PythonJob to run Python function on a remote computer

PythonJob task is a built-in task which allow user to run Python function on a remote computer.

File Handling

Remote Folder

List the file in the remote folder:

$ ls
aiida.out        inputs.pickle   _scheduler-stderr.txt  script.py
_aiidasubmit.sh  results.pickle  _scheduler-stdout.txt

Each task creates a script.py file on the remote computer, which includes:

  • The function definition.

  • Loading inputs from inputs.pickle.

  • Running the function with the loaded inputs.

  • Saving the results into results.pickle.

About the data

For a CalcJob, the input data needs to be an AiiDA data node; however, we don’t require the user to install AiiDA or the same Python environment on a remote computer. This means we should pass normal Python data as arguments when running the Python function on the remote computer. The WorkGraphEngine will handle this data transformation when preparing and launching the CalcJob.

All AiiDA data that will be passed to the function should have a value attribute, which corresponds to its raw Python data. The PickledData, Int, Float, Str, Bool fulfill this requirement, while List, Dict and StructureData are not.

Inputs and Outputs:

Inputs for each task are pickled into the inputs.pickle file. Outputs from each task are pickled into the results.pickle file.

Parent Folder

The parent_folder parameter allows a task to access the output files of a parent task. This feature is particularly useful when you want to reuse data generated by a previous computation in subsequent computations. In the provided example, the multiply task uses the result.txt file created by the add task.

Example

from aiida_workgraph import WorkGraph, task
from aiida import orm, load_profile

load_profile()

# define add task
@task()
def add(x, y):
    return x + y

# define multiply task
@task()
def multiply(x, y):
    return x*y

wg = WorkGraph("first_workflow")
wg.add_task(add, name="add", run_remotely=True)
wg.add_task(multiply, name="multiply", x=wg.tasks["add"].outputs[0], run_remotely=True)


#------------------------- Submit the calculation -------------------
wg.submit(inputs = {"add": {"x": 2, "y": 3, "computer": "localhost"},
                    "multiply": {"y": 4, "computer": "localhost"}},
          wait=True)
print("\nResult of multiply is {} \n\n".format(wg.tasks["multiply"].outputs['result'].value))

Using parent_folder_name for Data Continuity

AiiDA runs each job in a separate folder. If one calculation requires data from previous calculations to be accessible in the current job’s working directory. This has been managed with the parent_folder input, which specifies a source for copying necessary data. The new parent_folder_name input streamlines this process by allowing users to define a subfolder within the working directory to organize these files effectively.

Example Usage: NSCF Calculation

In the context of an NSCF calculation, where data dependency exists on outputs from a SCF calculation, the workflow can be configured as follows:

nscf_task = wg.add_task(
    pw_calculator,
    name="nscf",
    parent_folder=scf_task.outputs["remote_folder"],
    parent_output_folder="out",
    parent_folder_name="out",
    run_remotely=True,
)

This setup will copy all content of the out folder from the SCF calculation’s remote folder into an out folder within the working directory of the NSCF job.

Handling Multiple Data Sources with copy_files

The traditional parent_folder method is limited when calculations require inputs from multiple remote directories. For instance, Bader charge analysis with Quantum ESPRESSO may need both valence and all-electron density data from different calculations.

The new copy_files input allows for flexible linkage to multiple remote folders. It facilitates copying necessary files from diverse sources into a single job’s directory under dynamically generated subfolder names based on taskand socket names.

Example Usage: Bader Charge Analysis

For a Bader analysis requiring different charge density files:

bader_task = wg.add_task(
    bader_calculator,
    name="bader",
    command=bader_command,
    charge_density_folder="pp_valence_remote_folder",
    reference_charge_density_folder="pp_all_remote_folder",
    run_remotely=True,
)
wg.add_link(pp_valence.outputs["remote_folder"], bader_task.inputs["copy_files"])
wg.add_link(pp_all.outputs["remote_folder"], bader_task.inputs["copy_files"])

The bader_calculator function using specified charge density data:

@task()
def bader_calculator(
    command: str = "pw.x",
    charge_density_folder: str = "./",
    charge_density_filename: str = "charge_density.cube",
    reference_charge_density_folder: str = "./",
    reference_charge_density_filename: str = "charge_density.cube",
):
    """Run Bader charge analysis."""
    command_str = f"{command} {charge_density_folder}/{charge_density_filename}"
    if reference_charge_density_filename:
        command_str += f" -ref {reference_charge_density_folder}/{reference_charge_density_filename}"
    os.system(command_str)

    with open("ACF.dat", "r") as f:
        lines = f.readlines()
        charges = [float(line.split()[4]) for line in lines[2:-4]]

    return charges