PythonJob
to run Python function on a remote computer
PythonJob
task is a built-in task which allow user to run Python function on a remote computer.
File Handling
Remote Folder
List the file in the remote folder:
$ ls
aiida.out inputs.pickle _scheduler-stderr.txt script.py
_aiidasubmit.sh results.pickle _scheduler-stdout.txt
Each task creates a script.py
file on the remote computer, which includes:
The function definition.
Loading inputs from
inputs.pickle
.Running the function with the loaded inputs.
Saving the results into
results.pickle
.
About the data
For a CalcJob
, the input data needs to be an AiiDA data node; however, we don’t require the user to install AiiDA or the same Python environment on a remote computer. This means we should pass normal Python data as arguments when running the Python function on the remote computer. The WorkGraphEngine
will handle this data transformation when preparing and launching the CalcJob
.
All AiiDA data that will be passed to the function should have a value
attribute, which corresponds to its raw Python data. The PickledData
, Int
, Float
, Str
, Bool
fulfill this requirement, while List
, Dict
and StructureData
are not.
Inputs and Outputs:
Inputs for each task are pickled into the inputs.pickle
file. Outputs from each task are pickled into the results.pickle
file.
Parent Folder
The parent_folder parameter allows a task to access the output files of a parent task. This feature is particularly useful when you want to reuse data generated by a previous computation in subsequent computations. In the provided example, the multiply task uses the result.txt file created by the add task.
Example
from aiida_workgraph import WorkGraph, task
from aiida import orm, load_profile
load_profile()
# define add task
@task()
def add(x, y):
return x + y
# define multiply task
@task()
def multiply(x, y):
return x*y
wg = WorkGraph("first_workflow")
wg.add_task(add, name="add", run_remotely=True)
wg.add_task(multiply, name="multiply", x=wg.tasks["add"].outputs[0], run_remotely=True)
#------------------------- Submit the calculation -------------------
wg.submit(inputs = {"add": {"x": 2, "y": 3, "computer": "localhost"},
"multiply": {"y": 4, "computer": "localhost"}},
wait=True)
print("\nResult of multiply is {} \n\n".format(wg.tasks["multiply"].outputs['result'].value))
Using parent_folder_name
for Data Continuity
AiiDA runs each job in a separate folder. If one calculation requires data from previous calculations to be accessible in the current job’s working directory. This has been managed with the parent_folder
input, which specifies a source for copying necessary data. The new parent_folder_name
input streamlines this process by allowing users to define a subfolder within the working directory to organize these files effectively.
Example Usage: NSCF Calculation
In the context of an NSCF calculation, where data dependency exists on outputs from a SCF calculation, the workflow can be configured as follows:
nscf_task = wg.add_task(
pw_calculator,
name="nscf",
parent_folder=scf_task.outputs["remote_folder"],
parent_output_folder="out",
parent_folder_name="out",
run_remotely=True,
)
This setup will copy all content of the out
folder from the SCF calculation’s remote folder into an out
folder within the working directory of the NSCF job.
Handling Multiple Data Sources with copy_files
The traditional parent_folder
method is limited when calculations require inputs from multiple remote directories. For instance, Bader charge analysis with Quantum ESPRESSO may need both valence and all-electron density data from different calculations.
The new copy_files
input allows for flexible linkage to multiple remote folders. It facilitates copying necessary files from diverse sources into a single job’s directory under dynamically generated subfolder names based on taskand socket names.
Example Usage: Bader Charge Analysis
For a Bader analysis requiring different charge density files:
bader_task = wg.add_task(
bader_calculator,
name="bader",
command=bader_command,
charge_density_folder="pp_valence_remote_folder",
reference_charge_density_folder="pp_all_remote_folder",
run_remotely=True,
)
wg.add_link(pp_valence.outputs["remote_folder"], bader_task.inputs["copy_files"])
wg.add_link(pp_all.outputs["remote_folder"], bader_task.inputs["copy_files"])
The bader_calculator
function using specified charge density data:
@task()
def bader_calculator(
command: str = "pw.x",
charge_density_folder: str = "./",
charge_density_filename: str = "charge_density.cube",
reference_charge_density_folder: str = "./",
reference_charge_density_filename: str = "charge_density.cube",
):
"""Run Bader charge analysis."""
command_str = f"{command} {charge_density_folder}/{charge_density_filename}"
if reference_charge_density_filename:
command_str += f" -ref {reference_charge_density_folder}/{reference_charge_density_filename}"
os.system(command_str)
with open("ACF.dat", "r") as f:
lines = f.readlines()
charges = [float(line.split()[4]) for line in lines[2:-4]]
return charges