Streamline Your Research: Integrate Your Favorite Tools with AiiDA-WorkGraph

AiiDA is sometimes viewed as having a steep initial setup, primarily because users are required to develop AiiDA plugins to execute their codes on remote computers. This development process involves creating input generators, output parsers, and configuring codes for execution. Software packages such as ASE, which already include APIs to manage these tasks, are widely utilized within various communities and boast many impressive features. Requiring developers to adapt their packages to meet AiiDA-specific requirements introduces significant overhead, including the need to maintain dual codebases. Moreover, AiiDA’s emphasis on data provenance demands that data be transformed to fit its unique database format, a process divergent from those used by other packages.

The new PythonJob built-in task in AiiDA-WorkGraph innovates by allowing users to deploy their existing package APIs to operate jobs on remote computers seamlessly. For example, users can now directly use ASE’s calculator to perform DFT calculations remotely by writing standard Python functions. The WorkGraph manages the execution of these functions on remote computers, handles checkpoints, manages data transformations, and ensures comprehensive data provenance.

Highlighted Benefits

  • Leveraging Existing Packages: Utilize existing packages like ASE, which are equipped with APIs for managing inputs, execution, and outputs, directly. This integration significantly reduces redundancy in the traditional AiiDA setup processes, increasing user efficiency.

  • Broad Accessibility: The PythonJob task aims to make AiiDA more accessible by enabling users from various backgrounds to integrate their existing tools without converting them into AiiDA-specific formats. This enhancement broadens AiiDA’s appeal across diverse scientific and engineering communities.

Real-world Workflow: atomization energy of molecule

The atomization energy, \(\Delta E\), of a molecule can be expressed as:

\[\Delta E = n_{\text{atom}} \times E_{\text{atom}} - E_{\text{molecule}}\]

Where:

  • \(\Delta E\) is the atomization energy of the molecule.

  • \(n_{\text{atom}}\) is the number of atoms.

  • \(E_{\text{atom}}\) is the energy of an isolated atom.

  • \(E_{\text{molecule}}\) is the energy of the molecule.

Define a task using ASE EMT potential

[1]:
from aiida_workgraph import task, WorkGraph
from ase import Atoms

@task()
def emt(atoms: Atoms) -> float:
    """Calculate the energy of an Atoms object using the EMT calculator."""
    from ase.calculators.emt import EMT
    atoms.calc = EMT()
    energy = atoms.get_potential_energy()
    return energy


@task()
def atomization_energy(molecule: Atoms,
                       energy_molecule: float,
                       energy_atom: float) -> float:
    """Calculate the atomization energy of a molecule."""
    energy = energy_atom*len(molecule) - energy_molecule
    return energy

Define a workgraph

[2]:
wg = WorkGraph("atomization_energy")
pw_atom = wg.add_task("PythonJob", function=emt, name="emt_atom")
pw_mol = wg.add_task("PythonJob", function=emt, name="emt_mol")
# create the task to calculate the atomization energy
wg.add_task("PythonJob", function=atomization_energy, name="atomization_energy",
             energy_atom=pw_atom.outputs["result"],
             energy_molecule=pw_mol.outputs["result"])
wg.to_html()
[2]:

Prepare the inputs and submit the workgraph

Computer: Users can designate the remote computer where the job will be executed. This action will create an AiiDA code, python3@computer, if it does not already exist.

Data: It is recommended that users employ standard Python data types as inputs. The WorkGraph is responsible for transferring and serializing this data to AiiDA-compatible formats. During serialization, the WorkGraph searches for a corresponding AiiDA data entry point based on the module and class names (e.g., ase.atoms.Atoms). If an appropriate entry point is found, it is utilized for serialization. If no entry point is found, the data is serialized into binary format using PickledData (pickle).

Python Version: To ensure compatibility, the Python version on the remote computer should match the version used on the localhost. Users can create a matching virtual environment using Conda. It’s essential to activate this environment prior to executing the script.

For operational deployments, metadata can be passed to the scheduler to configure the execution environment:

metadata = {
    "options": {
        'custom_scheduler_commands': 'module load anaconda\nconda activate py3.11\n',
    }
}
[3]:
from ase.build import molecule
from ase import Atoms
from aiida import load_profile

load_profile()

# create input structure
n_atom = Atoms("N", pbc=True)
n_atom.center(vacuum=5.0)
n2_molecule = molecule("N2", pbc=True)
n2_molecule.center(vacuum=5.0)

metadata = {
    "options": {
        'custom_scheduler_commands': '# test\n',
        # 'custom_scheduler_commands': 'module load anaconda\nconda activate py3.11\n',
    }
}
#------------------------- Set the inputs -------------------------
wg.tasks["emt_atom"].set({"atoms": n_atom,
                          "computer": "localhost",
                          "metadata": metadata})
wg.tasks["emt_mol"].set({"atoms": n2_molecule,
                         "computer": "localhost",
                         "metadata": metadata})
wg.tasks["atomization_energy"].set({"molecule": n2_molecule})
#------------------------- Submit the calculation -------------------
wg.submit(wait=True, timeout=200)
#------------------------- Print the output -------------------------
print('Atomization energy:                  {:0.3f} eV'.format(wg.tasks['atomization_energy'].outputs["result"].value.value))

WorkGraph process created, PK: 50402
Atomization energy:                  9.651 eV

Remote Folder

Users can inspect the remote folder to review the files generated by the PythonJob. To check the job status, use the command:

verdi process list -a
50350  7m ago     WorkGraph<atomization_energy>                       ⏹ Finished [0]
50354  7m ago     PythonJob<emt>                                      ⏹ Finished [0]
50358  7m ago     PythonJob<emt>                                      ⏹ Finished [0]
50368  7m ago     PythonJob<atomization_energy>                       ⏹ Finished [0]

To access the remote folder and view its contents, replace <calcjob-pk> with the appropriate calculation job PK (e.g., 50354):

# replace <calcjob-pk> with the calcjob pk, e.g. 50354
verdi calcjob gotocomputer <calcjob-pk>

To list the files in the remote folder:

$ ls
aiida.out        inputs.pickle   _scheduler-stderr.txt  script.py
_aiidasubmit.sh  results.pickle  _scheduler-stdout.txt

Each task’s inputs are serialized into the inputs.pickle file, and outputs are stored in the results.pickle file. The script.py file on the remote computer executes the Python function.

Use Parent Folder

The parent_folder parameter enables a task to access the output files of a parent task. This functionality is particularly beneficial for reusing data from a previous computation in subsequent processes. For example, in the setup below, the multiply task utilizes the file created by the add task from the remote_folder.

[6]:
from aiida_workgraph import WorkGraph, task

# define add task
@task()
def add(x, y):
    z = x + y
    with open("result.txt", "w") as f:
        f.write(str(z))

# define multiply task
@task()
def multiply(x, y):
    with open("parent_folder/result.txt", "r") as f:
        z = int(f.read())
    return x*y + z

wg = WorkGraph("PythonJob_parent_folder")
wg.add_task("PythonJob", function=add, name="add")
wg.add_task("PythonJob", function=multiply, name="multiply",
             parent_folder=wg.tasks["add"].outputs["remote_folder"],
             )

wg.to_html()
[6]:

Upload files or folders to the remote computer

The upload_files parameter allows users to upload files or folders to the remote computer. The files will be uploaded to the working directory of the remote computer.

# we need use full path to the file
input_file = os.path.abspath("input.txt")
input_folder = os.path.abspath("inputs_folder")

wg.submit(inputs = {"add": {
                            "computer": "localhost",
                            "upload_files": {"input.txt": input_file,
                                             "inputs_folder": input_folder,
                                             },
                            },
                    },
          wait=True)

Conclusion

The new PythonJob task in AiiDA-WorkGraph simplifies the integration of existing software tools into computational workflows, enhancing efficiency and broadening user access. This feature minimizes setup complexity and allows researchers to focus more on their scientific goals with reliable data management. As a result, AiiDA is now more user-friendly and applicable across various scientific disciplines, marking a significant step forward in making advanced computational research more accessible and efficient.