Data Serialization
In AiiDA
Data serialization in AiiDA is critical for several reasons:
Storing data in the database: This is important for maintaining data provenance.
Storing intermediate data in checkpoints: This enables the restart of calculations.
In WorkGraph
In WorkGraph, all input data are passed into the wg
namespace. The wg
namespace is dynamic and can accept any data type, without enforcing validation on the data types. However, in AiiDA, when generating the node graph, only AiiDA data types are displayed, meaning only AiiDA data types can be linked within the graph. This does not imply a loss of data provenance, as WorkGraph itself does not generate any data. Only calcfunction
and CalcJob
generate new data, and the input data for these processes must be AiiDA data types to preserve data provenance. Data provenance can always be traced by checking the input data of the calcfunction
and CalcJob
.
There are reasons why we don’t serialize all data in the wg
namespace:
Flexibility for non-AiiDA components: WorkGraph supports non-AiiDA component as task, meaning any Python function can be used as a task in the graph. These functions do not require AiiDA data as input, allowing for a variety of data types.
Respecting existing serialization methods: For AiiDA components (e.g.,
CalcJob
,WorkChain
), some input ports may have explicitly defined serialization methods, which must be respected.
However, ensuring that all data within the wg
namespace are JSON-serializable is beneficial to guarantee that checkpoints can be saved and loaded correctly.
PythonJob
PythonJob
is a special case of CalcJob
that runs a Python function on a remote computer. The input data for the function does not need to be of AiiDA data type, and users are not required to provide AiiDA data types as input. When WorkGraph launches the PythonJob
, it serializes all input data for the function. However, if users provide non-JSON-serializable data as input, the checkpoint will fail. Thus, it is necessary to serialize all input data of the function when initializing the WorkGraph process.