Skip to content

molmod/acid-test

Repository files navigation

🚨 This repository is under development as part of the preparation for the ACID 2 release.

You can view the latest version of the ACID 1 dataset and validation results at the following URLs:

The AutoCorrelation Integral Drill (ACID) 2 -- Test Bench

This repository contains the scripts and StepUp workflows to validate algorithms and their implementations for computing an integral of an autocorrelation function, using the "AutoCorrelation Integral Drill" (ACID) test set. More details on the ACID test can be found in the corresponding ACID Git repository.

A description, test reports and and an archived copy of this repository can be found on Zenodo: 10.5281/zenodo.18947912.

For now, ACID is only used to validate the STACIE algorithm and its implementation. We plan to test also other programs in the future, including:

License

All files in this dataset are distributed under a choice of license: either the Creative Commons Attribution-ShareAlike 4.0 International license (CC BY-SA 4.0) or the GNU Lesser General Public License, version 3 or later (LGPL-v3+). The SPDX License Expression for the documentation is CC-BY-SA-4.0 OR LGPL-3.0-or-later.

You should have received a copy of the CC BY-SA 4.0 and LGPL-v3+ licenses along with the data set. If not, see:

Citation

If you use this dataset in your research, please cite the following publication:

Gözdenur Toraman, Dieter Fauconnier, and Toon Verstraelen "STable AutoCorrelation Integral Estimator (STACIE): Robust and accurate transport properties from molecular dynamics simulations" Journal of Chemical Information and Modeling 2025, 65 (19), 10445–10464, doi:10.1021/acs.jcim.5c01475, arXiv:2506.20438.

@article{Toraman2025,
 author = {G\"{o}zdenur Toraman and Dieter Fauconnier and Toon Verstraelen},
 title = {STable AutoCorrelation Integral Estimator (STACIE): Robust and accurate transport properties from molecular dynamics simulations},
 journal = {Journal of Chemical Information and Modeling},
 volume = {65},
 number = {19},
 pages = {10445--10464},
 year = {2025},
 month = {sep},
 url = {https://doi.org/10.1021/acs.jcim.5c01475},
 doi = {10.1021/acs.jcim.5c01475},
}

Overview

This consists of four main parts:

  1. 1_dataset/: Contains a script to download the appropriate ACID dataset from Zenodo. It will mirror the 1_dataset/output directory of the ACID repository, which contains the raw data files for the test set.
  2. 2_validation/: Workflows to recompute the validation results for a selection of autocorrelation integral estimators with the ACID test set. Subdirectories test_* contain workflows for different implementations and versions.
  3. 3_report/: A workflow with post-processing scripts of the validation results to regenerate the figures and tables, similar to those in the initial STACIE paper.
  4. 4_zenodo/: A workflow to package and upload the generated data to Zenodo.

When regenerating the data and the validation of results, the workflows in these directories must be executed in the order listed above. Each directory contains a README.md file that provides more details.

All instructions below assume that you are working on a compute cluster with SLURM job scheduling. If you are working on a local machine, run job.sh scripts directly instead of submitting them with sbatch.

ACID Data Download

Before any of the validations can be performed, the ACID dataset must be downloaded. Be mindful of the size of the dataset (43 GB) and the bandwidth of your internet connection.

(cd 1_dataset/; sbatch download.sh)

This job script only uses the wget, unzip and standard posix commands, which should present by default on most Linux systems.

Instead of downloading the large dataset, you can also regenerate it locally, by following the instructions in the ACID repository. Due to differences in floating-point arithmetics and compiler optimizations, the generated dataset may differ from the one on Zenodo, but it should be sufficiently similar for validation purposes.

To use your local copy, run the script link.sh in the same directory. It assumes that the output directory is located at ../acid/1_dataset/output.

Software Environments

There are two approaches to software environments in this repository:

  1. The 3_report/ and 4_zenodo/ directories use the software environment defined in the top-level requirements.in file. This venv is also suitable for working with this repository in general, e.g. it includes pre-commit.
  2. The directories 2_validation/test_* define their own software environment, as needed by the different implementations being validated. Such independent environments allows for benchmarking different versions of the same software, or to support incompatable requirements between different implementations. A local requirements.in file is used to define the software environment for each workflow.

To create a virtual environment, run or submit the top-level setup-venv-pip.sh from the directory that contains the requirements.in file. If you want this script to use a specific Python version, set the PYTHON3 environment variable before running it. For example:

export PYTHON3=/usr/bin/python3.13  # optional
cd 2_validation/test_stacie_v1.0.0
sbatch ../../setup-venv-pip.sh

After the virtual environment has been created, you can run or submit the script job.sh to perform the actual work. If you want to work interactively with the virtual environment, you can source the .loadvenv script in the workflow directory.

Note that the workflows and scripts in this repository require Python 3.11 or higher. They have only been tested on an x86_64 Linux system (so far). All results on Zenodo were generated using the following module on the Tier2 VSC compute cluster donphan

module load Python/Python/3.13.1-GCCcore-14.2.0

When the setup-venv-pip.sh script detects the presence of the $VSC_HOME environment variable, it will automatically load this Python module and include it in the generated .loadvenv script.

How to Work With This Git Repository

Please, follow these guidelines to make clean commits to this repository:

  1. Install pre-commit on your system. (It is included in the requirements.in file, so it will be installed in the virtual environment when you run setup-venv-pip.sh.)
  2. Install the pre-commit hook by running pre-commit install in the root directory of this repository.
  3. Use git commit as you normally would.

If you are working in an environment with limited permissions, you can install pre-commit locally by running the following commands:

wget https://github.com/pre-commit/pre-commit/releases/download/v4.5.1/pre-commit-4.5.1.pyz
python pre-commit-4.5.1.pyz install

How to Make a New Release

After having updated the contents of the repository, the following steps are needed to make a new release on Zenodo:

  • Update CHANGELOG.md with a new version section, describing the changes since the last release.

  • Update the version number in 4_zenodo/zenodo.yaml.

  • Upload a draft release to Zenodo by running

    (cd 4_zenodo/; sbatch job.sh)
  • Visit the dataset page on Zenodo and click on "New version". The files and metadata will already be present due to the previous step. Request the DOI for the new draft and add this information to CHANGELOG.md.

  • Commit all changes to Git and run git tag with the new version number.

  • Ensure that all validation results are up to date by running the workflows in 2_validation/. For example:

    (cd 2_validation/test_stacie_v1.0.0/; sbatch job.sh)

    Note that the tests write JSON files with validation results to 3_report/results, which are included in the Zenodo release.

  • Recompile all PDF files in the repository to include the Git hash in the PDF frontmatter:

    (cd 3_report/; sbatch job.sh)
  • Sync your local data one last time with Zenodo:

    (cd 4_zenodo/; sbatch job.sh)
  • Log in to https://zenodo.org/, go to your draft release, check that all files have been uploaded correctly, and publish the release.

  • Push your commits and tags to GitHub.