Skip to content

Reimplement Snakebids expand using SnakemakeFormatter  #513

@pvandyken

Description

@pvandyken

Summary

Snakebids currently calls Snakemake’s expand directly. To support optional-wildcard templates and run cleanly in contexts where Snakemake may not be installed, we should implement Snakebids’ own expand behavior.

Key differences from Snakemake:

  • For each template path, format it once per row of the zip-list using Snakebids’ SnakemakeFormatter (snakebids/utils/snakemake_templates.py).
  • Snakebids does not allow a custom combinator function.
  • Mixed-in additional entities are combined with the zip-list rows using product.
  • Optional wildcards are supported.
  • Output duplicates may arise when paths don’t contain all wildcards; we will do order-preserving deduplication via dict.fromkeys(...).

Additionally:

  • allow_missing must be supported, but implemented inside SnakemakeFormatter.
  • If a wildcard is missing and allow_missing=True, the wildcard must remain as the original brace-wrapped wildcard,constraint combination given as input, i.e. the output should contain exactly the {name,constraint} form that appeared in the input template.

Context / links

Upstream reference:

  • Snakemake expand: snakemake/snakemake src/snakemake/io/__init__.py

Requirements

1) Reimplement expand in Snakebids (stop calling Snakemake’s)

Modify Snakebids expand (in src/snakebids/core/datasets.py) to no longer call Snakemake’s expand. Instead, expansions should be generated by formatting template paths with SnakemakeFormatter.

2) Formatting approach

For each provided template path:

  • Iterate rows of the zip-list (zipped entities).
  • For each row, format the path once using SnakemakeFormatter.format(...).
  • Mixed-in entities provided directly to expand should combine with zip-list rows using product (i.e., each zip row expands over all combinations of the mixed-in values).
  • Snakebids expand does not support swapping the combinator function.

3) Optional wildcards + missing entity handling

  • Optional wildcards in templates are handled by SnakemakeFormatter.
  • None and "" both represent “missing entity” inputs; convert any None values to "" prior to formatting.
  • No dummy or special wildcard labels may be provided directly as entity arguments to expand (e.g. _acq_, _subject_d_, __d__, ___). Ordinary entities may still be optional.

4) allow_missing behavior is implemented in SnakemakeFormatter

  • Add allow_missing: bool = False to SnakemakeFormatter.__init__.
  • When allow_missing=True, any missing wildcard must be replaced by the original brace-wrapped wildcard including constraint as given in the input template, not by "".
  • Constraint stripping and parsing must move from parse to get_value.

5) Duplicate output handling

  • Apply order-preserving deduplication: list(dict.fromkeys(outputs)).

6) AnnotatedString flag preservation (Snakemake optional dependency)

  • If an input template is an AnnotatedString with .flags, every output must be an AnnotatedString with those .flags.
  • Snakebids must support running without Snakemake installed, gating this logic behind an import check.

Testing

Most required test scenarios are already covered in tests/test_datasets.py (notably TestExpandables). Additional tests should focus on:

  • Ensuring None"" conversion for entity values before formatting with SnakemakeFormatter.
  • Constraint-bearing wildcards are preserved verbatim under allow_missing=True.
  • AnnotatedString propagation (conditional on snakemake installed).

Tasks

  1. Refactor SnakemakeFormatter

    • Add allow_missing to __init__.
    • Stop stripping constraints in parse().
    • Implement constraint parsing/stripping logic inside get_value.
    • When missing and allow_missing=True, return the original {field_name} verbatim (brace-wrapped, including constraint).
  2. Reimplement Snakebids expand

    • Replace direct Snakemake expand calls with formatter-based expansion.
    • Ensure mixed-in entities are combined with zip-list rows using product.
    • Convert None"" for entity values before formatting.
    • Validate that no dummy/special wildcard labels are passed as direct entity args.
    • Preserve order and deduplicate with dict.fromkeys.
  3. AnnotatedString flag propagation

    • Detect Snakemake AnnotatedString templates (only if snakemake installed).
    • Ensure derived outputs are AnnotatedString with same .flags.
  4. Tests

    • Reuse existing dataset expansion tests in tests/test_datasets.py as primary coverage.
    • Add focused tests for:
      • Constraint-preserving partial expansion (allow_missing=True).
      • None handling in expansion input values.
      • AnnotatedString flag propagation (conditional).

Acceptance Criteria

  • Snakebids no longer depends on calling Snakemake’s expand.
  • expand uses SnakemakeFormatter for per-row formatting.
  • allow_missing is implemented in SnakemakeFormatter, preserving brace-wrapped wildcards as specified.
  • Optional entities support None and "" as missing values, with correct conversion before formatting.
  • Duplicate outputs are removed with order preserved.
  • AnnotatedString flags are preserved when present and safely ignored if Snakemake is absent.
  • All tests pass; code meets lint/typing standards.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions