Skip to content

define a module for chunk grids, and a registry#3735

Open
d-v-b wants to merge 7 commits intozarr-developers:mainfrom
d-v-b:chore/chunk-grids-module
Open

define a module for chunk grids, and a registry#3735
d-v-b wants to merge 7 commits intozarr-developers:mainfrom
d-v-b:chore/chunk-grids-module

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Mar 1, 2026

This PR creates a modular structure for our chunk grids. Currently we only have the regular chunk grid, but we will get rectilinear chunks soon, and this layout will make that addition much easier.

This PR also adds an explicit registry for chunk grids. Previously we handled chunk grid class lookup in the ChunkGrid.from_dict method. A registry is better, and consistent with the rest of the codebase.

Since this is a purely internal change, I don't think we need release notes.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Mar 1, 2026
@d-v-b d-v-b requested a review from maxrjones March 1, 2026 16:18
@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 1, 2026

closes #3734, and supports #3534

@d-v-b d-v-b requested review from a team and jhamman March 2, 2026 02:06
@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 5, 2026

It would be good to have some eyes on this. we need this functionality for the rectilinear chunk grid.

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two reasons why this is actually public-facing:

  1. New functions added to __all__ in src/zarr/registry.py (which is outside .core)
  2. New zarr.chunk_grid entry point group in _collect_entrypoints

This could be kept private (and have a lower review bar) if the chunk grid registry is first prototyped inside zarr.core.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

I don't think we need to prototype this, it's using the same pattern as the registries we already have. I will add a release note but since we only support a single chunk grid implementation, there's not much anyone can do with this right now (besides register a different regular chunk grid).

@maxrjones
Copy link
Member

I don't think we need to prototype this, it's using the same pattern as the registries we already have. I will add a release note but since we only support a single chunk grid implementation, there's not much anyone can do with this right now (besides register a different regular chunk grid).

I would prefer that we build the functionality that allows using the chunk grid registry before exposing it in the public API. It's more about only having truly usable components in our public API than it is about prototyping. What is the downside of delaying its exposure via zarr.registry?

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

the registry itself works in this pr. someone who wants to register a different chunk grid can do that -- but it has to be some form of regular chunk grid. someone might want to register a different regular chunk grid with a different name, for example. that's possible, but I highly doubt anyone will want to do this.

I think the registry itself is logically distinct from the internal chunk grid implementation. so shipping the registry, in its final form, seems better to me than releasing a private registry and then modifying the registry again in a later PR.

@maxrjones
Copy link
Member

The registry exposes register_chunk_grid as public API, but there's no public API for creating an array with a custom chunk grid. zarr.create() and zarr.open_array() only accept chunk_shape: tuple[int, ...]. So the only way to actually use a registered chunk grid is through internal APIs like ArrayV3Metadata, which defeats the purpose of a public registry.

I agree that the registry is logically distinct from the internal chunk grid implementation. Still, I don't see value in adding something to the public API that cannot yet be used.

@maxrjones
Copy link
Member

Is there something about your plans for follow-up PRs that requires this functionality be public?

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

the goal is to be totally done with the chunk grid registry part of the problem. we can either do that in multiple PRs, or just one. I think "just one" is simpler, which is why this PR defines the chunk grid registry as a regular registry, like the other registries we already have. It's not very useful yet, but that's OK -- it doesn't break anything, users will not see it or be confused by it, and it's a necessary step (arguably a blocker) for the rest of the rectilinear chunk grid work we are trying to do.

@maxrjones
Copy link
Member

the goal is to be totally done with the chunk grid registry part of the problem. we can either do that in multiple PRs, or just one. I think "just one" is simpler, which is why this PR defines the chunk grid registry as a regular registry, like the other registries we already have. It's not very useful yet, but that's OK -- it doesn't break anything, users will not see it or be confused by it, and it's a necessary step (arguably a blocker) for the rest of the rectilinear chunk grid work we are trying to do.

Could we compromise and remove register_chunk_grid and get_chunk_grid_class from __all__? That way they would not show up in the public API docs or be returned from docs searches, such that I'd agree with you that users would not see it. I would gladly take on the follow-up PR to re-add them so more work is not added to your plate.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

that's fine, we will just have to revert those changes as soon as we start working on the next PRs (namely, the next PR that adds a new chunk grid)

@maxrjones
Copy link
Member

maxrjones commented Mar 6, 2026

that's fine, we will just have to revert those changes as soon as we start working on the next PRs (namely, the next PR that adds a new chunk grid)

can you share your plan for the sequence of PRs? __all__ only impact the docs and from zarr.registry import * behavior, so it's essential to understand what you plan to make public when to be able to understand the impact of my suggestions.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

my plan, which is pretty roughly defined at this point based on taking a look at the internals of #3534, goes like this:

  • add a module for chunk grids, and a registry for chunk grids (this PR)
  • add array_shape information to the base chunk grid, because this simplifies a lot of code in the rectilinear chunk grid PR, and the regular chunk grid can use it too (Feature/chunk grid array shape #3737 )

These are both self-contained, and don't depend on the regular chunk grid. There might be more changes like that inside #3534, but these are the first two I found. Once these changes are sorted out, we rebase #3534 on top of main and see what else we can shake out.

The registry exposes register_chunk_grid as public API, but there's no public API for creating an array with a custom chunk grid.

actually, where are you seeing this? (I should have asked this at the start). This PR only touches the chunk_grids package, which is not public API (it's inside zarr.core). (edit -- a few other files are touched but they are all in zarr.core, so not public)

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

ah I see -- the changes in registry.py. duh. I agree it doesn't make sense to expose anything in the registry without exposing the chunk grid base class, so I should do that in this PR.

@maxrjones
Copy link
Member

The registry exposes register_chunk_grid as public API, but there's no public API for creating an array with a custom chunk grid.

actually, where are you seeing this? (I should have asked this at the start). This PR only touches the chunk_grids package, which is not public API (it's inside zarr.core). (edit -- a few other files are touched but they are all in zarr.core, so not public)

https://zarr--3735.org.readthedocs.build/en/3735/api/zarr/registry/#zarr.registry.register_chunk_grid

this is collected into the public docs via https://github.com/d-v-b/zarr-python/blob/20dd3117f98fcb6a3e5e4eedf45fd4f1d981a950/docs/api/zarr/registry.md?plain=1#L5, which exposes all functions included in __all__.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

and something I failed to mention in the PR description -- this PR adds methods to the base chunk grid class that the rectilinear chunk grid will use. The goal was to ensure that, after this PR, we confine as much as possible any changes to the chunk_grids/rectilinear.py file

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

...and now that I look at the ChunkGrid base class, I realize I don't like it very much at all (it works, and will be fine for the rectilinear chunk grid, but maybe safer to keep it private until later). So I removed the public API parts of this PR -- everything should be internal now.

@d-v-b
Copy link
Contributor Author

d-v-b commented Mar 6, 2026

thanks max! good call to be careful on what we make public here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants