Conversation
|
Thanks for doing this @williamsnell! |
|
this is great @williamsnell! I'm wondering if this be exposed as part of the general array configuration, which is where |
|
@d-v-b I've pushed a new commit moving this to One note: I'm not sure how this interacts with Sharding now - following the existing code I hardcoded |
I don't think we want this new configuration option to change the behavior of the sharding codec. A missing subchunk inside a shard is conveyed explicitly via the shard index, so from the sharding codec's POV you can't have a subchunk appear missing due to a network error. |
If I've understood correctly, we'll want to make this tweak to def _get_chunk_spec(self, shard_spec: ArraySpec) -> ArraySpec:
+ # Because the shard index and inner chunks should be stored
+ # together, we detect missing data via the shard index.
+ # The inner chunks defined here are thus allowed to return
+ # None, even if fill_missing_chunks=False at the array level.
+ config = replace(shard_spec.config, fill_missing_chunks=True)
return ArraySpec(
shape=self.chunk_shape,
dtype=shard_spec.dtype,
fill_value=shard_spec.fill_value,
- config=shard_spec.config,
+ config=config,
prototype=shard_spec.prototype,
)With this change, I think my previous point was wrong - we would be able to use |
example in zarr-python zarr-developers#486.
self.supports_partial_decode`.
expected behaviour of fill_missing_chunks for both sharding and write_empty_chunks via tests. Use elif to make control flow slightly clearer.
6db55a1 to
de7afd8
Compare
|
I've committed the change to I've also made two more changes:
|
Add config options for whether a missing chunk should:
fill_value(current behaviour; retained as default)MissingChunkErrorThis PR is entirely based on the work of @tomwhite in this issue. I've started this PR as this an important feature that I'd like to see merged.
I've added a test (based on the demo in the issue) and a minor docs tweak.
Questions:
config.md- is this the right place?TODO:
docs/user-guide/*.mdchanges/