Skip to content

add setting to define filename pattern for part exports#1490

Open
arthurpassos wants to merge 5 commits intoantalya-26.1from
export_filename_pattern_setting
Open

add setting to define filename pattern for part exports#1490
arthurpassos wants to merge 5 commits intoantalya-26.1from
export_filename_pattern_setting

Conversation

@arthurpassos
Copy link
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add setting to define filename pattern for part exports - helps with sharding - port of unmerged and unreviewed PR #1383

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Workflow [PR], commit [4120b57]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d69971b4f2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

manifest.parquet_parallel_encoding = json->getValue<bool>("parquet_parallel_encoding");
manifest.max_bytes_per_file = json->getValue<size_t>("max_bytes_per_file");
manifest.max_rows_per_file = json->getValue<size_t>("max_rows_per_file");
manifest.filename_pattern = json->getValue<String>("filename_pattern");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve manifest backward compatibility for filename_pattern

Deserialization now requires filename_pattern unconditionally, but metadata written by earlier versions does not include this key. Any node that reads an older exports/.../metadata.json (for example while checking existing exports or canceling an export in StorageReplicatedMergeTree) will throw during fromJsonString, breaking in-flight export management after upgrade. Make this field optional and fall back to the default pattern when absent.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthurpassos , what do you think on ^^ ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody is using this feature yet, it is ok to introduce backwards incompatible changes like this. We literally have 0 users so far.

- **Type**: `String`
- **Default**: `{part_name}_{checksum}`
- **Description**: Pattern for the filename of the exported merge tree part. The `part_name` and `checksum` are calculated and replaced on the fly. Additional macros are supported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we duplicate part_export.md content here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is export partition (slightly different feature), and at some point there might be settings that are not supported by export partition and only by export part.

I don't have a good answer tbh.

Macros::MacroExpansionInfo macro_info;
macro_info.table_id = storage_id;
filename = local_context->getMacros()->expand(filename, macro_info);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need special logic from {part_name} and {checksum}?
In other words, why we do not put it inside expand() ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because part_name and checksum are calculated on the fly based on the data part being exported. They are not meant to be extracted from macros, it would not even work tbh

ilejn
ilejn previously approved these changes Mar 9, 2026
Copy link
Collaborator

@ilejn ilejn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arthurpassos arthurpassos added port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Mar 9, 2026
@ilejn
Copy link
Collaborator

ilejn commented Mar 9, 2026

test_export_replicated_mt_partition_to_object_storage/test.py::test_export_partition_from_replicated_database_uses_db_shard_replica_macros test failure could be related to this PR.

@arthurpassos
Copy link
Collaborator Author

test_export_replicated_mt_partition_to_object_storage/test.py::test_export_partition_from_replicated_database_uses_db_shard_replica_macros test failure could be related to this PR.

This was an interesting one: it succeeds in all suites, except for Integration tests (amd_asan, db disk, old analyzer, 6/6). The key here is the db_disk variant. The db_disk variant sets the shard and replica macros. When expanding the macros, the configured macros take precedence over the fields set in the code. That's why it is failing. Being that said, we have a few options:

  1. Skip this test for db_disk suite variant
  2. Use the same values the suite code sets the macros - the tests will succeed, we just won't know if it's because it expanded the config or the values set in the code
  3. if-else branch

I vote for n1.

@Selfeer
Copy link
Collaborator

Selfeer commented Mar 10, 2026

PR #1490 Audit Review

AI audit note: This review was generated by AI (gpt-5.3-codex).

Audit update for PR #1490 (export filename pattern for part/partition export)


Confirmed defects

Medium: Backward-incompatible manifest deserialization for existing partition exports

  • Impact: After upgrade, any code path that reads pre-PR exports/.../metadata.json can throw and break export lifecycle operations (overwrite check, cancellation lookup, export status polling/listing) for those entries.
  • Anchor: src/Storages/ExportReplicatedMergeTreePartitionManifest.h / ExportReplicatedMergeTreePartitionManifest::fromJsonString, plus readers in src/Storages/StorageReplicatedMergeTree.cpp and src/Storages/MergeTree/ExportPartitionManifestUpdatingTask.cpp.
  • Trigger: A metadata.json created before this PR (without filename_pattern) is read by upgraded binaries.
  • Why defect: Deserialization now unconditionally calls json->getValue<String>("filename_pattern"); older manifests do not contain that key, so parse fails instead of falling back to default behavior.
  • Transition: existing export entry in ZooKeepermetadata readmanifest parseexception path instead of normal state update.
  • Fault injection mapping: Inject legacy metadata missing filename_pattern; expected fail-closed parse now occurs.
  • Fix direction (short): Make filename_pattern optional in manifest parse and default to "{part_name}_{checksum}" when absent.
  • Regression test direction (short): Add a unit/integration test that deserializes legacy manifest JSON and verifies cancel/list/overwrite-check paths continue to work.
  • Affected subsystem and blast radius: Replicated partition export metadata management; impacts any cluster carrying old export metadata across upgrade.
  • Code evidence:
    • src/Storages/ExportReplicatedMergeTreePartitionManifest.h: manifest.filename_pattern = json->getValue<String>("filename_pattern");
    • src/Storages/StorageReplicatedMergeTree.cpp: reads manifest for existing-entry expiry check and cancellation scan.
    • src/Storages/MergeTree/ExportPartitionManifestUpdatingTask.cpp: reads manifest while polling/refreshing export tasks.

Medium: {shard} / {replica} expansion can ignore DatabaseReplicated identity when config macros are present

  • Impact: Filename patterns intended to include database-level shard/replica can resolve to server-config macros instead, defeating per-database disambiguation and reintroducing collision risk in shared destinations.
  • Anchor: src/Storages/MergeTree/ExportPartTask.cpp / buildDestinationFilename and src/Common/Macros.cpp / Macros::expand.
  • Trigger: Using pattern with {shard} / {replica} in environments where server <macros> defines those names (e.g. remote-db-disk variants).
  • Why defect: buildDestinationFilename populates macro_info.shard/replica, but Macros::expand explicitly prefers configured macros over implicit values; DB-derived values are shadowed.
  • Transition: export settings include shard/replica patternmacro expansionconfig macro selectedfilename not tied to DatabaseReplicated identity.
  • Fault injection mapping: Inject server config macros for shard/replica while exporting from DatabaseReplicated and observe resolved filename source.
  • Fix direction (short): Ensure DB-derived shard/replica take precedence for this expansion path (or perform explicit replacement before generic macro expansion).
  • Regression test direction (short): Add/enable a test variant where server macros are set and assert exported filenames still contain DatabaseReplicated shard/replica values.
  • Affected subsystem and blast radius: Export filename generation for replicated databases; impacts correctness of collision-avoidance naming in mixed macro environments.
  • Code evidence:
    • src/Storages/MergeTree/ExportPartTask.cpp: sets macro_info.shard/replica then calls expand.
    • src/Common/Macros.cpp: /// Prefer explicit macros over implicit. branch resolves configured macros first.
    • tests/integration/test_export_replicated_mt_partition_to_object_storage/test.py: test_export_partition_from_replicated_database_uses_db_shard_replica_macros is skipped for remote-db-disk due to this precedence conflict.

Coverage summary

  • Scope reviewed: PR origin/antalya-26.1...pr-1490 for settings wiring, replicated partition manifest lifecycle, export filename expansion, scheduler propagation, and added tests/docs.
  • Categories failed: Backward/upgrade compatibility on manifest schema; error-contract consistency for macro source precedence.
  • Categories passed: Call-graph construction, transition mapping (entry→manifest→scheduler→part export), branch analysis for new filename path, C++ bug-class scan (lifetime, iterator invalidation, lock-order, exception rollback, integer/signedness, RAII, UB) with no additional confirmed defects.
  • Assumptions/limits: Static code audit only (no runtime execution); severity reflects realistic production paths inferred from current call sites.

@Selfeer
Copy link
Collaborator

Selfeer commented Mar 10, 2026

PR #1490 CI Triage

PR: Altinity/ClickHouse#1490
Title: Add setting to define filename pattern for part exports
Author: arthurpassos
Base branch: antalya-26.1
State: OPEN
Latest commit: 4120b578d87fa7d5bc47bbb553d3487bff761fae ("skip test", 2026-03-10)
CI Report: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1490&sha=4120b578d87fa7d5bc47bbb553d3487bff761fae&name_0=PR

Summary

Category Count Tests
PR-caused regression (fixed) 2 02995_new_settings_history, test_export_partition_from_replicated_database_uses_db_shard_replica_macros
Pre-existing flaky 2 00145_aggregate_functions_statistics, test_move_after_processing[another_bucket-AzureQueue]
Infrastructure issue 1 Stateless tests (amd_debug, distributed plan, s3 storage, sequential) job failure
Pre-existing broken 4 02815_no_throw_in_simple_queries, 03206_no_exceptions_clickhouse_local, 03441_deltalake_clickhouse_public_datasets, 03441_deltalake_clickhouse_virtual_columns
Cascade failures 0

Detailed Analysis

PR-Caused Regressions (Both Addressed)

1. 02995_new_settings_history — Fast test — FIXED

2. test_export_partition_from_replicated_database_uses_db_shard_replica_macros — Integration tests (amd_asan, db disk, old analyzer, 6/6) — ADDRESSED (skipped)

  • Check: Integration tests (amd_asan, db disk, old analyzer, 6/6) (2026-03-09 16:50:30, first commit)
  • Root cause: The remote disk test suite sets shard/replica macros in helpers/cluster.py that take precedence over the ones from the DatabaseReplicated definition. When expanding macros for the export filename pattern, the configured macros are preferred, causing the test assertion to fail.
  • History: This test has never failed before on any other PR — only on PR add setting to define filename pattern for part exports #1490. Directly related to the PR's changes.
  • Fix: Commit 4120b578 ("skip test") adds skip_if_remote_database_disk_enabled(cluster) to skip this test when running with remote database disk.
  • Current status: Integration tests are pending rerun on the latest commit.

Pre-existing Flaky Tests (Unrelated)

3. 00145_aggregate_functions_statistics — Stateless tests (amd_debug, parallel)

4. test_storage_s3_queue/test_0.py::test_move_after_processing[another_bucket-AzureQueue] — Integration tests (arm_binary, distributed plan, 3/4)

Infrastructure Issues

5. Stateless tests (amd_debug, distributed plan, s3 storage, sequential) — Job failure

  • GitHub status: failure (completed in 13m 2s)
  • Database: Zero failed tests — all tests show OK or SKIPPED.
  • Analysis: The GitHub Actions job failed but no individual test failed. The "Run" step is showing as failed, likely due to a runner/infrastructure issue (timeout, OOM, or runner crash).
  • Relation to PR: None. Infrastructure-level failure.

Pre-existing Broken Tests

These 4 tests are marked BROKEN in both the first and latest commit runs, indicating pre-existing issues:

Test Occurrences
02815_no_throw_in_simple_queries 2 runs
03206_no_exceptions_clickhouse_local 2 runs
03441_deltalake_clickhouse_public_datasets 2 runs
03441_deltalake_clickhouse_virtual_columns 2 runs

CI Status by Build Type

Build Status
Build (amd_debug) PASS
Build (amd_release) PASS
Build (amd_asan) PASS
Build (amd_binary) PASS
Build (arm_asan) PASS
Build (arm_binary) PASS
Build (arm_release) PASS
Install packages (amd_release) PASS
Install packages (arm_release) PASS
Quick functional tests PASS
AST fuzzer (amd_debug) PASS
BuzzHouse (amd_debug) PASS
Stress test (amd_debug) PASS
Stress test (arm_asan) PASS
Stress test (arm_asan, s3) PASS
Unit tests (asan) PASS
Stateless tests (amd_asan, distributed plan, parallel, 1/2) PASS
Stateless tests (amd_asan, distributed plan, parallel, 2/2) PASS
Stateless tests (amd_binary, ParallelReplicas, s3 storage, parallel) PASS
Stateless tests (amd_debug, distributed plan, s3 storage, parallel) PASS
Stateless tests (arm_binary, parallel) PASS
Stateless tests (arm_asan, targeted) PASS
Integration tests (amd_asan, targeted) PASS
Integration tests (amd_asan, db disk, old analyzer, 1-5/6) PASS
Integration tests (amd_binary, 1-5/5) PASS
Integration tests (arm_binary, distributed plan, 1,2,4/4) PASS

Recommendations

  1. No blocking PR-caused failures remain. Both PR-caused regressions have been addressed:

    • Settings history updated in commit 2ca197c9
    • Export partition integration test skipped for remote disk configuration in commit 4120b578
  2. Wait for pending integration test reruns on the latest commit to confirm the export partition test fix works correctly.

  3. The 00145_aggregate_functions_statistics flaky test is a known issue unrelated to this PR and should not block merge.

  4. The S3 queue AzureQueue flaky test is a known issue unrelated to this PR and should not block merge.

  5. The infrastructure failure on Stateless tests (amd_debug, distributed plan, s3 storage, sequential) should be rerun if a clean CI is needed.

Conclusion

This PR is safe to merge from a CI perspective. All actual test failures are either:

  • Fixed by subsequent commits in the PR
  • Pre-existing flaky tests unrelated to the changes
  • Infrastructure issues

@Selfeer
Copy link
Collaborator

Selfeer commented Mar 10, 2026

PR #1490 Audit Review

AI audit note: This review was generated by AI (gpt-5.3-codex).

Audit update for PR #1490 (export filename pattern for part/partition export)

Confirmed defects

Medium: Backward-incompatible manifest deserialization for existing partition exports

* **Impact:** After upgrade, any code path that reads pre-PR `exports/.../metadata.json` can throw and break export lifecycle operations (overwrite check, cancellation lookup, export status polling/listing) for those entries.

* **Anchor:** `src/Storages/ExportReplicatedMergeTreePartitionManifest.h` / `ExportReplicatedMergeTreePartitionManifest::fromJsonString`, plus readers in `src/Storages/StorageReplicatedMergeTree.cpp` and `src/Storages/MergeTree/ExportPartitionManifestUpdatingTask.cpp`.

* **Trigger:** A `metadata.json` created before this PR (without `filename_pattern`) is read by upgraded binaries.

* **Why defect:** Deserialization now unconditionally calls `json->getValue<String>("filename_pattern")`; older manifests do not contain that key, so parse fails instead of falling back to default behavior.

* **Transition:** `existing export entry in ZooKeeper` → `metadata read` → `manifest parse` → `exception path instead of normal state update`.

* **Fault injection mapping:** Inject legacy metadata missing `filename_pattern`; expected fail-closed parse now occurs.

* **Fix direction (short):** Make `filename_pattern` optional in manifest parse and default to `"{part_name}_{checksum}"` when absent.

* **Regression test direction (short):** Add a unit/integration test that deserializes legacy manifest JSON and verifies cancel/list/overwrite-check paths continue to work.

* **Affected subsystem and blast radius:** Replicated partition export metadata management; impacts any cluster carrying old export metadata across upgrade.

* **Code evidence:**
  
  * `src/Storages/ExportReplicatedMergeTreePartitionManifest.h`: `manifest.filename_pattern = json->getValue<String>("filename_pattern");`
  * `src/Storages/StorageReplicatedMergeTree.cpp`: reads manifest for existing-entry expiry check and cancellation scan.
  * `src/Storages/MergeTree/ExportPartitionManifestUpdatingTask.cpp`: reads manifest while polling/refreshing export tasks.

Medium: {shard} / {replica} expansion can ignore DatabaseReplicated identity when config macros are present

* **Impact:** Filename patterns intended to include database-level shard/replica can resolve to server-config macros instead, defeating per-database disambiguation and reintroducing collision risk in shared destinations.

* **Anchor:** `src/Storages/MergeTree/ExportPartTask.cpp` / `buildDestinationFilename` and `src/Common/Macros.cpp` / `Macros::expand`.

* **Trigger:** Using pattern with `{shard}` / `{replica}` in environments where server `<macros>` defines those names (e.g. remote-db-disk variants).

* **Why defect:** `buildDestinationFilename` populates `macro_info.shard/replica`, but `Macros::expand` explicitly prefers configured macros over implicit values; DB-derived values are shadowed.

* **Transition:** `export settings include shard/replica pattern` → `macro expansion` → `config macro selected` → `filename not tied to DatabaseReplicated identity`.

* **Fault injection mapping:** Inject server config macros for shard/replica while exporting from DatabaseReplicated and observe resolved filename source.

* **Fix direction (short):** Ensure DB-derived shard/replica take precedence for this expansion path (or perform explicit replacement before generic macro expansion).

* **Regression test direction (short):** Add/enable a test variant where server macros are set and assert exported filenames still contain DatabaseReplicated shard/replica values.

* **Affected subsystem and blast radius:** Export filename generation for replicated databases; impacts correctness of collision-avoidance naming in mixed macro environments.

* **Code evidence:**
  
  * `src/Storages/MergeTree/ExportPartTask.cpp`: sets `macro_info.shard/replica` then calls `expand`.
  * `src/Common/Macros.cpp`: `/// Prefer explicit macros over implicit.` branch resolves configured macros first.
  * `tests/integration/test_export_replicated_mt_partition_to_object_storage/test.py`: `test_export_partition_from_replicated_database_uses_db_shard_replica_macros` is skipped for remote-db-disk due to this precedence conflict.

Coverage summary

* **Scope reviewed:** PR `origin/antalya-26.1...pr-1490` for settings wiring, replicated partition manifest lifecycle, export filename expansion, scheduler propagation, and added tests/docs.

* **Categories failed:** Backward/upgrade compatibility on manifest schema; error-contract consistency for macro source precedence.

* **Categories passed:** Call-graph construction, transition mapping (entry→manifest→scheduler→part export), branch analysis for new filename path, C++ bug-class scan (lifetime, iterator invalidation, lock-order, exception rollback, integer/signedness, RAII, UB) with no additional confirmed defects.

* **Assumptions/limits:** Static code audit only (no runtime execution); severity reflects realistic production paths inferred from current call sites.

@arthurpassos can you check if the issues make sense?

@arthurpassos
Copy link
Collaborator Author

@arthurpassos can you check if the issues make sense?

1 is not relevant
2 has already been discussed in this PR #1490 (comment). At the same time, I need to double check with either Dima or Misha if the current behavior is the intended one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants