[ML] Add per-PR changelog YAML entries with schema validation#2920
[ML] Add per-PR changelog YAML entries with schema validation#2920edsavage wants to merge 2 commits intoelastic:mainfrom
Conversation
Replace the monolithic CHANGELOG.md with per-PR YAML changelog files in docs/changelog/. Each PR that changes user-visible behaviour adds a small YAML file (<PR_NUMBER>.yaml) with structured metadata (area, type, summary). This eliminates merge conflicts in CHANGELOG.md and simplifies backports. Includes: - JSON schema for validating changelog entries - Python validation script (validate_changelogs.py) - Python bundler script (bundle_changelogs.py) for release notes - Gradle tasks: validateChangelogs, bundleChangelogs - Buildkite CI step (soft-fail during rollout) - Skip validation via >test, >refactoring, >docs, >build labels Made-with: Cursor
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Made-with: Cursor
Review: Interaction with the existing monolithic changelogThe new per-PR YAML changelog system and the existing No migration or replacement planThe existing No deduplicationNothing prevents the same change from appearing in both the monolithic file and a per-PR YAML entry. No release workflow
Format mismatchThe existing changelog uses AsciiDoc macros like Grouping mismatchThe existing file groups by Elasticsearch version ( SuggestionsTo be production-ready, this would need:
|
|
@edsavage , I am very excited about this change. However, this should be the first step to resolving #2217 and we need to decide on the complete plan on how integrate ML changeslog in the ES release docs process. I expect that some design decisions will extend/adjust the yaml schema that you are using now. Once we have this, we should ditch CHANGELOG.asciidoc completely and only use the single schema. I think a couple of things changed since 2022, which makes #2217 more approachable and relevant:
Can you please plan the required changes for the complete integration of the release doc processes, and identify the open questions we still need to answer before moving forward? |
Design Plan: Integrating ml-cpp Changelogs into the ES Release Notes PipelineFollowing up on @valeriy42's request to plan the complete integration of ml-cpp changelog entries into the Elasticsearch release documentation process, resolving #2217. Current State
Proposed DesignPhase 1: Per-PR YAML changelogs in ml-cppDevelopers add structured changelog entries with each ml-cpp PR. The schema should align with the ES changelog schema ( pr: 2914
summary: "Split build and test into separate pipeline steps"
area: Machine Learning
type: enhancement
issues: []Key schema decisions:
Auto-generation of changelog entriesIn the ES repo,
Developers can then customise the generated file if needed (e.g. adjusting the summary wording). This automation likely runs via Homer or another internal Elastic tool configured in Proposed mechanism —
This replicates the ES workflow while being self-contained — no dependency on Homer or external tooling. As an alternative (or complement), we could provide a CLI helper: # Generate changelog YAML from PR metadata
./dev-tools/generate_changelog.sh 2914Validation: CI validates entries against the schema on every PR (soft-fail initially, then hard-fail). Location: Skip logic: PRs labelled Phase 2: Integration with the ES release notes pipelineThree possible approaches, in order of preference: Option A — ES build pulls ml-cpp changelogs at bundle time (recommended) Extend
Requires a PR to Option B — CI pushes ml-cpp entries to the ES repo When an ml-cpp PR is merged, a GitHub Actions workflow creates a corresponding YAML file in the ES repo via PR:
Simpler to implement but adds cross-repo coupling and noise to the ES repo. Option C — Release-time script (interim) A script collects ml-cpp changelogs, converts them to ES-compatible format, and creates a single PR in the ES repo at release time. Less automation but lowest risk — good as an interim step while working toward Option A. Phase 3: Deprecate CHANGELOG.asciidocOnce the YAML system is integrated:
Phase 4: Backport considerationsChangelog YAML files travel with the PR — they are just files in the repo. When backporting:
Open Questions
Suggested Next Steps
|
Summary
Replaces the monolithic CHANGELOG.md approach with per-PR YAML changelog files, modelled after the Elasticsearch repository's changelog system.
Each PR that changes user-visible behaviour adds a small YAML file (
docs/changelog/<PR_NUMBER>.yaml) with structured metadata:What's included
docs/changelog/— directory for per-PR YAML entries, with a README explaining the format and a JSON schema for validationdev-tools/validate_changelogs.py— Python script that validates entries against the schema (filename convention, field types, enum values, PR number cross-check)dev-tools/bundle_changelogs.py— Python script that generates consolidated release notes (Markdown or AsciiDoc) from individual YAML entries, grouped by type and areavalidateChangelogsandbundleChangelogsfor local developer useformat_and_validation.yml.shas asoft_failstep during rollout, with automatic skip for PRs labelled>test,>refactoring,>docs, or>buildBenefits
Rollout plan
The CI step is set to
soft_fail: trueinitially, giving the team time to adopt the new workflow before making it mandatory.Test plan
validate_changelogs.pylocally with valid and invalid YAML filesbundle_changelogs.pymarkdown and asciidoc outputvalidateChangelogstask wiringMade with Cursor