[ML] Add per-PR changelog YAML entries with schema validation by edsavage · Pull Request #2920 · elastic/ml-cpp

edsavage · 2026-02-26T00:01:00Z

Summary

Replaces the monolithic CHANGELOG.md approach with per-PR YAML changelog files, modelled after the Elasticsearch repository's changelog system.

Each PR that changes user-visible behaviour adds a small YAML file (docs/changelog/<PR_NUMBER>.yaml) with structured metadata:

pr: 2914
summary: Split build and test into separate pipeline steps
area: Build
type: enhancement
issues: []

What's included

docs/changelog/ — directory for per-PR YAML entries, with a README explaining the format and a JSON schema for validation
dev-tools/validate_changelogs.py — Python script that validates entries against the schema (filename convention, field types, enum values, PR number cross-check)
dev-tools/bundle_changelogs.py — Python script that generates consolidated release notes (Markdown or AsciiDoc) from individual YAML entries, grouped by type and area
Gradle tasks — validateChangelogs and bundleChangelogs for local developer use
Buildkite CI step — added to format_and_validation.yml.sh as a soft_fail step during rollout, with automatic skip for PRs labelled >test, >refactoring, >docs, or >build

Benefits

No more merge conflicts in CHANGELOG.md
Simpler backports — changelog entry travels with the PR, no separate file to conflict
Structured data — enables automated release notes generation
Schema validation — catches errors early in CI

Rollout plan

The CI step is set to soft_fail: true initially, giving the team time to adopt the new workflow before making it mandatory.

Test plan

Validated validate_changelogs.py locally with valid and invalid YAML files
Verified error messages for: wrong filename, missing fields, invalid enums, PR number mismatch, extra fields
Tested bundle_changelogs.py markdown and asciidoc output
Confirmed Gradle validateChangelogs task wiring
CI build passes with the new Buildkite step

Made with Cursor

Replace the monolithic CHANGELOG.md with per-PR YAML changelog files in docs/changelog/. Each PR that changes user-visible behaviour adds a small YAML file (<PR_NUMBER>.yaml) with structured metadata (area, type, summary). This eliminates merge conflicts in CHANGELOG.md and simplifies backports. Includes: - JSON schema for validating changelog entries - Python validation script (validate_changelogs.py) - Python bundler script (bundle_changelogs.py) for release notes - Gradle tasks: validateChangelogs, bundleChangelogs - Buildkite CI step (soft-fail during rollout) - Skip validation via >test, >refactoring, >docs, >build labels Made-with: Cursor

prodsecmachine · 2026-02-26T00:01:16Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Made-with: Cursor

edsavage · 2026-02-27T01:54:42Z

Review: Interaction with the existing monolithic changelog

The new per-PR YAML changelog system and the existing docs/CHANGELOG.asciidoc are completely independent — there's no integration between them. A few things to consider before merging:

No migration or replacement plan

The existing docs/CHANGELOG.asciidoc is untouched. There's no code to append bundled entries into it, nor any plan to deprecate it. Contributors could end up maintaining both systems in parallel.

No deduplication

Nothing prevents the same change from appearing in both the monolithic file and a per-PR YAML entry.

No release workflow

bundle_changelogs.py can output AsciiDoc, but there's no automation to merge its output into docs/CHANGELOG.asciidoc at release time, nor to clean out processed YAML files after a release.

Format mismatch

The existing changelog uses AsciiDoc macros like {ml-pull}2863[#2863] for links, while the bundler generates raw GitHub URLs. They wouldn't be stylistically consistent if combined.

Grouping mismatch

The existing file groups by Elasticsearch version (== {es} version 9.4.0), while the YAML schema has no version field — entries are just grouped by type and area.

Suggestions

To be production-ready, this would need:

A decision on whether the monolithic file is being replaced or supplemented
A release-time workflow to merge YAML entries into the existing format (or replace it)
Cleanup of processed YAML files after each release
Consistent link/reference formatting between the two systems

valeriy42 · 2026-03-06T10:26:12Z

@edsavage , I am very excited about this change. However, this should be the first step to resolving #2217 and we need to decide on the complete plan on how integrate ML changeslog in the ES release docs process. I expect that some design decisions will extend/adjust the yaml schema that you are using now. Once we have this, we should ditch CHANGELOG.asciidoc completely and only use the single schema.

I think a couple of things changed since 2022, which makes #2217 more approachable and relevant:

Introduction of AI code assistants significantly reduced the implementation costs, and hence, the ROI argument of Dave Roberts does not have the same validity anymore.
We have many new developers who are more comfortable with ES processes. Aligning the ML-CPP documentation process with ES will simplify their work and reduce errors.

Can you please plan the required changes for the complete integration of the release doc processes, and identify the open questions we still need to answer before moving forward?

edsavage · 2026-03-09T02:35:18Z

Design Plan: Integrating ml-cpp Changelogs into the ES Release Notes Pipeline

Following up on @valeriy42's request to plan the complete integration of ml-cpp changelog entries into the Elasticsearch release documentation process, resolving #2217.

Current State

ml-cpp maintains docs/CHANGELOG.asciidoc manually
At ES release time, someone manually copies relevant entries into the Elasticsearch release notes
Elasticsearch uses per-PR YAML files (docs/changelog/<PR>.yaml) validated against a JSON schema, bundled and rendered by Gradle tasks (generateReleaseNotes)
Machine Learning is already a valid area in the ES changelog schema

Proposed Design

Phase 1: Per-PR YAML changelogs in ml-cpp

Developers add structured changelog entries with each ml-cpp PR. The schema should align with the ES changelog schema (build-tools-internal/src/main/resources/changelog-schema.json) as closely as possible:

pr: 2914
summary: "Split build and test into separate pipeline steps"
area: Machine Learning
type: enhancement
issues: []

Key schema decisions:

Use the ES area enum — most entries would use Machine Learning, but some could use Inference or other valid ES areas
Use the ES type enum — bug, enhancement, feature, breaking, deprecation, etc.
Support highlight and breaking objects — same structure as ES, for entries warranting release highlights or breaking change notices
pr field — references the ml-cpp PR number (not an ES PR). This diverges from ES where pr is always an ES PR number
Add optional es-pr field — for cross-repo changes where a corresponding ES PR exists

Auto-generation of changelog entries

In the ES repo, elasticsearchmachine automatically generates changelog YAML files for PRs. When a PR is opened, the bot:

Creates docs/changelog/<PR_NUMBER>.yaml with fields derived from the PR metadata (title, labels, linked issues)
Pushes a commit to the PR branch (attributed to the PR author) with the message Update docs/changelog/<PR_NUMBER>.yaml
Comments on the PR: "Hi @author, I've created a changelog YAML for you."
If the PR title or labels change, the bot updates the file and comments: "I've updated the changelog YAML for you."

Developers can then customise the generated file if needed (e.g. adjusting the summary wording).

This automation likely runs via Homer or another internal Elastic tool configured in elastic/elasticsearch-infra. Since ml-cpp doesn't have this integration, we should replicate and build on it with a GitHub Action:

Proposed mechanism — changelog-check GitHub Action (runs on pull_request):

On PR open/edit/label: the workflow checks whether docs/changelog/<PR_NUMBER>.yaml exists in the PR branch
If missing and required: it auto-generates a changelog YAML file from PR metadata:
- pr — from the PR number
- summary — from the PR title
- area — defaults to Machine Learning (can be overridden by PR labels)
- type — inferred from PR labels (>bug → bug, >enhancement → enhancement, >feature → feature, >breaking → breaking, >deprecation → deprecation, default → enhancement)
- issues — extracted from any Fixes #NNN / Closes #NNN references in the PR body
Commit the generated file directly to the PR branch, so the developer can review and adjust it
If PR metadata changes: update the generated file (matching the ES bot behaviour)
If already manually edited: validate the existing file against the schema but don't overwrite the developer's changes
If not required: skip silently (based on skip labels)

This replicates the ES workflow while being self-contained — no dependency on Homer or external tooling.

As an alternative (or complement), we could provide a CLI helper:

# Generate changelog YAML from PR metadata
./dev-tools/generate_changelog.sh 2914

Validation: CI validates entries against the schema on every PR (soft-fail initially, then hard-fail).

Location: docs/changelog/<PR_NUMBER>.yaml in the ml-cpp repo.

Skip logic: PRs labelled >test, >refactoring, >docs, >build, or >non-issue would not require a changelog entry.

Phase 2: Integration with the ES release notes pipeline

Three possible approaches, in order of preference:

Option A — ES build pulls ml-cpp changelogs at bundle time (recommended)

Extend BundleChangelogsTask in elastic/elasticsearch to read changelogs from ml-cpp in addition to the local docs/changelog/ directory:

Add a Gradle configuration for external changelog sources (repo + path)
At bundle time, fetch ml-cpp's docs/changelog/ directory (via git clone or GitHub API)
Merge ml-cpp entries into the bundle, adjusting PR links to point to elastic/ml-cpp

Requires a PR to elastic/elasticsearch build-tools-internal and buy-in from the ES build/release team.

Option B — CI pushes ml-cpp entries to the ES repo

When an ml-cpp PR is merged, a GitHub Actions workflow creates a corresponding YAML file in the ES repo via PR:

ml-cpp CI creates docs/changelog/ml-cpp-<PR>.yaml in the ES repo
Uses the standard ES schema with a naming convention to avoid PR number collisions
The pr field would need special handling (or an external_pr / source_repo field)

Simpler to implement but adds cross-repo coupling and noise to the ES repo.

Option C — Release-time script (interim)

A script collects ml-cpp changelogs, converts them to ES-compatible format, and creates a single PR in the ES repo at release time. Less automation but lowest risk — good as an interim step while working toward Option A.

Phase 3: Deprecate CHANGELOG.asciidoc

Once the YAML system is integrated:

Stop updating docs/CHANGELOG.asciidoc
Replace its contents with a pointer to the ES release notes
Add a pruneChangelogs equivalent that removes YAML files after they are included in a release

Phase 4: Backport considerations

Changelog YAML files travel with the PR — they are just files in the repo. When backporting:

The YAML file is cherry-picked along with the code change
The same entry appears on the version branch, which is correct for that version's release notes
No special handling needed (this is an advantage of per-file changelogs vs a monolithic file)

Open Questions

PR number linkage — The ES schema uses ES PR numbers. ml-cpp entries reference ml-cpp PRs. How should these appear in the generated release notes? Options:
- Use {ml-pull} macro format (existing convention in CHANGELOG.asciidoc)
- Extend the ES schema with a source_repo field
- Use a filename convention (e.g., ml-cpp-2914.yaml)
Cross-repo changes — When a change spans both ES and ml-cpp (e.g., new ML feature with a Java API surface), where does the changelog entry live? Both repos? One with a cross-reference? The current convention is to mark the ES PR as >non-issue and reference both {ml-pull} and {es-pull} in the ml-cpp changelog.
ES build-tools ownership — Extending BundleChangelogsTask requires buy-in from the ES build/release team. Should we propose this, or start with a simpler integration path (Option C)?
Homer / elasticsearchmachine integration — elasticsearchmachine auto-generates changelog YAML files for ES PRs (creates the file, commits it to the PR branch, and updates it when PR metadata changes). This automation likely runs via Homer or similar internal tooling in elastic/elasticsearch-infra. Key questions:
- Could this automation be extended to also handle ml-cpp PRs?
- Would the ES team prefer ml-cpp to use the same tooling, or is an independent GitHub Action acceptable?
- Who owns the elasticsearchmachine changelog automation and can we request changes?
Version scoping — ES changelogs are pruned per release. How do we handle the version boundary in ml-cpp? Should we prune after each ES release that includes ml-cpp changes?
Which PRs need entries? — Should every ml-cpp PR have a changelog entry, or only user-facing changes? What labels indicate "no changelog needed"?
Historical entries — Should we backfill existing CHANGELOG.asciidoc entries as YAML, or draw a line and only use YAML going forward?

Suggested Next Steps

Align on the schema — confirm that using the ES area/type enums works for ml-cpp
Investigate elasticsearchmachine — determine how the ES changelog auto-generation works and whether it can be extended to ml-cpp, or whether an independent GitHub Action is preferred (question 4)
Answer question 3 — reach out to the ES build/release team about the preferred integration method
Implement Phase 1 — update this PR to use the ES-compatible schema, including the changelog-check GitHub Action for auto-generation
Start with Option C — build a release-time script as an interim integration while pursuing Option A

[ML] Add >non-issue to changelog validation skip labels

22dfa52

Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add per-PR changelog YAML entries with schema validation#2920

[ML] Add per-PR changelog YAML entries with schema validation#2920
edsavage wants to merge 2 commits intoelastic:mainfrom
edsavage:changelog-yaml-per-pr

edsavage commented Feb 26, 2026

Uh oh!

prodsecmachine commented Feb 26, 2026 •

edited

Loading

Uh oh!

edsavage commented Feb 27, 2026

Uh oh!

valeriy42 commented Mar 6, 2026

Uh oh!

edsavage commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edsavage commented Feb 26, 2026

Summary

What's included

Benefits

Rollout plan

Test plan

Uh oh!

prodsecmachine commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

edsavage commented Feb 27, 2026

Review: Interaction with the existing monolithic changelog

No migration or replacement plan

No deduplication

No release workflow

Format mismatch

Grouping mismatch

Suggestions

Uh oh!

valeriy42 commented Mar 6, 2026

Uh oh!

edsavage commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design Plan: Integrating ml-cpp Changelogs into the ES Release Notes Pipeline

Current State

Proposed Design

Phase 1: Per-PR YAML changelogs in ml-cpp

Auto-generation of changelog entries

Phase 2: Integration with the ES release notes pipeline

Phase 3: Deprecate CHANGELOG.asciidoc

Phase 4: Backport considerations

Open Questions

Suggested Next Steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prodsecmachine commented Feb 26, 2026 •

edited

Loading

edsavage commented Mar 9, 2026 •

edited

Loading