Skip to content

Fix incorrect last sequence id when sending messages in batch#546

Merged
BewareMyPower merged 5 commits intoapache:mainfrom
bigo-sg:fix-531-batch-last-sequence-id
Mar 5, 2026
Merged

Fix incorrect last sequence id when sending messages in batch#546
BewareMyPower merged 5 commits intoapache:mainfrom
bigo-sg:fix-531-batch-last-sequence-id

Conversation

@zhanglistar
Copy link
Contributor

Fixes #531

Problem

When sending messages with sendAsync and then flush(), getLastSequenceId() returns a wrong value when batching is enabled.

  • After sending 3 messages (sequence ids [0, 1, 2]), the result is 4 instead of 2.
  • After sending 5 messages, the result is 8 instead of 4.

The sequence id is effectively doubled per message in the batch.

Root cause

  1. Batch metadata
    Commands::serializeSingleMessagesToBatchPayload used the last message's sequence_id (messages.back()) for the batch metadata. The client then assumed the broker ack referred to the first id and computed lastSequenceIdPublished_ = sequenceId + messagesCount - 1, which is wrong when the broker echoes or uses the last id.

  2. Ack handling
    In ProducerImpl::ackReceived, the code assumed the broker always sends the first sequence id of the batch and set lastSequenceIdPublished_ = sequenceId + messagesCount - 1. When the broker sends the last sequence id (common for batch acks), this formula double-counts and yields the wrong last sequence id.

Solution

  1. lib/Commands.cc
    In serializeSingleMessagesToBatchPayload, return the first message's sequence_id (messages.front()) for the batch metadata so the batch is identified by the first message's id.

  2. lib/ProducerImpl.cc
    In ackReceived:

    • Treat the ack as matching the pending op when sequenceId is in [expectedFirstSequenceId, expectedLastSequenceId].
    • If the broker sent the first id: lastSequenceIdPublished_ = expectedLastSequenceId.
    • If the broker sent the last id: lastSequenceIdPublished_ = sequenceId.
  3. tests/ProducerTest.cc
    Add testGetLastSequenceIdAfterBatchFlush to verify:

    • After 3 messages in a batch and flush(), getLastSequenceId() is 2.
    • After 5 messages total, getLastSequenceId() is 4.

How to verify

./tests/pulsar_tests --gtest_filter="ProducerTest.testGetLastSequenceIdAfterBatchFlush"

@BewareMyPower
Copy link
Contributor

FAILED TESTS (11/579):
      90 ms: ./pulsar-tests KeyBasedBatchingTest.testSequenceId (try #1)
      42 ms: ./pulsar-tests KeyBasedBatchingTest.testSequenceId (try #2)
      81 ms: ./pulsar-tests KeyBasedBatchingTest.testSequenceId (try #3)
      31 ms: ./pulsar-tests KeyBasedBatchingTest.testSequenceId (try #4)

zhangzhibiao added 2 commits March 5, 2026 18:52
@BewareMyPower BewareMyPower added the bug Something isn't working label Mar 5, 2026
@BewareMyPower BewareMyPower modified the milestones: 4.2.0, 4.1.0 Mar 5, 2026
Comment on lines +867 to +868
// After seek-to-end the broker may close the consumer and trigger reconnect; allow a short
// delay for hasMessageAvailable to become false (avoids flakiness when reconnect completes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to reduce flakiness to avoid blocking PRs, but it should be noted that these changes don't fix the bug of hasMessageAvailable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That's a little tricky here.

@BewareMyPower BewareMyPower merged commit 23b60d1 into apache:main Mar 5, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Incorrect last sequence id when sending messages in batch

2 participants