perf(saexec): Remove TrieDB memory leak by alarso16 · Pull Request #236 · ava-labs/strevm

alarso16 · 2026-02-25T15:53:23Z

Currently, SAE follows this pattern:
If we call triedb.Commit() every N blocks:

State 0 will be on disk
States [1, N] will be generated in memory
State N will be committed, moving all dirty nodes at that root from memory onto disk
All outdated nodes in [1, N) will remain in the dirty cache

Although not a correctness bug, keeping these tries is generally unnecessary and leads to a memory leak.

The expected functionality should be:

For validators: keep the minimum states around to guarantee that all consensus-necessary states are accessible
For API nodes: keep all states on disk.

Now, all nodes will default to NOT storing all states. API nodes must set the necessary config

saedb/saedb.go

alarso16 · 2026-02-25T19:10:24Z

saedb/saedb.go

+
+// Record tracks the root and may commit the trie associated with the root
+// to the database if the height is on an multiple of [CommitTrieDBEvery].
+func (e *StateRecorder) Record(root common.Hash, height uint64) error {


This is the same logic as coreth's core/state_manager.go logic, but simpler. There is a "we're not ready to commit yet, but we have a huge amount of state in memory, let's offload some into the database" function, but I don't think it's necessary until proven so

alarso16 · 2026-02-25T21:06:30Z

saedb/saedb.go

+}
+
+// Close commits the most recent state to the database for shutdown.
+func (e *StateRecorder) Close() (errs error) {


Is there a better pattern for this? I could avoid the defer, but we should report all these errors, right?

sae/recovery.go

saedb/saedb.go

sae/recovery_test.go

saedb/saedb.go

saexec/saexec.go

saexec/saexec_test.go

saedb/saedb.go

saexec/saexec_test.go

ARR4N

Thank you for digging deeply enough to figure out that this would be a problem, and for fixing it!

saexec/saexec.go

saexec/saexec_test.go

sae/recovery.go

saedb/saedb.go

ARR4N · 2026-02-26T13:09:44Z

saexec/saexec_test.go

+		// We expect to not find blocks older than [saedb.StateHistory]
+		for _, b := range chain.AllBlocks() {
+			sdb, err := e.StateDB(b.PostExecutionStateRoot())
+			inMemory := b.NumberU64()+saedb.StateHistory > uint64(numBlocks) //nolint:gosec // positive plus positive


If you define numBlocks as a uint64 then the linter won't complain:

numBlocks := uint64(saedb.StateHistory) + 10

Would it be better to just make StateHistory a uint64? the buffer takes an int for some reason...

saexec/saexec_test.go

ARR4N · 2026-02-26T13:21:13Z

saexec/saexec_test.go

+	final := chain.Last()
+	require.NoErrorf(t, final.WaitUntilExecuted(ctx), "%T.WaitUntilExecuted() on last-enqueued block", final)
+
+	t.Run("remove in memory state", func(t *testing.T) {


Tests that share logic and only differ in the declaration of parameters are easier to reason about. It's also unnecessary to check that a non-nil StateDB is returned when there's a nil error because that's implied by all other usage.

t.Run("access state", func(t *testing.T) { for _, b := range chain.AllBlocks() { var want testerr.Want switch num := b.NumberU64(); { case num > numBlocks-saedb.StateHistory: // Still referenced case saedb.ShouldCommitTrieDB(num): // On disk default: want = testerr.As(func(got *trie.MissingNodeError) string { if r := b.PostExecutionStateRoot(); got.NodeHash != r { return fmt.Sprintf("%T for hash %#x", got, r) } return "" }) } _, err := e.StateDB(b.PostExecutionStateRoot()) if diff := testerr.Diff(err, want); diff != "" { t.Errorf("%T.StateDB([post-execution root of block %d]) %s", e, b.NumberU64(), diff) } } })

That does look cleaner. I also tried to share with the recover portion of the test, since it's all the same logic besides one line

saexec/saexec_test.go

worstcase/state_benchmark_test.go

…erence

stale

ARR4N

Primarily readability and structure but the approach LGTM in general.

saexec/recorder.go

ARR4N · 2026-03-03T13:24:16Z

saexec/recorder.go

+// WorstCaseState returns a [worstcase.State] at the starting at the provided settled block.
+func (s *stateRecorder) WorstCaseState(hooks hook.Points, config *params.ChainConfig, settled *blocks.Block) (*worstcase.State, error) {
+	return worstcase.NewState(hooks, config, s.cache, settled, s.snaps)
+}


It doesn't make sense for the saexec.Executor to construct a worstcase.State. Although the plumbing works, the abstraction is strange and is being governed by the former.

This refactoring allows the Executor to be injected into the worstcase.State constructor instead of calling it:

package worstcase type StateDBOpener interface { StateDB(root common.Hash) (*state.StateDB, error) } func New(hook.Points, *params.CacheConfig, *types.Block, StateDBOpener)

I agree, I didn't love the abstraction, but didn't see a better way. This does seem to be an improvement, but still feels a little unnatural (specifically for testing). I added it, and let me know if you agree and we can refactor this constructor some more

saexec/recorder.go

saexec/execution.go

worstcase/state_benchmark_test.go

saexec/saexec_test.go

…erence

ARR4N

Pretty much there. The move of the interface and test helper is trivial and I just have one open question. Please DM to take another look and we can almost certainly merge this today.

worstcase/state_test.go

ARR4N · 2026-03-05T13:21:26Z

saexec/recorder.go

+	}
+
+	// If we have new state, commit changes to database for easier startup.
+	if err := s.cache.TrieDB().Commit(root, true /* log */); err != nil {


Why log here when we're not doing it in the regular path?

I thought that logging the final state root known to the database is helpful, specifically if there's issues restarting the VM after shutdown.

FWIW setting log = false still does log, but at a debug instead of info level

…erence

alarso16 · 2026-03-06T20:24:44Z

sae/vm_test.go

 	}
 }
+
+func TestRegressionLoseStateBeforeSettlement(t *testing.T) {


I'm not sure this regression test is still necessary....

saedb/recorder.go

alarso16 · 2026-03-06T20:27:18Z

saexec/saexec_test.go

 // cancel the returned context, which is useful when waiting for blocks that
 // can never finish execution because of an error.
-func newSUT(tb testing.TB, hooks *saehookstest.Stub) (context.Context, SUT) {
+func newSUT(tb testing.TB, opts ...sutOption) (context.Context, *SUT) {


Since I now had two optional fields, seemed better to just provide the infrastructure.

…erence

ARR4N

Haven't finished the review but I've noticed something that requires a bit of work (as a test) so thought I'd share it early.

saedb/tracker.go

ARR4N · 2026-03-13T13:38:30Z

sae/consensus.go

 			continue
 		}
 		vm.blocks.Delete(s.Hash())
+		vm.exec.Untrack(s.PostExecutionStateRoot())


If the last-settled block has no transactions (allowed by SAE) and len(settles) > 1 then b.LastSettled().ParentBlock() will have the same post-execution state root and we'll un-track it early.

keep := b.LastSettled() for _, s := range settles { if s.Hash() != keep.Hash() { vm.blocks.Delete(s.Hash()) } // If `s` is the parent of `keep` and the latter has no transactions // then we MUST NOT dereference the state root too early. if r := s.PostExecutionStateRoot(); r != keep.PostExecutionStateRoot() { vm.exec.Untrack(r) } } if h := parentLastSettled.Hash(); h != keep.Hash() { // i.e. `parentLastSettled` was the last block's `keep` vm.blocks.Delete(h) vm.exec.Untrack(parentLastSettled.PostExecutionStateRoot()) }

I presume this would be catastrophic as block-building couldn't then open the settled state, so such a scenario requires a test.

I think the second half of my block is incorrect and should follow the same pattern as in the loop.

Each time a state root is seen by the Tracker, a reference is added. Thus, it won't removed until Untrack is called as many times as Track was called. Since each block will call Track, even if the state root is not unique, everything works fine

Added a comment at the location of use (here) to explain that we don't need to check for duplicate roots

…erence

alarso16 added 2 commits February 25, 2026 10:51

feat(saexec): Remove stale tries from memory

a0bbb03

fix: typo

2af3b20

alarso16 commented Feb 25, 2026

View reviewed changes

saedb/saedb.go Outdated Show resolved Hide resolved

alarso16 added 3 commits February 25, 2026 11:23

fix: test failures

faef651

fix: edge case

d791625

chore: linter

fc48ae5

alarso16 marked this pull request as ready for review February 25, 2026 16:56

alarso16 requested review from ARR4N and StephenButtolph as code owners February 25, 2026 16:56

alarso16 added 2 commits February 25, 2026 12:27

refactor: simplify bounds check

24cefa2

style: clarify tracked roots

49408a7

alarso16 commented Feb 25, 2026

View reviewed changes

alarso16 added 3 commits February 25, 2026 14:24

docs: Clarify concurrency

a8f2251

fix: error handling

dd2e54d

fix: Close errors

dde89d1

alarso16 commented Feb 25, 2026

View reviewed changes

JonathanOppenheimer previously requested changes Feb 25, 2026

View reviewed changes

JonathanOppenheimer reviewed Feb 25, 2026

View reviewed changes

saexec/saexec_test.go Outdated Show resolved Hide resolved

ARR4N requested changes Feb 26, 2026

View reviewed changes

alarso16 added 2 commits February 26, 2026 09:40

refactor: move packages

5beb5ba

style: comments

97c72f4

alarso16 requested review from ARR4N and JonathanOppenheimer February 26, 2026 16:11

chore: nit comments

9f6102c

JonathanOppenheimer reviewed Feb 26, 2026

View reviewed changes

worstcase/state_benchmark_test.go Outdated Show resolved Hide resolved

alarso16 mentioned this pull request Feb 26, 2026

refactor: Unify types.MakeSigner calls #247

Merged

JonathanOppenheimer assigned alarso16 Feb 26, 2026

Merge remote-tracking branch 'origin/main' into alarso16/triedb-deref…

61bff75

…erence

ARR4N requested changes Mar 3, 2026

View reviewed changes

alarso16 added 2 commits March 3, 2026 11:04

refactor: move worstcase constructor

49b9673

style: unnecessary parentheses

af139a7

alarso16 requested a review from ARR4N March 3, 2026 16:10

Merge remote-tracking branch 'origin/main' into alarso16/triedb-deref…

63ae3f3

…erence

ARR4N reviewed Mar 5, 2026

View reviewed changes

alarso16 added 3 commits March 5, 2026 09:57

refactor: move interface

c2eadf5

chore: lint

45f68a7

test: add regression for bug

81557cd

alarso16 force-pushed the alarso16/triedb-dereference branch from 0cd9b80 to 81557cd Compare March 5, 2026 21:56

ARR4N mentioned this pull request Mar 6, 2026

feat(sae): implement StateAtBlock and StateAtTransaction for state tracing support #245

Merged

alarso16 marked this pull request as draft March 6, 2026 16:12

alarso16 added 5 commits March 6, 2026 11:17

fix: Address bug

616a181

chore: cleanup names

28607ac

test: ensure drop works

3913728

Merge remote-tracking branch 'origin/main' into alarso16/triedb-deref…

c05417b

…erence

feat: Add archival mode

5d938de

alarso16 changed the title ~~perf(saexec): Remove stale tries from memory~~ perf(saexec): Remove TrieDB memory leak Mar 6, 2026

alarso16 commented Mar 6, 2026

View reviewed changes

alarso16 marked this pull request as ready for review March 6, 2026 20:27

chore: rename functions

1db3ce7

alarso16 force-pushed the alarso16/triedb-dereference branch from 16a0619 to 1db3ce7 Compare March 10, 2026 13:31

Merge remote-tracking branch 'origin/main' into alarso16/triedb-deref…

12db2e2

…erence

ARR4N reviewed Mar 13, 2026

View reviewed changes

alarso16 added 2 commits March 13, 2026 10:58

chore: comments

6fd072a

Merge remote-tracking branch 'origin/main' into alarso16/triedb-deref…

d936db4

…erence

Conversation

alarso16 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ARR4N left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ARR4N left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ARR4N left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alarso16 Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ARR4N left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alarso16 commented Feb 25, 2026 •

edited

Loading

alarso16 Mar 5, 2026 •

edited

Loading