feat(ethexe/node-loader): support multiple validator nodes by playX18 · Pull Request #5208 · gear-tech/gear

playX18 · 2026-03-10T05:35:51Z

Added support for multiple-validator nodes in node-loader. It will randomly switch between them and reconenct to them if API fails. In case of too many fails connection is removed, and on all connections dead program terminates.

Also added start-local-network.sh script that allows users to start proper Vara.eth network with multiple validators that are all connected to single Anvil instance. It is also possible to start-up node-loader automatically this way.

semanticdiff-com · 2026-03-10T05:35:54Z

Review changes with

Changed Files

File	Status
ethexe/node-loader/src/args.rs	71% smaller
ethexe/node-loader/src/main.rs	46% smaller
ethexe/node-loader/src/batch.rs	10% smaller
ethexe/scripts/start-local-network.sh	Unsupported file format

gemini-code-assist · 2026-03-10T05:37:06Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for multiple validator nodes in the node-loader, improving its robustness and ability to handle API failures. It also includes a script to easily start a local Vara.eth network with multiple validators, streamlining the development and testing process.

Highlights

Multiple Validator Support: The node-loader now supports connecting to multiple validator nodes, enhancing resilience and load distribution.
Randomized Node Switching: The node-loader randomly switches between validator nodes and reconnects upon API failures.
Local Network Script: A new start-local-network.sh script simplifies setting up a Vara.eth network with multiple validators connected to a single Anvil instance.

Changelog

ethexe/node-loader/src/args.rs
- The ethexe_node argument was replaced with ethexe_nodes to accept a list of validator node endpoints.
ethexe/node-loader/src/batch.rs
- Introduced EthexeRpcPool to manage multiple RPC endpoints for validator nodes.
- Implemented retry logic with reconnection for RPC requests to handle potential failures.
- Modified BatchPool::new to accept a list of ethexe RPC URLs and initialize the EthexeRpcPool.
ethexe/node-loader/src/main.rs
- Updated BatchPool::new call to pass the list of ethexe nodes.
ethexe/scripts/start-local-network.sh
- Added a new script to automate the setup of a local Vara.eth network with multiple validators.

Activity

Added support for multiple validator nodes.
Implemented random switching and reconnection logic.
Introduced start-local-network.sh script for easy local network setup.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for multiple validator nodes in the node-loader. It adds an EthexeRpcPool to manage connections to multiple ethexe-node endpoints, with logic for random selection, reconnection, and retries on failure. A new script, start-local-network.sh, is also included to facilitate setting up a local test network with multiple validators.

My review focuses on the new connection management and retry logic. I've identified a potential race condition in connection handling that could lead to creating unnecessary connections, and significant code duplication in the retry logic for RPC calls. I've provided suggestions to improve both of these aspects for better performance and maintainability.

gemini-code-assist · 2026-03-10T05:40:02Z

ethexe/node-loader/src/batch.rs

+    async fn reconnect_client(
+        &self,
+        endpoint_idx: usize,
+        api: &Ethereum,
+    ) -> Result<Arc<VaraEthApi>> {
+        let endpoint = self
+            .endpoints
+            .get(endpoint_idx)
+            .ok_or_else(|| anyhow!("invalid endpoint index: {endpoint_idx}"))?;
+
+        tracing::warn!(
+            endpoint_idx,
+            endpoint = %endpoint.url,
+            "Connecting ethexe RPC client"
+        );
+
+        let client = Arc::new(VaraEthApi::new(&endpoint.url, api.clone()).await?);
+        let mut lock = endpoint.client.write().await;
+        *lock = Some(client.clone());
+
+        tracing::info!(
+            endpoint_idx,
+            endpoint = %endpoint.url,
+            "Connected ethexe RPC client"
+        );
+
+        Ok(client)
+    }
+
+    async fn get_or_connect_client(
+        &self,
+        endpoint_idx: usize,
+        api: &Ethereum,
+    ) -> Result<Arc<VaraEthApi>> {
+        let endpoint = self
+            .endpoints
+            .get(endpoint_idx)
+            .ok_or_else(|| anyhow!("invalid endpoint index: {endpoint_idx}"))?;
+
+        if let Some(client) = endpoint.client.read().await.clone() {
+            return Ok(client);
+        }
+
+        self.reconnect_client(endpoint_idx, api).await
+    }


There's a potential race condition in get_or_connect_client that can lead to creating unnecessary connections. If multiple threads call this function for an unconnected endpoint, they might all see that the client is None and proceed to call reconnect_client, resulting in multiple connections being established where only one is needed. The last one to acquire the write lock will win, and the other connections will be dropped.

To fix this, you can use a double-checked locking pattern. This involves checking for the client with a read lock, then acquiring a write lock and checking again before creating a new connection. This ensures that only one thread will create the connection.

I suggest refactoring get_or_connect_client and inlining reconnect_client to implement this pattern.

async fn get_or_connect_client( &self, endpoint_idx: usize, api: &Ethereum, ) -> Result<Arc<VaraEthApi>> { let endpoint = self .endpoints .get(endpoint_idx) .ok_or_else(|| anyhow!("invalid endpoint index: {endpoint_idx}"))?; // First, check with a read lock for efficiency. if let Some(client) = endpoint.client.read().await.clone() { return Ok(client); } // If no client, acquire a write lock to create one. let mut client_guard = endpoint.client.write().await; // Double-check in case another thread created the client while we were waiting for the lock. if let Some(client) = client_guard.clone() { return Ok(client); } // Still no client, so we are the one to create it. tracing::warn!( endpoint_idx, endpoint = %endpoint.url, "Connecting ethexe RPC client" ); let client = Arc::new(VaraEthApi::new(&endpoint.url, api.clone()).await?); *client_guard = Some(client.clone()); tracing::info!( endpoint_idx, endpoint = %endpoint.url, "Connected ethexe RPC client" ); Ok(client) }

gemini-code-assist · 2026-03-10T05:40:02Z

ethexe/node-loader/src/batch.rs

+    async fn request_code_validation(
+        &self,
+        endpoint_idx: usize,
+        api: &Ethereum,
+        code: &[u8],
+    ) -> Result<CodeId> {
+        for attempt in 1..=RPC_MAX_ATTEMPTS {
+            let client = match self.get_or_connect_client(endpoint_idx, api).await {
+                Ok(client) => client,
+                Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => {
+                    tracing::warn!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "failed to acquire ethexe RPC client; reconnecting and retrying"
+                    );
+                    self.invalidate_client(endpoint_idx).await;
+                    continue;
+                }
+                Err(err) => {
+                    tracing::error!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "failed to acquire ethexe RPC client"
+                    );
+                    return Err(err);
+                }
+            };
+
+            match client.router().request_code_validation(code).await {
+                Ok((_, code_id)) => return Ok(code_id),
+                Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => {
+                    tracing::warn!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "request_code_validation failed; reconnecting and retrying"
+                    );
+                    self.invalidate_client(endpoint_idx).await;
+                }
+                Err(err) => {
+                    tracing::error!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "request_code_validation failed"
+                    );
+                    return Err(err.into());
+                }
+            }
+        }
+
+        Err(anyhow!("request_code_validation exhausted retries"))
+    }
+
+    async fn wait_for_code_validation(
+        &self,
+        endpoint_idx: usize,
+        api: &Ethereum,
+        code_id: CodeId,
+    ) -> Result<()> {
+        for attempt in 1..=RPC_MAX_ATTEMPTS {
+            let client = match self.get_or_connect_client(endpoint_idx, api).await {
+                Ok(client) => client,
+                Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => {
+                    tracing::warn!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "failed to acquire ethexe RPC client; reconnecting and retrying"
+                    );
+                    self.invalidate_client(endpoint_idx).await;
+                    continue;
+                }
+                Err(err) => {
+                    tracing::error!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "failed to acquire ethexe RPC client"
+                    );
+                    return Err(err);
+                }
+            };
+
+            match client.router().wait_for_code_validation(code_id).await {
+                Ok(_) => return Ok(()),
+                Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => {
+                    tracing::warn!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "wait_for_code_validation failed; reconnecting and retrying"
+                    );
+                    self.invalidate_client(endpoint_idx).await;
+                }
+                Err(err) => {
+                    tracing::error!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "wait_for_code_validation failed"
+                    );
+                    return Err(err.into());
+                }
+            }
+        }
+
+        Err(anyhow!("wait_for_code_validation exhausted retries"))
+    }
+
+    async fn send_message_injected(
+        &self,
+        endpoint_idx: usize,
+        api: &Ethereum,
+        actor: ActorId,
+        payload: &[u8],
+        value: u128,
+    ) -> Result<MessageId> {
+        for attempt in 1..=RPC_MAX_ATTEMPTS {
+            let client = match self.get_or_connect_client(endpoint_idx, api).await {
+                Ok(client) => client,
+                Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => {
+                    tracing::warn!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "failed to acquire ethexe RPC client; reconnecting and retrying"
+                    );
+                    self.invalidate_client(endpoint_idx).await;
+                    continue;
+                }
+                Err(err) => {
+                    tracing::error!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "failed to acquire ethexe RPC client"
+                    );
+                    return Err(err);
+                }
+            };
+
+            match client
+                .mirror(actor)
+                .send_message_injected(payload, value)
+                .await
+            {
+                Ok(mid) => return Ok(mid),
+                Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => {
+                    tracing::warn!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "send_message_injected failed; reconnecting and retrying"
+                    );
+                    self.invalidate_client(endpoint_idx).await;
+                }
+                Err(err) => {
+                    tracing::error!(
+                        endpoint_idx,
+                        attempt,
+                        max_attempts = RPC_MAX_ATTEMPTS,
+                        error = %err,
+                        "send_message_injected failed"
+                    );
+                    return Err(err.into());
+                }
+            }
+        }
+
+        Err(anyhow!("send_message_injected exhausted retries"))
+    }


The retry logic is duplicated across request_code_validation, wait_for_code_validation, and send_message_injected. This makes the code harder to maintain and prone to errors if one of them is updated and the others are not.

You can extract this logic into a generic helper function that takes a closure for the specific RPC call. This would reduce code duplication and improve maintainability.

Here's an example of how such a generic helper function could look:

async fn with_retry<T, F, Fut>(&self, endpoint_idx: usize, api: &Ethereum, call_name: &str, f: F) -> Result<T> where F: Fn(Arc<VaraEthApi>) -> Fut, Fut: std::future::Future<Output = Result<T>>, { for attempt in 1..=RPC_MAX_ATTEMPTS { let client = match self.get_or_connect_client(endpoint_idx, api).await { Ok(client) => client, Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => { tracing::warn!( endpoint_idx, attempt, max_attempts = RPC_MAX_ATTEMPTS, error = %err, "failed to acquire ethexe RPC client for {}; reconnecting and retrying", call_name ); self.invalidate_client(endpoint_idx).await; continue; } Err(err) => { tracing::error!( endpoint_idx, attempt, max_attempts = RPC_MAX_ATTEMPTS, error = %err, "failed to acquire ethexe RPC client for {}", call_name ); return Err(err); } }; match f(client).await { Ok(result) => return Ok(result), Err(err) if attempt < RPC_MAX_ATTEMPTS && is_retryable_rpc_error(&err) => { tracing::warn!( endpoint_idx, attempt, max_attempts = RPC_MAX_ATTEMPTS, error = %err, "{} failed; reconnecting and retrying", call_name ); self.invalidate_client(endpoint_idx).await; } Err(err) => { tracing::error!( endpoint_idx, attempt, max_attempts = RPC_MAX_ATTEMPTS, error = %err, "{} failed", call_name ); return Err(err.into()); } } } Err(anyhow!("{} exhausted retries", call_name)) }

You could then refactor request_code_validation like this:

async fn request_code_validation( &self, endpoint_idx: usize, api: &Ethereum, code: &[u8], ) -> Result<CodeId> { self.with_retry(endpoint_idx, api, "request_code_validation", |client| async move { client.router().request_code_validation(code).await.map(|(_, code_id)| code_id) }).await }

… connections immediately

playX18 added 2 commits March 10, 2026 12:32

feat(ethexe/node-loader): support multiple validator nodes

7cf229a

feat(ethexe/scripts): start-local-network.sh

f98e64a

playX18 requested a review from grishasobol March 10, 2026 05:35

playX18 self-assigned this Mar 10, 2026

playX18 added A0-pleasereview PR is ready to be reviewed by the team D8-ethexe ethexe-related PR labels Mar 10, 2026

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

playX18 added 3 commits March 10, 2026 13:25

scripts: start node-loader in container as well, add proper cli options

44eb881

ethexe-node-loader: Add timeout between reconnects and switch to live…

088576d

… connections immediately

clippy

51f836f

playX18 mentioned this pull request Mar 10, 2026

ethexe: potential problem with usage of rayon #5212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ethexe/node-loader): support multiple validator nodes#5208

feat(ethexe/node-loader): support multiple validator nodes#5208
playX18 wants to merge 5 commits intomasterfrom
ap-multi-validator-loader

playX18 commented Mar 10, 2026

Uh oh!

semanticdiff-com bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

gemini-code-assist bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

playX18 commented Mar 10, 2026

Uh oh!

semanticdiff-com bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

semanticdiff-com bot commented Mar 10, 2026 •

edited

Loading