Skip to content

Increase poll log interval to reduce log spam#541

Draft
kmontemayor2-sc wants to merge 5 commits intomainfrom
kmonte/bump-gs-polling-timeout
Draft

Increase poll log interval to reduce log spam#541
kmontemayor2-sc wants to merge 5 commits intomainfrom
kmonte/bump-gs-polling-timeout

Conversation

@kmontemayor2-sc
Copy link
Collaborator

Scope of work done

Since dataset building is O(minutes) we might as well bump the log interval time to reduce log spam.

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Supports both GcsUri (production) and LocalUri (testing).
timeout: Maximum time in seconds to wait for the signal. Defaults to 3600.
poll_interval: Time in seconds between poll attempts. Defaults to 10.
log_every_n_attempts: Number of attempts between log messages. Defaults to 60.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log_every_n_attempts: Number of attempts between log messages. Defaults to 60.
log_every_n_attempts: Number of attempts between log messages. Defaults to 60. i.e. with poll_interval set to 10, and log_every_n_attempts set to 60, we will log ever 600 seconds which is every 10 minutes.

10 minutes isnt too much? Usually I consider something hanging if no logs for more than 2 -4 mins.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We log updates every 10 minutes, I think every compute rank (e.g. num gpus) dumping every minute is probably too frequent and kind of clogs up the logs.

I guess we can make it 5 minutes? And update the logs here to expect the next update.

@svij-sc
Copy link
Collaborator

svij-sc commented Mar 11, 2026

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants