Add manage command to resync preprint dois v1 by Vlad0n20 · Pull Request #11617 · CenterForOpenScience/osf.io

Vlad0n20 · 2026-03-02T19:39:37Z

Ticket

ENG-9044

Purpose

Changes

Side Effects

QE Notes

CE Notes

Documentation

cslzchen

Looks good overall. In addition to my questions/comments:

Can we add logs of the output of the logs for you local run?
We should also work with CE to test this command with a copy of production DB.

cslzchen · 2026-03-05T21:44:08Z

osf/management/commands/resync_preprint_dois_v1.py

+
+logger = logging.getLogger(__name__)
+
+RATE_LIMIT_SLEEP = 60 * 5


Nit-picking: let's put a comment mentioning this is 5 min and what this rate limit does.

cslzchen · 2026-03-05T21:52:54Z

osf/management/commands/resync_preprint_dois_v1.py

+    return qs
+
+
+def resync_preprint_dois_v1(dry_run=True, batch_size=0, rate_limit=100, provider_id=None):


The default batch_size should not be 0. If we we have a huge query set and we run this without providing a batch size, it may take a long time or even get stuck and killed.

cslzchen · 2026-03-05T21:58:00Z

osf/management/commands/resync_preprint_dois_v1.py

+            queued += 1
+            continue
+
+        if rate_limit and not record_number % rate_limit:


Curious on the reason that led us to rate limit every 100 (default) items?

In addition, should batch size always larger than and be multiples of the rate limit?

cslzchen · 2026-03-05T21:59:25Z

osf/management/commands/resync_preprint_dois_v1.py

+    )
+
+    if batch_size:
+        preprints_iterable = preprints_to_update[:batch_size]


So if we are doing it in batches, should we have another for loop to loop on each batch? Or what the batch does here is just to do the first batch size items and we had to manually run this command again?

cslzchen · 2026-03-05T22:01:42Z

osf/management/commands/resync_preprint_dois_v1.py

+
+    queued = 0
+    skipped = 0
+    for record_number, preprint in enumerate(preprints_iterable, 1):


Are there any exceptions that we can catch and continue to the loop instead of error and quit?

I suggest adding errored = 0 to track errored ones.

Add manage command to resync preprint dois v1

1c049a4

cslzchen requested changes Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manage command to resync preprint dois v1#11617

Add manage command to resync preprint dois v1#11617
Vlad0n20 wants to merge 1 commit intoCenterForOpenScience:feature/pbs-26-2from
Vlad0n20:fix/ENG-9044

Vlad0n20 commented Mar 2, 2026 •

edited by atlassian bot

Loading

Uh oh!

cslzchen left a comment

Uh oh!

cslzchen Mar 5, 2026

Uh oh!

cslzchen Mar 5, 2026

Uh oh!

cslzchen Mar 5, 2026

Uh oh!

cslzchen Mar 5, 2026

Uh oh!

cslzchen Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		logger = logging.getLogger(__name__)

		RATE_LIMIT_SLEEP = 60 * 5

		return qs


		def resync_preprint_dois_v1(dry_run=True, batch_size=0, rate_limit=100, provider_id=None):

Conversation

Vlad0n20 commented Mar 2, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ticket

Purpose

Changes

Side Effects

QE Notes

CE Notes

Documentation

Uh oh!

cslzchen left a comment

Choose a reason for hiding this comment

Uh oh!

cslzchen Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

cslzchen Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

cslzchen Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

cslzchen Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

cslzchen Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vlad0n20 commented Mar 2, 2026 •

edited by atlassian bot

Loading