Add manage command to resync preprint dois v1#11617
Add manage command to resync preprint dois v1#11617Vlad0n20 wants to merge 1 commit intoCenterForOpenScience:feature/pbs-26-2from
Conversation
cslzchen
left a comment
There was a problem hiding this comment.
Looks good overall. In addition to my questions/comments:
- Can we add logs of the output of the logs for you local run?
- We should also work with CE to test this command with a copy of production DB.
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| RATE_LIMIT_SLEEP = 60 * 5 |
There was a problem hiding this comment.
Nit-picking: let's put a comment mentioning this is 5 min and what this rate limit does.
| return qs | ||
|
|
||
|
|
||
| def resync_preprint_dois_v1(dry_run=True, batch_size=0, rate_limit=100, provider_id=None): |
There was a problem hiding this comment.
The default batch_size should not be 0. If we we have a huge query set and we run this without providing a batch size, it may take a long time or even get stuck and killed.
| queued += 1 | ||
| continue | ||
|
|
||
| if rate_limit and not record_number % rate_limit: |
There was a problem hiding this comment.
Curious on the reason that led us to rate limit every 100 (default) items?
In addition, should batch size always larger than and be multiples of the rate limit?
| ) | ||
|
|
||
| if batch_size: | ||
| preprints_iterable = preprints_to_update[:batch_size] |
There was a problem hiding this comment.
So if we are doing it in batches, should we have another for loop to loop on each batch? Or what the batch does here is just to do the first batch size items and we had to manually run this command again?
|
|
||
| queued = 0 | ||
| skipped = 0 | ||
| for record_number, preprint in enumerate(preprints_iterable, 1): |
There was a problem hiding this comment.
Are there any exceptions that we can catch and continue to the loop instead of error and quit?
I suggest adding errored = 0 to track errored ones.
Ticket
Purpose
Changes
Side Effects
QE Notes
CE Notes
Documentation