Skip to content

feat: add --lazy-multimodal-load to defer image process to rollout time#1623

Open
yzlnew wants to merge 1 commit intoTHUDM:mainfrom
yzlnew:feat-add-lazy-multimodal
Open

feat: add --lazy-multimodal-load to defer image process to rollout time#1623
yzlnew wants to merge 1 commit intoTHUDM:mainfrom
yzlnew:feat-add-lazy-multimodal

Conversation

@yzlnew
Copy link
Contributor

@yzlnew yzlnew commented Feb 25, 2026

Avoid OOM during Dataset init for large VLM datasets by deferring process_vision_info calls to rollout time. Controlled by the --lazy-multimodal-load flag (default off, preserving existing behavior).

Avoid OOM during Dataset init for large VLM datasets by deferring
process_vision_info calls to rollout time. Controlled by the
--lazy-multimodal-load flag (default off, preserving existing behavior).
index: int | None = None
# prompt
prompt: str | list[dict[str, str]] = ""
raw_prompt: list[dict[str, str]] | None = None # original conversation-format prompt for lazy multimodal loading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse the prompt field to store the origin conversation-format prompt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when passing --apply-chat-template, the original format is converted to string?

@yzlnew yzlnew changed the title feat: add --lazy-multimodal-load to defer image decoding to rollout time feat: add --lazy-multimodal-load to defer image process to rollout time Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants