Skip to content

internv3.5 support#1660

Open
samaritan1998 wants to merge 10 commits intoTHUDM:mainfrom
samaritan1998:main
Open

internv3.5 support#1660
samaritan1998 wants to merge 10 commits intoTHUDM:mainfrom
samaritan1998:main

Conversation

@samaritan1998
Copy link

No description provided.

samaritan1998 and others added 10 commits February 27, 2026 22:03
- Add FSDP layer wrapping fallback for InternVL HF models
- Fix empty videos list causing IndexError in processor
- Fix list of tensors not being stacked in multimodal_train_inputs
- Add torch_dtype=bfloat16 for model loading
- Add WandB environment variables passthrough to Ray job
- Add InternVL image processing utilities
- Add KIE training shell script with cleanup and checks

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add automatic checkpoint conversion from HuggingFace format (training)
to SGLang format (inference) for InternVL models.

Changes:
- Add model_converter.py with HF->SGLang key mapping and QKV concatenation
- Add --convert-to-sglang and --sglang-model-path arguments
- Integrate conversion into train.py save() function
- Add convert_internvl_checkpoint.py CLI tool
- Add start_sglang_internvl.sh helper script
- Add model_conversion_guide.md documentation

Key features:
- Handles vision_tower -> vision_model naming
- Handles multi_modal_projector -> mlp1 naming
- Concatenates separate q/k/v_proj weights into qkv
- Automatically converts after each checkpoint save
- Preserves both HF and SGLang formats

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add debugging tools and documentation for InternVL multimodal training.

Changes:
- Add multimodal logging in sglang_rollout.py
- Add request dumping for debugging
- Convert <IMG_CONTEXT> to <image> placeholder for SGLang
- Add test_sglang_request.py for testing dumped requests
- Add internvl_training_lifecycle.md documentation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…g-fixes

Feature/internvl kie training fixes
- Use non-HF InternVL model for both training and inference
- Add img_context_token_id setting for non-HF InternVL models
- Update load_processor to detect and handle non-HF InternVL models
- Fix image placeholder handling (<image> vs <IMG_CONTEXT>)
- Convert numpy arrays to lists in build_processor_kwargs
- Add InternVL-specific data loading to preserve <image> placeholders
- Remove HF-to-SGLang checkpoint conversion (no longer needed)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…g-fixes

feat: support non-HF format InternVL training  Final
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…g-fixes

docs: add InternVL training documentation and summary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant