openai api: max_tokens -> max_completion_tokens by shihaobai · Pull Request #1222 · ModelTC/LightLLM

shihaobai · 2026-03-09T07:56:58Z

No description provided.

gemini-code-assist · 2026-03-09T07:57:14Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the handling of maximum token generation within the LightLLM API. It introduces a more explicit parameter, max_completion_tokens, to replace the generic max_tokens, which is now marked for deprecation. Additionally, it enhances the flexibility of max_new_tokens by allowing a value of -1 to signify dynamic calculation of the maximum possible output length, and improves the structure of parameter validation by separating length-specific checks.

Highlights

API Parameter Deprecation: The max_tokens parameter in CompletionRequest and ChatCompletionRequest models has been deprecated. Users are now encouraged to use the new max_completion_tokens parameter instead.
Flexible Max New Tokens: The default value for max_new_tokens in sampling parameters has been changed from 16 to -1. When max_new_tokens is set to -1, it will dynamically be calculated as the maximum possible tokens remaining after the prompt, up to max_req_total_len.
Refactored Length Verification: Length-related validation logic for max_new_tokens and min_new_tokens has been extracted into a new verify_length method within SamplingParams classes, improving code organization and reusability.
API Handler Updates: The chat_completions_impl and completions_impl functions now prioritize max_completion_tokens for setting max_new_tokens, falling back to the deprecated max_tokens if max_completion_tokens is not provided.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lightllm/server/api_models.py
- Imported deprecated from typing_extensions to mark parameters as deprecated.
- Deprecated the max_tokens field in CompletionRequest and ChatCompletionRequest models.
- Added a new max_completion_tokens field to CompletionRequest and ChatCompletionRequest models as the preferred alternative.
lightllm/server/api_openai.py
- Modified chat_completions_impl and completions_impl to conditionally set max_new_tokens based on request.max_completion_tokens first, then request.max_tokens.
- Adjusted the assignment of stop_sequences to ensure it's only set if request.stop is not None.
lightllm/server/core/objs/py_sampling_params.py
- Changed the default value of max_new_tokens in __init__ from 16 to -1.
- Removed length-related validation checks (max_new_tokens, min_new_tokens) from the verify method.
- Created a new verify_length method to encapsulate the length-related validation logic.
lightllm/server/core/objs/sampling_params.py
- Updated the default value of max_new_tokens in the init method from 16 to -1.
- Moved length-related validation checks from the verify method to a newly introduced verify_length method.
lightllm/server/httpserver/manager.py
- Added logic within _check_and_repair_length to calculate sampling_params.max_new_tokens as self.max_req_total_len - prompt_tokens when its value is -1.
- Invoked the new sampling_params.verify_length() method to ensure length parameters are valid after potential adjustments.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request renames max_tokens to max_completion_tokens to align with the OpenAI API, deprecating the old parameter. The changes are logical and well-contained, including a positive change to how the default maximum tokens are calculated. However, there is a significant issue in the implementation: the deprecated max_tokens parameter incorrectly takes precedence over the new max_completion_tokens parameter. This should be reversed to ensure the new API is adopted correctly.

_{Note: Security Review did not run due to the size of the PR.}

gemini-code-assist · 2026-03-09T08:03:46Z

lightllm/server/api_openai.py

+    if request.max_completion_tokens is not None:
+        sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
+    if request.max_tokens is not None:
+        sampling_params_dict["max_new_tokens"] = request.max_tokens


The current logic for setting max_new_tokens gives precedence to the deprecated max_tokens parameter. If both max_completion_tokens and max_tokens are provided, the value from max_tokens will overwrite the one from max_completion_tokens. To ensure the new parameter is prioritized, this logic should be adjusted.

Suggested change

if request.max_completion_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_completion_tokens

if request.max_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_tokens

if request.max_completion_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_completion_tokens

elif request.max_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_tokens

gemini-code-assist · 2026-03-09T08:03:46Z

lightllm/server/api_openai.py

+    if request.max_completion_tokens is not None:
+        sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
+    if request.max_tokens is not None:
+        sampling_params_dict["max_new_tokens"] = request.max_tokens


Similar to the chat completions endpoint, the logic here incorrectly gives precedence to the deprecated max_tokens parameter. The new max_completion_tokens parameter should be prioritized to ensure correct behavior and a smooth transition for users.

Suggested change

if request.max_completion_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_completion_tokens

if request.max_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_tokens

if request.max_completion_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_completion_tokens

elif request.max_tokens is not None:

sampling_params_dict["max_new_tokens"] = request.max_tokens

refactor max_tokens

f4e5432

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai api: max_tokens -> max_completion_tokens#1222

openai api: max_tokens -> max_completion_tokens#1222
shihaobai wants to merge 1 commit intomainfrom
openai_update

shihaobai commented Mar 9, 2026

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Uh oh!

gemini-code-assist bot Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shihaobai commented Mar 9, 2026

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant