Skip to content

openai api: max_tokens -> max_completion_tokens#1222

Open
shihaobai wants to merge 1 commit intomainfrom
openai_update
Open

openai api: max_tokens -> max_completion_tokens#1222
shihaobai wants to merge 1 commit intomainfrom
openai_update

Conversation

@shihaobai
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the handling of maximum token generation within the LightLLM API. It introduces a more explicit parameter, max_completion_tokens, to replace the generic max_tokens, which is now marked for deprecation. Additionally, it enhances the flexibility of max_new_tokens by allowing a value of -1 to signify dynamic calculation of the maximum possible output length, and improves the structure of parameter validation by separating length-specific checks.

Highlights

  • API Parameter Deprecation: The max_tokens parameter in CompletionRequest and ChatCompletionRequest models has been deprecated. Users are now encouraged to use the new max_completion_tokens parameter instead.
  • Flexible Max New Tokens: The default value for max_new_tokens in sampling parameters has been changed from 16 to -1. When max_new_tokens is set to -1, it will dynamically be calculated as the maximum possible tokens remaining after the prompt, up to max_req_total_len.
  • Refactored Length Verification: Length-related validation logic for max_new_tokens and min_new_tokens has been extracted into a new verify_length method within SamplingParams classes, improving code organization and reusability.
  • API Handler Updates: The chat_completions_impl and completions_impl functions now prioritize max_completion_tokens for setting max_new_tokens, falling back to the deprecated max_tokens if max_completion_tokens is not provided.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lightllm/server/api_models.py
    • Imported deprecated from typing_extensions to mark parameters as deprecated.
    • Deprecated the max_tokens field in CompletionRequest and ChatCompletionRequest models.
    • Added a new max_completion_tokens field to CompletionRequest and ChatCompletionRequest models as the preferred alternative.
  • lightllm/server/api_openai.py
    • Modified chat_completions_impl and completions_impl to conditionally set max_new_tokens based on request.max_completion_tokens first, then request.max_tokens.
    • Adjusted the assignment of stop_sequences to ensure it's only set if request.stop is not None.
  • lightllm/server/core/objs/py_sampling_params.py
    • Changed the default value of max_new_tokens in __init__ from 16 to -1.
    • Removed length-related validation checks (max_new_tokens, min_new_tokens) from the verify method.
    • Created a new verify_length method to encapsulate the length-related validation logic.
  • lightllm/server/core/objs/sampling_params.py
    • Updated the default value of max_new_tokens in the init method from 16 to -1.
    • Moved length-related validation checks from the verify method to a newly introduced verify_length method.
  • lightllm/server/httpserver/manager.py
    • Added logic within _check_and_repair_length to calculate sampling_params.max_new_tokens as self.max_req_total_len - prompt_tokens when its value is -1.
    • Invoked the new sampling_params.verify_length() method to ensure length parameters are valid after potential adjustments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request renames max_tokens to max_completion_tokens to align with the OpenAI API, deprecating the old parameter. The changes are logical and well-contained, including a positive change to how the default maximum tokens are calculated. However, there is a significant issue in the implementation: the deprecated max_tokens parameter incorrectly takes precedence over the new max_completion_tokens parameter. This should be reversed to ensure the new API is adopted correctly.

Note: Security Review did not run due to the size of the PR.

Comment on lines +212 to +215
if request.max_completion_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
if request.max_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for setting max_new_tokens gives precedence to the deprecated max_tokens parameter. If both max_completion_tokens and max_tokens are provided, the value from max_tokens will overwrite the one from max_completion_tokens. To ensure the new parameter is prioritized, this logic should be adjusted.

Suggested change
if request.max_completion_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
if request.max_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_tokens
if request.max_completion_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
elif request.max_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_tokens

Comment on lines +546 to +549
if request.max_completion_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
if request.max_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the chat completions endpoint, the logic here incorrectly gives precedence to the deprecated max_tokens parameter. The new max_completion_tokens parameter should be prioritized to ensure correct behavior and a smooth transition for users.

Suggested change
if request.max_completion_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
if request.max_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_tokens
if request.max_completion_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_completion_tokens
elif request.max_tokens is not None:
sampling_params_dict["max_new_tokens"] = request.max_tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant