Add support for FP32 and mixed precision in PDLP#910
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds FP32 PDLP support and an optional mixed-precision SpMV path. Core PDLP/PDHG and utility components are templated over a floating type f_t, new CLI flags control precision and mixed-SpMV, FP32 matrix copies and SpMV helpers are added, and many template instantiation guards were extended to enable float builds. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (6)
docs/cuopt/source/lp-qp-milp-settings.rst (1)
201-207: Prefer an explicit label for the Mixed Precision SpMV cross-reference.Using a named anchor avoids fragile section-title references and keeps links stable if headings change.
📚 Suggested doc tweak
-For an alternative that maintains FP64 accuracy while improving performance, see :ref:`Mixed Precision SpMV`. +For an alternative that maintains FP64 accuracy while improving performance, see :ref:`mixed-precision-spmv`. +.. _mixed-precision-spmv: Mixed Precision SpMV ^^^^^^^^^^^^^^^^^^^^As per coding guidelines for
docs/**/*, documentation should prioritize consistency and clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/cuopt/source/lp-qp-milp-settings.rst` around lines 201 - 207, Add a stable named anchor for the "Mixed Precision SpMV" section and use that label in the earlier cross-reference: insert a label line like ".. _mixed-precision-spmv:" immediately before the "Mixed Precision SpMV" heading, and replace the existing inline reference "see :ref:`Mixed Precision SpMV`" with "see :ref:`mixed-precision-spmv`" (or another consistent label name) so the link remains stable if the heading text changes.cpp/src/mip_heuristics/presolve/third_party_presolve.cpp (1)
33-46: Avoid full extra copies in same-type vector conversion.
convert_vectortakesconst&, so whenf_t == double, thedouble -> doublepath copies very large vectors (Ax, bounds, objective) unnecessarily before PSLP calls.Proposed refactor
-template <typename To, typename From> -std::vector<To> convert_vector(const std::vector<From>& src) +template <typename To, typename From> +std::vector<To> convert_vector(std::vector<From> src) { if constexpr (std::is_same_v<To, From>) { - return src; // No conversion needed + return std::move(src); // No conversion needed } else { std::vector<To> dst(src.size()); for (size_t i = 0; i < src.size(); ++i) { dst[i] = static_cast<To>(src[i]); } return dst; } } @@ -std::vector<double> h_coefficients = convert_vector<double>(h_coefficients_ft); -std::vector<double> h_obj_coeffs = convert_vector<double>(h_obj_coeffs_ft); -std::vector<double> h_var_lb = convert_vector<double>(h_var_lb_ft); -std::vector<double> h_var_ub = convert_vector<double>(h_var_ub_ft); -std::vector<double> h_constr_lb = convert_vector<double>(h_constr_lb_ft); -std::vector<double> h_constr_ub = convert_vector<double>(h_constr_ub_ft); +std::vector<double> h_coefficients = convert_vector<double>(std::move(h_coefficients_ft)); +std::vector<double> h_obj_coeffs = convert_vector<double>(std::move(h_obj_coeffs_ft)); +std::vector<double> h_var_lb = convert_vector<double>(std::move(h_var_lb_ft)); +std::vector<double> h_var_ub = convert_vector<double>(std::move(h_var_ub_ft)); +std::vector<double> h_constr_lb = convert_vector<double>(std::move(h_constr_lb_ft)); +std::vector<double> h_constr_ub = convert_vector<double>(std::move(h_constr_ub_ft));As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".
Also applies to: 295-300
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp` around lines 33 - 46, The current template convert_vector<To,From>(const std::vector<From>&) unconditionally returns a std::vector<To>, causing a full copy even when To==From; fix this by adding an overload for the identical-type case: implement template<typename T> const std::vector<T>& convert_vector(const std::vector<T>& src) { return src; } and keep the existing two-type template for actual conversions (std::vector<To> convert_vector(const std::vector<From>&) with static_cast loop); update usages (e.g., where convert_vector is called for Ax, bounds, objective and the other occurrence mentioned) to accept/handle a const reference when available to avoid unnecessary large-vector copies.benchmarks/linear_programming/cuopt/run_pdlp.cu (1)
193-198: Add fail-fast validation for unsupported flag combinations.
--pdlp-fp32/--mixed-precision-spmvconstraints are documented, but incompatible combinations are only surfaced later by solver validation. A small upfront check inmainwould return clearer CLI errors earlier and avoid unnecessary setup/parse work.💡 Suggested guard in
mainbool use_fp32 = program.get<bool>("--pdlp-fp32"); + const int method = program.get<int>("--method"); + const bool crossover_enabled = program.get<int>("--crossover") != 0; + const bool mixed_spmv = program.get<bool>("--mixed-precision-spmv"); + + if (use_fp32 && (method != static_cast<int>(cuopt::linear_programming::method_t::PDLP) || + crossover_enabled)) { + std::cerr << "--pdlp-fp32 is only supported for PDLP method without crossover.\n"; + return 1; + } + if (use_fp32 && mixed_spmv) { + std::cerr << "--mixed-precision-spmv has no effect in FP32 mode.\n"; + } if (use_fp32) { return run_solver<float>(program, handle_); } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/linear_programming/cuopt/run_pdlp.cu` around lines 193 - 198, The CLI currently defers reporting incompatible flag combinations until solver validation; add an early fail-fast check in main: read the boolean for "--pdlp-fp32" (use_fp32) and the boolean for "--mixed-precision-spmv" from program (same API used for program.get<bool>(...)) before calling run_solver<float>/run_solver<double>, and if the combination is unsupported (e.g. use_fp32 && mixed-precision-spmv true) print a clear error to stderr and return a non-zero exit code to abort setup immediately; place this guard just before the existing branch that calls run_solver to avoid unnecessary parsing/setup work.cpp/tests/linear_programming/pdlp_test.cu (1)
1930-2055: Add one FP32 test withmixed_precision_spmv=trueto verify no-op behavior.The PR contract says mixed precision SpMV has no effect in FP32 mode. Adding that explicit case would lock this behavior.
As per coding guidelines,
**/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1930 - 2055, Add a new FP32 unit test mirroring the existing float32_papilo_presolve_works/run_float32 pattern that sets solver_settings.mixed_precision_spmv = true and verifies numerical correctness (expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that solution.get_additional_termination_information().primal_objective matches afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works) to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs op_problem via cuopt::mps_parser::parse_mps<int,float>, sets solver_settings.method = cuopt::linear_programming::method_t::PDLP and solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_, op_problem, solver_settings), and asserts both termination status is CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for FP32.cpp/src/pdlp/cusparse_view.cu (1)
1091-1091: Consider removing unnecessary synchronization in initialization setup.Line 1091 synchronizes after mixed precision matrix transforms, but no CPU-side code depends on the transformed data before the solve loop. Downstream initialization functions (compute_initial_step_size, compute_initial_primal_weight) use scaled coefficients, not the float matrices. The mixed precision matrices are consumed only later in GPU kernels on the same stream, where stream ordering ensures correctness without explicit host-blocking synchronization.
Note: This is initialization code, not a hot solver iteration path, so the performance impact is minimal. However, the synchronization may be redundant for correctness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` at line 1091, Remove the blocking host synchronization call handle_ptr_->get_stream().synchronize() after creating the mixed-precision matrices; instead rely on CUDA stream ordering so GPU kernels that consume the mixed-precision data run on the same stream without host-side synchronize. Locate the call in cusparse_view.cu (the initialization path where mixed-precision transforms are performed) and delete the synchronize() invocation; keep the rest of the initialization (including compute_initial_step_size and compute_initial_primal_weight) unchanged since they only use scaled coefficients on the host and do not require the float matrices to be host-visible before the solve kernels execute.cpp/src/pdlp/pdlp.cu (1)
2974-2996: Consider adding a clarifying comment for the asymmetric instantiation macros.The
PDLP_INSTANTIATE_FLOATvsMIP_INSTANTIATE_DOUBLEasymmetry is intentional (FP32 is PDLP-specific, while MIP always uses FP64 and relies on PDLP double instantiation). A brief comment would help future maintainers understand this design decision.📝 Suggested documentation
+// PDLP supports both FP32 and FP64 precision modes. +// Float instantiation is controlled by PDLP_INSTANTIATE_FLOAT (PDLP-specific). +// Double instantiation uses MIP_INSTANTIATE_DOUBLE since MIP depends on PDLP<double>. `#if` PDLP_INSTANTIATE_FLOAT template class pdlp_solver_t<int, float>;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 2974 - 2996, Add a short clarifying comment above the asymmetric instantiation blocks explaining why PDLP_INSTANTIATE_FLOAT and MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is PDLP-specific while MIP always uses FP64 and depends on PDLP double instantiation; place the comment immediately above the two preprocessor blocks that instantiate pdlp_solver_t and the compute_weights_initial_primal_weight_from_squared_norms function templates so future maintainers see the rationale for the asymmetry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/math_optimization/solver_settings.cu`:
- Around line 74-75: The FP32 default tolerances
(CUOPT_PRIMAL_INFEASIBLE_TOLERANCE, CUOPT_DUAL_INFEASIBLE_TOLERANCE and
CUOPT_MIP_ABSOLUTE_GAP) use f_t(1e-10) which is below float epsilon and may
underflow to denorm/zero; update the default construction to clamp to the type's
machine epsilon (e.g., replace raw f_t(1e-10) with std::max(f_t(1e-10),
std::numeric_limits<f_t>::epsilon()) or choose separate defaults for float vs
double) so pdlp_settings.tolerances.primal_infeasible_tolerance and
dual_infeasible_tolerance (and the MIP absolute gap default) are representable
in FP32.
In `@cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu`:
- Around line 182-184: The float instantiation guard in simple_rounding.cu is
using PDLP_INSTANTIATE_FLOAT which is inconsistent with the double guard
(MIP_INSTANTIATE_DOUBLE) and other MIP files; update the preprocessor check
around the INSTANTIATE(float) call to use MIP_INSTANTIATE_FLOAT to match the
file's MIP-specific instantiation pattern (or, only if this code truly needs to
be shared with PDLP, use the combined guard MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT) so that INSTANTIATE(float) follows the same macro
convention as INSTANTIATE(double) and other MIP files.
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 599-667: The code unconditionally allocates mixed-precision
buffers and creates mixed cuSPARSE descriptors (A_float_, A_T_float_, A_mixed_,
A_T_mixed_, buffer_non_transpose_mixed_, buffer_transpose_mixed_, and calls to
mixed_precision_spmv_buffersize) whenever enable_mixed_precision_spmv is true,
which wastes memory for batch paths that use SpMM on the FP64 descriptors; wrap
the entire mixed-precision allocation block so it only runs when mixed precision
is enabled AND the current execution will use SpMV (e.g., !is_batch_mode or a
flag like use_spmv) — move the allocations, cub::DeviceTransform calls,
cusparseCreateCsr calls, and buffer_size computations inside that guard and skip
them in batch/SpMM execution paths to avoid unnecessary memory use.
- Around line 606-616: The two unwrapped calls to
cub::DeviceTransform::Transform that convert doubles to floats (the calls using
double_to_float_functor with sources op_problem_scaled.coefficients and A_T_.
and destinations A_float_ and A_T_float_ respectively, using
handle_ptr->get_stream().value()) must be wrapped with RAFT_CUDA_TRY to surface
CUDA errors; locate the Transform invocations and wrap each call with
RAFT_CUDA_TRY( ... ) so any returned cudaError_t is checked, and apply the same
change for the identical pair of Transform calls found later in the file.
In `@cpp/src/pdlp/cusparse_view.hpp`:
- Around line 201-207: Replace the raw cusparseSpMatDescr_t handles A_mixed_ and
A_T_mixed_ with the existing RAII wrapper type cusparse_sp_mat_descr_wrapper_t
(i.e., change their declarations to use cusparse_sp_mat_descr_wrapper_t instead
of cusparseSpMatDescr_t), update the code paths that call cusparseCreateCsr to
assign into these wrappers, and remove any manual calls to cusparseDestroySpMat
(relying on the wrapper destructor). Ensure all references to A_mixed_,
A_T_mixed_, cusparseCreateCsr and mixed_precision_enabled_ compile against the
wrapper API so the descriptors are automatically destroyed.
In `@cpp/src/pdlp/pdhg.hpp`:
- Around line 32-33: The constructor signature in pdhg.hpp currently defaults
enable_mixed_precision_spmv to true, which can silently change behavior; change
the default to false at the constructor boundary by updating the parameter
default from enable_mixed_precision_spmv = true to enable_mixed_precision_spmv =
false in the PDHG (pdhg) constructor declaration so callers that omit this
argument keep the documented default-off behavior.
In `@cpp/src/pdlp/solve.cu`:
- Around line 607-612: The FP32+crossover unsupported check currently inside the
template block (f_t) runs too late; move the validation into the beginning of
run_pdlp (or the earlier dispatch function that selects the f_t instantiation)
so invalid configurations fail fast. Specifically, add a guard that mirrors the
existing cuopt_expects(!settings.crossover, ...) check at the top of run_pdlp
(or immediately before selecting/instantiating the PDLP template) to throw a
ValidationError when std::is_same_v<f_t, float> (or when dispatch would pick
f_t=float) and settings.crossover==true; reference run_pdlp, the dispatch call
site for f_t, and the existing cuopt_expects check to locate and duplicate the
logic early in the call path.
---
Nitpick comments:
In `@benchmarks/linear_programming/cuopt/run_pdlp.cu`:
- Around line 193-198: The CLI currently defers reporting incompatible flag
combinations until solver validation; add an early fail-fast check in main: read
the boolean for "--pdlp-fp32" (use_fp32) and the boolean for
"--mixed-precision-spmv" from program (same API used for program.get<bool>(...))
before calling run_solver<float>/run_solver<double>, and if the combination is
unsupported (e.g. use_fp32 && mixed-precision-spmv true) print a clear error to
stderr and return a non-zero exit code to abort setup immediately; place this
guard just before the existing branch that calls run_solver to avoid unnecessary
parsing/setup work.
In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`:
- Around line 33-46: The current template convert_vector<To,From>(const
std::vector<From>&) unconditionally returns a std::vector<To>, causing a full
copy even when To==From; fix this by adding an overload for the identical-type
case: implement template<typename T> const std::vector<T>& convert_vector(const
std::vector<T>& src) { return src; } and keep the existing two-type template for
actual conversions (std::vector<To> convert_vector(const std::vector<From>&)
with static_cast loop); update usages (e.g., where convert_vector is called for
Ax, bounds, objective and the other occurrence mentioned) to accept/handle a
const reference when available to avoid unnecessary large-vector copies.
In `@cpp/src/pdlp/cusparse_view.cu`:
- Line 1091: Remove the blocking host synchronization call
handle_ptr_->get_stream().synchronize() after creating the mixed-precision
matrices; instead rely on CUDA stream ordering so GPU kernels that consume the
mixed-precision data run on the same stream without host-side synchronize.
Locate the call in cusparse_view.cu (the initialization path where
mixed-precision transforms are performed) and delete the synchronize()
invocation; keep the rest of the initialization (including
compute_initial_step_size and compute_initial_primal_weight) unchanged since
they only use scaled coefficients on the host and do not require the float
matrices to be host-visible before the solve kernels execute.
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2974-2996: Add a short clarifying comment above the asymmetric
instantiation blocks explaining why PDLP_INSTANTIATE_FLOAT and
MIP_INSTANTIATE_DOUBLE differ: FP32 instantiation is PDLP-specific while MIP
always uses FP64 and depends on PDLP double instantiation; place the comment
immediately above the two preprocessor blocks that instantiate pdlp_solver_t and
the compute_weights_initial_primal_weight_from_squared_norms function templates
so future maintainers see the rationale for the asymmetry.
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1930-2055: Add a new FP32 unit test mirroring the existing
float32_papilo_presolve_works/run_float32 pattern that sets
solver_settings.mixed_precision_spmv = true and verifies numerical correctness
(expect CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32); locate and modify the pdlp_test.cu tests around
TEST(pdlp_class, run_float32) or TEST(pdlp_class, float32_papilo_presolve_works)
to add TEST(pdlp_class, float32_mixed_precision_spmv_noop) which constructs
op_problem via cuopt::mps_parser::parse_mps<int,float>, sets
solver_settings.method = cuopt::linear_programming::method_t::PDLP and
solver_settings.mixed_precision_spmv = true, calls solve_lp(&handle_,
op_problem, solver_settings), and asserts both termination status is
CUOPT_TERIMINATION_STATUS_OPTIMAL and the primal objective equals
afiro_primal_objective_f32 to guarantee the mixed-precision flag is a no-op for
FP32.
In `@docs/cuopt/source/lp-qp-milp-settings.rst`:
- Around line 201-207: Add a stable named anchor for the "Mixed Precision SpMV"
section and use that label in the earlier cross-reference: insert a label line
like ".. _mixed-precision-spmv:" immediately before the "Mixed Precision SpMV"
heading, and replace the existing inline reference "see :ref:`Mixed Precision
SpMV`" with "see :ref:`mixed-precision-spmv`" (or another consistent label name)
so the link remains stable if the heading text changes.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (42)
benchmarks/linear_programming/cuopt/run_pdlp.cucpp/include/cuopt/linear_programming/pdlp/solver_settings.hppcpp/src/dual_simplex/sparse_matrix.cppcpp/src/math_optimization/solution_writer.cucpp/src/math_optimization/solution_writer.hppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/diversity/lns/rins.cucpp/src/mip_heuristics/local_search/rounding/simple_rounding.cucpp/src/mip_heuristics/mip_constants.hppcpp/src/mip_heuristics/presolve/gf2_presolve.cppcpp/src/mip_heuristics/presolve/third_party_presolve.cppcpp/src/mip_heuristics/problem/presolve_data.cucpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/solution/solution.cucpp/src/mip_heuristics/solver_solution.cucpp/src/pdlp/cpu_pdlp_warm_start_data.cucpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/pdlp/initial_scaling_strategy/initial_scaling.cucpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cucpp/src/pdlp/optimization_problem.cucpp/src/pdlp/pdhg.cucpp/src/pdlp/pdhg.hppcpp/src/pdlp/pdlp.cucpp/src/pdlp/pdlp_warm_start_data.cucpp/src/pdlp/restart_strategy/localized_duality_gap_container.cucpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/restart_strategy/weighted_average_solution.cucpp/src/pdlp/saddle_point.cucpp/src/pdlp/solution_conversion.cucpp/src/pdlp/solve.cucpp/src/pdlp/solver_settings.cucpp/src/pdlp/solver_solution.cucpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cucpp/src/pdlp/termination_strategy/convergence_information.cucpp/src/pdlp/termination_strategy/infeasibility_information.cucpp/src/pdlp/termination_strategy/termination_strategy.cucpp/src/pdlp/translate.hppcpp/src/pdlp/utilities/problem_checking.cucpp/tests/linear_programming/pdlp_test.cudocs/cuopt/source/lp-qp-features.rstdocs/cuopt/source/lp-qp-milp-settings.rst
cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
Outdated
Show resolved
Hide resolved
|
/ok to test 366fd6a |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tests/linear_programming/pdlp_test.cu (1)
54-59:⚠️ Potential issue | 🟠 MajorAdd NaN/Inf guard to
is_incorrect_objectiveto prevent masked solver regressions.The function accepts NaN values silently. When
objectiveis NaN,std::abs(objective)returns NaN, and the comparisons evaluate to false, causing the helper to incorrectly report the objective as valid. This can mask solver regressions. The codebase already usesstd::isfiniteextensively in solver components (pdlp, feasibility_jump, barrier, etc.) for this exact purpose.Suggested fix
template <typename f_t> static bool is_incorrect_objective(f_t reference, f_t objective) { + if (!std::isfinite(reference) || !std::isfinite(objective)) { return true; } if (reference == 0) { return std::abs(objective) > 0.01; } if (objective == 0) { return std::abs(reference) > 0.01; } return std::abs((reference - objective) / reference) > 0.01; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 54 - 59, The helper is_incorrect_objective must treat NaN/Inf as incorrect: at the start of is_incorrect_objective(f_t reference, f_t objective) check std::isfinite(reference) and std::isfinite(objective) and return true if either is not finite, then proceed with the existing zero checks and relative-difference logic; update includes/usings if necessary to ensure std::isfinite is available and referenced.
🧹 Nitpick comments (4)
cpp/src/pdlp/cpu_pdlp_warm_start_data.cu (1)
16-39: Consider batching synchronization for better performance (optional).Each helper function synchronizes after its copy operation. When
convert_to_cpu_warmstartorconvert_to_gpu_warmstartcopies 9 vector fields, this results in 9 separate synchronization points.For warm start data that isn't in a hot path, this is acceptable. However, if performance becomes a concern, consider refactoring to batch all copies and synchronize once at the end of the conversion functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu` around lines 16 - 39, The helpers device_to_host_vector and host_to_device_vector currently call stream.synchronize() after each raft::copy, causing multiple sync points when convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields; change the helpers to avoid per-copy synchronization by either (a) removing the stream.synchronize() calls and documenting that the caller must synchronize the stream once after batching copies, or (b) adding an optional parameter (e.g. bool do_sync = true) so callers can disable per-copy sync and perform a single stream.synchronize() at the end of convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_* functions to batch the copies and call stream.synchronize() once.cpp/tests/linear_programming/pdlp_test.cu (1)
1931-2054: Add an explicit FP32 “mixed_precision_spmv is a no-op” test.PR behavior states mixed-precision SpMV has no effect in FP32 mode, but this block doesn’t directly assert that contract yet.
Suggested test addition
+TEST(pdlp_class, float32_mixed_precision_spmv_no_effect) +{ + const raft::handle_t handle_{}; + auto path = make_path_absolute("linear_programming/afiro_original.mps"); + cuopt::mps_parser::mps_data_model_t<int, float> op_problem = + cuopt::mps_parser::parse_mps<int, float>(path, true); + + auto settings_base = pdlp_solver_settings_t<int, float>{}; + settings_base.method = cuopt::linear_programming::method_t::PDLP; + settings_base.mixed_precision_spmv = false; + + auto settings_mixed = settings_base; + settings_mixed.mixed_precision_spmv = true; + + auto solution_base = solve_lp(&handle_, op_problem, settings_base); + auto solution_mixed = solve_lp(&handle_, op_problem, settings_mixed); + + EXPECT_EQ((int)solution_base.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL); + EXPECT_EQ((int)solution_mixed.get_termination_status(), CUOPT_TERIMINATION_STATUS_OPTIMAL); + EXPECT_NEAR(solution_base.get_additional_termination_information().primal_objective, + solution_mixed.get_additional_termination_information().primal_objective, + 1e-2f); +}As per coding guidelines:
**/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error').🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 1931 - 2054, Add a test that explicitly verifies mixed-precision SpMV is a no-op in FP32: create a pdlp_solver_settings_t<int,float> (same as in run_float32) set method = cuopt::linear_programming::method_t::PDLP and set solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem, solver_settings), then assert termination status equals CUOPT_TERIMINATION_STATUS_OPTIMAL and that solution.get_additional_termination_information().primal_objective matches afiro_primal_objective_f32 (use is_incorrect_objective to check numerical equality), mirroring the checks in the existing run_float32/papilo/pslp tests so the behavior is explicitly tested.cpp/src/pdlp/pdhg.cu (1)
254-280: Consolidate repeated mixed-vs-standard SpMV dispatch into one helper.The same precision-dispatch branch is repeated in three methods. A shared helper would reduce drift risk and keep future algorithm/descriptor changes in one place.
♻️ Suggested refactor sketch
+template <typename i_t, typename f_t> +template <typename MatStd, typename MatMixed, typename VecX, typename VecY, typename BufStd, typename BufMixed> +inline void pdhg_solver_t<i_t, f_t>::run_spmv_dispatch(MatStd mat_std, + MatMixed mat_mixed, + VecX vec_x, + VecY vec_y, + BufStd buf_std, + BufMixed buf_mixed) +{ + if constexpr (std::is_same_v<f_t, double>) { + if (cusparse_view_.mixed_precision_enabled_) { + mixed_precision_spmv(handle_ptr_->get_cusparse_handle(), + CUSPARSE_OPERATION_NON_TRANSPOSE, + reusable_device_scalar_value_1_.data(), + mat_mixed, + vec_x, + reusable_device_scalar_value_0_.data(), + vec_y, + CUSPARSE_SPMV_CSR_ALG2, + buf_mixed, + stream_view_); + return; + } + } + RAFT_CUSPARSE_TRY(raft::sparse::detail::cusparsespmv(handle_ptr_->get_cusparse_handle(), + CUSPARSE_OPERATION_NON_TRANSPOSE, + reusable_device_scalar_value_1_.data(), + mat_std, + vec_x, + reusable_device_scalar_value_0_.data(), + vec_y, + CUSPARSE_SPMV_CSR_ALG2, + (f_t*)buf_std, + stream_view_)); +}As per coding guidelines,
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.Also applies to: 308-334, 356-382
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdhg.cu` around lines 254 - 280, Duplicate precision-dispatch SpMV logic should be extracted into a single helper: create a templated utility (e.g., spmv_dispatch or dispatch_spmv) that accepts the cusparse handle (handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars (reusable_device_scalar_value_1_.data(), reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers (cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers (cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args (CUSPARSE_SPMV_CSR_ALG2 and both buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream (stream_view_), then replace the repeated branches in pdhg.cu (and the other two occurrences) with calls to that helper which checks cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or raft::sparse::detail::cusparsespmv accordingly.cpp/src/pdlp/cusparse_view.cu (1)
1069-1082: Remove the blocking stream synchronization after mixed-precision matrix transforms.Line 1081's
handle_ptr_->get_stream().synchronize()blocks the host after submitting two matrix transforms (A_→A_float_andA_T_→A_T_float_) to the same GPU stream. Since subsequent mixed-precision SpMV operations are queued on the same stream, the hardware will ensure correct ordering without explicit host stall. Removing this synchronization eliminates an unnecessary pipeline blockage in the hot path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` around lines 1069 - 1082, The explicit host-side stall is caused by handle_ptr_->get_stream().synchronize() after launching two cub::DeviceTransform::Transform calls that convert A_→A_float_ and A_T_→A_T_float_; remove the synchronize call so the transforms run asynchronously on the same stream (GPU will preserve ordering) and let subsequent mixed-precision SpMV kernels queue on that stream without blocking the host; ensure no other code relies on the host wait and delete only the handle_ptr_->get_stream().synchronize() invocation referenced here.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 1151-1156: The current hard validation using cuopt_expects blocks
float-precision users; update solve_lp() to auto-map float precision to
method_t::PDLP when the user has not explicitly selected an incompatible method
(i.e., settings.method is still method_t::Concurrent/default) instead of
failing: detect float precision and if settings.method == method_t::Concurrent,
set settings.method = method_t::PDLP and emit a log/info message about the
override; keep the existing cuopt_expects check only for the case where the user
explicitly set an incompatible method (e.g., DualSimplex/Barrier) so an error is
still raised for explicit invalid combinations. Ensure you reference and modify
the code paths around solve_lp(), settings.method, and the float-precision
branch that currently calls run_pdlp().
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Line 23: Update the preprocessor guard that gates the PDLP float tests:
replace the overly-broad conditional that uses MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT with a check that only uses PDLP_INSTANTIATE_FLOAT so
PDLP float tests compile only when PDLP is instantiated for float; locate the
conditional in pdlp_test.cu (the lines surrounding the existing `#if`
MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT) and change it to `#if`
PDLP_INSTANTIATE_FLOAT, leaving the rest of the test code unchanged.
---
Outside diff comments:
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 54-59: The helper is_incorrect_objective must treat NaN/Inf as
incorrect: at the start of is_incorrect_objective(f_t reference, f_t objective)
check std::isfinite(reference) and std::isfinite(objective) and return true if
either is not finite, then proceed with the existing zero checks and
relative-difference logic; update includes/usings if necessary to ensure
std::isfinite is available and referenced.
---
Nitpick comments:
In `@cpp/src/pdlp/cpu_pdlp_warm_start_data.cu`:
- Around line 16-39: The helpers device_to_host_vector and host_to_device_vector
currently call stream.synchronize() after each raft::copy, causing multiple sync
points when convert_to_cpu_warmstart/convert_to_gpu_warmstart copy many fields;
change the helpers to avoid per-copy synchronization by either (a) removing the
stream.synchronize() calls and documenting that the caller must synchronize the
stream once after batching copies, or (b) adding an optional parameter (e.g.
bool do_sync = true) so callers can disable per-copy sync and perform a single
stream.synchronize() at the end of
convert_to_cpu_warmstart/convert_to_gpu_warmstart; update those convert_*
functions to batch the copies and call stream.synchronize() once.
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1069-1082: The explicit host-side stall is caused by
handle_ptr_->get_stream().synchronize() after launching two
cub::DeviceTransform::Transform calls that convert A_→A_float_ and
A_T_→A_T_float_; remove the synchronize call so the transforms run
asynchronously on the same stream (GPU will preserve ordering) and let
subsequent mixed-precision SpMV kernels queue on that stream without blocking
the host; ensure no other code relies on the host wait and delete only the
handle_ptr_->get_stream().synchronize() invocation referenced here.
In `@cpp/src/pdlp/pdhg.cu`:
- Around line 254-280: Duplicate precision-dispatch SpMV logic should be
extracted into a single helper: create a templated utility (e.g., spmv_dispatch
or dispatch_spmv) that accepts the cusparse handle
(handle_ptr_->get_cusparse_handle()), operation flag, alpha/beta device scalars
(reusable_device_scalar_value_1_.data(),
reusable_device_scalar_value_0_.data()), both mixed and standard matrix pointers
(cusparse_view_.A_mixed_, cusparse_view_.A), input/output buffers
(cusparse_view_.tmp_primal, cusparse_view_.dual_gradient), algorithm/buffer args
(CUSPARSE_SPMV_CSR_ALG2 and both
buffer_non_transpose_mixed_.data()/buffer_non_transpose.data()), and the stream
(stream_view_), then replace the repeated branches in pdhg.cu (and the other two
occurrences) with calls to that helper which checks
cusparse_view_.mixed_precision_enabled_ and calls mixed_precision_spmv or
raft::sparse::detail::cusparsespmv accordingly.
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 1931-2054: Add a test that explicitly verifies mixed-precision
SpMV is a no-op in FP32: create a pdlp_solver_settings_t<int,float> (same as in
run_float32) set method = cuopt::linear_programming::method_t::PDLP and set
solver_settings.mixed_precision_spmv = true, call solve_lp(&handle_, op_problem,
solver_settings), then assert termination status equals
CUOPT_TERIMINATION_STATUS_OPTIMAL and that
solution.get_additional_termination_information().primal_objective matches
afiro_primal_objective_f32 (use is_incorrect_objective to check numerical
equality), mirroring the checks in the existing run_float32/papilo/pslp tests so
the behavior is explicitly tested.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
cpp/src/dual_simplex/sparse_matrix.cppcpp/src/math_optimization/solution_writer.cucpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/local_search/rounding/simple_rounding.cucpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/solution/solution.cucpp/src/mip_heuristics/solver_solution.cucpp/src/pdlp/cpu_pdlp_warm_start_data.cucpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/pdlp/initial_scaling_strategy/initial_scaling.cucpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cucpp/src/pdlp/optimization_problem.cucpp/src/pdlp/pdhg.cucpp/src/pdlp/pdhg.hppcpp/src/pdlp/pdlp.cucpp/src/pdlp/pdlp_warm_start_data.cucpp/src/pdlp/restart_strategy/localized_duality_gap_container.cucpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/restart_strategy/weighted_average_solution.cucpp/src/pdlp/saddle_point.cucpp/src/pdlp/solution_conversion.cucpp/src/pdlp/solve.cucpp/src/pdlp/solver_settings.cucpp/src/pdlp/solver_solution.cucpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cucpp/src/pdlp/termination_strategy/convergence_information.cucpp/src/pdlp/termination_strategy/infeasibility_information.cucpp/src/pdlp/termination_strategy/termination_strategy.cucpp/src/pdlp/utilities/problem_checking.cucpp/tests/linear_programming/pdlp_test.cu
🚧 Files skipped from review as they are similar to previous changes (15)
- cpp/src/mip_heuristics/solution/solution.cu
- cpp/src/pdlp/optimal_batch_size_handler/optimal_batch_size_handler.cu
- cpp/src/pdlp/pdhg.hpp
- cpp/src/pdlp/solution_conversion.cu
- cpp/src/pdlp/pdlp_warm_start_data.cu
- cpp/src/pdlp/solver_solution.cu
- cpp/src/pdlp/saddle_point.cu
- cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
- cpp/src/pdlp/termination_strategy/termination_strategy.cu
- cpp/src/pdlp/termination_strategy/infeasibility_information.cu
- cpp/src/pdlp/solver_settings.cu
- cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
- cpp/src/pdlp/restart_strategy/weighted_average_solution.cu
- cpp/src/mip_heuristics/local_search/rounding/simple_rounding.cu
- cpp/src/mip_heuristics/problem/problem.cu
| // Float precision only supports PDLP without presolve/crossover | ||
| cuopt_expects(settings.method == method_t::PDLP, | ||
| error_type_t::ValidationError, | ||
| "Float precision only supports PDLP method. DualSimplex, Barrier, and Concurrent " | ||
| "require double precision."); | ||
| return run_pdlp(problem, settings, timer, is_batch_mode); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Check default method initialization =="
fd 'solver_settings\.hpp$' -o 'solver_settings\.cu$' | head -20 | xargs -r rg -n "method_t|method\s*=" | head -40
echo ""
echo "== Check FP32 flag handling and method assignment in CLI / dispatch =="
rg -n "pdlp-fp32|PDLP_INSTANTIATE_FLOAT|method_t::PDLP|CUOPT_METHOD" | head -60
echo ""
echo "== Check the solve.cu file and method dispatch =="
fd 'solve\.cu$' | head -10Repository: NVIDIA/cuopt
Length of output: 274
🏁 Script executed:
#!/bin/bash
# Let's specifically look at the solve.cu file around line 1151 and trace back
echo "== Context around line 1151-1156 in solve.cu =="
fd 'pdlp/solve\.cu$' | xargs -r sed -n '1140,1170p'
echo ""
echo "== Trace back to function entry point =="
fd 'pdlp/solve\.cu$' | xargs -r sed -n '1100,1160p' | head -80Repository: NVIDIA/cuopt
Length of output: 145
🏁 Script executed:
#!/bin/bash
# Search for how float settings are constructed and method is set
echo "== Search for solver_settings_t<float> instantiation =="
rg -n "solver_settings_t\s*<\s*float|pdlp_solver_settings_t\s*<\s*float" -A 5
echo ""
echo "== Search for method assignment before solve =="
rg -n "settings\.method\s*=" | grep -v "==" | head -20Repository: NVIDIA/cuopt
Length of output: 2752
🏁 Script executed:
#!/bin/bash
# Find the actual solve.cu file location
echo "== Find solve.cu files =="
find . -name "solve.cu" -type f 2>/dev/null | head -20
echo ""
echo "== Check which solve.cu contains the validation check =="
grep -r "Float precision only supports PDLP" --include="*.cu" --include="*.hpp" --include="*.h" 2>/dev/null | head -10Repository: NVIDIA/cuopt
Length of output: 438
🏁 Script executed:
#!/bin/bash
# Once we find the file, let's look at the broader context
FILE=$(grep -r "Float precision only supports PDLP" --include="*.cu" -l 2>/dev/null | head -1)
echo "Found file: $FILE"
if [ -n "$FILE" ]; then
echo ""
echo "== Lines 1100-1200 (broader context) =="
sed -n '1100,1200p' "$FILE" | head -100
fiRepository: NVIDIA/cuopt
Length of output: 4595
🏁 Script executed:
#!/bin/bash
# Search for float-related method handling and defaults
echo "== Search for float or FP32 in context of method selection =="
rg -n "float|FP32|fp32" -B 2 -A 2 --include="*.cu" --include="*.hpp" | grep -i "method\|pdlp\|float" | head -80
echo ""
echo "== Look for default method values in solver_settings =="
find . -name "*solver_settings*" -type f | xargs grep -n "method.*=" | grep -i "default\|=\s*method_t" | head -30Repository: NVIDIA/cuopt
Length of output: 633
🏁 Script executed:
#!/bin/bash
# Search for callers of solve_lp
echo "== Find callers of solve_lp =="
rg -n "solve_lp\(" --type cu --type cpp --type hpp | head -30
echo ""
echo "== Check solver_settings initialization/defaults =="
find . -name "*solver_settings*" -type f | head -10Repository: NVIDIA/cuopt
Length of output: 696
🏁 Script executed:
#!/bin/bash
# Look at solver_settings headers to find default method
fd "solver_settings" -t f | xargs -r cat 2>/dev/null | grep -n "method" | head -40Repository: NVIDIA/cuopt
Length of output: 1585
🏁 Script executed:
#!/bin/bash
# Check how float precision settings are handled in CLI or main dispatch
echo "== Search for float/FP32 handling in solve dispatch =="
rg -n "is_same.*float|float.*instantiate|PDLP_INSTANTIATE" -B 3 -A 3 | head -100
echo ""
echo "== Check for method override based on precision =="
rg -n "settings\.method\s*=\s*method_t::PDLP" | head -20Repository: NVIDIA/cuopt
Length of output: 7282
🏁 Script executed:
#!/bin/bash
# Find the public API that calls solve_lp
echo "== Search for solve function signatures in public headers =="
fd ".*\.hpp" -path "*/linear_programming/*" | head -10 | xargs grep -n "solve\(" | grep -v "run_\|solve_lp" | head -20
echo ""
echo "== Check the main solve entry point =="
rg -n "optimization_problem_solution_t.*solve\(" -A 10 | head -80Repository: NVIDIA/cuopt
Length of output: 3448
🏁 Script executed:
#!/bin/bash
# Look for where float dispatch happens
echo "== Find main solve implementation that dispatches to solve_lp =="
fd "*.cpp" -o "*.cu" | head -20 | xargs grep -l "solve_lp\(" 2>/dev/null | head -5
echo ""
echo "== Check if method is set to PDLP for float in any dispatch =="
rg -n "if.*float\|if constexpr.*float" -B 3 -A 3 | grep -i "method\|pdlp" | head -40Repository: NVIDIA/cuopt
Length of output: 329
🏁 Script executed:
#!/bin/bash
# Look for the solve function in the public API headers
echo "== Check linear_programming.hpp for solve entry point =="
fd "linear_programming.hpp" | xargs cat 2>/dev/null | grep -n "solve" | head -30Repository: NVIDIA/cuopt
Length of output: 112
🏁 Script executed:
#!/bin/bash
echo "== Read solve.hpp header for main solve API =="
cat cpp/include/cuopt/linear_programming/solve.hpp | head -150Repository: NVIDIA/cuopt
Length of output: 6828
🏁 Script executed:
#!/bin/bash
echo "== Search for main solve function in solve.hpp =="
grep -n "^optimization_problem_solution_t.*solve\|^template.*solve" cpp/include/cuopt/linear_programming/solve.hpp | head -20
echo ""
echo "== Search entire solve.cu for solve function definitions (not solve_lp) =="
grep -n "^optimization_problem_solution_t.*solve\|^template.*solve" cpp/src/pdlp/solve.cu | head -20Repository: NVIDIA/cuopt
Length of output: 602
🏁 Script executed:
#!/bin/bash
echo "== Check if there's a wrapper solve function that dispatches based on type =="
rg -n "template.*optimization_problem_solution_t" cpp/src/pdlp/solve.cu -A 10 | head -100Repository: NVIDIA/cuopt
Length of output: 3422
🏁 Script executed:
#!/bin/bash
echo "== Read solve_lp implementation starting at line 1161 =="
sed -n '1161,1330p' cpp/src/pdlp/solve.cuRepository: NVIDIA/cuopt
Length of output: 8559
🏁 Script executed:
#!/bin/bash
echo "== Check for any settings.method assignment or override logic =="
sed -n '1161,1330p' cpp/src/pdlp/solve.cu | grep -n "settings.method\|method_t::"Repository: NVIDIA/cuopt
Length of output: 169
🏁 Script executed:
#!/bin/bash
echo "== Check batch_pdlp_solve implementation for method override =="
sed -n '912,980p' cpp/src/pdlp/solve.cu | grep -n "method\|settings"Repository: NVIDIA/cuopt
Length of output: 918
🏁 Script executed:
#!/bin/bash
echo "== Search for any CLI or user-facing handling of float/fp32 =="
find . -name "*.py" -o -name "*.cpp" | xargs grep -l "pdlp.*fp32\|float.*method\|PDLP_INSTANTIATE_FLOAT" 2>/dev/null | head -10
echo ""
echo "== Check if there's a wrapper that enforces PDLP for float before calling solve_lp =="
rg -n "is_same.*float" -B 5 -A 5 | head -80Repository: NVIDIA/cuopt
Length of output: 6115
Auto-map float precision to PDLP instead of enforcing hard method validation.
The concern is valid. With default method=Concurrent in solver_settings and no automatic override for float precision in solve_lp(), users requesting float precision will hit the validation error at line 1151-1156 unless they explicitly set method=PDLP beforehand. This is an unnecessary UX friction since float support is specifically PDLP-only.
Rather than enforce a hard cuopt_expects validation, auto-map float mode to method_t::PDLP when the method is unspecified or left at default Concurrent. Only error on explicit, incompatible method requests (e.g., user explicitly selects DualSimplex with float precision).
Add logic in solve_lp() analogous to the quadratic objective override: if float precision and method is Concurrent (the default), silently reassign to PDLP with a log message.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/solve.cu` around lines 1151 - 1156, The current hard validation
using cuopt_expects blocks float-precision users; update solve_lp() to auto-map
float precision to method_t::PDLP when the user has not explicitly selected an
incompatible method (i.e., settings.method is still
method_t::Concurrent/default) instead of failing: detect float precision and if
settings.method == method_t::Concurrent, set settings.method = method_t::PDLP
and emit a log/info message about the override; keep the existing cuopt_expects
check only for the case where the user explicitly set an incompatible method
(e.g., DualSimplex/Barrier) so an error is still raised for explicit invalid
combinations. Ensure you reference and modify the code paths around solve_lp(),
settings.method, and the float-precision branch that currently calls run_pdlp().
There was a problem hiding this comment.
No I don't want users thinking they are using concurrent while in FP32. They should realize that they should, and will, use PDLP only
There was a problem hiding this comment.
@Kh4ster, understood! The current validation approach is the right design choice. Requiring users to explicitly set method=PDLP when using FP32 precision ensures clarity and avoids confusion about which solver method is actually being used. The validation error at lines 1151-1156 correctly enforces this requirement and helps users make informed decisions about their solver configuration.
🧠 Learnings used
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Learnt from: tmckayus
Repo: NVIDIA/cuopt PR: 819
File: cpp/src/linear_programming/solve.cu:1338-1366
Timestamp: 2026-02-02T17:16:54.629Z
Learning: In `cpp/src/linear_programming/solve.cu`, the remote vs. local solve routing intentionally checks only `is_remote_execution_enabled()` without checking `force_gpu_memory()` or backend type. This design supports remote solving of GPU-resident problems, enabling workflows where users manipulate problems/solutions on a local GPU but execute the solve on a remote machine with a better GPU.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 899
File: cpp/src/mip_heuristics/solve.cu:169-193
Timestamp: 2026-02-24T14:37:15.024Z
Learning: In cuOpt's MIP solver (cpp/src/mip_heuristics/solve.cu and related files), the objective_scaling_factor sign encodes the user problem's objective sense: if >= 0, the user problem is minimization; if < 0, the user problem is maximization (which is internally converted to minimization by negating the objective). Therefore, checking the sign of objective_scaling_factor is a valid way to determine the problem direction and set appropriate "no bound" values for callbacks.
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 814
File: cpp/src/dual_simplex/branch_and_bound.hpp:154-159
Timestamp: 2026-01-29T04:25:05.434Z
Learning: In cpp/src/dual_simplex/branch_and_bound.hpp and branch_and_bound.cpp, the mutex_original_lp_ protects original_lp_ specifically during the root cut pass phase. The public API set_new_solution() can be called from external threads during this phase while the main thread is adding cuts/slacks to original_lp_. Functions like best_first_thread, exploration_ramp_up, and plunge_from execute after the cut passes complete (after the OpenMP parallel barrier) when original_lp_ is fixed, so they don't need mutex protection for their reads.
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 814
File: cpp/src/dual_simplex/cuts.cpp:1278-1278
Timestamp: 2026-01-30T23:34:26.318Z
Learning: In the cuOpt dual simplex solver (cpp/src/dual_simplex/cuts.cpp and related files), slack variable coefficients in the constraint matrix are always exactly 1.0 or -1.0 (not floating-point approximations), so exact equality checks like `assert(std::abs(lp.A.x[col_start]) == 1.0)` are appropriate and should not be flagged as requiring epsilon tolerances.
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 721
File: cpp/src/linear_programming/optimization_problem.cu:106-205
Timestamp: 2026-01-29T11:25:28.213Z
Learning: The function `csr_matrices_equivalent_with_permutation` in `cpp/src/linear_programming/optimization_problem.cu` is only used in CI testing with known inputs, so strict problem size validation is not required.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 840
File: python/cuopt/cuopt/linear_programming/solver_settings/solver_settings.py:178-180
Timestamp: 2026-02-09T17:57:51.918Z
Learning: In cuOpt's Python solver settings (python/cuopt/cuopt/linear_programming/solver_settings/solver_settings.py), the set_optimality_tolerance() method should only update the six optimality-related tolerances (absolute_dual_tolerance, relative_dual_tolerance, absolute_primal_tolerance, relative_primal_tolerance, absolute_gap_tolerance, relative_gap_tolerance) and must not modify infeasibility detection thresholds (primal_infeasible_tolerance, dual_infeasible_tolerance), which require stricter default values for correct infeasibility detection.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: rg20
Repo: NVIDIA/cuopt PR: 816
File: cpp/src/math_optimization/solver_settings.cu:99-102
Timestamp: 2026-02-05T15:54:25.780Z
Learning: In the PSLP integration (PR `#816`), the CUOPT_PRESOLVE parameter was intentionally changed from a boolean to an integer enum (presolver_t) with values Default/None/Papilo/PSLP. This is a deliberate breaking change; the CLI expects integers, and boolean compatibility was intentionally removed.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 798
File: cpp/src/utilities/models/fj_predictor/quantize.cpp:1149-1161
Timestamp: 2026-01-26T14:17:25.105Z
Learning: The files in cpp/src/utilities/models/*/quantize.cpp (fj_predictor, cpufj_predictor, pdlp_predictor, dualsimplex_predictor, bounds_strengthening_predictor) are autogenerated model artifacts and should not have manual code changes suggested; any improvements should target the model generation tooling or calling code validation layer.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
| #include <cuopt/linear_programming/pdlp/solver_settings.hpp> | ||
| #include <cuopt/linear_programming/pdlp/solver_solution.hpp> | ||
| #include <cuopt/linear_programming/solve.hpp> | ||
| #include <mip_heuristics/mip_constants.hpp> |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate macro definitions/usages
echo "=== Searching for macro definitions and usages ==="
rg -n -C3 '\b(MIP_INSTANTIATE_FLOAT|PDLP_INSTANTIATE_FLOAT)\b' --type-add 'cpp:*.cu' --type-add 'cpp:*.cuh' --type-add 'cpp:*.hpp' --type-add 'cpp:*.h'
echo ""
echo "=== Checking for float template instantiations ==="
rg -n -C3 'solve_lp<[^>]*float|pdlp_solver_settings_t<[^>]*float|mps_data_model_t<int,\s*float>' --type-add 'cpp:*.cu' --type-add 'cpp:*.cuh' --type-add 'cpp:*.hpp' --type-add 'cpp:*.h'
echo ""
echo "=== Examining guard usage in pdlp_test.cu ==="
rg -n -C5 '#if.*MIP_INSTANTIATE_FLOAT|#if.*PDLP_INSTANTIATE_FLOAT' cpp/tests/linear_programming/pdlp_test.cuRepository: NVIDIA/cuopt
Length of output: 46827
🏁 Script executed:
# Check how PDLP_INSTANTIATE_FLOAT is defined and used globally
grep -r "PDLP_INSTANTIATE_FLOAT" cpp/src/mip_heuristics/mip_constants.hpp | head -1Repository: NVIDIA/cuopt
Length of output: 88
Refactor the float test gate to check only PDLP instantiation.
The guard #if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT at lines 50 and 1930 is overly permissive. PDLP float tests will compile if either MIP or PDLP is instantiated for float. This can accidentally enable PDLP float tests in builds where only MIP float is instantiated. Change to #if PDLP_INSTANTIATE_FLOAT to ensure tests only compile when PDLP float instantiation is actually enabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/linear_programming/pdlp_test.cu` at line 23, Update the
preprocessor guard that gates the PDLP float tests: replace the overly-broad
conditional that uses MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT with a
check that only uses PDLP_INSTANTIATE_FLOAT so PDLP float tests compile only
when PDLP is instantiated for float; locate the conditional in pdlp_test.cu (the
lines surrounding the existing `#if` MIP_INSTANTIATE_FLOAT ||
PDLP_INSTANTIATE_FLOAT) and change it to `#if` PDLP_INSTANTIATE_FLOAT, leaving the
rest of the test code unchanged.
There was a problem hiding this comment.
In both cases the pdlp float32 tests should compile and run
There was a problem hiding this comment.
@Kh4ster, thank you for the clarification! You're absolutely right—the guard #if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT is correct and intentional. The PDLP float32 tests should indeed compile and run when either macro is defined, ensuring broader test coverage across different instantiation configurations. I've noted this for future reviews.
✏️ Learnings added
Learnt from: Kh4ster
Repo: NVIDIA/cuopt PR: 910
File: cpp/tests/linear_programming/pdlp_test.cu:23-23
Timestamp: 2026-02-27T15:33:38.577Z
Learning: In cpp/tests/linear_programming/pdlp_test.cu and similar PDLP test files, the preprocessor guard `#if MIP_INSTANTIATE_FLOAT || PDLP_INSTANTIATE_FLOAT` for float tests is intentional. PDLP float32 tests should compile and run when either MIP_INSTANTIATE_FLOAT or PDLP_INSTANTIATE_FLOAT is defined, not just when PDLP_INSTANTIATE_FLOAT is defined.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 721
File: cpp/src/linear_programming/optimization_problem.cu:106-205
Timestamp: 2026-01-29T11:25:28.213Z
Learning: The function `csr_matrices_equivalent_with_permutation` in `cpp/src/linear_programming/optimization_problem.cu` is only used in CI testing with known inputs, so strict problem size validation is not required.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 899
File: cpp/src/mip_heuristics/solve.cu:169-193
Timestamp: 2026-02-24T14:37:15.024Z
Learning: In cuOpt's MIP solver (cpp/src/mip_heuristics/solve.cu and related files), the objective_scaling_factor sign encodes the user problem's objective sense: if >= 0, the user problem is minimization; if < 0, the user problem is maximization (which is internally converted to minimization by negating the objective). Therefore, checking the sign of objective_scaling_factor is a valid way to determine the problem direction and set appropriate "no bound" values for callbacks.
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Learnt from: tmckayus
Repo: NVIDIA/cuopt PR: 819
File: cpp/src/linear_programming/solve.cu:1338-1366
Timestamp: 2026-02-02T17:16:54.629Z
Learning: In `cpp/src/linear_programming/solve.cu`, the remote vs. local solve routing intentionally checks only `is_remote_execution_enabled()` without checking `force_gpu_memory()` or backend type. This design supports remote solving of GPU-resident problems, enabling workflows where users manipulate problems/solutions on a local GPU but execute the solve on a remote machine with a better GPU.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 814
File: cpp/src/dual_simplex/cuts.cpp:1278-1278
Timestamp: 2026-01-30T23:34:26.318Z
Learning: In the cuOpt dual simplex solver (cpp/src/dual_simplex/cuts.cpp and related files), slack variable coefficients in the constraint matrix are always exactly 1.0 or -1.0 (not floating-point approximations), so exact equality checks like `assert(std::abs(lp.A.x[col_start]) == 1.0)` are appropriate and should not be flagged as requiring epsilon tolerances.
|
/ok to test 2cb9ce5 |
This PR adds support for FP32 and mixed precision in PDLP (not MIP, Dual Simplex or Barrier).
Those two new options are available through:
Below, the detail of what each feature allows:
FP32 Precision Mode
By default, PDLP operates in FP64 (double) precision. Users can switch to FP32 (float) precision for the entire solve. FP32 uses half the memory of FP64 and allows PDHG iterations to be on average twice as fast, but it may require more iterations to converge due to reduced numerical accuracy. FP32 mode is only supported with the PDLP method (not concurrent) and without crossover.
Note: The default precision is FP64 (double).
Mixed Precision SpMV
When running PDLP in FP64 mode, users can enable mixed precision sparse matrix-vector products (SpMV) during PDHG iterations. In this mode, the constraint matrix and its transpose are stored in FP32 while vectors and the compute type remain in FP64. This allows SpMV operations to be faster thanks to reduced memory bandwidth requirements, while maintaining FP64 accuracy in the accumulation. This will make PDHG iterations faster while limiting the potential negative impact on convergence (compared to running in FP32 mode). Convergence checking and restart logic always use the full FP64 matrix, so this mode does not reduce memory usage since both the FP32 and FP64 copies of the matrix are kept in memory. Mixed precision SpMV only applies in FP64 mode and has no effect when running in FP32.
Note: The default value is false.
Summary by CodeRabbit
New Features
Compatibility & Notes
Documentation