Conversation
…Error Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a pandas ≥2.1 IndexError crash that occurred when running pyprophet export matrix at any summarization level. The root cause was using .iloc[] with a Series of index labels (returned by groupby().apply(lambda x: x[col].idxmin())), which was silently accepted in older pandas but explicitly rejected in ≥2.1. The fix replaces this with the equivalent and more efficient .loc[groupby()[col].idxmin()].
Changes:
- Replace
.iloc[groupby().apply(lambda x: x["m_score"].idxmin())]with.loc[groupby()["m_score"].idxmin()]in_summarize_precursor_level - Apply the same fix in
_summarize_peptide_level
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin() | ||
| data = data.loc[idx] |
There was a problem hiding this comment.
The same IndexError: DataFrame indexer is not allowed for .iloc bug that this PR fixes in pyprophet/io/_base.py also exists in pyprophet/export/export_compound.py (lines 179-183) and pyprophet/glyco/export.py (lines 501-505). Both files still use the broken pattern:
data = data.iloc[
data.groupby(["run_id", "transition_group_id"]).apply(
lambda x: x["m_score"].idxmin()
)
]
These should be updated to use the same fix applied here:
idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin()
data = data.loc[idx]
These will crash with pandas ≥2.1 exactly as described in the issue.
| idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin() | ||
| data = data.loc[idx] |
There was a problem hiding this comment.
There are no tests for export matrix (the export_quant_matrix path via _summarize_precursor_level, _summarize_peptide_level, _summarize_protein_level, and _summarize_gene_level). The existing test file tests/test_pyprophet_export.py covers many other export paths. Adding at least one integration test that exercises pyprophet export matrix --level=precursor (and ideally --level=peptide) would prevent regressions like this from going undetected.
pyprophet export matrixcrashes withIndexError: DataFrame indexer is not allowed for .ilocon pandas ≥2.1, affecting--level=peptide,--level=protein, and--level=precursorexports.Root Cause
groupby().apply(lambda x: x["m_score"].idxmin())returns aSeriesof index labels, not integer positions. Passing this to.iloc[]was silently accepted in older pandas but is explicitly rejected in ≥2.1.Changes
pyprophet/io/_base.py— two instances fixed (_summarize_precursor_leveland_summarize_peptide_level): replace.iloc[groupby().apply(lambda x: x[...].idxmin())]with the correct and more efficient.loc[groupby()[...].idxmin()]Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.