Skip to content

Fix pandas IndexError in export matrix summarization (pandas ≥2.1)#194

Merged
singjc merged 2 commits intomasterfrom
copilot/fix-11459058-63005398-96111a0d-2f3d-4c1b-9d46-73063a5fa252
Mar 3, 2026
Merged

Fix pandas IndexError in export matrix summarization (pandas ≥2.1)#194
singjc merged 2 commits intomasterfrom
copilot/fix-11459058-63005398-96111a0d-2f3d-4c1b-9d46-73063a5fa252

Conversation

Copy link
Contributor

Copilot AI commented Mar 3, 2026

pyprophet export matrix crashes with IndexError: DataFrame indexer is not allowed for .iloc on pandas ≥2.1, affecting --level=peptide, --level=protein, and --level=precursor exports.

Root Cause

groupby().apply(lambda x: x["m_score"].idxmin()) returns a Series of index labels, not integer positions. Passing this to .iloc[] was silently accepted in older pandas but is explicitly rejected in ≥2.1.

Changes

  • pyprophet/io/_base.py — two instances fixed (_summarize_precursor_level and _summarize_peptide_level): replace .iloc[groupby().apply(lambda x: x[...].idxmin())] with the correct and more efficient .loc[groupby()[...].idxmin()]
# Before (broken in pandas ≥2.1):
data = data.iloc[
    data.groupby(["run_id", "transition_group_id"]).apply(
        lambda x: x["m_score"].idxmin()
    )
]

# After:
idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin()
data = data.loc[idx]
Original prompt

This section details on the original issue you should resolve

<issue_title>export matrix: pandas IndexError in _summarize_peptide_level (v3.0.7)</issue_title>
<issue_description>## Description

pyprophet export matrix crashes with a pandas IndexError in _summarize_peptide_level (file pyprophet/io/_base.py, line ~804). This affects both --level=peptide and --level=protein exports.

Environment

  • pyprophet 3.0.7
  • pandas 2.1.4 (also reproducible with 2.3.3)
  • Python 3.10.19
  • Linux

Error

File "pyprophet/io/_base.py", line 804, in _summarize_peptide_level
    data = data.iloc[
        data.groupby(["run_id", "transition_group_id"]).apply(
            lambda x: x["m_score"].idxmin()
        )
    ]
IndexError: DataFrame indexer is not allowed for .iloc
Consider using .loc for automatic alignment.

Root cause

groupby().apply() returns a Series object. In recent pandas versions (>=2.1), passing a Series to .iloc[] is no longer allowed. The fix is to use .loc[] with idxmin() directly:

# Before (broken):
data = data.iloc[
    data.groupby(["run_id", "transition_group_id"]).apply(
        lambda x: x["m_score"].idxmin()
    )
]

# After (fixed):
idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin()
data = data.loc[idx]

Note: there is a second instance of the same .iloc pattern in the same function (for _summarize_protein_level) that should be fixed as well.

Steps to reproduce

  1. Run pyprophet score on an .osw file
  2. Run pyprophet export matrix --level=peptide

The crash occurs in _summarize_peptide_level regardless of the data content.</issue_description>

<agent_instructions>Can you look into this issue.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…Error

Co-authored-by: singjc <32938975+singjc@users.noreply.github.com>
Copilot AI changed the title [WIP] export matrix: pandas IndexError in _summarize_peptide_level (v3.0.7) Fix pandas IndexError in export matrix summarization (pandas ≥2.1) Mar 3, 2026
@singjc singjc marked this pull request as ready for review March 3, 2026 16:03
Copilot AI review requested due to automatic review settings March 3, 2026 16:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a pandas ≥2.1 IndexError crash that occurred when running pyprophet export matrix at any summarization level. The root cause was using .iloc[] with a Series of index labels (returned by groupby().apply(lambda x: x[col].idxmin())), which was silently accepted in older pandas but explicitly rejected in ≥2.1. The fix replaces this with the equivalent and more efficient .loc[groupby()[col].idxmin()].

Changes:

  • Replace .iloc[groupby().apply(lambda x: x["m_score"].idxmin())] with .loc[groupby()["m_score"].idxmin()] in _summarize_precursor_level
  • Apply the same fix in _summarize_peptide_level

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +777 to +778
idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin()
data = data.loc[idx]
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same IndexError: DataFrame indexer is not allowed for .iloc bug that this PR fixes in pyprophet/io/_base.py also exists in pyprophet/export/export_compound.py (lines 179-183) and pyprophet/glyco/export.py (lines 501-505). Both files still use the broken pattern:

data = data.iloc[
    data.groupby(["run_id", "transition_group_id"]).apply(
        lambda x: x["m_score"].idxmin()
    )
]

These should be updated to use the same fix applied here:

idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin()
data = data.loc[idx]

These will crash with pandas ≥2.1 exactly as described in the issue.

Copilot uses AI. Check for mistakes.
Comment on lines +777 to +778
idx = data.groupby(["run_id", "transition_group_id"])["m_score"].idxmin()
data = data.loc[idx]
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for export matrix (the export_quant_matrix path via _summarize_precursor_level, _summarize_peptide_level, _summarize_protein_level, and _summarize_gene_level). The existing test file tests/test_pyprophet_export.py covers many other export paths. Adding at least one integration test that exercises pyprophet export matrix --level=precursor (and ideally --level=peptide) would prevent regressions like this from going undetected.

Copilot uses AI. Check for mistakes.
@singjc singjc enabled auto-merge March 3, 2026 16:41
@singjc singjc disabled auto-merge March 3, 2026 16:41
@singjc singjc merged commit 9526aec into master Mar 3, 2026
6 of 9 checks passed
@singjc singjc deleted the copilot/fix-11459058-63005398-96111a0d-2f3d-4c1b-9d46-73063a5fa252 branch March 3, 2026 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

export matrix: pandas IndexError in _summarize_peptide_level (v3.0.7)

3 participants