drop support for compute capability <= 7.0 for newer cuDNN versions by bedroge · Pull Request #170 · EESSI/software-layer-scripts

bedroge · 2026-02-27T13:12:34Z

This one is a little bit more tricky as CUDA itself, as the list of supported compute capabilities in the docs (https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html) don't really match what running cuobjdump on the binaries shows. Also, there seem to be some gaps in the matrix, and I wonder if that's really correct.

So for now I've chosen an easier approach by just checking if we're building with a newer cuDNN and compute capability <= 7.0, and in that case I do the same thing as what @casparvl implemented for CUDA. In order to check if cuDNN is used as dependency, I've generalized Caspar's get_cuda_version into a get_dependency_software_version function.

Tested this locally with EESSI-extend and the cuDNN from EESSI/software-layer#1410 on a V100 (CC 7.0) and RTX PRO 6000 (CC 12.0f), and got the expected result: on the RTX PRO 6000 I get a full cuDNN installation, while for the V100 I get the following output during the build:

WARNING: Requested a CUDA Compute Capability (['7.0']) that is not supported by the cuDNN version (9.15.0.57) used by this software. Switching to 
'--module-only --force' and injectiong an LmodError into the modulefile. You can override this behaviour by setting the 
EESSI_OVERRIDE_CUDA_CC_CUDNN_CHECK environment variable.

and a module file that has:

if (not os.getenv("EESSI_IGNORE_CUDNN_9_15_0_57_CC_7_0")) then LmodError("EasyConfigs using cuDNN 9.15.0.57 or older are not supported for (all) requested Compute Capabilities: ['7.0'].\n") end

bedroge · 2026-02-27T13:15:08Z

Ultimately we could make the same kind of lookup table as for CUDA. Initially I started working on it:

# The documentation at e.g. https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html and
# what cuobjdump showns on cuDNN libraries does not fully match. The support matrix below may be too inclusive,
# so if you find that a specific combination is not supported in practice, please remove it from the matrix.
CUDNN_SUPPORTED_CCS = {
    '8.8.0': [],
    '9.15.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.15.1': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.16.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.17.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.17.1': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.18.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.18.1': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
    '9.19.0': ['75', '80', '86', '89', '90', '100', '103', '120', '121'],
}

but it's a lot of work, and as mentioned, it's not really clear what is supported and what is not. We could also consider an more simple lookup table with just the min+max supported CCs per X.YZ version? But then again, https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html says that 12.1 is not supported, the binaries do seem to indicate that it's supported, so it's very confusing and unclear...

casparvl · 2026-02-27T17:37:45Z

eb_hooks.py

+                    cuda_ccs_string = re.sub(r'[a-zA-Z]', '', cuda_ccs_string).replace(',', '_')
+                    # Also replace periods, those are not officially supported in environment variable names
+                    var=f"EESSI_IGNORE_CUDNN_{cudnn_ver}_CC_{cuda_ccs_string}".replace('.', '_')
+                    errmsg = f"EasyConfigs using cuDNN {cudnn_ver} or older are not supported for (all) requested Compute "


I think this is wrong: in your case the cuDNN is too new, not too old, right?

casparvl · 2026-02-27T17:49:28Z

My 2 cents:

Go for a lookup table. If you only specify a min and max version, the implicit assumption is that all intermediate versions are supported - which does not seem to be the case (i.e. 11.X almost certainly isn't, since that's not supported in CUDA 12 - see the CUDA lookup table)
If you create a lookup table, and if the docs contradict what the binaries show, assume the binaries to be correct. If the binaries say there is no X.Y support, there is no X.Y code in the binary - so there can't be support. If the binary says there is X.Y code in the binary, that might not be a hard guarantee that the full cuDNN API is supported for that architecture - but the only way to find out is to assume the support is there, install it, and see how this works in practice. If we skip installations for targets that do turn out to be supported, we'd never find out otherwise.

bedroge · 2026-02-27T18:49:07Z

I just feel like a lookup table is a lot of work to set up and to maintain, while (according to the docs) the supported CCs don't change that often. Also, wouldn't the sanity check still catch unsupported CCs, as it did for CC 7.0 in EESSI/software-layer#1410? So whenever we run into this, we can mark those as unsupported in the hooks (and if necessary, change the if statement to something else if there are going to be too many combinations)?

casparvl · 2026-03-09T16:34:11Z

Hm, I don't think it's too bad to maintain - but admittedly it may be easier for CUDA than for cuDNN since we can just query the list from nvcc. Looking at your PR again, it should correctly generate fake modules for cuDNN's that are too new to support CC 7.0.

The fact that it doesn't do so for CC 11.0 may be a minor detail, since the CUDA sanity check will then indeed report that this is also invalid. The only downside of not including that case (and maybe also an upper limit) right away is that when sites install this with EESSI-extend and have 11.0 configured as their CC, they'll hit the CUDA sanity check - and may not fully understand why it fails (while the error message printed by the module is much more informative, as it is more specific).

Anyway, I'm also ok in leaving that out for now. If you can have a look at my (minor) review comment, I'll see if I can test the PR locally - and merge it if it works as expected.

add check for new cuDNN versions and older compute capabilities

b26e360

casparvl reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop support for compute capability <= 7.0 for newer cuDNN versions#170

drop support for compute capability <= 7.0 for newer cuDNN versions#170
bedroge wants to merge 1 commit intoEESSI:mainfrom
bedroge:cudnn915_cc70

bedroge commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

casparvl Feb 27, 2026

Uh oh!

casparvl commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

casparvl commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bedroge commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

casparvl Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

casparvl commented Feb 27, 2026

Uh oh!

bedroge commented Feb 27, 2026

Uh oh!

casparvl commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants