Conversation
…ensorFlow-v2-18-1
|
Looks good to me. Is the same fix needed for the CUDA version? Anyhow, you have to trigger the build for one architecture, I guess it only applies to a single EESSI version? Ah, yeah #171 notes this. Maybe good to mention this in the title or description too. |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2 |
|
New job on instance
|
We should pause modifications to the hooks file as soon as we start building TensorFlow, that could take some time and we do not want to rebuild successful builds |
boegel
left a comment
There was a problem hiding this comment.
I could be wrong, but it seems like at least some part of this could/should be done in the TensorFlow easyblock or the easyconfig file being used, especially since at first sight they don't seem specific to EESSI at all?
| ec['preconfigopts'] = ec.get('preconfigopts', '') + ( | ||
| 'export GCC_HOST_COMPILER_PATH=$EBROOTGCC/bin/gcc && ' | ||
| 'sed -i \'s|--define=PREFIX=/usr|--define=PREFIX=\\$EESSI_EPREFIX|g\' .bazelrc && ' | ||
| 'cat > /tmp/fix_h5py.py << \'EOF\'\n' |
There was a problem hiding this comment.
I don't like the use of a hardcoded path in a fixed location...
Can we at least use /tmp/$USER/, or even better mktemp -d ?
Some comments in this hook to explain exactly what all of this does and why it's required would be helpful I think, since it looks quite involved...
| 'with open("requirements_lock_3_12.txt", "r") as f:\n' | ||
| ' content = f.read()\n' | ||
| 'content = content.replace("h5py==3.11.0 \\\\", "h5py==3.15.1 \\\\")\n' | ||
| 'content = content.replace(\n' |
There was a problem hiding this comment.
Can we use re.sub here instead, especially below to avoid the long repetition.
Something like this (totally untested):
regex = re.compile("(--hash=sha256:f4e025e852754ca833401777c25888acb96889ee2c27e7e629a19aee288833f0)", re.M)
extra_hashes = "...'
content = regex.sub(extra_hashes, content)| if get_eessi_envvar('EESSI_CPU_FAMILY') == 'aarch64': | ||
| ec['prebuildopts'] = ec.get('prebuildopts', '') + ( | ||
| # KleidiAI in TF 2.18 has similar -march/-mcpu conflict as XNNPACK | ||
| # The easyblock already excludes XNNPACK from -mcpu=native, extend the same exclusion to KleidiAI |
There was a problem hiding this comment.
Isn't this something that should be fixed in the TensorFlow easyblock, or in the easyconfig file for TensorFlow 2.18.1?
Now that I think of it, same question w.r.t. the hash stuff above...
Would that be a lot cleaner by adding a patch file (seems like that could work, since it's done pre-configure)?
Replacing h5py with a version that provides wheels for both x86 and ARM resolves the initial GLIBC error. This could be applied as a patch, but in this PR it is implemented in a parse hook as a proof of concept.
Results for x86_64 builds:
KleidiAI in TF 2.18 has a similar -march/-mcpu conflict as XNNPACK, the easyblock already excludes XNNPACK from -mcpu=native for aarch64, extending the same exclusion to KleidiAI, this change can also be added to the easyblock.
Results for aarch64 builds:
see #171