#python #pytorch #gpu #nvidia
#python #pytorch #графический процессор #nvidia
Вопрос:
Подробности указаны следующим образом —
Вывод nvidia-smi
-----------------------------------------------------------------------------
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|=============================== ====================== ======================|
| 0 Tesla K40c Off | 00000000:42:00.0 Off | 0 |
| 23% 35C P0 65W / 235W | 0MiB / 11441MiB | 77% Default |
------------------------------- ---------------------- ----------------------
-----------------------------------------------------------------------------
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
-----------------------------------------------------------------------------
Получено исключение —
Traceback (most recent call last):
File "MetaWSD/train_wsd.py", line 108, in <module>
meta_learner.training(train_episodes, val_episodes)
File "/mnt/lustre/users/k21036268/wsd/MetaWSD/models/maml.py", line 121, in training
ls, acc, prec, rcl, f1 = self.meta_model(epts, self.updates)
File "/mnt/lustre/users/k21036268/wsd/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/lustre/users/k21036268/wsd/MetaWSD/models/seq_meta.py", line 109, in forward
self.initialize_output_layer(episode.n_classes)
File "/mnt/lustre/users/k21036268/wsd/MetaWSD/models/seq_meta.py", line 231, in initialize_output_layer
device=self.device) stdv
RuntimeError: CUDA error: no kernel image is available for execution on the device
Вывод env_collection.py
Collecting environment information...
PyTorch version: 1.4.0
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: CentOS Linux release 7.6.1810 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.3.4
Python version: 3.6.8 (default, Apr 2 2020, 13:34:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] (64-bit runtime)
Python platform: Linux-3.10.0-1062.9.1.el7.x86_64-x86_64-with-centos-7.6.1810-Core
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K40c
Nvidia driver version: 418.40.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] numpydoc==1.1.0
[pip3] pytorch-pretrained-bert==0.6.2
[pip3] pytorch-transformers==1.1.0
[pip3] torch==1.4.0
[pip3] torchtext==0.6.0
[pip3] torchvision==0.5.0
[conda] Could not collect
Я уже пробовал настройку export TORCH_CUDA_ARCH_LIST=All
перед установкой PyTorch, однако получаю то же исключение. Поскольку я работаю над кластером, я не могу собрать PyTorch из исходного кода (версия gcc 4.8.5 не поддерживается для ручной сборки)
Комментарии:
1. discuss.pytorch.org/t/k40-is-not-supported-by-pytorch/80356