Невозможно запустить PyTorch с GPU с сообщением — образ ядра недоступен для выполнения на устройстве

#python #pytorch #gpu #nvidia

#python #pytorch #графический процессор #nvidia

Вопрос:

Подробности указаны следующим образом —

Вывод nvidia-smi

  ----------------------------------------------------------------------------- 
| NVIDIA-SMI 418.40.04    Driver Version: 418.40.04    CUDA Version: 10.1     |
|------------------------------- ---------------------- ---------------------- 
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|=============================== ====================== ======================|
|   0  Tesla K40c          Off  | 00000000:42:00.0 Off |                    0 |
| 23%   35C    P0    65W / 235W |      0MiB / 11441MiB |     77%      Default |
 ------------------------------- ---------------------- ---------------------- 
                                                                               
 ----------------------------------------------------------------------------- 
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
 ----------------------------------------------------------------------------- 
 

Получено исключение —

 Traceback (most recent call last):
  File "MetaWSD/train_wsd.py", line 108, in <module>
    meta_learner.training(train_episodes, val_episodes)
  File "/mnt/lustre/users/k21036268/wsd/MetaWSD/models/maml.py", line 121, in training
    ls, acc, prec, rcl, f1 = self.meta_model(epts, self.updates)
  File "/mnt/lustre/users/k21036268/wsd/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/lustre/users/k21036268/wsd/MetaWSD/models/seq_meta.py", line 109, in forward
    self.initialize_output_layer(episode.n_classes)
  File "/mnt/lustre/users/k21036268/wsd/MetaWSD/models/seq_meta.py", line 231, in initialize_output_layer
    device=self.device)   stdv
RuntimeError: CUDA error: no kernel image is available for execution on the device
 

Вывод env_collection.py

 Collecting environment information...
PyTorch version: 1.4.0
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 7.6.1810 (Core)  (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.3.4

Python version: 3.6.8 (default, Apr  2 2020, 13:34:55)  [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] (64-bit runtime)
Python platform: Linux-3.10.0-1062.9.1.el7.x86_64-x86_64-with-centos-7.6.1810-Core
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K40c
Nvidia driver version: 418.40.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] numpydoc==1.1.0
[pip3] pytorch-pretrained-bert==0.6.2
[pip3] pytorch-transformers==1.1.0
[pip3] torch==1.4.0
[pip3] torchtext==0.6.0
[pip3] torchvision==0.5.0
[conda] Could not collect
 

Я уже пробовал настройку export TORCH_CUDA_ARCH_LIST=All перед установкой PyTorch, однако получаю то же исключение. Поскольку я работаю над кластером, я не могу собрать PyTorch из исходного кода (версия gcc 4.8.5 не поддерживается для ручной сборки)

Комментарии:

1. discuss.pytorch.org/t/k40-is-not-supported-by-pytorch/80356