#tensorflow
#tensorflow
Вопрос:
Я тренирую GAN с помощью tensorpack, после завершения обучения вот файл журнала:
2019-04-19 10:14:19.311373: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-19 10:14:19.374343: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-19 10:14:19.374547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
pciBusID: 0000:01:00.0
totalMemory: 2.94GiB freeMemory: 2.17GiB
2019-04-19 10:14:19.374562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 3GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
[0419 10:14:20 @base.py:211] Initializing the session ...
[0419 10:14:20 @base.py:218] Graph Finalized.
[0419 10:14:20 @concurrency.py:37] Starting EnqueueThread QueueInput/input_queue ...
[0419 10:14:20 @base.py:250] Start Epoch 1
...
[0419 10:25:45 @base.py:250] Start Epoch 5
100%|#######################################|10000/10000[02:51<00:00,58.38it/s]
[0419 10:28:36 @base.py:260] Epoch 5 (global_step 50000) finished, time:2 minutes 51 seconds.
а вот информация о nvidia-smi
-----------------------------------------------------------------------------
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|=============================== ====================== ======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 59% 67C P2 74W / 120W | 965MiB / 3010MiB | 93% Default |
------------------------------- ---------------------- ----------------------
-----------------------------------------------------------------------------
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1371 G /usr/lib/xorg/Xorg 313MiB |
| 0 2293 G compiz 192MiB |
| 0 2823 G ...uest-channel-token=17545882067829269512 131MiB |
| 0 7632 G ...-token=7C806614AA650E661A2E8895D83D4B4E 41MiB |
| 0 7824 G /opt/teamviewer/tv_bin/TeamViewer 13MiB |
| 0 28413 C python3 269MiB |
-----------------------------------------------------------------------------
Как вы можете видеть, обучение выполняется на GPU, и для завершения эпохи потребовалось всего несколько минут.
Но после завершения обучения контрольная точка будет восстановлена. Но я узнал, что он был восстановлен на CPU, а не на GPU
, вот файл журнала и nvidia-smi
2019-04-19 10:28:55.169308: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-19 10:28:55.236758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-19 10:28:55.236987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1060 3GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
pciBusID: 0000:01:00.0
totalMemory: 2.94GiB freeMemory: 2.20GiB
2019-04-19 10:28:55.237005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 3GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
7%|##7 |700/10000[14:36<3:14:43, 0.80it/s][0419 10:28:55 @sessinit.py:117] Restoring checkpoint from train_log/TGAN_synthesizer:ISOT-1/model-50000 ...
16%|#####9 |1574/10000[31:59<2:51:25, 0.82it/s] 16%|######1 |1606/10000[32:37<2:47:25, 0.84it/s]
-----------------------------------------------------------------------------
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|=============================== ====================== ======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 41% 42C P8 7W / 120W | 665MiB / 3010MiB | 1% Default |
------------------------------- ---------------------- ----------------------
-----------------------------------------------------------------------------
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1371 G /usr/lib/xorg/Xorg 313MiB |
| 0 2293 G compiz 196MiB |
| 0 2823 G ...uest-channel-token=17545882067829269512 97MiB |
| 0 7632 G ...-token=7C806614AA650E661A2E8895D83D4B4E 41MiB |
| 0 7824 G /opt/teamviewer/tv_bin/TeamViewer 13MiB |
-----------------------------------------------------------------------------
Почему восстановление контрольной точки выполняется на CPU, а не на GPU? Мне потребовалось так много времени, чтобы восстановить контрольную точку на процессоре. Как мне настроить восстановление контрольных точек на GPU?
Комментарии:
1. Включите свой фрагмент кода для подробного анализа.
2. Хммм, очень хороший вопрос, но я не знаю, почему