Как один pip устанавливает torch 1.9.x с cuda 11.1, когда возникают ошибки, связанные с проблемой памяти?

#python #pip #pytorch

Вопрос:

Я пытался установить torch 1.9.x с помощью pip3, но получил эту ошибку:

 (metalearning_gpu) miranda9~/automl-meta-learning $ pip3 install torch==1.9.0 cu111 torchvision==0.10.0 cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html


Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.9.0 cu111
ERROR: Exception:
Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
    status = self.run(options, args)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
    return func(self, options, args)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/commands/install.py", line 315, in run
    requirement_set = resolver.resolve(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
    result = self._result = resolver.resolve(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 472, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 341, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
    if not criterion.candidates:
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
    return bool(self._sequence)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 140, in __bool__
    return any(self)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 128, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 32, in _iter_built
    candidate = func()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 204, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 295, in __init__
    super().__init__(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
    self.dist = self._prepare()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
    dist = self._prepare_distribution()
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 305, in _prepare_distribution
    return self._factory.preparer.prepare_linked_requirement(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 550, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 239, in unpack_url
    file = get_http_url(
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/operations/prepare.py", line 102, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/download.py", line 132, in __call__
    resp = _http_get_download(self._session, link)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/download.py", line 115, in _http_get_download
    resp = session.get(target_url, headers=HEADERS, stream=True)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/session.py", line 454, in request
    return super().request(method, url, *args, **kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/cachecontrol/adapter.py", line 44, in send
    cached_response = self.controller.cached_request(request)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_vendor/cachecontrol/controller.py", line 139, in cached_request
    cache_data = self.cache.get(cache_url)
  File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/pip/_internal/network/cache.py", line 54, in get
    return f.read()
MemoryError
 

В нем говорится, что произошла ошибка в памяти, но я не понимаю, где и как? Как мне начать отладку этого?

Ничто в моем доме не кажется проблемой:

 (metalearning_gpu) miranda9~/automl-meta-learning $ du -hs ~
150G    /home/miranda9
 

Ответ №1:

Не уверен, почему это так, но, похоже, как только у меня появится узел для запуска установок (с графическим процессором), установка worked…is это нормально?!

 (synthesis) miranda9~/automl-meta-learning $ condor_submit -i interactive.sub 
Submitting job(s).
1 job(s) submitted to cluster 17192.






Could not find conda environment: metalearning_cpu
You can list all discoverable environments with `conda info --envs`.

Welcome to slot1_1@vision-02.cs.illinois.edu!
Could not find conda environment: metalearning_cpu
You can list all discoverable environments with `conda info --envs`.

(synthesis) miranda9~/automl-meta-learning $ 
(synthesis) miranda9~/automl-meta-learning $ 
(synthesis) miranda9~/automl-meta-learning $ 
(synthesis) miranda9~/automl-meta-learning $ 
(synthesis) miranda9~/automl-meta-learning $ 
(synthesis) miranda9~/automl-meta-learning $ conda activate metalearning_gpu
(metalearning_gpu) miranda9~/automl-meta-learning $ pip3 install torch==1.9.0 cu111 torchvision==0.10.0 cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.9.0 cu111
  Using cached https://download.pytorch.org/whl/cu111/torch-1.9.0+cu111-cp39-cp39-linux_x86_64.whl (2041.4 MB)
Collecting torchvision==0.10.0 cu111
  Using cached https://download.pytorch.org/whl/cu111/torchvision-0.10.0+cu111-cp39-cp39-linux_x86_64.whl (23.1 MB)
Collecting torchaudio==0.9.0
  Using cached torchaudio-0.9.0-cp39-cp39-manylinux1_x86_64.whl (1.9 MB)
Collecting typing-extensions
  Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Collecting numpy
  Using cached numpy-1.21.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.8 MB)
Collecting pillow>=5.3.0
  Using cached Pillow-8.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
Installing collected packages: typing-extensions, torch, pillow, numpy, torchvision, torchaudio
Successfully installed numpy-1.21.2 pillow-8.3.2 torch-1.9.0 cu111 torchaudio-0.9.0 torchvision-0.10.0 cu111 typing-extensions-3.10.0.2
(metalearning_gpu) miranda9~/automl-meta-learning $ 
 

возможно, вы захотите попробовать вместо этого эту команду:

 pip3 install torch==1.9.1 cu111 torchvision==0.10.1 cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
 

Комментарии:

1. Не знаю, почему, но это работает и просто спасло меня на целый день! Большое спасибо, что поделились ответом @Charlie Parker

Ответ №2:

вы можете попробовать добавить --no-cache-dir в конце, чтобы избежать проблем с оперативной памятью