# script runs fine on different machine)
Вопрос:
У меня есть CNN, который я могу обучать на своем локальном настольном компьютере (теоретически; т. Е. С размером пакета 1 и в течение очень долгого времени). Однако, когда я запускаю тот же код на другой машине, я получаю некоторые странные (?) Ошибки. Фактически, предыдущие версии скрипта действительно выполнялись, и ошибка указывает на функцию, которую, как мне кажется, я не менял с момента последнего корректного запуска скрипта на сервере.
Я думаю, что исходная ошибка, ответственная за остановку, заключается в следующем:
Unable to locate the source code of <function load_image_train at 0x14f2f6d0b820>.
Функция load_image_train
представляет собой загрузчик обучающих изображений, который я получил с этого сайта.
Он определяется в скрипте следующим образом:
@tf.function
def load_image_train(datapoint: dict) -> tuple:
input_image = tf.image.resize(datapoint['image'], (IMG_SIZE, IMG_SIZE))
input_mask = tf.image.resize(datapoint['segmentation_mask'], (IMG_SIZE, IMG_SIZE))
if tf.random.uniform(()) > 0.5:
input_image = tf.image.flip_left_right(input_image)
input_mask = tf.image.flip_left_right(input_mask)
input_image, input_mask = normalize(input_image, input_mask)
return input_image, input_mask
Итак, в чем же здесь проблема и почему она работает на одной машине, но не на другой?
Полный вывод выполнения (системная информация и ошибки):
Running on Linux 4.18.0-193.65.2.el8_2.x86_64.
Python version: 3.8.0 (default, Mar 9 2020, 18:02:46)
[GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
Tensorflow version: 2.6.0
UTC time (start): 2021-10-28 07:58:46.099075
Local time (start): 2021-10-28 09:58:50.733831
N GPUs available: 4
2021-10-28 09:58:53.298641: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-28 09:58:55.358312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30988 MB memory: -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
2021-10-28 09:58:55.360055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30988 MB memory: -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
2021-10-28 09:58:55.361619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 30988 MB memory: -> device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0
2021-10-28 09:58:55.363165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 30988 MB memory: -> device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:b2:00.0, compute capability: 7.0
WARNING:tensorflow:AutoGraph could not transform <function parse_image at 0x14f2f76173a0> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the entire output.
Cause: Unable to locate the source code of <function parse_image at 0x14f2f76173a0>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain, the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function load_image_train at 0x14f2f6d0b820> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the entire output.
Cause: Unable to locate the source code of <function load_image_train at 0x14f2f6d0b820>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Traceback (most recent call last):
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/pyct/parser.py", line 154, in parse_entity
original_source = inspect_utils.getimmediatesource(entity)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/pyct/inspect_utils.py", line 151, in getimmediatesource
lines, lnum = inspect.findsource(obj)
File "/usr/lib64/python3.8/inspect.py", line 798, in findsource
raise OSError('could not get source code')
OSError: could not get source code
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 432, in converted_call
converted_f = _convert_actual(target_entity, program_ctx)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 274, in _convert_actual
transformed, module, source_map = _TRANSPILER.transform(entity, program_ctx)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/pyct/transpiler.py", line 286, in transform
return self.transform_function(obj, user_context)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/pyct/transpiler.py", line 470, in transform_function
nodes, ctx = super(PyToPy, self).transform_function(fn, user_context)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/pyct/transpiler.py", line 346, in transform_function
node, source = parser.parse_entity(fn, future_features=future_features)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/pyct/parser.py", line 156, in parse_entity
raise ValueError(
ValueError: Unable to locate the source code of <function load_image_train at 0x14f2f6d0b820>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./LeleNet/py3/LeleNet_trn.py", line 377, in <module>
dataset["train"] = dataset["train"]
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1863, in map
return ParallelMapDataset(
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 5020, in __init__
self._map_func = StructuredFunctionWrapper(
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4218, in __init__
self._function = fn_factory()
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3150, in get_concrete_function
graph_function = self._get_concrete_function_garbage_collected(
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3116, in _get_concrete_function_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3463, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3298, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1007, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4195, in wrapped_fn
ret = wrapper_helper(*args)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4125, in wrapper_helper
ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 933, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 759, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3066, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3463, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3298, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1007, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 668, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 983, in wrapper
return autograph.converted_call(
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
return _fall_back_unconverted(f, args, kwargs, options, e)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 486, in _fall_back_unconverted
return _call_unconverted(f, args, kwargs, options)
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 463, in _call_unconverted
return f(*args, **kwargs)
File "./LeleNet/py3/LeleNet_trn.py", line 351, in load_image_train
if tf.random.uniform(()) > 0.5:
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 900, in __bool__
self._disallow_bool_casting()
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 503, in _disallow_bool_casting
self._disallow_when_autograph_enabled(
File "/home/kit/ifgg/mp3890/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 489, in _disallow_when_autograph_enabled
raise errors.OperatorNotAllowedInGraphError(
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
Информация о системе / Python машины, на которой выполняется скрипт:
Running on Windows 8.1.
Python version: 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]
Tensorflow version: 2.5.0
If required: the full script can be found here. In both cases, I ran it from a terminal/command prompt as ~$ python LeleNet_trn.py "fcd" 6 60 -op "adam"
.
Update:
The problem apparently only occurs when I run the script from the home directory, specifying the path to the script as python ./some_folder/script.py
. When I run python ~/some_folder/script.py
or cd ./some_folder
python script.py
the issue does not occur.