Гибкие формы, не работающие с преобразованием ONNX в MLModel с использованием coremltools 4

#python #coreml #coremltools #mlmodel #onnx-coreml

#python #coreml #основные инструменты #mlmodel #onnx-coreml

Вопрос:

Я не могу заставить гибкие формы работать с ONNX моделью, которую я конвертирую в MLModel использование coremltools 4.0 . Исходная модель взята из PyTorch , но я не могу использовать новое унифицированное преобразование, потому что coremltools в настоящее время не поддерживает reflection_pad2d слой, используемый в модели.

coremltools компилирует модель без каких-либо предупреждений или ошибок и показывает, что гибкие формы поддерживаются:

 input {
  name: "input"
  type {
    imageType {
      width: 1024
      height: 1024
      colorSpace: BGR
      imageSizeRange {
        widthRange {
          lowerBound: 256
          upperBound: -1
        }
        heightRange {
          lowerBound: 256
          upperBound: -1
        }
      }
    }
  }
}
output {
  name: "output"
  type {
    imageType {
      width: 1024
      height: 1024
      colorSpace: RGB
      imageSizeRange {
        widthRange {
          lowerBound: 256
          upperBound: -1
        }
        heightRange {
          lowerBound: 256
          upperBound: -1
        }
      }
    }
  }
}
  

Но выполнение прогноза для модели завершается ошибкой с сообщением:

 MyApp[5773:4974761] [espresso] [Espresso::handle_ex_plan] exception=Invalid X-dimension 1/814 status=-7
MyApp[5773:4974761] [coreml] Error binding image input buffer input: -7
MyApp[5773:4974761] [coreml] Failure in bindInputsAndOutputs.
prediction error: Error Domain=com.apple.CoreML Code=0 "Error binding image input buffer input." UserInfo={NSLocalizedDescription=Error binding image input buffer input.}
  

Перечисленные формы будут работать с моделью, но этого недостаточно без наличия более 10 тыс. перечисленных фигур, что просто не похоже на решение.

Модель представляет собой полностью сверточную сеть, похоже, она не использует какие-либо фиксированные формы (см. Вывод спецификации), и она работает с разными формами PyTorch , поэтому кажется, что должна быть возможность заставить гибкие формы работать каким-то образом.

Я пробовал использовать гибкие формы ввода с использованием ввода / вывода изображения:

 input_names=['input']
output_names=['output']
channels = 3
input_shape = ct.Shape(shape=(channels, ct.RangeDim(), ct.RangeDim()))
#also tried:
input_shape = ct.Shape(shape=(channels, ct.RangeDim(256, 4096), ct.RangeDim(256, 4096)))
#and:
input_shape = ct.Shape(shape=(channels, ct.RangeDim(256, -1), ct.RangeDim(256, -1)))

model_input = ct.TensorType(shape=input_shape)
mlmodel = convert('torch_model.onnx',
            [model_input], 
            image_input_names=input_names,
            image_output_names=output_names,
            ...
)

spec = mlmodel.get_spec()

#tried with and without adding flexible shapes
spec = add_flexible_shapes(spec)

def add_flexible_shapes(spec):
    img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange(height_range=(256, -1), width_range=(256, -1))
    #also tried:
    #img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange(height_range=(256, 4096), width_range=(256, 4096))
    flexible_shape_utils.update_image_size_range(spec, feature_name=input_names[0], size_range=img_size_ranges)
    flexible_shape_utils.update_image_size_range(spec, feature_name=output_names[0], size_range=img_size_ranges)
    return spec     
  

Я также попытался сначала преобразовать модель в виде многомассива, затем преобразовать в изображение, а затем добавить гибкие формы.

  torch.onnx.export(torch_model, example_input, 'torch_model.onnx', input_names=input_names, output_names=output_names, verbose=True)
 mlmodel = ct.converters.onnx.convert(model='torch_model.onnx',
                                 ...
 spec = mlmodel.get_spec()

 input = spec.description.input[0]
 input.type.imageType.colorSpace = ft.ImageFeatureType.RGB
 input.type.imageType.height = 1024
 input.type.imageType.width = 1024

 output = spec.description.output[0]
 output.type.imageType.colorSpace = ft.ImageFeatureType.RGB
 output.type.imageType.height = 1024
 output.type.imageType.width = 1024
                                     
 spec = add_flexible_shapes(spec)
  

Я просмотрел все слои в спецификации, и я не вижу ни одного, который использует фиксированную форму (кроме слоев ввода / вывода):

 specificationVersion: 4
description {
  input {
    name: "input"
    type {
      imageType {
        width: 1024
        height: 1024
        colorSpace: RGB
      }
    }
  }
  output {
    name: "output"
    type {
      imageType {
        width: 1024
        height: 1024
        colorSpace: RGB
      }
    }
  }
  metadata {
    userDefined {
      key: "com.github.apple.coremltools.source"
      value: "onnx==1.7.0"
    }
    userDefined {
      key: "com.github.apple.coremltools.version"
      value: "4.0"
    }
  }
}
neuralNetwork {
  layers {
    name: "Pad_0"
    input: "input"
    output: "63"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
      }
    }
  }
  layers {
    name: "Conv_1"
    input: "63"
    output: "64"
    convolution {
      outputChannels: 16
      kernelChannels: 3
      nGroups: 1
      kernelSize: 9
      kernelSize: 9
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_2"
    input: "64"
    output: "65"
    batchnorm {
      channels: 16
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_3"
    input: "65"
    output: "66"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_4"
    input: "66"
    output: "67"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_5"
    input: "67"
    output: "68"
    convolution {
      outputChannels: 32
      kernelChannels: 16
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_6"
    input: "68"
    output: "69"
    batchnorm {
      channels: 32
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_7"
    input: "69"
    output: "70"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_8"
    input: "70"
    output: "71"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_9"
    input: "71"
    output: "72"
    convolution {
      outputChannels: 64
      kernelChannels: 32
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_10"
    input: "72"
    output: "73"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_11"
    input: "73"
    output: "74"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_12"
    input: "74"
    output: "75"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_13"
    input: "75"
    output: "76"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_14"
    input: "76"
    output: "77"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_15"
    input: "77"
    output: "78"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_16"
    input: "78"
    output: "79"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_17"
    input: "79"
    output: "80"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_18"
    input: "80"
    output: "81"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_19"
    input: "81"
    input: "74"
    output: "82"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_20"
    input: "82"
    output: "83"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_21"
    input: "83"
    output: "84"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_22"
    input: "84"
    output: "85"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_23"
    input: "85"
    output: "86"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_24"
    input: "86"
    output: "87"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_25"
    input: "87"
    output: "88"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_26"
    input: "88"
    output: "89"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_27"
    input: "89"
    input: "82"
    output: "90"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_28"
    input: "90"
    output: "91"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_29"
    input: "91"
    output: "92"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_30"
    input: "92"
    output: "93"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_31"
    input: "93"
    output: "94"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_32"
    input: "94"
    output: "95"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_33"
    input: "95"
    output: "96"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_34"
    input: "96"
    output: "97"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_35"
    input: "97"
    input: "90"
    output: "98"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_36"
    input: "98"
    output: "99"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_37"
    input: "99"
    output: "100"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_38"
    input: "100"
    output: "101"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_39"
    input: "101"
    output: "102"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_40"
    input: "102"
    output: "103"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_41"
    input: "103"
    output: "104"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_42"
    input: "104"
    output: "105"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_43"
    input: "105"
    input: "98"
    output: "106"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_44"
    input: "106"
    output: "107"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_45"
    input: "107"
    output: "108"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_46"
    input: "108"
    output: "109"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_47"
    input: "109"
    output: "110"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_48"
    input: "110"
    output: "111"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_49"
    input: "111"
    output: "112"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_50"
    input: "112"
    output: "113"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_51"
    input: "113"
    input: "106"
    output: "114"
    addBroadcastable {
    }
  }
  layers {
    name: "Upsample_52"
    input: "114"
    output: "123"
    upsample {
      scalingFactor: 4
      scalingFactor: 4
    }
  }
  layers {
    name: "Pad_53"
    input: "123"
    output: "124"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_54"
    input: "124"
    output: "125"
    convolution {
      outputChannels: 32
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_55"
    input: "125"
    output: "126"
    batchnorm {
      channels: 32
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_56"
    input: "126"
    output: "127"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Upsample_57"
    input: "127"
    output: "136"
    upsample {
      scalingFactor: 4
      scalingFactor: 4
      mode: BILINEAR
    }
  }
  layers {
    name: "Pad_58"
    input: "136"
    output: "137"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_59"
    input: "137"
    output: "138"
    convolution {
      outputChannels: 16
      kernelChannels: 32
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_60"
    input: "138"
    output: "139"
    batchnorm {
      channels: 16
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_61"
    input: "139"
    output: "140"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_62"
    input: "140"
    output: "141"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
      }
    }
  }
  layers {
    name: "Conv_63"
    input: "141"
    output: "output"
    convolution {
      outputChannels: 3
      kernelChannels: 16
      nGroups: 1
      kernelSize: 9
      kernelSize: 9
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  arrayInputShapeMapping: EXACT_ARRAY_MAPPING
  imageInputShapeMapping: RANK4_IMAGE_MAPPING
}
  

Комментарии:

1. Я видел подобные отчеты в прошлом. Это звучит как ошибка в Core ML. Если вы хотите разобраться в этом, вы могли бы выполнить тип двоичного поиска, где вы удаляете нижнюю половину слоев из модели и смотрите, работает ли это сейчас. Если нет, удалите половину оставшихся слоев и т. Д. В какой-то момент вы можете обнаружить, что модель снова работает, и вы можете определить слой, на котором что-то пошло не так.

2. Спасибо за советы! Я смог полностью воспроизвести проблему, используя только два слоя conv2d, поэтому что-то с преобразованием ONNX должно быть фундаментально нарушено. Я опубликовал отчет об ошибке с полностью воспроизводимыми инструкциями здесь: github.com/apple/coremltools/issues/988