Не удалось найти предоставленный класс задания (com.job.className) в пользовательском каталоге библиотеки при развертывании apache flink 1.11 в kubernetes

#kubernetes

#kubernetes

Вопрос:

Сегодня я пытаюсь развернуть Apache Flink 1.11 в моем кластере kubernetes версии v1.16, следуя официальному документу, и это мои задания по развертыванию yaml, просто внесите небольшую правку (оригинал из официального контента, и я повторно экспортировал из kubernetes):

 [root@k8smaster flink]# cat flink-jobmanager.yaml 
apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: flink
    component: jobmanager
    job-name: flink-jobmanager
  name: flink-jobmanager
  namespace: infrastructure
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: flink
        component: jobmanager
        job-name: flink-jobmanager
    spec:
      containers:
      - args:
        - standalone-job
        - --job-classname
        - com.job.ClassName
        - <optional arguments>
        - <job arguments>
        image: flink:1.11.0-scala_2.11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 60
          successThreshold: 1
          tcpSocket:
            port: 6123
          timeoutSeconds: 1
        name: jobmanager
        ports:
        - containerPort: 6123
          name: rpc
          protocol: TCP
        - containerPort: 6124
          name: blob-server
          protocol: TCP
        - containerPort: 8081
          name: webui
          protocol: TCP
        securityContext:
          runAsUser: 9999
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/flink/conf
          name: flink-config-volume
        - mountPath: /opt/flink/usrlib
          name: job-artifacts-volume
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
          name: flink-config
        name: flink-config-volume
      - hostPath:
          path: /home
        name: job-artifacts-volume
  

задание выполнено успешно, но когда я захожу в модуль и проверяю журнал, он отображается следующим образом:

  Starting Job Manager
sed: couldn't open temporary file /opt/flink/conf/sedktEwGn: Read-only file system
sed: couldn't open temporary file /opt/flink/conf/sedvZ7Znn: Read-only file system
/docker-entrypoint.sh: 72: /docker-entrypoint.sh: cannot create /opt/flink/conf/flink-conf.yaml: Permission denied
/docker-entrypoint.sh: 91: /docker-entrypoint.sh: cannot create /opt/flink/conf/flink-conf.yaml.tmp: Read-only file system
Starting standalonejob as a console application on host flink-jobmanager-dfhtt.
2020-08-19 16:50:46,850 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
2020-08-19 16:50:46,852 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Preconfiguration: 
2020-08-19 16:50:46,852 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - 
JM_RESOURCE_PARAMS extraction logs:
jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456
logs: INFO  [] - Loading configuration property: jobmanager.rpc.address, flink-jobmanager
INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 2
INFO  [] - Loading configuration property: blob.server.port, 6124
INFO  [] - Loading configuration property: jobmanager.rpc.port, 6123
INFO  [] - Loading configuration property: taskmanager.rpc.port, 6122
INFO  [] - Loading configuration property: queryable-state.proxy.ports, 6125
INFO  [] - Loading configuration property: jobmanager.memory.process.size, 1600m
INFO  [] - Loading configuration property: taskmanager.memory.process.size, 1728m
INFO  [] - Loading configuration property: parallelism.default, 2
INFO  [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
INFO  [] - Final Master Memory configuration:
INFO  [] -   Total Process Memory: 1.563gb (1677721600 bytes)
INFO  [] -     Total Flink Memory: 1.125gb (1207959552 bytes)
INFO  [] -       JVM Heap:         1024.000mb (1073741824 bytes)
INFO  [] -       Off-heap:         128.000mb (134217728 bytes)
INFO  [] -     JVM Metaspace:      256.000mb (268435456 bytes)
INFO  [] -     JVM Overhead:       192.000mb (201326592 bytes)
2020-08-19 16:50:46,853 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
2020-08-19 16:50:46,853 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Starting StandaloneApplicationClusterEntryPoint (Version: 1.11.0, Scala: 2.11, Rev:d04872d, Date:2020-06-29T16:13:14 02:00)
2020-08-19 16:50:46,853 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  OS current user: flink
2020-08-19 16:50:46,853 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2020-08-19 16:50:46,853 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.262-b10
2020-08-19 16:50:46,853 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Maximum heap size: 989 MiBytes
2020-08-19 16:50:46,853 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  JAVA_HOME: /usr/local/openjdk-8
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  No Hadoop Dependency available
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  JVM Options:
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Xmx1073741824
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Xms1073741824
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -XX:MaxMetaspaceSize=268435456
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlog.file=/opt/flink/log/flink--standalonejob-0-flink-jobmanager-dfhtt.log
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Program Arguments:
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     --configDir
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     /opt/flink/conf
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     --job-classname
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     com.job.ClassName
2020-08-19 16:50:46,854 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     <optional arguments>
2020-08-19 16:50:46,855 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     <job arguments>
2020-08-19 16:50:46,855 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Classpath: /opt/flink/lib/flink-csv-1.11.0.jar:/opt/flink/lib/flink-json-1.11.0.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.11-1.11.0.jar:/opt/flink/lib/flink-table_2.11-1.11.0.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.11-1.11.0.jar:::
2020-08-19 16:50:46,855 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
2020-08-19 16:50:46,855 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Registered UNIX signal handlers for [TERM, HUP, INT]
2020-08-19 16:50:46,875 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Could not create application program.
org.apache.flink.util.FlinkException: Could not find the provided job class (com.job.ClassName) in the user lib directory (/opt/flink/usrlib).
    at org.apache.flink.client.deployment.application.ClassPathPackagedProgramRetriever.getJobClassNameOrScanClassPath(ClassPathPackagedProgramRetriever.java:140) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
    at org.apache.flink.client.deployment.application.ClassPathPackagedProgramRetriever.getPackagedProgram(ClassPathPackagedProgramRetriever.java:123) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
    at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:110) ~[flink-dist_2.11-1.11.0.jar:1.11.0]
    at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.main(StandaloneApplicationClusterEntryPoint.java:78) [flink-dist_2.11-1.11.0.jar:1.11.0]
  

вопрос заключается:

  • зачем сообщать Read-only file system в начале журнала? что я должен сделать, чтобы исправить?
  • почему показывает Could not find the provided job class (com.job.ClassName) in the user lib directory (/opt/flink/usrlib) и что я должен сделать, чтобы это исправить?

Я уже пробовал в 2 кластерах kubernetes (производственный кластер kubernetes версии v1.15 и мой кластер localhost kubernetes версииv1.18), и ошибка та же. и я понятия не имею, как это исправить сейчас.

Комментарии:

1. Не могли бы вы, пожалуйста, проверить, решает ли это ваши проблемы?

2. Я уже прочитал проблему, но все еще не имею понятия, я прочитаю ее снова, спасибо! @acid_fuji