Spark на kubernetes не запускает исполнителей, даже не пытается, почему?

#apache-spark #kubernetes

Вопрос:

Следуя инструкциям, я пытаюсь развернуть свое приложение pyspark на уровне Azure AKS free с помощью spark.executor.instances=5

 spark-submit 
    --master k8s://https://xxxxxxx-xxxxxxx.hcp.westeurope.azmk8s.io:443 
    --deploy-mode cluster 
    --name sparkbasics 
    --conf spark.executor.instances=5 
    --conf spark.kubernetes.container.image=aosb06.azurecr.io/sparkbasics:v300 
    local:///opt/spark/work-dir/main.py

 

Все работает нормально (включая само приложение), за исключением того, что я вообще не вижу модулей исполнителей, только модуль драйвера.

 kubectl get pods                                
NAME                                  READY   STATUS      RESTARTS   AGE
sparkbasics-f374377b3c78ac68-driver   0/1     Completed   0          52m
 

Файл Dockerfile из дистрибутива Spark.

В чем может быть проблема? Есть ли проблемы с распределением ресурсов?

В журналах драйверов, похоже, нет никаких проблем.

 kubectl logs <driver-pod>

021-08-12 22:25:54,332 INFO spark.SparkContext: Running Spark version 3.1.2
2021-08-12 22:25:54,378 INFO resource.ResourceUtils: ==============================================================
2021-08-12 22:25:54,378 INFO resource.ResourceUtils: No custom resources configured for spark.driver.
2021-08-12 22:25:54,379 INFO resource.ResourceUtils: ==============================================================
2021-08-12 22:25:54,379 INFO spark.SparkContext: Submitted application: SimpleApp
2021-08-12 22:25:54,403 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
2021-08-12 22:25:54,422 INFO resource.ResourceProfile: Limiting resource is cpu
2021-08-12 22:25:54,422 INFO resource.ResourceProfileManager: Added ResourceProfile id: 0
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing view acls to: 185,aovsyannikov
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing modify acls to: 185,aovsyannikov
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing view acls groups to: 
2021-08-12 22:25:54,475 INFO spark.SecurityManager: Changing modify acls groups to: 
2021-08-12 22:25:54,475 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(185, aovsyannikov); groups with view permissions: Set(); users  with modify permissions: Set(185, aovsyannikov); groups with modify permissions: Set()
2021-08-12 22:25:54,717 INFO util.Utils: Successfully started service 'sparkDriver' on port 7078.
2021-08-12 22:25:54,781 INFO spark.SparkEnv: Registering MapOutputTracker
2021-08-12 22:25:54,818 INFO spark.SparkEnv: Registering BlockManagerMaster
2021-08-12 22:25:54,843 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2021-08-12 22:25:54,844 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
2021-08-12 22:25:54,848 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat
2021-08-12 22:25:54,862 INFO storage.DiskBlockManager: Created local directory at /var/data/spark-1e9aa64b-e0a1-44ae-a097-ebb3c2f32404/blockmgr-c51b9095-5426-4a00-b17a-461de2b80357
2021-08-12 22:25:54,892 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MiB
2021-08-12 22:25:54,909 INFO spark.SparkEnv: Registering OutputCommitCoordinator
2021-08-12 22:25:55,023 INFO util.log: Logging initialized @3324ms to org.sparkproject.jetty.util.log.Slf4jLog
2021-08-12 22:25:55,114 INFO server.Server: jetty-9.4.40.v20210413; built: 2021-04-13T20:42:42.668Z; git: b881a572662e1943a14ae12e7e1207989f218b74; jvm 1.8.0_275-b01
2021-08-12 22:25:55,139 INFO server.Server: Started @3442ms
2021-08-12 22:25:55,184 INFO server.AbstractConnector: Started ServerConnector@59b3b32{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2021-08-12 22:25:55,184 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

 
 kubectl describe pod <driver-pod>
Name:         sparkbasics-f374377b3c78ac68-driver
Namespace:    default
Priority:     0
Node:         aks-default-31057657-vmss000000/10.240.0.4
Start Time:   Fri, 13 Aug 2021 01:25:47  0300
Labels:       spark-app-selector=spark-256cc7f64af9451b89e0098397980974
              spark-role=driver
Annotations:  <none>
Status:       Succeeded
IP:           10.244.0.28
IPs:
  IP:  10.244.0.28
Containers:
  spark-kubernetes-driver:
    Container ID:  containerd://b572a4056014cd4b0520b808d64d766254d30c44ba12fc98717aee3b4814f17d
    Image:         aosb06.azurecr.io/sparkbasics:v300
    Image ID:      aosb06.azurecr.io/sparkbasics@sha256:965393784488025fffc7513edcb4a62333ba59a5ee3076346fd8d335e1715213
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      org.apache.spark.deploy.PythonRunner
      local:///opt/spark/work-dir/main.py
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 13 Aug 2021 01:25:51  0300
      Finished:     Fri, 13 Aug 2021 01:56:40  0300
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  1433Mi
    Requests:
      cpu:     1
      memory:  1433Mi
    Environment:
      SPARK_USER:                 aovsyannikov
      SPARK_APPLICATION_ID:       spark-256cc7f64af9451b89e0098397980974
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SB_KEY_STORAGE:             <set to the key 'STORAGE' in secret 'sparkbasics'>     Optional: false
      SB_KEY_OPENCAGE:            <set to the key 'OPENCAGE' in secret 'sparkbasics'>    Optional: false
      SB_KEY_STORAGEOUT:          <set to the key 'STORAGEOUT' in secret 'sparkbasics'>  Optional: false
      SPARK_LOCAL_DIRS:           /var/data/spark-1e9aa64b-e0a1-44ae-a097-ebb3c2f32404
      SPARK_CONF_DIR:             /opt/spark/conf
    Mounts:
      /opt/spark/conf from spark-conf-volume-driver (rw)
      /var/data/spark-1e9aa64b-e0a1-44ae-a097-ebb3c2f32404 from spark-local-dir-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wlqjt (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  spark-local-dir-1:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  spark-conf-volume-driver:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      spark-drv-6f83b17b3c78af1f-conf-map
    Optional:  false
  default-token-wlqjt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-wlqjt
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>
 

Комментарии:

1. Не могли бы вы добавить вывод kubectl describe pod sparkbasics-f374377b3c78ac68-driver (раздел событий) и kubectl logs sparkbasics-f374377b3c78ac68-driver команд?

Ответ №1:

Я обнаружил ошибку в самом приложении pyspark.

     ...
    SparkSession.builder.master("local")
    ...
 

Должно быть без хозяина

     ...
    SparkSession.builder
    ...

 

вот так просто 🙁