#apache-spark #spark-structured-streaming
#apache-spark #spark-structured-streaming
Вопрос:
У меня есть 3 рабочих в пользовательском интерфейсе Spark, работающих на 3 гостевых виртуальных машинах (1 с ведущим и 2 подчиненными). Я пытаюсь запустить потоковое приложение Twitter, но у меня следующие ошибки. Я мог видеть другие темы о подобных проблемах, но я не понимаю ошибки. Я вижу, что в некоторых потоках упоминаются проблемы с памятью. Итак, я проверил, в порядке ли мой хост-процессор и оперативная память во время запуска spark submit. И хотя с оперативной памятью все в порядке, загрузка процессора достигает 100%. Может ли это быть проблемой? Любой вклад в понимание взаимоотношений между мастером и рабочими после запуска кластера будет оценен по достоинству.
ОСНОВНОЙ ВЫВОД:
Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host pd --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/05 00:11:05 INFO Master: Started daemon with process name: 10852@pd
21/03/05 00:11:06 INFO SignalUtils: Registering signal handler for TERM
21/03/05 00:11:06 INFO SignalUtils: Registering signal handler for HUP
21/03/05 00:11:06 INFO SignalUtils: Registering signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/03/05 00:11:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/05 00:11:13 INFO SecurityManager: Changing view acls to: *****
21/03/05 00:11:13 INFO SecurityManager: Changing modify acls to: *****
21/03/05 00:11:13 INFO SecurityManager: Changing view acls groups to:
21/03/05 00:11:13 INFO SecurityManager: Changing modify acls groups to:
21/03/05 00:11:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(*****); groups with view permissions: Set(); users with modify permissions: Set(*****); groups with modify permissions: Set()
21/03/05 00:11:17 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
21/03/05 00:11:18 INFO Master: Starting Spark master at spark://pd:7077
21/03/05 00:11:18 INFO Master: Running Spark version 3.1.1
21/03/05 00:11:19 INFO Utils: Successfully started service 'MasterUI' on port 8080.
21/03/05 00:11:20 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://pd:8080
21/03/05 00:11:21 INFO Master: Registering worker 10.0.2.15:34975 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:21 INFO Master: I have been elected leader! New state: ALIVE
21/03/05 00:11:25 INFO Master: Registering worker ***.***.**.104:43757 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:28 INFO Master: Registering worker 10.0.2.15:34975 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:31 INFO Master: Registering worker ***.***.**.103:38529 with 1 cores, 1024.0 MiB RAM
21/03/05 00:19:02 INFO Master: Registering app TwitterSentimentAnalysis
21/03/05 00:19:02 INFO Master: Registered app TwitterSentimentAnalysis with ID app-20210305001902-0000
21/03/05 00:19:02 INFO Master: Launching executor app-20210305001902-0000/0 on worker worker-20210305001307-***.***.**.103-38529
21/03/05 00:19:03 INFO Master: Launching executor app-20210305001902-0000/1 on worker worker-20210305001123-***.***.**.104-43757
21/03/05 00:19:03 INFO Master: Launching executor app-20210305001902-0000/2 on worker worker-20210305001308-10.0.2.15-34975
21/03/05 00:19:40 INFO Master: Received unregister request from application app-20210305001902-0000
21/03/05 00:19:40 INFO Master: Removing app app-20210305001902-0000
21/03/05 00:19:40 WARN Master: Got status update for unknown executor app-20210305001902-0000/0
21/03/05 00:19:40 WARN Master: Got status update for unknown executor app-20210305001902-0000/2
21/03/05 00:19:41 WARN Master: Got status update for unknown executor app-20210305001902-0000/1
21/03/05 00:19:41 INFO Master: ***.***.**.104:60360 got disassociated, removing it.
21/03/05 00:19:41 INFO Master: pd:40465 got disassociated, removing it.
21/03/05 00:47:39 INFO Master: Registering app TwitterSentimentAnalysis
21/03/05 00:47:39 INFO Master: Registered app TwitterSentimentAnalysis with ID app-20210305004739-0001
21/03/05 00:47:39 INFO Master: Launching executor app-20210305004739-0001/0 on worker worker-20210305001307-***.***.**.103-38529
21/03/05 00:47:39 INFO Master: Launching executor app-20210305004739-0001/1 on worker worker-20210305001123-***.***.**.104-43757
21/03/05 00:47:39 INFO Master: Launching executor app-20210305004739-0001/2 on worker worker-20210305001308-10.0.2.15-34975
21/03/05 00:48:05 INFO Master: Received unregister request from application app-20210305004739-0001
21/03/05 00:48:05 INFO Master: Removing app app-20210305004739-0001
21/03/05 00:48:05 WARN Master: Got status update for unknown executor app-20210305004739-0001/2
21/03/05 00:48:05 WARN Master: Got status update for unknown executor app-20210305004739-0001/0
21/03/05 00:48:06 WARN Master: Got status update for unknown executor app-20210305004739-0001/1
21/03/05 00:48:06 INFO Master: ***.***.**.104:60388 got disassociated, removing it.
21/03/05 00:48:06 INFO Master: pd:34007 got disassociated, removing it.
21/03/05 00:48:52 INFO Master: Registering app TwitterSentimentAnalysis
21/03/05 00:48:52 INFO Master: Registered app TwitterSentimentAnalysis with ID app-20210305004852-0002
21/03/05 00:48:52 INFO Master: Launching executor app-20210305004852-0002/0 on worker worker-20210305001307-***.***.**.103-38529
21/03/05 00:48:52 INFO Master: Launching executor app-20210305004852-0002/1 on worker worker-20210305001123-***.***.**.104-43757
21/03/05 00:48:52 INFO Master: Launching executor app-20210305004852-0002/2 on worker worker-20210305001308-10.0.2.15-34975
21/03/05 00:49:19 INFO Master: Received unregister request from application app-20210305004852-0002
21/03/05 00:49:19 INFO Master: Removing app app-20210305004852-0002
21/03/05 00:49:19 WARN Master: Got status update for unknown executor app-20210305004852-0002/0
21/03/05 00:49:19 WARN Master: Got status update for unknown executor app-20210305004852-0002/2
21/03/05 00:49:19 WARN Master: Got status update for unknown executor app-20210305004852-0002/1
21/03/05 00:49:19 INFO Master: ***.***.**.104:60402 got disassociated, removing it.
21/03/05 00:49:19 INFO Master: pd:42517 got disassociated, removing it.
РАБОЧИЙ ВЫВОД ‘
Spark Command: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://pd:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/05 00:11:15 INFO Worker: Started daemon with process name: 10993@pd
21/03/05 00:11:15 INFO SignalUtils: Registering signal handler for TERM
21/03/05 00:11:15 INFO SignalUtils: Registering signal handler for HUP
21/03/05 00:11:15 INFO SignalUtils: Registering signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/03/05 00:11:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/05 00:11:20 INFO SecurityManager: Changing view acls to: ****
21/03/05 00:11:20 INFO SecurityManager: Changing modify acls to: ****
21/03/05 00:11:20 INFO SecurityManager: Changing view acls groups to:
21/03/05 00:11:20 INFO SecurityManager: Changing modify acls groups to:
21/03/05 00:11:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(*******); groups with view permissions: Set(); users with modify permissions: Set(*****); groups with modify permissions: Set()
21/03/05 00:11:23 INFO Utils: Successfully started service 'sparkWorker' on port 43757.
21/03/05 00:11:23 INFO Worker: Worker decommissioning not enabled, SIGPWR will result in exiting.
21/03/05 00:11:24 INFO Worker: Starting Spark worker ***.***.**.104:43757 with 1 cores, 1024.0 MiB RAM
21/03/05 00:11:24 INFO Worker: Running Spark version 3.1.1
21/03/05 00:11:24 INFO Worker: Spark home: /opt/spark
21/03/05 00:11:24 INFO ResourceUtils: ==============================================================
21/03/05 00:11:24 INFO ResourceUtils: No custom resources configured for spark.worker.
21/03/05 00:11:24 INFO ResourceUtils: ==============================================================
21/03/05 00:11:24 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
21/03/05 00:11:24 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://pd:8081
21/03/05 00:11:24 INFO Worker: Connecting to master pd:7077...
21/03/05 00:11:24 INFO TransportClientFactory: Successfully created connection to pd/***.***.**.104:7077 after 98 ms (0 ms spent in bootstraps)
21/03/05 00:11:25 INFO Worker: Successfully registered with master spark://pd:7077
21/03/05 00:19:03 INFO Worker: Asked to launch executor app-20210305001902-0000/1 for TwitterSentimentAnalysis
21/03/05 00:19:03 INFO SecurityManager: Changing view acls to: *****
21/03/05 00:19:03 INFO SecurityManager: Changing modify acls to: *****
21/03/05 00:19:03 INFO SecurityManager: Changing view acls groups to:
21/03/05 00:19:03 INFO SecurityManager: Changing modify acls groups to:
21/03/05 00:19:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(*****); groups with view permissions: Set(); users with modify permissions: Set(*****); groups with modify permissions: Set()
21/03/05 00:19:04 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=40465" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:40465" "--executor-id" "1" "--hostname" "***.***.**.104" "--cores" "1" "--app-id" "app-20210305001902-0000" "--worker-url" "spark://Worker@***.***.**.104:43757"
21/03/05 00:19:41 INFO Worker: Executor app-20210305001902-0000/1 finished with state EXITED message Command exited with code 0 exitStatus 0
21/03/05 00:19:41 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 1
21/03/05 00:19:41 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210305001902-0000, execId=1)
21/03/05 00:19:41 INFO Worker: Asked to kill unknown executor app-20210305001902-0000/1
21/03/05 00:19:41 INFO ExternalShuffleBlockResolver: Application app-20210305001902-0000 removed, cleanupLocalDirs = true
21/03/05 00:19:41 INFO Worker: Cleaning up local directories for application app-20210305001902-0000
21/03/05 00:47:39 INFO Worker: Asked to launch executor app-20210305004739-0001/1 for TwitterSentimentAnalysis
21/03/05 00:47:39 INFO SecurityManager: Changing view acls to: *****
21/03/05 00:47:39 INFO SecurityManager: Changing modify acls to: ******
21/03/05 00:47:39 INFO SecurityManager: Changing view acls groups to:
21/03/05 00:47:39 INFO SecurityManager: Changing modify acls groups to:
21/03/05 00:47:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(********); groups with view permissions: Set(); users with modify permissions: Set(delalma); groups with modify permissions: Set()
21/03/05 00:47:41 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=34007" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:34007" "--executor-id" "1" "--hostname" "***.***.**.104" "--cores" "1" "--app-id" "app-20210305004739-0001" "--worker-url" "spark://Worker@***.***.**.104:43757"
21/03/05 00:48:05 INFO Worker: Asked to kill executor app-20210305004739-0001/1
21/03/05 00:48:05 INFO ExecutorRunner: Runner thread for executor app-20210305004739-0001/1 interrupted
21/03/05 00:48:05 INFO ExecutorRunner: Killing process!
21/03/05 00:48:06 INFO Worker: Executor app-20210305004739-0001/1 finished with state KILLED exitStatus 0
21/03/05 00:48:06 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 1
21/03/05 00:48:06 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210305004739-0001, execId=1)
21/03/05 00:48:06 INFO Worker: Cleaning up local directories for application app-20210305004739-0001
21/03/05 00:48:06 INFO ExternalShuffleBlockResolver: Application app-20210305004739-0001 removed, cleanupLocalDirs = true
21/03/05 00:48:52 INFO Worker: Asked to launch executor app-20210305004852-0002/1 for TwitterSentimentAnalysis
21/03/05 00:48:52 INFO SecurityManager: Changing view acls to: *******
21/03/05 00:48:52 INFO SecurityManager: Changing modify acls to: ******
21/03/05 00:48:52 INFO SecurityManager: Changing view acls groups to:
21/03/05 00:48:52 INFO SecurityManager: Changing modify acls groups to:
21/03/05 00:48:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(*******); groups with view permissions: Set(); users with modify permissions: Set(delalma); groups with modify permissions: Set()
21/03/05 00:48:54 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=42517" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:42517" "--executor-id" "1" "--hostname" ***.***.**.104" "--cores" "1" "--app-id" "app-20210305004852-0002" "--worker-url" "spark://Worker@***.***.**.104:43757"
21/03/05 00:49:19 INFO Worker: Asked to kill executor app-20210305004852-0002/1
21/03/05 00:49:19 INFO ExecutorRunner: Runner thread for executor app-20210305004852-0002/1 interrupted
21/03/05 00:49:19 INFO ExecutorRunner: Killing process!
21/03/05 00:49:19 INFO Worker: Executor app-20210305004852-0002/1 finished with state KILLED exitStatus 143
21/03/05 00:49:19 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 1
21/03/05 00:49:19 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210305004852-0002, execId=1)
21/03/05 00:49:19 INFO Worker: Cleaning up local directories for application app-20210305004852-0002
21/03/05 00:49:19 INFO ExternalShuffleBlockResolver: Application app-20210305004852-0002 removed, cleanupLocalDirs = true
РАБОЧИЙ ЖУРНАЛ НА ГЛАВНОМ УЗЛЕ ИЗ пользовательского интерфейса:
Spark Executor Command: "/usr/lib/jvm/java-11-openjdk-amd64/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=42517" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@pd:42517" "--executor-id" "1" "--hostname" "***.***.**.104" "--cores" "1" "--app-id" "app-20210305004852-0002" "--worker-url" "spark://Worker@***.***.**.104:43757"
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/05 00:49:02 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 11595@pd
21/03/05 00:49:02 INFO SignalUtils: Registering signal handler for TERM
21/03/05 00:49:02 INFO SignalUtils: Registering signal handler for HUP
21/03/05 00:49:02 INFO SignalUtils: Registering signal handler for INT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/03/05 00:49:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/05 00:49:07 INFO SecurityManager: Changing view acls to: ******
21/03/05 00:49:07 INFO SecurityManager: Changing modify acls to: ******
21/03/05 00:49:07 INFO SecurityManager: Changing view acls groups to:
21/03/05 00:49:07 INFO SecurityManager: Changing modify acls groups to:
21/03/05 00:49:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(*****); groups with view permissions: Set(); users with modify permissions: Set(****); groups with modify permissions: Set()
21/03/05 00:49:08 INFO TransportClientFactory: Successfully created connection to pd/***.***.**.104:42517 after 336 ms (0 ms spent in bootstraps)
21/03/05 00:49:09 INFO SecurityManager: Changing view acls to: ******
21/03/05 00:49:09 INFO SecurityManager: Changing modify acls to: *****
21/03/05 00:49:09 INFO SecurityManager: Changing view acls groups to:
21/03/05 00:49:09 INFO SecurityManager: Changing modify acls groups to:
21/03/05 00:49:09 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(*****); groups with view permissions: Set(); users with modify permissions: Set(********); groups with modify permissions: Set()
21/03/05 00:49:10 INFO TransportClientFactory: Successfully created connection to pd/***.***.**.104:42517 after 26 ms (0 ms spent in bootstraps)
21/03/05 00:49:10 INFO DiskBlockManager: Created local directory at /tmp/spark-64626d64-5b29-4a13-8ee2-cc98d61f7a2c/executor-db784ed0-ccba-4e37-b97b-325221d118e0/blockmgr-3da06abd-90af-4a44-a423-c4b5253427df
21/03/05 00:49:10 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB
21/03/05 00:49:12 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@pd:42517
21/03/05 00:49:12 INFO WorkerWatcher: Connecting to worker spark://Worker@***.***.**.104:43757
21/03/05 00:49:12 INFO TransportClientFactory: Successfully created connection to /***.***.**.104:43757 after 27 ms (0 ms spent in bootstraps)
21/03/05 00:49:12 INFO WorkerWatcher: Successfully connected to spark://Worker@***.***.**.104:43757
21/03/05 00:49:12 INFO ResourceUtils: ==============================================================
21/03/05 00:49:12 INFO ResourceUtils: No custom resources configured for spark.executor.
21/03/05 00:49:12 INFO ResourceUtils: ==============================================================
21/03/05 00:49:12 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
21/03/05 00:49:12 INFO Executor: Starting executor ID 1 on host ***.***.**.104
21/03/05 00:49:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41941.
21/03/05 00:49:13 INFO NettyBlockTransferService: Server created on ***.***.**.104:41941
21/03/05 00:49:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/05 00:49:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, ***.***.**.104, 41941, None)
21/03/05 00:49:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, ***.***.**.104, 41941, None)
21/03/05 00:49:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, ***.***.**.104, 41941, None)
21/03/05 00:49:19 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
21/03/05 00:49:19 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
tdown
Комментарии:
1. Можете ли вы показать журналы ошибок, на которые вы ссылаетесь, и полную трассировку стека? Я могу видеть только информацию и журналы предупреждений.
2. @mike Я добавил stderr из WORKER В ЖУРНАЛ ГЛАВНОГО УЗЛА. Помогает ли это вам увидеть, что произошло? Спасибо
3. @mike также добавил полный стек журналов MASTER и WORKER
4. Может быть, вы пропустили использование
awaitTermination
в своем коде?5. @mike спасибо, ты наставил меня на правильный путь. Проблема не в кластере spark, а в коде python. Я использовал именно следующее tuto ch-nabarun.medium.com /. … Но я понял, что подключение к сокету выдает ошибку 401. Я пока не уверен, как это решить..