#python #apache-spark #pyspark #apache-kafka
#python #apache-spark #pyspark #apache-kafka
Вопрос:
Я пытаюсь запустить Spark kafka с использованием SSL:
kafkaParams = {
'metadata.broker.list': 'localhost:8080, anotherhost:8080',
'security.protocol':'ssl',
'ssl.truststore.location':<location to .pem file>
}
ks = KafkaUtils.createDirectStream(ssc, [topic], kafkaParams)
При запуске KakfaUtils с использованием SSL я получаю следующее:
2020-10-14 02:06:55 WARN VerifiableProperties:83 - Property security.protocol is not valid
2020-10-14 02:06:55 WARN VerifiableProperties:83 - Property ssl.truststore.location is not valid
Traceback (most recent call last):
File "transformer_clc.py", line 27, in <module>
ks = KafkaUtils.createDirectStream(ssc, [topic], kafkaParams)
File "site-packages/pyspark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 146, in createDirectStream
File "site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o49.createDirectStreamWithoutMessageHandler.
: org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Сведения о конфигурации:
Spark: 2.4.0 Scala: 2.11.12
Я использую следующую команду для запуска потребителя:
spark-submit --jars jars-old/spark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar transformer_clc.py
Любая помощь приветствуется.
Комментарии:
1. вам действительно нужна потоковая передача Spark? Почему бы не использовать структурированную потоковую передачу, которая проще в использовании?