#apache-spark #pyspark #cassandra #spark-cassandra-connector
Вопрос:
Я пытался подать заявку на spark, но получил следующее исключение:
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 21/11/13 13:17:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2021-11-13T13:17:46 0330 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/11/13 13:17:47 INFO SparkContext: Running Spark version 3.2.0 21/11/13 13:17:47 INFO ResourceUtils: ============================================================== 21/11/13 13:17:47 INFO ResourceUtils: No custom resources configured for spark.driver. 21/11/13 13:17:47 INFO ResourceUtils: ============================================================== 21/11/13 13:17:47 INFO SparkContext: Submitted application: examstat 21/11/13 13:17:47 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -gt; name: cores, amount: 1, script: , vendor: , memory -gt; name: memory, amount: 1024, script: , vendor: , offHeap -gt; name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -gt; name: cpus, amount: 1.0) 21/11/13 13:17:47 INFO ResourceProfile: Limiting resource is cpu 21/11/13 13:17:47 INFO ResourceProfileManager: Added ResourceProfile id: 0 21/11/13 13:17:47 INFO SecurityManager: Changing view acls to: alisaberi 21/11/13 13:17:47 INFO SecurityManager: Changing modify acls to: alisaberi 21/11/13 13:17:47 INFO SecurityManager: Changing view acls groups to: 21/11/13 13:17:47 INFO SecurityManager: Changing modify acls groups to: 21/11/13 13:17:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(alisaberi); groups with view permissions: Set(); users with modify permissions: Set(alisaberi); groups with modify permissions: Set() 21/11/13 13:17:47 INFO Utils: Successfully started service 'sparkDriver' on port 62135. 21/11/13 13:17:47 INFO SparkEnv: Registering MapOutputTracker 21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMaster 21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 21/11/13 13:17:47 INFO DiskBlockManager: Created local directory at /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/blockmgr-e6d2444c-2aa6-4690-ac82-7a4ab1d86b6b 21/11/13 13:17:47 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB 21/11/13 13:17:47 INFO SparkEnv: Registering OutputCommitCoordinator 21/11/13 13:17:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 21/11/13 13:17:47 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.3:4040 21/11/13 13:17:47 INFO SparkContext: Added JAR file:///Users/alisaberi/Desktop/test-great-expectations/spark-cassandra-connector-assembly_2.12-3.1.0.jar at spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038 21/11/13 13:17:47 INFO Executor: Starting executor ID driver on host 192.168.1.3 21/11/13 13:17:47 INFO Executor: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038 21/11/13 13:17:47 INFO TransportClientFactory: Successfully created connection to /192.168.1.3:62135 after 42 ms (0 ms spent in bootstraps) 21/11/13 13:17:47 INFO Utils: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar to /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/fetchFileTemp11862606911562884947.tmp 21/11/13 13:17:48 INFO Executor: Adding file:/private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/spark-cassandra-connector-assembly_2.12-3.1.0.jar to class loader 21/11/13 13:17:48 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62138. 21/11/13 13:17:48 INFO NettyBlockTransferService: Server created on 192.168.1.3:62138 21/11/13 13:17:48 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 21/11/13 13:17:48 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.3:62138 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 WARN SparkSession: Cannot use com.datastax.spark.connector.CassandraSparkExtensions to configure session extensions. java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151) at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825) at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:468) at org.apache.spark.util.Utils$.classForName(Utils.scala:216) at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1(SparkSession.scala:1194) at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1$adapted(SparkSession.scala:1192) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$applyExtensions(SparkSession.scala:1192) at org.apache.spark.sql.SparkSession.lt;initgt;(SparkSession.scala:104) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ... 33 more 21/11/13 13:17:48 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 21/11/13 13:17:48 INFO SharedState: Warehouse path is 'file:/Users/alisaberi/Desktop/test-great-expectations/spark-warehouse'. /Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/context.py:77: FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead. Traceback (most recent call last): File "/Users/alisaberi/Desktop/test-great-expectations/test.py", line 33, in lt;modulegt; sqlContext.read File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 164, in load File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in __call__ File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o56.load. : java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151) at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825) at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646) at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:55) at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81) at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:233) at scala.Option.map(Option.scala:230) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ... 28 more 21/11/13 13:17:49 INFO SparkContext: Invoking stop() from shutdown hook 21/11/13 13:17:49 INFO SparkUI: Stopped Spark web UI at http://192.168.1.3:4040 21/11/13 13:17:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 21/11/13 13:17:49 INFO MemoryStore: MemoryStore cleared 21/11/13 13:17:49 INFO BlockManager: BlockManager stopped 21/11/13 13:17:49 INFO BlockManagerMaster: BlockManagerMaster stopped 21/11/13 13:17:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 21/11/13 13:17:49 INFO SparkContext: Successfully stopped SparkContext 21/11/13 13:17:49 INFO ShutdownHookManager: Shutdown hook called 21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-ef03b69b-8170-49e1-a24f-af46ff8ada7d 21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/pyspark-42c7c117-c948-4b16-82a6-39017769cff9 21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb
Приложение используется spark-cassandra-connector
для чтения с кассандры. Вот код:
from pyspark.sql import SQLContext, SparkSession from pyspark.context import SparkContext spark = SparkSession .builder .appName("Test") .master('local[*]') .config('spark.cassandra.connection.host', 'localhost') .getOrCreate() spark.read .format("org.apache.spark.sql.cassandra") .options(table="gps", keyspace="test") .load().show()
Я попробовал два разных подхода к подаче заявки:
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0 ./test.py
$SPARK_HOME/bin/spark-submit --jars /Full/Path/to/spark-cassandra-connector-assembly_2.12-3.1.0.jar
Кроме того, когда я запускаю тот же код в pyspark
оболочке, он работает нормально.
Spark 3.2.0
spark-cassandra-connector 3.1.0
cassandra 4.0.1
Комментарии:
1. какая версия Java используется? добавьте вывод
java -version
. Я не могу воспроизвести это — отлично работает с java 8, 11, 15. с обоими подходами2. @AlexOtt Вот результат
java --version
:java 15.0.2 2021-01-19 Java(TM) SE Runtime Environment (build 15.0.2 7-27) Java HotSpot(TM) 64-Bit Server VM (build 15.0.2 7-27, mixed mode, sharing)
3. попробуйте использовать java 11. Официально Spark совместим только с 8 и 11. Из документации: Spark работает на Java 8/11, Scala 2.12
4. @AlexOtt попробовал java 8, но все еще безуспешно
5. странный. работает для меня из коробки с вашим кодом