Получение NoClassDefFoundError с помощью Spark с помощью spark-кассандра-разъем 3.1.0

#apache-spark #pyspark #cassandra #spark-cassandra-connector

Вопрос:

Я пытался подать заявку на spark, но получил следующее исключение:

 WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 21/11/13 13:17:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2021-11-13T13:17:46 0330 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/11/13 13:17:47 INFO SparkContext: Running Spark version 3.2.0 21/11/13 13:17:47 INFO ResourceUtils: ============================================================== 21/11/13 13:17:47 INFO ResourceUtils: No custom resources configured for spark.driver. 21/11/13 13:17:47 INFO ResourceUtils: ============================================================== 21/11/13 13:17:47 INFO SparkContext: Submitted application: examstat 21/11/13 13:17:47 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -gt; name: cores, amount: 1, script: , vendor: , memory -gt; name: memory, amount: 1024, script: , vendor: , offHeap -gt; name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -gt; name: cpus, amount: 1.0) 21/11/13 13:17:47 INFO ResourceProfile: Limiting resource is cpu 21/11/13 13:17:47 INFO ResourceProfileManager: Added ResourceProfile id: 0 21/11/13 13:17:47 INFO SecurityManager: Changing view acls to: alisaberi 21/11/13 13:17:47 INFO SecurityManager: Changing modify acls to: alisaberi 21/11/13 13:17:47 INFO SecurityManager: Changing view acls groups to:  21/11/13 13:17:47 INFO SecurityManager: Changing modify acls groups to:  21/11/13 13:17:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(alisaberi); groups with view permissions: Set(); users with modify permissions: Set(alisaberi); groups with modify permissions: Set() 21/11/13 13:17:47 INFO Utils: Successfully started service 'sparkDriver' on port 62135. 21/11/13 13:17:47 INFO SparkEnv: Registering MapOutputTracker 21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMaster 21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 21/11/13 13:17:47 INFO DiskBlockManager: Created local directory at /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/blockmgr-e6d2444c-2aa6-4690-ac82-7a4ab1d86b6b 21/11/13 13:17:47 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB 21/11/13 13:17:47 INFO SparkEnv: Registering OutputCommitCoordinator 21/11/13 13:17:47 INFO Utils: Successfully started service 'SparkUI' on port 4040. 21/11/13 13:17:47 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.3:4040 21/11/13 13:17:47 INFO SparkContext: Added JAR file:///Users/alisaberi/Desktop/test-great-expectations/spark-cassandra-connector-assembly_2.12-3.1.0.jar at spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038 21/11/13 13:17:47 INFO Executor: Starting executor ID driver on host 192.168.1.3 21/11/13 13:17:47 INFO Executor: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038 21/11/13 13:17:47 INFO TransportClientFactory: Successfully created connection to /192.168.1.3:62135 after 42 ms (0 ms spent in bootstraps) 21/11/13 13:17:47 INFO Utils: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar to /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/fetchFileTemp11862606911562884947.tmp 21/11/13 13:17:48 INFO Executor: Adding file:/private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/spark-cassandra-connector-assembly_2.12-3.1.0.jar to class loader 21/11/13 13:17:48 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62138. 21/11/13 13:17:48 INFO NettyBlockTransferService: Server created on 192.168.1.3:62138 21/11/13 13:17:48 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 21/11/13 13:17:48 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.3:62138 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.3, 62138, None) 21/11/13 13:17:48 WARN SparkSession: Cannot use com.datastax.spark.connector.CassandraSparkExtensions to configure session extensions. java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging  at java.base/java.lang.ClassLoader.defineClass1(Native Method)  at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)  at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)  at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825)  at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723)  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646)  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)  at java.base/java.lang.Class.forName0(Native Method)  at java.base/java.lang.Class.forName(Class.java:468)  at org.apache.spark.util.Utils$.classForName(Utils.scala:216)  at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1(SparkSession.scala:1194)  at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1$adapted(SparkSession.scala:1192)  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$applyExtensions(SparkSession.scala:1192)  at org.apache.spark.sql.SparkSession.lt;initgt;(SparkSession.scala:104)  at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64)  at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)  at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)  at py4j.Gateway.invoke(Gateway.java:238)  at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)  at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)  at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)  at py4j.ClientServerConnection.run(ClientServerConnection.java:106)  at java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)  ... 33 more 21/11/13 13:17:48 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 21/11/13 13:17:48 INFO SharedState: Warehouse path is 'file:/Users/alisaberi/Desktop/test-great-expectations/spark-warehouse'. /Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/context.py:77: FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead. Traceback (most recent call last):  File "/Users/alisaberi/Desktop/test-great-expectations/test.py", line 33, in lt;modulegt;  sqlContext.read  File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 164, in load  File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in __call__  File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco  File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o56.load. : java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging  at java.base/java.lang.ClassLoader.defineClass1(Native Method)  at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)  at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)  at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825)  at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723)  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646)  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)  at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:55)  at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)  at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:233)  at scala.Option.map(Option.scala:230)  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  at java.base/java.lang.reflect.Method.invoke(Method.java:564)  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)  at py4j.Gateway.invoke(Gateway.java:282)  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)  at py4j.commands.CallCommand.execute(CallCommand.java:79)  at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)  at py4j.ClientServerConnection.run(ClientServerConnection.java:106)  at java.base/java.lang.Thread.run(Thread.java:832) Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)  ... 28 more  21/11/13 13:17:49 INFO SparkContext: Invoking stop() from shutdown hook 21/11/13 13:17:49 INFO SparkUI: Stopped Spark web UI at http://192.168.1.3:4040 21/11/13 13:17:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 21/11/13 13:17:49 INFO MemoryStore: MemoryStore cleared 21/11/13 13:17:49 INFO BlockManager: BlockManager stopped 21/11/13 13:17:49 INFO BlockManagerMaster: BlockManagerMaster stopped 21/11/13 13:17:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 21/11/13 13:17:49 INFO SparkContext: Successfully stopped SparkContext 21/11/13 13:17:49 INFO ShutdownHookManager: Shutdown hook called 21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-ef03b69b-8170-49e1-a24f-af46ff8ada7d 21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/pyspark-42c7c117-c948-4b16-82a6-39017769cff9 21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb  

Приложение используется spark-cassandra-connector для чтения с кассандры. Вот код:

 from pyspark.sql import SQLContext, SparkSession from pyspark.context import SparkContext  spark = SparkSession  .builder  .appName("Test")  .master('local[*]')   .config('spark.cassandra.connection.host', 'localhost')   .getOrCreate()  spark.read  .format("org.apache.spark.sql.cassandra")  .options(table="gps", keyspace="test")  .load().show()  

Я попробовал два разных подхода к подаче заявки:

$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0 ./test.py

$SPARK_HOME/bin/spark-submit --jars /Full/Path/to/spark-cassandra-connector-assembly_2.12-3.1.0.jar

Кроме того, когда я запускаю тот же код в pyspark оболочке, он работает нормально.

Spark 3.2.0 spark-cassandra-connector 3.1.0 cassandra 4.0.1

Комментарии:

1. какая версия Java используется? добавьте вывод java -version . Я не могу воспроизвести это — отлично работает с java 8, 11, 15. с обоими подходами

2. @AlexOtt Вот результат java --version : java 15.0.2 2021-01-19 Java(TM) SE Runtime Environment (build 15.0.2 7-27) Java HotSpot(TM) 64-Bit Server VM (build 15.0.2 7-27, mixed mode, sharing)

3. попробуйте использовать java 11. Официально Spark совместим только с 8 и 11. Из документации: Spark работает на Java 8/11, Scala 2.12

4. @AlexOtt попробовал java 8, но все еще безуспешно

5. странный. работает для меня из коробки с вашим кодом