#azure #apache-spark #azure-databricks #spark-submit #typesafe-config
Вопрос:
Я пытаюсь передать файл конфигурации, сохраняющий тип, в задачу отправки spark и распечатать сведения в файле конфигурации.
import org.slf4j.{Logger, LoggerFactory} import com.typesafe.config.{Config, ConfigFactory} import org.apache.spark.sql.SparkSession object Bootstrap extends MyLogging { val spark: SparkSession = SparkSession.builder.enableHiveSupport().getOrCreate() val config: Config = ConfigFactory.load("application.conf") def main(args: Array[String]): Unit = { val url: String = config.getString("db.url") val user: String = config.getString("db.user") println(url) println(user) } }
файл application.conf :
db { url = "jdbc:postgresql://localhost:5432/test" user = "test" }
Я загрузил файл application.conf в dbfs и использовал тот же путь для создания задания.
Спарк отправить задание JSON :
{ "new_cluster": { "spark_version": "6.4.x-esr-scala2.11", "azure_attributes": { "availability": "ON_DEMAND_AZURE", "first_on_demand": 1, "spot_bid_max_price": -1 }, "node_type_id": "Standard_DS3_v2", "enable_elastic_disk": true, "num_workers": 1 }, "spark_submit_task": { "parameters": [ "--class", "Bootstrap", "--conf", "spark.driver.extraClassPath=dbfs:/tmp/", "--conf", "spark.executor.extraClassPath=dbfs:/tmp/", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ] }, "email_notifications": {}, "name": "application-conf-test", "max_concurrent_runs": 1 }
Я использовал приведенный выше json для создания задания отправки spark и попытался запустить задание отправки spark с помощью команд командной строки datbricks.
Ошибка :
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'db' at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:147) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159) at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164) at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206) at Bootstrap$.main(Test.scala:16) at Bootstrap.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
I can see the below line in logs but the file is not getting loaded.
21/09/22 07:21:43 INFO SparkContext: Added file dbfs:/tmp/application.conf at dbfs:/tmp/application.conf with timestamp 1632295303654 21/09/22 07:21:43 INFO Utils: Fetching dbfs:/tmp/application.conf to /local_disk0/spark-20456b30-fddd-42d7-9b23-9e4c0d3c91cd/userFiles-ee199161-6f48-4c47-b1c7-763ce7c0895f/fetchFileTemp4713981355306806616.tmp
Please help me in passing this typesafe config file to the spark-submit job using the appropriate spark submit job parameters.
We have tried below spark_submit_task parameters in the above json but still facing the same issue
[ "--class", "Bootstrap", "--conf", "spark.driver.extraClassPath=/tmp/application.conf", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
[ "--class", "Bootstrap", "--conf", "spark.driver.extraClassPath=/tmp/", "--conf", "spark.executor.extraClassPath=/tmp/", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
[ "--class", "Bootstrap", "--conf", "spark.driver.extraClassPath=dbfs:/tmp/application.conf", "--conf", "spark.executor.extraClassPath=dbfs:/tmp/application.conf", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
[ "--class", "Bootstrap", "--conf", "spark.driver.extraClassPath=dbfs:/tmp/", "--conf", "spark.executor.extraClassPath=dbfs:/tmp/", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
[ "--class", "Bootstrap", "--conf", "spark.driver.extraClassPath=dbfs:./", "--conf", "spark.executor.extraClassPath=dbfs:./", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
[ "--class", "Bootstrap", "--driver-java-options", "-Dconfig.file=application.conf", "--conf", "spark.executor.extraJavaOptions=-Dconfig.file=application.conf", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
[ "--class", "Bootstrap", "--conf", "spark.driver.extraJavaOptions=-Dconfig.file=application.conf", "--conf", "spark.executor.extraJavaOptions=-Dconfig.file=application.conf", "--files", "dbfs:/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
Ответ №1:
Проще явно передать имя файла в качестве параметра заданию и ссылаться на него как /dbfs/tmp/application.conf
(вам нужно будет обработать этот параметр в своем коде).:
[ "--class", "Bootstrap", "dbfs:/tmp/code-assembly-0.1.0.jar", "/dbfs/tmp/application.conf" ]
или обратитесь с помощью дополнительных опций:
[ "--class", "Bootstrap", "--conf", "spark.driver.extraJavaOptions=-Dconfig.file=/dbfs/tmp/application.conf", "dbfs:/tmp/code-assembly-0.1.0.jar" ]
Комментарии:
1. Привет , я пробовал выше jsons для spark-отправки , но это не сработало. Не могли бы вы , пожалуйста, дать нам код и точный json, который вы используете для отправки spark, это очень помогло бы мне !