HADOOP_HOME и hadoop.home.dir не задан Ошибка при попытке использовать sc.setCheckpointDir(«контрольная точка») Пыспарк

#java #apache-spark #hadoop #pyspark #apache-spark-sql

Вопрос:

Я пытаюсь использовать API ALS на pyspark для модели рекомендаций, но получаю java.lang.Ошибка StackOverflowError, в которой после просмотра я увидел инструкции о том, как ее исправить с помощью контрольных точек.

Когда я пытаюсь установить каталог контрольных точек, я получаю сообщение об ошибке «hadoop не найден».

     sc.setCheckpointDir('checkpoint')
 

Это ответ на ошибку

     Py4JJavaError: An error occurred while calling o83.setCheckpointDir.
: java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:736) at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:271) at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:287) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:865) at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:547) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:587) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:559) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:586) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:559) at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:705) at org.apache.spark.SparkContext.$anonfun$setCheckpointDir$2(SparkContext.scala:2483) at scala.Option.map(Option.scala:230) at org.apache.spark.SparkContext.setCheckpointDir(SparkContext.scala:2480) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Unknown Source) Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:548) at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:569) at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:592) at org.apache.hadoop.util.Shell.(Shell.java:689) at org.apache.hadoop.util.StringUtils.(StringUtils.java:78) at org.apache.hadoop.conf.Configuration.getTimeDurationHelper(Configuration.java:1814) at org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1791) at org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183) at org.apache.hadoop.util.ShutdownHookManager$HookEntry.(ShutdownHookManager.java:207) at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:302) at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:181) at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50) at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153) at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala:58) at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala) at org.apache.spark.util.Utils$.createTempDir(Utils.scala:326) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:343) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:468) at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:439) at org.apache.hadoop.util.Shell.(Shell.java:516) ... 21 more
 

Я буду признателен за ответ о том, как это исправить. Я работаю в среде Windows