Визуализация дерева решений с помощью graphviz: как решить FileNotFoundError?

#graphviz

#graphviz

Вопрос:

Я хочу визуализировать дерево решений с помощью graphviz.

Я нашел несколько примеров кода (https://gist.github.com/WillKoehrsen/ff77f5f308362819805a3defd9495ffd ):

 from sklearn.datasets import load_iris
iris = load_iris()

# Model (can also use single decision tree)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)

# Train
model.fit(iris.data, iris.target)
# Extract single tree
estimator = model.estimators_[5]

from sklearn.tree import export_graphviz
# Export as dot file
export_graphviz(estimator, out_file='tree.dot', 
                feature_names = iris.feature_names,
                class_names = iris.target_names,
                rounded = True, proportion = False, 
                precision = 2, filled = True)

# Convert to png using system command (requires Graphviz)
from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])

# Display in jupyter notebook
from IPython.display import Image
Image(filename = 'tree.png')
 

Что он должен выводить:
введите описание изображения здесь

Однако, когда я делаю это с записными книжками jupyter, я получаю FileNotFoundError:

 ---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-3-6d9aafea91ef> in <module>()
     21 # Convert to png using system command (requires Graphviz)
     22 from subprocess import call
---> 23 call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
     24 
     25 # Display in jupyter notebook

C:ProgramDataAnaconda3libsubprocess.py in call(timeout, *popenargs, **kwargs)
    302     retcode = call(["ls", "-l"])
    303     """
--> 304     with Popen(*popenargs, **kwargs) as p:
    305         try:
    306             return p.wait(timeout=timeout)

C:ProgramDataAnaconda3libsubprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    754                                 c2pread, c2pwrite,
    755                                 errread, errwrite,
--> 756                                 restore_signals, start_new_session)
    757         except:
    758             # Cleanup if the child failed starting.

C:ProgramDataAnaconda3libsubprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
   1153                                          env,
   1154                                          os.fspath(cwd) if cwd is not None else None,
-> 1155                                          startupinfo)
   1156             finally:
   1157                 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] Das System kann die angegebene Datei nicht finden
 

(В последнем сообщении говорится, что моя система не смогла найти указанный файл)

С подсказкой Anaconda в режиме администратора я работал conda install -c anaconda graphviz без ошибок:

 Collecting package metadata: done
Solving environment: 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/win-64::anaconda==5.3.1=py37_0
  - defaults/win-64::astropy==3.0.4=py37hfa6e2cd_0
  - defaults/win-64::bkcharts==0.2=py37_0
  - defaults/win-64::blaze==0.11.3=py37_0
  - defaults/win-64::bokeh==0.13.0=py37_0
  - defaults/win-64::bottleneck==1.2.1=py37h452e1ab_1
  - defaults/win-64::dask==0.19.1=py37_0
  - defaults/win-64::datashape==0.5.4=py37_1
  - defaults/win-64::h5py==2.8.0=py37h3bdd7fb_2
  - defaults/win-64::imageio==2.4.1=py37_0
  - defaults/win-64::matplotlib==2.2.3=py37hd159220_0
  - defaults/win-64::mkl-service==1.1.2=py37hb217b18_5
  - defaults/win-64::mkl_fft==1.0.4=py37h1e22a9b_1
  - defaults/win-64::mkl_random==1.0.1=py37h77b88f5_1
  - defaults/win-64::numba==0.39.0=py37h830ac7b_0
  - defaults/win-64::numexpr==2.6.8=py37h9ef55f4_0
  - defaults/win-64::numpy==1.15.1=py37ha559c80_0
  - defaults/win-64::numpy-base==1.15.1=py37h8128ebf_0
  - defaults/win-64::odo==0.5.1=py37_0
  - defaults/win-64::pandas==0.23.4=py37h830ac7b_0
  - defaults/win-64::patsy==0.5.0=py37_0
  - defaults/win-64::pytables==3.4.4=py37he6f6034_0
  - defaults/win-64::pytest-arraydiff==0.2=py37h39e3cac_0
  - defaults/win-64::pytest-astropy==0.4.0=py37_0
  - defaults/win-64::pytest-doctestplus==0.1.3=py37_0
  - defaults/win-64::pywavelets==1.0.0=py37h452e1ab_0
  - defaults/win-64::scikit-image==0.14.0=py37h6538335_1
  - defaults/win-64::scikit-learn==0.19.2=py37heebcf9a_0
  - defaults/win-64::scipy==1.1.0=py37h4f6bf74_1
  - defaults/win-64::seaborn==0.9.0=py37_0
  - defaults/win-64::statsmodels==0.9.0=py37h452e1ab_0
done

## Package Plan ##

  environment location: C:ProgramDataAnaconda3

  added / updated specs:
    - graphviz


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.1.23  |                0         158 KB  anaconda
    certifi-2019.3.9           |           py37_0         155 KB  anaconda
    conda-4.6.12               |           py37_1         2.1 MB  anaconda
    graphviz-2.38.0            |                4        37.7 MB  anaconda
    openssl-1.1.1              |       he774522_0         5.7 MB  anaconda
    vc-14.1                    |       h21ff451_3           5 KB  anaconda
    vs2015_runtime-15.5.2      |                3         2.2 MB  anaconda
    ------------------------------------------------------------
                                           Total:        48.1 MB

The following packages will be UPDATED:

  openssl            conda-forge::openssl-1.1.1b-hfa6e2cd_2 --> anaconda::openssl-1.1.1-he774522_0
  vs2015_runtime     pkgs/main::vs2015_runtime-14.15.26706~ --> anaconda::vs2015_runtime-15.5.2-3

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    conda-forge::ca-certificates-2019.3.9~ --> anaconda::ca-certificates-2019.1.23-0
  certifi                                       conda-forge --> anaconda
  conda                    conda-forge::conda-4.6.12-py37_2 --> anaconda::conda-4.6.12-py37_1
  graphviz           conda-forge::graphviz-2.38.0-h6538335~ --> anaconda::graphviz-2.38.0-4
  vc                          pkgs/main::vc-14.1-h0510ff6_4 --> anaconda::vc-14.1-h21ff451_3


Proceed ([y]/n)? y


Downloading and Extracting Packages
certifi-2019.3.9     | 155 KB    | ############################################################################ | 100%
conda-4.6.12         | 2.1 MB    | ############################################################################ | 100%
graphviz-2.38.0      | 37.7 MB   | ############################################################################ | 100%
openssl-1.1.1        | 5.7 MB    | ############################################################################ | 100%
vc-14.1              | 5 KB      | ############################################################################ | 100%
ca-certificates-2019 | 158 KB    | ############################################################################ | 100%
vs2015_runtime-15.5. | 2.2 MB    | ############################################################################ | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
 

Удивительно, но это сработало просто отлично, когда я запустил код на kaggle.com .

Есть идеи о том, в чем может быть проблема и как ее решить?

Ответ №1:

Просто решил это.

Хитрость заключалась в добавлении пути graphviz к переменным среды Windows, вот хорошее описание: https://bobswift.atlassian.net/wiki/spaces/GVIZ/pages/20971549/How для установки программного обеспечения Graphviz

После этого я перезагрузил свой компьютер и вуаля.