fluentd не может подключиться к elasticsearch в кластере

#elasticsearch #kubernetes #fluentd #rke

Вопрос:

я попытался настроить стек EFK. Хотя E K отлично работает в пространстве имен по умолчанию, контейнер Fluentd не может подключиться к elasticsearch.

 kubectl get services -n default
NAME                            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
elasticsearch-master            ClusterIP   10.43.40.136    <none>        9200/TCP,9300/TCP   92m
elasticsearch-master-headless   ClusterIP   None            <none>        9200/TCP,9300/TCP   92m
kibana-kibana                   ClusterIP   10.43.152.189   <none>        5601/TCP            74m
kubernetes                      ClusterIP   10.43.0.1       <none>        443/TCP             14d
 

Я установил fluentd из этого репозитория и изменил URL-адрес на elasticsearch

https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch-rbac.yaml

 kubectl -n kube-system get pods | grep fluentd
fluentd-4fd2s                                1/1     Running     0          51m
fluentd-7t2v5                                1/1     Running     0          49m
fluentd-dfnfg                                1/1     Running     0          50m
fluentd-lvrsv                                1/1     Running     0          48m
fluentd-rv4td                                1/1     Running     0          50m
 

но журнал говорит мне:

 2021-07-23 21:38:59  0000 [info]: starting fluentd-1.13.2 pid=7 ruby="2.6.8"
2021-07-23 21:38:59  0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.6.0/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "-r", "/fluentd/vendor/bundle/ruby/2.6.0/gems/fluent-plugin-elasticsearch-5.0.5/lib/fluent/plugin/elasticsearch_simple_sniffer.rb", "--under-supervisor"]
2021-07-23 21:39:01  0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2021-07-23 21:39:01  0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2021-07-23 21:39:01  0000 [warn]: #0 [filter_kube_metadata] !! The environment variable 'K8S_NODE_NAME' is not set to the node name which can affect the API server and watch efficiency !!
2021-07-23 21:39:01  0000 [info]: adding match pattern="**" type="elasticsearch"
2021-07-23 21:39:09  0000 [warn]: #0 [out_es] Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached
2021-07-23 21:39:09  0000 [warn]: #0 [out_es] Remaining retry: 14. Retry to communicate after 2 second(s).
2021-07-23 21:39:18  0000 [warn]: #0 [out_es] Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached
2021-07-23 21:39:18  0000 [warn]: #0 [out_es] Remaining retry: 13. Retry to communicate after 4 second(s).
2021-07-23 21:39:31  0000 [warn]: #0 [out_es] Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached
2021-07-23 21:39:31  0000 [warn]: #0 [out_es] Remaining retry: 12. Retry to communicate after 8 second(s).
2021-07-23 21:39:52  0000 [warn]: #0 [out_es] Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached
2021-07-23 21:39:52  0000 [warn]: #0 [out_es] Remaining retry: 11. Retry to communicate after 16 second(s).
2021-07-23 21:40:29  0000 [warn]: #0 [out_es] Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached
2021-07-23 21:40:29  0000 [warn]: #0 [out_es] Remaining retry: 10. Retry to communicate after 32 second(s).
2021-07-23 21:41:38  0000 [warn]: #0 [out_es] Could not communicate to Elasticsearch, resetting connection and trying again. connect_write timeout reached
 

Я установил dig, и это решило проблему с сервисом:

 root@fluentd-dfnfg:/home/fluent# nslookup elasticsearch-master.default.svc.cluster.local
Server:     10.43.0.10
Address:    10.43.0.10#53

Name:   elasticsearch-master.default.svc.cluster.local
Address: 10.43.40.136
 

У меня закончились идеи.

PS: Я использую закаленный RKE2. (https://github.com/rancherfederal/rke2-ansible)