орг.апач.кафка.распространенные.ошибки.Исключение TimeoutException: Глобальная задача не достигла прогресса в восстановлении состояния в течение x мс при повторном развертывании приложения на K8s

#spring-boot #apache-kafka #apache-kafka-streams #spring-cloud-stream #spring-cloud-stream-binder-kafka

Вопрос:

Это приложение использует Spring Cloud Stream для создания глобальной таблицы и развертывается на Kubernetes в качестве набора состояний, монтируя том для содержимого rocksdb, используя утверждение о постоянном томе для регионального постоянного диска Google Cloud:

 apiVersion: apps/v1
kind: StatefulSet
metadata:
...
spec:
  ...
  template:
    metadata:
      ...
    spec:
      containers:
      - env:
        ...
        volumeMounts:
        - mountPath: /tmp/kafka-streams
          name: rocksdb
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: rocksdb
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: regionalpd-storageclass
      volumeMode: Filesystem
    status:
      phase: Pending
 

Он работает нормально, но иногда после обновления образа до новой версии приложения новому контейнеру не удается запуститься, он застревает до тех пор, пока, наконец, не выдаст следующее исключение:

 org.springframework.context.ApplicationContextException: Failed to start bean 'streamsBuilderFactoryManager'; nested exception is org.springframework.kafka.KafkaException: Could not start stream: ; nested exception is org.springframework.kafka.KafkaException: Could not start stream: ; nested exception is org.apache.kafka.streams.errors.StreamsException: Exception caught during initialization of GlobalStreamThread
    at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:181)
    at org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:54)
    at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:356)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
    at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
    at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:586)
    at org.springframework.boot.web.reactive.context.ReactiveWebServerApplicationContext.refresh(ReactiveWebServerApplicationContext.java:64)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:754)
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:434)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:338)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1343)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1332)
    at io.xxx.MyApplicationKt.main(SentrioGatewayApplication.kt:27)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
    at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
    at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
    at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)
Caused by: org.springframework.kafka.KafkaException: Could not start stream: ; nested exception is org.springframework.kafka.KafkaException: Could not start stream: ; nested exception is org.apache.kafka.streams.errors.StreamsException: Exception caught during initialization of GlobalStreamThread
    at org.springframework.cloud.stream.binder.kafka.streams.StreamsBuilderFactoryManager.start(StreamsBuilderFactoryManager.java:96)
    at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:178)
    ... 22 common frames omitted
Caused by: org.springframework.kafka.KafkaException: Could not start stream: ; nested exception is org.apache.kafka.streams.errors.StreamsException: Exception caught during initialization of GlobalStreamThread
    at org.springframework.kafka.config.StreamsBuilderFactoryBean.start(StreamsBuilderFactoryBean.java:333)
    at org.springframework.cloud.stream.binder.kafka.streams.StreamsBuilderFactoryManager.start(StreamsBuilderFactoryManager.java:87)
    ... 23 common frames omitted
Caused by: org.apache.kafka.streams.errors.StreamsException: Exception caught during initialization of GlobalStreamThread
    at org.apache.kafka.streams.processor.internals.GlobalStreamThread.initialize(GlobalStreamThread.java:400)
    at org.apache.kafka.streams.processor.internals.GlobalStreamThread.run(GlobalStreamThread.java:281)
Caused by: org.apache.kafka.common.errors.TimeoutException: Global task did not make progress to restore state within 300000 ms. Adjust `task.timeout.ms` if needed.
 

When this happens I have to delete the StatefulSet, the persistent volume claims and create it again (generating a downtime).

Why am I getting this exception? How could I fix it??