Mongodb 4.4. Члены набора реплик застряли при запуске (шаг WT_VERB_RECOVERY)

#mongodb #wiredtiger

Вопрос:

После перезапуска участника реплики (возможно, нечистого) процесс запуска сервера зависает без каких-либо признаков использования процессора или ввода-вывода диска.

Выходные данные журнала являются следующими:

 
{"t":{"$date":"2021-09-20T18:15:43.621 00:00"},"s":"I", "c":"CONTROL",  "id":20698,   "ctx":"main","msg":"***** SERVER RESTARTED *****"}
{"t":{"$date":"2021-09-20T18:15:43.623 00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2021-09-20T18:15:43.634 00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-09-20T18:15:43.634 00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2021-09-20T18:15:43.634 00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-09-20T18:15:43.703 00:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":15724,"port":27017,"dbPath":"/mnt/mongodb_volume/mongodb_db","architecture":"64-bit","host":"mongodb-his
tory-nyc3-r1.supplyspy.com"}}
{"t":{"$date":"2021-09-20T18:15:43.703 00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.8","gitVersion":"83b8bb8b6b325d8d8d3dfd2ad9f744bdad7d6ca0","openSSLVersion":"OpenSSL 1.1
.1  11 Sep 2018","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2021-09-20T18:15:43.703 00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"18.04"}}}
{"t":{"$date":"2021-09-20T18:15:43.703 00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/mnt/mongodb_volume/mongodb_db/automation-mongod.conf","net":{"bindIp":"0.0.0.
0","port":27017},"operationProfiling":{"mode":"slowOp","slowOpThresholdMs":4000},"processManagement":{"fork":true,"timeZoneInfo":"/usr/share/zoneinfo"},"replication":{"enableMajorityReadConcern":false,"replSetName":"history-main"},"security":{"authoriz
ation":"enabled","keyFile":"/var/lib/mongodb-mms-automation/keyfile"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-1"},"storage":{"dbPath":"/mnt/mongodb_volume/mongodb_db","engine":"wiredTiger","journal":{"commitIntervalMs":200,"enabled":true}
},"systemLog":{"destination":"file","logAppend":true,"path":"/var/log/mongodb/mongod.log"}}}}
{"t":{"$date":"2021-09-20T18:15:43.704 00:00"},"s":"W",  "c":"STORAGE",  "id":22271,   "ctx":"initandlisten","msg":"Detected unclean shutdown - Lock file is not empty","attr":{"lockFile":"/mnt/mongodb_volume/mongodb_db/mongod.lock"}}
{"t":{"$date":"2021-09-20T18:15:43.704 00:00"},"s":"W",  "c":"STORAGE",  "id":22302,   "ctx":"initandlisten","msg":"Recovering data from the last clean checkpoint."}
{"t":{"$date":"2021-09-20T18:15:43.704 00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=3476M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false
,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progr
ess],"}}
{"t":{"$date":"2021-09-20T18:15:45.519 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161745:519173][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering lo
g 167037 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:45.892 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161745:892803][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering lo
g 167038 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.051 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:51008][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log
 167039 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.141 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:141837][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROG
RESS] Main recovery loop: starting at 167037/84361856 to 167039/256"}}
{"t":{"$date":"2021-09-20T18:15:46.142 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:142283][15724:0x7ff77c0e9ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering lo
g 167037 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.434 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:434289][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY_PRO
GRESS] Recovering log 167038 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.745 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:745696][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY_PRO
GRESS] Recovering log 167039 through 167039"}}
{"t":{"$date":"2021-09-20T18:15:46.800 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:800525][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY | W
T_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (1632161384, 162)"}}
{"t":{"$date":"2021-09-20T18:15:46.800 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632161746:800622][15724:0x7ff77c0e9ac0], file:sizeStorer.wt, txn-recover: [WT_VERB_RECOVERY | W
T_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (1632161379, 162)"}}
 

Я уже более 2 часов не получаю никаких других выходных данных.

Размер данных на диске составляет ~700 ГБ.


Редактировать: После долгого ожидания я получил больше выходных данных:

 {"t":{"$date":"2021-09-20T22:23:55.310 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:306696][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 66 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.320 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:320968][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 67 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.354 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:354841][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 68 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.355 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:355064][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 69 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.359 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:359206][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 70 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.362 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:362785][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 71 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.370 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:370297][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 72 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.374 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:374281][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 73 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:23:55.382 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632176635:382329][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 14888 seconds and has inspected 74 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.560 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:560668][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 75 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.563 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:563027][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 76 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.563 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:563135][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 77 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.563 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:563211][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 78 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-20T22:53:48.611 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632178428:611968][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 16681 seconds and has inspected 79 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.052 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:52779][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 80 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.059 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:59610][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 81 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.071 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:71578][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 82 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.078 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:78379][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 83 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.082 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:82011][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 84 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.084 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:84104][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 85 files. For more detailed logging, enable WT_VERB_RTS"}}
{"t":{"$date":"2021-09-21T09:33:46.086 00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1632216826:86957][15724:0x7ff77c0e9ac0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 55079 seconds and has inspected 86 files. For more detailed logging, enable WT_VERB_RTS"}}


 

Ответ №1:

«Обнаружено нечистое завершение работы» означает, что сервер не был остановлен корректно. В зависимости от характера завершения работы в файлах данных могут присутствовать несоответствия или повреждения.

Поэтому теперь он проверяет файлы данных и возвращает их в самосогласованное состояние.

Сообщения «был запущен для» показывают, что он добивается прогресса. Дайте ему закончить, и, надеюсь, он сможет вернуть данные в пригодное для использования состояние.

Комментарии:

1. Через 3 ночи он в конце концов закончил процесс восстановления.