Конфигурационный сервер MongoDB использовал слишком много памяти и вызвал сбой в докере

#mongodb #memory

Вопрос:

Это мой первый «поток» здесь, поэтому я стараюсь подробно описать свой вопрос. У меня есть группа кластеров сегментов MongoDB в кластерах docker, состоящая из 2 участников mongos, 1 сервера конфигурации (всего: 3 участника-реплики) и 2 сегментов(каждый сегмент также имеет 3 участника-реплики.). Каждый участник работает в одном контейнере docker. Непонятно, почему мой основной сервер конфигурации был когда-нибудь убит OOM.

Мне нужно увеличить намного больше памяти для сервера конфигурации, чтобы предотвратить это, даже если я установил ограничение на кэш WiredTiger, как показано ниже в файле конфигурации. В первый раз сервер конфигурации нормально работал с памятью 2G, но теперь он имеет 8G и используется почти на 99%. И каждый из двух осколков содержит не более 28 г данных.

 # wiredTiger memory limit in config file wiredTigerCacheSizeGB = 1 --- # docker stats $dockerId CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS d-f2uxxxp41y 3.86% 7.943 GiB / 8 GiB 99.29% 0 B / 0 B 39.52 GB / 1.539 TB 0  

Вот что я могу найти в журнале mongod и выходных данных db.serverStatus. И я знаю, что ОС может занять всю память, которую WiredTiger не может выделить, но я хочу знать, как ОС может использовать так много и для каких целей. Так где же я могу это выяснить? Я надеюсь, что любой «товарищ» здесь может предложить некоторую помощь:

 # ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 11684 1308 ? Ss 2020 0:00 /bin/bash /entrypoint.sh root 160 0.0 0.0 20612 804 ? Ss 2020 12:33 crond root 161 10.8 2.8 15042880 7611028 ? SLl 2020 93735:18 /opt/jmiss_mongo/mongod --config /var/jmiss_mongo/config/configsvr.conf --- # mongostat insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time  *0 9 *0 *0 0 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 896b 68.7k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:41.036  *0 2 2 *0 11 16|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 12.7k 80.1k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:42.036  *0 *0 *0 *0 0 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.21k 70.4k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:43.036  *0 *0 *0 *0 0 9|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 793b 69.6k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:44.036  *0 *0 *0 *0 1 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.15k 68.2k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:45.035  *0 *0 *0 *0 5 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 5.71k 72.2k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:46.036  *0 *0 *0 *0 1 11|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.15k 68.5k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:47.035  *0 *0 *0 *0 0 7|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.17k 69.4k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:48.036  *0 *0 *0 *0 1 13|0 0.4% 73.3% 0 14.3G 7.24G 0|0 1|0 1.27k 71.0k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:49.034  *0 *0 *0 *0 0 6|0 0.0% 73.3% 1 14.3G 7.24G 0|0 1|0 920b 66.7k 30 mgset-1725952967_configsvr PRI Oct 28 08:16:50.036  
 mgset-172xxx_configsvr:PRIMARYgt; show dbs admin 0.000GB config 0.149GB local 2.596GB mgset-172xxx_configsvr:PRIMARYgt; db.serverStatus().mem {  "bits" : 64,  "resident" : 7432,  "virtual" : 14690,  "supported" : true,  "mapped" : 0,  "mappedWithJournal" : 0 } mgset-172xxx_configsvr:PRIMARYgt; db.serverStatus().tcmalloc {  "generic" : {  "current_allocated_bytes" : NumberLong("12324 143616"),  "heap_size" : NumberLong("13836734464")  },  "tcmalloc" : {  "pageheap_free_bytes" : 154 181632,  "pageheap_unmapped_bytes" : 1024 569344,  "max_total_thread_cache_bytes" : NumberLong(1073 741824),  "current_total_thread_cache_bytes" : 24 271352,  "total_free_bytes" : 333 839872,  "central_cache_free_bytes" : 306 076200,  "transfer_cache_free_bytes" : 3 492320,  "thread_cache_free_bytes" : 24 271352,  "aggressive_memory_decommit" : 0,  "pageheap_committed_bytes" : NumberLong("12812 165120"),  "pageheap_scavenge_count" : 20 856825,  "pageheap_commit_count" : 27 858250,  "pageheap_total_commit_bytes" : NumberLong("21789588 615168"),  "pageheap_decommit_count" : 20 856825,  "pageheap_total_decommit_bytes" : NumberLong("21776776 450048"),  "pageheap_reserve_count" : 6709,  "pageheap_total_reserve_bytes" : NumberLong("13836 734464"),  "spinlock_total_delay_ns" : NumberLong("665771 200416"),   ------------------------------------------------  MALLOC: 12290579360 (11721.2 MiB) Bytes in use by application  MALLOC:   187674624 ( 179.0 MiB) Bytes in page heap freelist  MALLOC:   306094552 ( 291.9 MiB) Bytes in central cache freelist  MALLOC:   3133024 ( 3.0 MiB) Bytes in transfer cache freelist  MALLOC:   24618024 ( 23.5 MiB) Bytes in thread cache freelists  MALLOC:   112812288 ( 107.6 MiB) Bytes in malloc metadata  MALLOC: ------------  MALLOC: = 12924911872 (12326.2 MiB) Actual memory used (physical   swap)  MALLOC:   1024634880 ( 977.2 MiB) Bytes released to OS (aka unmapped)  MALLOC: ------------  MALLOC: = 13949546752 (13303.3 MiB) Virtual address space used  MALLOC:  MALLOC: 1696320 Spans in use  MALLOC: 96 Thread heaps in use  MALLOC: 4096 Tcmalloc page size  ------------------------------------------------  Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).  Bytes released to the OS take up virtual address space but no physical memory.   } } mgset-172xxx_configsvr:PRIMARYgt; db.serverStatus().wiredTiger.cache {  "application threads page read from disk to cache count" : 23400,  "application threads page read from disk to cache time (usecs)" : 3238111,  "application threads page write from cache to disk count" : 44687084,  "application threads page write from cache to disk time (usecs)" : 1003631736,  "bytes belonging to page images in the cache" : 261364892,  "bytes belonging to the cache overflow table in the cache" : 182,  "bytes currently in the cache" : 773199299,  "bytes not belonging to page images in the cache" : 511834406,  "bytes read into cache" : 619883360,  "bytes written from cache" : 723664179500,  "cache overflow cursor application thread wait time (usecs)" : 0,  "cache overflow cursor internal thread wait time (usecs)" : 0,  "cache overflow score" : 0,  "cache overflow table entries" : 0,  "cache overflow table insert calls" : 0,  "cache overflow table remove calls" : 0,  "checkpoint blocked page eviction" : 6,  "eviction calls to get a page" : 4827751,  "eviction calls to get a page found queue empty" : 4789295,  "eviction calls to get a page found queue empty after locking" : 1098,  "eviction currently operating in aggressive mode" : 0,  "eviction empty score" : 0,  "eviction passes of a file" : 38655,  "eviction server candidate queue empty when topping up" : 1661,  "eviction server candidate queue not empty when topping up" : 1890,  "eviction server evicting pages" : 0,  "eviction server slept, because we did not make progress with eviction" : 1709793,  "eviction server unable to reach eviction goal" : 0,  "eviction state" : 32,  "eviction walk target pages histogram - 0-9" : 37762,  "eviction walk target pages histogram - 10-31" : 598,  "eviction walk target pages histogram - 128 and higher" : 0,  "eviction walk target pages histogram - 32-63" : 162,  "eviction walk target pages histogram - 64-128" : 133,  "eviction walks abandoned" : 3314,  "eviction walks gave up because they restarted their walk twice" : 28244,  "eviction walks gave up because they saw too many pages and found no candidates" : 1131,  "eviction walks gave up because they saw too many pages and found too few candidates" : 181,  "eviction walks reached end of tree" : 62763,  "eviction walks started from root of tree" : 34971,  "eviction walks started from saved location in tree" : 3684,  "eviction worker thread active" : 4,  "eviction worker thread created" : 0,  "eviction worker thread evicting pages" : 46955,  "eviction worker thread removed" : 0,  "eviction worker thread stable number" : 0,  "failed eviction of pages that exceeded the in-memory maximum count" : 11,  "failed eviction of pages that exceeded the in-memory maximum time (usecs)" : 44,  "files with active eviction walks" : 0,  "files with new eviction walks started" : 34519,  "force re-tuning of eviction workers once in a while" : 0,  "hazard pointer blocked page eviction" : 361,  "hazard pointer check calls" : 49572,  "hazard pointer check entries walked" : 85708,  "hazard pointer maximum array length" : 2,  "in-memory page passed criteria to be split" : 4086,  "in-memory page splits" : 1977,  "internal pages evicted" : 2561,  "internal pages split during eviction" : 26,  "leaf pages split during eviction" : 2463,  "maximum bytes configured" : 1073741824,  "maximum page size at eviction" : 0,  "modified pages evicted" : 32533,  "modified pages evicted by application threads" : 0,  "operations timed out waiting for space in cache" : 0,  "overflow pages read into cache" : 0,  "page split during eviction deepened the tree" : 1,  "page written requiring cache overflow records" : 0,  "pages currently held in the cache" : 8975,  "pages evicted because they exceeded the in-memory maximum count" : 1995,  "pages evicted because they exceeded the in-memory maximum time (usecs)" : 1163695,  "pages evicted because they had chains of deleted items count" : 285,  "pages evicted because they had chains of deleted items time (usecs)" : 25953,  "pages evicted by application threads" : 0,  "pages queued for eviction" : 37601,  "pages queued for urgent eviction" : 9950,  "pages queued for urgent eviction during walk" : 824,  "pages read into cache" : 23433,  "pages read into cache after truncate" : 7634,  "pages read into cache after truncate in prepare state" : 0,  "pages read into cache requiring cache overflow entries" : 0,  "pages read into cache requiring cache overflow for checkpoint" : 0,  "pages read into cache skipping older cache overflow entries" : 0,  "pages read into cache with skipped cache overflow entries needed later" : 0,  "pages read into cache with skipped cache overflow entries needed later by checkpoint" : 0,  "pages requested from the cache" : 2622634339,  "pages seen by eviction walk" : 356810,  "pages selected for eviction unable to be evicted" : 478,  "pages walked for eviction" : 9543938,  "pages written from cache" : 44692387,  "pages written requiring in-memory restoration" : 235,  "percentage overhead" : 8,  "tracked bytes belonging to internal pages in the cache" : 2325120,  "tracked bytes belonging to leaf pages in the cache" : 770874179,  "tracked dirty bytes in the cache" : 4434563,  "tracked dirty pages in the cache" : 5,  "unmodified pages evicted" : 14605 }  

Комментарии:

1. Спасибо вам за вашу работу по формированию контента! Я благодарен (: @James Z

2. Теги «Поделиться флагом редактирования удаления» здесь мне трудно понять значение. Думал, что они находятся в одной строке под названием «поделиться тегом» редактировать удалить»… … Надеюсь, они смогут зажечь свет или что-то в этом роде. @переполнение стека