#ceph
#ceph
Вопрос:
Я выполнил восстановление службы mds с помощью шагов, приведенных в ссылке https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts /. После восстановления я могу смонтировать cephfs и получить доступ к файлу, в то время как служба mds снова вылетает при создании нового файла.
И увидеть ошибку «ошибка возврата к индексному индексу каталога’
-209> 2020-10-15 13:37:25.614 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000eb0d18d
-208> 2020-10-15 13:37:25.647 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000eb16802
-207> 2020-10-15 13:37:25.688 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000e298136
-206> 2020-10-15 13:37:25.725 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000fb2bfa4
-205> 2020-10-15 13:37:25.725 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000eb21cca
-204> 2020-10-15 13:37:25.744 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000e316aeb
Ниже приведен журнал сбоя mds при попытке удалить мастер файла ссылок
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=178132 cs=1 l=1). rx osd.15 seq 3 0x55b5771e5600 osd_op_reply(141 609.00000000 [omap-get-header,omap-get-vals,getxattr (62)] v0'0 uv286524 ondisk = 0) v8
-2> 2020-10-15 13:37:26.561 7f40a2655700 1 -- 10.160.33.40:6800/3705547350 <== osd.15 10.160.33.42:6810/2500 3 ==== osd_op_reply(141 609.00000000 [omap-get-header,omap-get-vals,getxattr (62)] v0'0 uv286524 ondisk = 0) v8 ==== 240 0 341 (673764597 0 1821608010) 0x55b5771e5600 con 0x55b57727b000
-1> 2020-10-15 13:37:26.562 7f409abbd700 4 mds.0.server handle_client_request client_request(client.1310401:9 unlink #0x100027c863a/master 2020-10-15 10:18:33.050263 RETRY=12 caller_uid=0, caller_gid=0{}) v2
0> 2020-10-15 13:37:26.564 7f409abbd700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/mds/Server.cc: In function 'void Server::_unlink_local(MDRequestRefamp;, CDentry*, CDentry*)' thread 7f409abbd700 time 2020-10-15 13:37:26.564156
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/mds/Server.cc: 6345: FAILED assert(in->first <= straydn->first)
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*) 0xff) [0x7f40a8c94e1f]
2: (() 0x284fe7) [0x7f40a8c94fe7]
3: (() 0x16e18a) [0x55b5743e318a]
4: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>amp;) 0x7e9) [0x55b5743e3979]
5: (Server::handle_client_request(MClientRequest*) 0x49e) [0x55b5743fdc0e]
6: (Server::dispatch(Message*) 0x2db) [0x55b57440189b]
7: (MDSRank::handle_deferrable_message(Message*) 0x434) [0x55b57437b4b4]
8: (MDSRank::_dispatch(Message*, bool) 0x63b) [0x55b574387b5b]
9: (MDSRank::retry_dispatch(Message*) 0x12) [0x55b574388302]
10: (MDSInternalContextBase::complete(int) 0x67) [0x55b5745d9b57]
11: (MDSRank::_advance_queues() 0xd1) [0x55b574386d51]
12: (MDSRank::ProgressThread::entry() 0x43) [0x55b5743873e3]
13: (() 0x7e25) [0x7f40a68bfe25]
14: (clone() 0x6d) [0x7f40a599fbad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-mds.ceph-admin.log
--- end dump of recent events ---
2020-10-15 13:37:26.581 7f409abbd700 -1 *** Caught signal (Aborted) **
in thread 7f409abbd700 thread_name:mds_rank_progr
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (() 0x3e2a20) [0x55b574657a20]
2: (() 0xf6d0) [0x7f40a68c76d0]
3: (gsignal() 0x37) [0x7f40a58d7277]
4: (abort() 0x148) [0x7f40a58d8968]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*) 0x242) [0x7f40a8c94f62]
6: (() 0x284fe7) [0x7f40a8c94fe7]
7: (() 0x16e18a) [0x55b5743e318a]
8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>amp;) 0x7e9) [0x55b5743e3979]
9: (Server::handle_client_request(MClientRequest*) 0x49e) [0x55b5743fdc0e]
10: (Server::dispatch(Message*) 0x2db) [0x55b57440189b]
11: (MDSRank::handle_deferrable_message(Message*) 0x434) [0x55b57437b4b4]
12: (MDSRank::_dispatch(Message*, bool) 0x63b) [0x55b574387b5b]
13: (MDSRank::retry_dispatch(Message*) 0x12) [0x55b574388302]
14: (MDSInternalContextBase::complete(int) 0x67) [0x55b5745d9b57]
15: (MDSRank::_advance_queues() 0xd1) [0x55b574386d51]
16: (MDSRank::ProgressThread::entry() 0x43) [0x55b5743873e3]
17: (() 0x7e25) [0x7f40a68bfe25]
18: (clone() 0x6d) [0x7f40a599fbad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2020-10-15 13:37:26.581 7f409abbd700 -1 *** Caught signal (Aborted) **
in thread 7f409abbd700 thread_name:mds_rank_progr
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (() 0x3e2a20) [0x55b574657a20]
2: (() 0xf6d0) [0x7f40a68c76d0]
3: (gsignal() 0x37) [0x7f40a58d7277]
4: (abort() 0x148) [0x7f40a58d8968]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*) 0x242) [0x7f40a8c94f62]
6: (() 0x284fe7) [0x7f40a8c94fe7]
7: (() 0x16e18a) [0x55b5743e318a]
8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>amp;) 0x7e9) [0x55b5743e3979]
9: (Server::handle_client_request(MClientRequest*) 0x49e) [0x55b5743fdc0e]
10: (Server::dispatch(Message*) 0x2db) [0x55b57440189b]
11: (MDSRank::handle_deferrable_message(Message*) 0x434) [0x55b57437b4b4]
12: (MDSRank::_dispatch(Message*, bool) 0x63b) [0x55b574387b5b]
13: (MDSRank::retry_dispatch(Message*) 0x12) [0x55b574388302]
14: (MDSInternalContextBase::complete(int) 0x67) [0x55b5745d9b57]
15: (MDSRank::_advance_queues() 0xd1) [0x55b574386d51]
16: (MDSRank::ProgressThread::entry() 0x43) [0x55b5743873e3]
17: (() 0x7e25) [0x7f40a68bfe25]
18: (clone() 0x6d) [0x7f40a599fbad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-mds.ceph-admin.log
--- end dump of recent events —
Есть идеи по исправлению?
Комментарии:
1. Попробуйте lists.ceph.io/hyperkitty/list/ceph-users@ceph.io , вы, вероятно, получите там больше ответов.
2. Спасибо, я отправил одно электронное письмо на ceph-users@ceph.io несколько дней назад, но полученное сообщение с автоответчиком показывает, что мой пост «удерживается до тех пор, пока модератор списка не сможет просмотреть его на утверждение».
3. Я не поддерживаю список рассылки, возможно, повторите попытку или просто подождите, или свяжитесь с администратором.