ошибка ceph mds ошибка возврата к индексу каталога и сбой записи нового файла

#ceph

#ceph

Вопрос:

Я выполнил восстановление службы mds с помощью шагов, приведенных в ссылке https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts /. После восстановления я могу смонтировать cephfs и получить доступ к файлу, в то время как служба mds снова вылетает при создании нового файла.

И увидеть ошибку «ошибка возврата к индексному индексу каталога’

   -209> 2020-10-15 13:37:25.614 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000eb0d18d
  -208> 2020-10-15 13:37:25.647 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000eb16802
  -207> 2020-10-15 13:37:25.688 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000e298136
  -206> 2020-10-15 13:37:25.725 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000fb2bfa4
  -205> 2020-10-15 13:37:25.725 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000eb21cca
  -204> 2020-10-15 13:37:25.744 7f40993ba700 -1 log_channel(cluster) log [ERR] : bad backtrace on directory inode 0x1000e316aeb
  

Ниже приведен журнал сбоя mds при попытке удалить мастер файла ссылок

 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=178132 cs=1 l=1). rx osd.15 seq 3 0x55b5771e5600 osd_op_reply(141 609.00000000 [omap-get-header,omap-get-vals,getxattr (62)] v0'0 uv286524 ondisk = 0) v8
    -2> 2020-10-15 13:37:26.561 7f40a2655700  1 -- 10.160.33.40:6800/3705547350 <== osd.15 10.160.33.42:6810/2500 3 ==== osd_op_reply(141 609.00000000 [omap-get-header,omap-get-vals,getxattr (62)] v0'0 uv286524 ondisk = 0) v8 ==== 240 0 341 (673764597 0 1821608010) 0x55b5771e5600 con 0x55b57727b000
    -1> 2020-10-15 13:37:26.562 7f409abbd700  4 mds.0.server handle_client_request client_request(client.1310401:9 unlink #0x100027c863a/master 2020-10-15 10:18:33.050263 RETRY=12 caller_uid=0, caller_gid=0{}) v2
     0> 2020-10-15 13:37:26.564 7f409abbd700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/mds/Server.cc: In function 'void Server::_unlink_local(MDRequestRefamp;, CDentry*, CDentry*)' thread 7f409abbd700 time 2020-10-15 13:37:26.564156
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/mds/Server.cc: 6345: FAILED assert(in->first <= straydn->first)

 ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*) 0xff) [0x7f40a8c94e1f]
 2: (() 0x284fe7) [0x7f40a8c94fe7]
 3: (() 0x16e18a) [0x55b5743e318a]
 4: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>amp;) 0x7e9) [0x55b5743e3979]
 5: (Server::handle_client_request(MClientRequest*) 0x49e) [0x55b5743fdc0e]
 6: (Server::dispatch(Message*) 0x2db) [0x55b57440189b]
 7: (MDSRank::handle_deferrable_message(Message*) 0x434) [0x55b57437b4b4]
 8: (MDSRank::_dispatch(Message*, bool) 0x63b) [0x55b574387b5b]
 9: (MDSRank::retry_dispatch(Message*) 0x12) [0x55b574388302]
 10: (MDSInternalContextBase::complete(int) 0x67) [0x55b5745d9b57]
 11: (MDSRank::_advance_queues() 0xd1) [0x55b574386d51]
 12: (MDSRank::ProgressThread::entry() 0x43) [0x55b5743873e3]
 13: (() 0x7e25) [0x7f40a68bfe25]
 14: (clone() 0x6d) [0x7f40a599fbad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.ceph-admin.log
--- end dump of recent events ---
2020-10-15 13:37:26.581 7f409abbd700 -1 *** Caught signal (Aborted) **
 in thread 7f409abbd700 thread_name:mds_rank_progr

 ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
 1: (() 0x3e2a20) [0x55b574657a20]
 2: (() 0xf6d0) [0x7f40a68c76d0]
 3: (gsignal() 0x37) [0x7f40a58d7277]
 4: (abort() 0x148) [0x7f40a58d8968]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*) 0x242) [0x7f40a8c94f62]
 6: (() 0x284fe7) [0x7f40a8c94fe7]
 7: (() 0x16e18a) [0x55b5743e318a]
 8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>amp;) 0x7e9) [0x55b5743e3979]
 9: (Server::handle_client_request(MClientRequest*) 0x49e) [0x55b5743fdc0e]
 10: (Server::dispatch(Message*) 0x2db) [0x55b57440189b]
 11: (MDSRank::handle_deferrable_message(Message*) 0x434) [0x55b57437b4b4]
 12: (MDSRank::_dispatch(Message*, bool) 0x63b) [0x55b574387b5b]
 13: (MDSRank::retry_dispatch(Message*) 0x12) [0x55b574388302]
 14: (MDSInternalContextBase::complete(int) 0x67) [0x55b5745d9b57]
 15: (MDSRank::_advance_queues() 0xd1) [0x55b574386d51]
 16: (MDSRank::ProgressThread::entry() 0x43) [0x55b5743873e3]
 17: (() 0x7e25) [0x7f40a68bfe25]
 18: (clone() 0x6d) [0x7f40a599fbad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2020-10-15 13:37:26.581 7f409abbd700 -1 *** Caught signal (Aborted) **
 in thread 7f409abbd700 thread_name:mds_rank_progr

 ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
 1: (() 0x3e2a20) [0x55b574657a20]
 2: (() 0xf6d0) [0x7f40a68c76d0]
 3: (gsignal() 0x37) [0x7f40a58d7277]
 4: (abort() 0x148) [0x7f40a58d8968]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*) 0x242) [0x7f40a8c94f62]
 6: (() 0x284fe7) [0x7f40a8c94fe7]
 7: (() 0x16e18a) [0x55b5743e318a]
 8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>amp;) 0x7e9) [0x55b5743e3979]
 9: (Server::handle_client_request(MClientRequest*) 0x49e) [0x55b5743fdc0e]
 10: (Server::dispatch(Message*) 0x2db) [0x55b57440189b]
 11: (MDSRank::handle_deferrable_message(Message*) 0x434) [0x55b57437b4b4]
 12: (MDSRank::_dispatch(Message*, bool) 0x63b) [0x55b574387b5b]
 13: (MDSRank::retry_dispatch(Message*) 0x12) [0x55b574388302]
 14: (MDSInternalContextBase::complete(int) 0x67) [0x55b5745d9b57]
 15: (MDSRank::_advance_queues() 0xd1) [0x55b574386d51]
 16: (MDSRank::ProgressThread::entry() 0x43) [0x55b5743873e3]
 17: (() 0x7e25) [0x7f40a68bfe25]
 18: (clone() 0x6d) [0x7f40a599fbad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.ceph-admin.log
--- end dump of recent events —
  

Есть идеи по исправлению?

Комментарии:

1. Попробуйте lists.ceph.io/hyperkitty/list/ceph-users@ceph.io , вы, вероятно, получите там больше ответов.

2. Спасибо, я отправил одно электронное письмо на ceph-users@ceph.io несколько дней назад, но полученное сообщение с автоответчиком показывает, что мой пост «удерживается до тех пор, пока модератор списка не сможет просмотреть его на утверждение».

3. Я не поддерживаю список рассылки, возможно, повторите попытку или просто подождите, или свяжитесь с администратором.