* osd crash after reboot
@ 2012-12-14 8:12 Stefan Priebe
2012-12-14 8:22 ` Stefan Priebe
0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe @ 2012-12-14 8:12 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
Hello list,
after a reboot of my node i see this on all OSDs of this node after the
reboot:
2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function
'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time
2012-12-14 09:03:20.392528
osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
2: (OSD::load_pgs()+0x13ed) [0x6168ad]
3: (OSD::init()+0xaff) [0x617a5f]
4: (main()+0x2de6) [0x55a416]
5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
6: /usr/bin/ceph-osd() [0x557269]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
-29> 2012-12-14 09:03:20.266349 7f8e652f8780 5 asok(0x285c000)
register_command perfcounters_dump hook 0x2850010
-28> 2012-12-14 09:03:20.266366 7f8e652f8780 5 asok(0x285c000)
register_command 1 hook 0x2850010
-27> 2012-12-14 09:03:20.266369 7f8e652f8780 5 asok(0x285c000)
register_command perf dump hook 0x2850010
-26> 2012-12-14 09:03:20.266379 7f8e652f8780 5 asok(0x285c000)
register_command perfcounters_schema hook 0x2850010
-25> 2012-12-14 09:03:20.266383 7f8e652f8780 5 asok(0x285c000)
register_command 2 hook 0x2850010
-24> 2012-12-14 09:03:20.266386 7f8e652f8780 5 asok(0x285c000)
register_command perf schema hook 0x2850010
-23> 2012-12-14 09:03:20.266389 7f8e652f8780 5 asok(0x285c000)
register_command config show hook 0x2850010
-22> 2012-12-14 09:03:20.266392 7f8e652f8780 5 asok(0x285c000)
register_command config set hook 0x2850010
-21> 2012-12-14 09:03:20.266396 7f8e652f8780 5 asok(0x285c000)
register_command log flush hook 0x2850010
-20> 2012-12-14 09:03:20.266398 7f8e652f8780 5 asok(0x285c000)
register_command log dump hook 0x2850010
-19> 2012-12-14 09:03:20.266401 7f8e652f8780 5 asok(0x285c000)
register_command log reopen hook 0x2850010
-18> 2012-12-14 09:03:20.267686 7f8e652f8780 0 ceph version
0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process
ceph-osd, pid 7212
-17> 2012-12-14 09:03:20.268738 7f8e652f8780 1 finished
global_init_daemonize
-16> 2012-12-14 09:03:20.275957 7f8e652f8780 0
filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work
-15> 2012-12-14 09:03:20.275968 7f8e652f8780 0
filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
fiemap' config option
-14> 2012-12-14 09:03:20.276177 7f8e652f8780 0
filestore(/ceph/osd.1/) mount did NOT detect btrfs
-13> 2012-12-14 09:03:20.277051 7f8e652f8780 0
filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
-12> 2012-12-14 09:03:20.277585 7f8e652f8780 0
filestore(/ceph/osd.1/) mount found snaps <>
-11> 2012-12-14 09:03:20.278899 7f8e652f8780 0
filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
not detected
-10> 2012-12-14 09:03:20.290745 7f8e652f8780 0 journal kernel
version is 3.6.10
-9> 2012-12-14 09:03:20.320728 7f8e652f8780 0 journal kernel
version is 3.6.10
-8> 2012-12-14 09:03:20.328381 7f8e652f8780 0
filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work
-7> 2012-12-14 09:03:20.328391 7f8e652f8780 0
filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
fiemap' config option
-6> 2012-12-14 09:03:20.328574 7f8e652f8780 0
filestore(/ceph/osd.1/) mount did NOT detect btrfs
-5> 2012-12-14 09:03:20.329579 7f8e652f8780 0
filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
-4> 2012-12-14 09:03:20.329612 7f8e652f8780 0
filestore(/ceph/osd.1/) mount found snaps <>
-3> 2012-12-14 09:03:20.330786 7f8e652f8780 0
filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
not detected
-2> 2012-12-14 09:03:20.340711 7f8e652f8780 0 journal kernel
version is 3.6.10
-1> 2012-12-14 09:03:20.370707 7f8e652f8780 0 journal kernel
version is 3.6.10
0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780
time 2012-12-14 09:03:20.392528
osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
2: (OSD::load_pgs()+0x13ed) [0x6168ad]
3: (OSD::init()+0xaff) [0x617a5f]
4: (main()+0x2de6) [0x55a416]
5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
6: /usr/bin/ceph-osd() [0x557269]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: osd crash after reboot 2012-12-14 8:12 osd crash after reboot Stefan Priebe @ 2012-12-14 8:22 ` Stefan Priebe 2012-12-14 9:14 ` Stefan Priebe 0 siblings, 1 reply; 12+ messages in thread From: Stefan Priebe @ 2012-12-14 8:22 UTC (permalink / raw) To: ceph-devel@vger.kernel.org same log more verbose: 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0 inactive] read_log done -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996 pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 inactive] handle_loaded -10> 2012-12-14 09:17:50.648581 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 inactive] exit Initial 0.015080 0 0.000000 -9> 2012-12-14 09:17:50.648591 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 inactive] enter Reset -8> 2012-12-14 09:17:50.648599 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 inactive] set_last_peering_reset 3996 -7> 2012-12-14 09:17:50.648609 7fb6e0d6b780 10 osd.3 4233 load_pgs loaded pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=3996 lcod 0'0 mlcod 0'0 inactive] log(1379'2968,3988'3969] -6> 2012-12-14 09:17:50.648649 7fb6e0d6b780 15 filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head 'info' -5> 2012-12-14 09:17:50.648664 7fb6e0d6b780 10 filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head 'info' = 5 -4> 2012-12-14 09:17:50.648672 7fb6e0d6b780 20 osd.3 0 get_map 3316 - loading and decoding 0x2943e00 -3> 2012-12-14 09:17:50.648678 7fb6e0d6b780 15 filestore(/ceph/osd.3/) read meta/a09ec88/osdmap.3316/0//-1 0~0 -2> 2012-12-14 09:17:50.648705 7fb6e0d6b780 10 filestore(/ceph/osd.3/) error opening file /ceph/osd.3//current/meta/DIR_8/DIR_8/osdmap.3316__0_0A09EC88__none with flags=0 and mode=0: (2) No such file or directory -1> 2012-12-14 09:17:50.648722 7fb6e0d6b780 10 filestore(/ceph/osd.3/) FileStore::read(meta/a09ec88/osdmap.3316/0//-1) open error: (2) No such file or directory 0> 2012-12-14 09:17:50.649586 7fb6e0d6b780 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fb6e0d6b780 time 2012-12-14 09:17:50.648733 osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] 2: (OSD::load_pgs()+0x13ed) [0x6168ad] 3: (OSD::init()+0xaff) [0x617a5f] 4: (main()+0x2de6) [0x55a416] 5: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] 6: /usr/bin/ceph-osd() [0x557269] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/20 journaler 0/ 5 objectcacher 0/ 5 client 0/20 osd 0/ 0 optracker 0/ 0 objclass 0/20 filestore 0/20 journal 0/ 0 ms 1/ 5 mon 0/ 0 monc 0/ 5 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 100000 max_new 1000 log_file /var/log/ceph/ceph-osd.3.log --- end dump of recent events --- 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) ** in thread 7fb6e0d6b780 ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) 1: /usr/bin/ceph-osd() [0x7a1889] 2: (()+0xeff0) [0x7fb6e0750ff0] 3: (gsignal()+0x35) [0x7fb6deb1a1b5] 4: (abort()+0x180) [0x7fb6deb1cfc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5] 6: (()+0xcb166) [0x7fb6df3ad166] 7: (()+0xcb193) [0x7fb6df3ad193] 8: (()+0xcb28e) [0x7fb6df3ad28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x805659] 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78] 11: (OSD::load_pgs()+0x13ed) [0x6168ad] 12: (OSD::init()+0xaff) [0x617a5f] 13: (main()+0x2de6) [0x55a416] 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] 15: /usr/bin/ceph-osd() [0x557269] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) ** in thread 7fb6e0d6b780 ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) 1: /usr/bin/ceph-osd() [0x7a1889] 2: (()+0xeff0) [0x7fb6e0750ff0] 3: (gsignal()+0x35) [0x7fb6deb1a1b5] 4: (abort()+0x180) [0x7fb6deb1cfc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5] 6: (()+0xcb166) [0x7fb6df3ad166] 7: (()+0xcb193) [0x7fb6df3ad193] 8: (()+0xcb28e) [0x7fb6df3ad28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x805659] 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78] 11: (OSD::load_pgs()+0x13ed) [0x6168ad] 12: (OSD::init()+0xaff) [0x617a5f] 13: (main()+0x2de6) [0x55a416] 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] 15: /usr/bin/ceph-osd() [0x557269] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/20 journaler 0/ 5 objectcacher 0/ 5 client 0/20 osd 0/ 0 optracker 0/ 0 objclass 0/20 filestore 0/20 journal 0/ 0 ms 1/ 5 mon 0/ 0 monc 0/ 5 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 100000 max_new 1000 log_file /var/log/ceph/ceph-osd.3.log --- end dump of recent events --- Stefan Am 14.12.2012 09:12, schrieb Stefan Priebe: > Hello list, > > after a reboot of my node i see this on all OSDs of this node after the > reboot: > > 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function > 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time > 2012-12-14 09:03:20.392528 > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > 2: (OSD::load_pgs()+0x13ed) [0x6168ad] > 3: (OSD::init()+0xaff) [0x617a5f] > 4: (main()+0x2de6) [0x55a416] > 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d] > 6: /usr/bin/ceph-osd() [0x557269] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- begin dump of recent events --- > -29> 2012-12-14 09:03:20.266349 7f8e652f8780 5 asok(0x285c000) > register_command perfcounters_dump hook 0x2850010 > -28> 2012-12-14 09:03:20.266366 7f8e652f8780 5 asok(0x285c000) > register_command 1 hook 0x2850010 > -27> 2012-12-14 09:03:20.266369 7f8e652f8780 5 asok(0x285c000) > register_command perf dump hook 0x2850010 > -26> 2012-12-14 09:03:20.266379 7f8e652f8780 5 asok(0x285c000) > register_command perfcounters_schema hook 0x2850010 > -25> 2012-12-14 09:03:20.266383 7f8e652f8780 5 asok(0x285c000) > register_command 2 hook 0x2850010 > -24> 2012-12-14 09:03:20.266386 7f8e652f8780 5 asok(0x285c000) > register_command perf schema hook 0x2850010 > -23> 2012-12-14 09:03:20.266389 7f8e652f8780 5 asok(0x285c000) > register_command config show hook 0x2850010 > -22> 2012-12-14 09:03:20.266392 7f8e652f8780 5 asok(0x285c000) > register_command config set hook 0x2850010 > -21> 2012-12-14 09:03:20.266396 7f8e652f8780 5 asok(0x285c000) > register_command log flush hook 0x2850010 > -20> 2012-12-14 09:03:20.266398 7f8e652f8780 5 asok(0x285c000) > register_command log dump hook 0x2850010 > -19> 2012-12-14 09:03:20.266401 7f8e652f8780 5 asok(0x285c000) > register_command log reopen hook 0x2850010 > -18> 2012-12-14 09:03:20.267686 7f8e652f8780 0 ceph version > 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process > ceph-osd, pid 7212 > -17> 2012-12-14 09:03:20.268738 7f8e652f8780 1 finished > global_init_daemonize > -16> 2012-12-14 09:03:20.275957 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work > -15> 2012-12-14 09:03:20.275968 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore > fiemap' config option > -14> 2012-12-14 09:03:20.276177 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount did NOT detect btrfs > -13> 2012-12-14 09:03:20.277051 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported > -12> 2012-12-14 09:03:20.277585 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount found snaps <> > -11> 2012-12-14 09:03:20.278899 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs > not detected > -10> 2012-12-14 09:03:20.290745 7f8e652f8780 0 journal kernel > version is 3.6.10 > -9> 2012-12-14 09:03:20.320728 7f8e652f8780 0 journal kernel > version is 3.6.10 > -8> 2012-12-14 09:03:20.328381 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work > -7> 2012-12-14 09:03:20.328391 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore > fiemap' config option > -6> 2012-12-14 09:03:20.328574 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount did NOT detect btrfs > -5> 2012-12-14 09:03:20.329579 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported > -4> 2012-12-14 09:03:20.329612 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount found snaps <> > -3> 2012-12-14 09:03:20.330786 7f8e652f8780 0 > filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs > not detected > -2> 2012-12-14 09:03:20.340711 7f8e652f8780 0 journal kernel > version is 3.6.10 > -1> 2012-12-14 09:03:20.370707 7f8e652f8780 0 journal kernel > version is 3.6.10 > 0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In > function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 > time 2012-12-14 09:03:20.392528 > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > 2: (OSD::load_pgs()+0x13ed) [0x6168ad] > 3: (OSD::init()+0xaff) [0x617a5f] > 4: (main()+0x2de6) [0x55a416] > 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d] > 6: /usr/bin/ceph-osd() [0x557269] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > Stefan > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 8:22 ` Stefan Priebe @ 2012-12-14 9:14 ` Stefan Priebe 2012-12-14 14:52 ` Dennis Jacobfeuerborn 2012-12-14 19:42 ` Sage Weil 0 siblings, 2 replies; 12+ messages in thread From: Stefan Priebe @ 2012-12-14 9:14 UTC (permalink / raw) To: ceph-devel@vger.kernel.org One more IMPORTANT note. This might happen due to the fact that a disk was missing (disk failure) afte the reboot. fstab and mountpoint are working with UUIDs so they match but the journal block device: osd journal = /dev/sde1 didn't match anymore - as the numbers got renumber due to the failed disk. Is there a way to use some kind of UUIDs here too for journal? Stefan Am 14.12.2012 09:22, schrieb Stefan Priebe: > same log more verbose: > 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0 > inactive] read_log done > -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996 > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > inactive] handle_loaded > -10> 2012-12-14 09:17:50.648581 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > inactive] exit Initial 0.015080 0 0.000000 > -9> 2012-12-14 09:17:50.648591 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > inactive] enter Reset > -8> 2012-12-14 09:17:50.648599 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > inactive] set_last_peering_reset 3996 > -7> 2012-12-14 09:17:50.648609 7fb6e0d6b780 10 osd.3 4233 load_pgs > loaded pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 > ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=3996 lcod 0'0 mlcod > 0'0 inactive] log(1379'2968,3988'3969] > -6> 2012-12-14 09:17:50.648649 7fb6e0d6b780 15 > filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head > 'info' > -5> 2012-12-14 09:17:50.648664 7fb6e0d6b780 10 > filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head > 'info' = 5 > -4> 2012-12-14 09:17:50.648672 7fb6e0d6b780 20 osd.3 0 get_map 3316 > - loading and decoding 0x2943e00 > -3> 2012-12-14 09:17:50.648678 7fb6e0d6b780 15 > filestore(/ceph/osd.3/) read meta/a09ec88/osdmap.3316/0//-1 0~0 > -2> 2012-12-14 09:17:50.648705 7fb6e0d6b780 10 > filestore(/ceph/osd.3/) error opening file > /ceph/osd.3//current/meta/DIR_8/DIR_8/osdmap.3316__0_0A09EC88__none with > flags=0 and mode=0: (2) No such file or directory > -1> 2012-12-14 09:17:50.648722 7fb6e0d6b780 10 > filestore(/ceph/osd.3/) FileStore::read(meta/a09ec88/osdmap.3316/0//-1) > open error: (2) No such file or directory > 0> 2012-12-14 09:17:50.649586 7fb6e0d6b780 -1 osd/OSD.cc: In > function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fb6e0d6b780 > time 2012-12-14 09:17:50.648733 > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > 2: (OSD::load_pgs()+0x13ed) [0x6168ad] > 3: (OSD::init()+0xaff) [0x617a5f] > 4: (main()+0x2de6) [0x55a416] > 5: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] > 6: /usr/bin/ceph-osd() [0x557269] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 0 lockdep > 0/ 0 context > 0/ 0 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 0 buffer > 0/ 0 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/20 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/20 osd > 0/ 0 optracker > 0/ 0 objclass > 0/20 filestore > 0/20 journal > 0/ 0 ms > 1/ 5 mon > 0/ 0 monc > 0/ 5 paxos > 0/ 0 tp > 0/ 0 auth > 1/ 5 crypto > 0/ 0 finisher > 0/ 0 heartbeatmap > 0/ 0 perfcounter > 1/ 5 rgw > 1/ 5 hadoop > 1/ 5 javaclient > 0/ 0 asok > 0/ 0 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 100000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.3.log > --- end dump of recent events --- > 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) ** > in thread 7fb6e0d6b780 > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > 1: /usr/bin/ceph-osd() [0x7a1889] > 2: (()+0xeff0) [0x7fb6e0750ff0] > 3: (gsignal()+0x35) [0x7fb6deb1a1b5] > 4: (abort()+0x180) [0x7fb6deb1cfc0] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5] > 6: (()+0xcb166) [0x7fb6df3ad166] > 7: (()+0xcb193) [0x7fb6df3ad193] > 8: (()+0xcb28e) [0x7fb6df3ad28e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x7c9) [0x805659] > 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > 11: (OSD::load_pgs()+0x13ed) [0x6168ad] > 12: (OSD::init()+0xaff) [0x617a5f] > 13: (main()+0x2de6) [0x55a416] > 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] > 15: /usr/bin/ceph-osd() [0x557269] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- begin dump of recent events --- > 0> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal > (Aborted) ** > in thread 7fb6e0d6b780 > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > 1: /usr/bin/ceph-osd() [0x7a1889] > 2: (()+0xeff0) [0x7fb6e0750ff0] > 3: (gsignal()+0x35) [0x7fb6deb1a1b5] > 4: (abort()+0x180) [0x7fb6deb1cfc0] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5] > 6: (()+0xcb166) [0x7fb6df3ad166] > 7: (()+0xcb193) [0x7fb6df3ad193] > 8: (()+0xcb28e) [0x7fb6df3ad28e] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x7c9) [0x805659] > 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > 11: (OSD::load_pgs()+0x13ed) [0x6168ad] > 12: (OSD::init()+0xaff) [0x617a5f] > 13: (main()+0x2de6) [0x55a416] > 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] > 15: /usr/bin/ceph-osd() [0x557269] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 0 lockdep > 0/ 0 context > 0/ 0 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 0 buffer > 0/ 0 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/20 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/20 osd > 0/ 0 optracker > 0/ 0 objclass > 0/20 filestore > 0/20 journal > 0/ 0 ms > 1/ 5 mon > 0/ 0 monc > 0/ 5 paxos > 0/ 0 tp > 0/ 0 auth > 1/ 5 crypto > 0/ 0 finisher > 0/ 0 heartbeatmap > 0/ 0 perfcounter > 1/ 5 rgw > 1/ 5 hadoop > 1/ 5 javaclient > 0/ 0 asok > 0/ 0 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 100000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.3.log > --- end dump of recent events --- > > Stefan > > Am 14.12.2012 09:12, schrieb Stefan Priebe: >> Hello list, >> >> after a reboot of my node i see this on all OSDs of this node after the >> reboot: >> >> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function >> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time >> 2012-12-14 09:03:20.392528 >> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) >> >> ceph version 0.55-239-gc951c27 >> (c951c270a42b94b6f269992c9001d90f70a2b824) >> 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] >> 2: (OSD::load_pgs()+0x13ed) [0x6168ad] >> 3: (OSD::init()+0xaff) [0x617a5f] >> 4: (main()+0x2de6) [0x55a416] >> 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d] >> 6: /usr/bin/ceph-osd() [0x557269] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> --- begin dump of recent events --- >> -29> 2012-12-14 09:03:20.266349 7f8e652f8780 5 asok(0x285c000) >> register_command perfcounters_dump hook 0x2850010 >> -28> 2012-12-14 09:03:20.266366 7f8e652f8780 5 asok(0x285c000) >> register_command 1 hook 0x2850010 >> -27> 2012-12-14 09:03:20.266369 7f8e652f8780 5 asok(0x285c000) >> register_command perf dump hook 0x2850010 >> -26> 2012-12-14 09:03:20.266379 7f8e652f8780 5 asok(0x285c000) >> register_command perfcounters_schema hook 0x2850010 >> -25> 2012-12-14 09:03:20.266383 7f8e652f8780 5 asok(0x285c000) >> register_command 2 hook 0x2850010 >> -24> 2012-12-14 09:03:20.266386 7f8e652f8780 5 asok(0x285c000) >> register_command perf schema hook 0x2850010 >> -23> 2012-12-14 09:03:20.266389 7f8e652f8780 5 asok(0x285c000) >> register_command config show hook 0x2850010 >> -22> 2012-12-14 09:03:20.266392 7f8e652f8780 5 asok(0x285c000) >> register_command config set hook 0x2850010 >> -21> 2012-12-14 09:03:20.266396 7f8e652f8780 5 asok(0x285c000) >> register_command log flush hook 0x2850010 >> -20> 2012-12-14 09:03:20.266398 7f8e652f8780 5 asok(0x285c000) >> register_command log dump hook 0x2850010 >> -19> 2012-12-14 09:03:20.266401 7f8e652f8780 5 asok(0x285c000) >> register_command log reopen hook 0x2850010 >> -18> 2012-12-14 09:03:20.267686 7f8e652f8780 0 ceph version >> 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process >> ceph-osd, pid 7212 >> -17> 2012-12-14 09:03:20.268738 7f8e652f8780 1 finished >> global_init_daemonize >> -16> 2012-12-14 09:03:20.275957 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to >> work >> -15> 2012-12-14 09:03:20.275968 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore >> fiemap' config option >> -14> 2012-12-14 09:03:20.276177 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount did NOT detect btrfs >> -13> 2012-12-14 09:03:20.277051 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported >> -12> 2012-12-14 09:03:20.277585 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount found snaps <> >> -11> 2012-12-14 09:03:20.278899 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs >> not detected >> -10> 2012-12-14 09:03:20.290745 7f8e652f8780 0 journal kernel >> version is 3.6.10 >> -9> 2012-12-14 09:03:20.320728 7f8e652f8780 0 journal kernel >> version is 3.6.10 >> -8> 2012-12-14 09:03:20.328381 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to >> work >> -7> 2012-12-14 09:03:20.328391 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore >> fiemap' config option >> -6> 2012-12-14 09:03:20.328574 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount did NOT detect btrfs >> -5> 2012-12-14 09:03:20.329579 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported >> -4> 2012-12-14 09:03:20.329612 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount found snaps <> >> -3> 2012-12-14 09:03:20.330786 7f8e652f8780 0 >> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs >> not detected >> -2> 2012-12-14 09:03:20.340711 7f8e652f8780 0 journal kernel >> version is 3.6.10 >> -1> 2012-12-14 09:03:20.370707 7f8e652f8780 0 journal kernel >> version is 3.6.10 >> 0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In >> function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 >> time 2012-12-14 09:03:20.392528 >> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) >> >> ceph version 0.55-239-gc951c27 >> (c951c270a42b94b6f269992c9001d90f70a2b824) >> 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] >> 2: (OSD::load_pgs()+0x13ed) [0x6168ad] >> 3: (OSD::init()+0xaff) [0x617a5f] >> 4: (main()+0x2de6) [0x55a416] >> 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d] >> 6: /usr/bin/ceph-osd() [0x557269] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> Stefan >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 9:14 ` Stefan Priebe @ 2012-12-14 14:52 ` Dennis Jacobfeuerborn 2012-12-14 15:01 ` Mark Nelson 2012-12-14 15:02 ` Stefan Priebe - Profihost AG 2012-12-14 19:42 ` Sage Weil 1 sibling, 2 replies; 12+ messages in thread From: Dennis Jacobfeuerborn @ 2012-12-14 14:52 UTC (permalink / raw) To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org On 12/14/2012 10:14 AM, Stefan Priebe wrote: > One more IMPORTANT note. This might happen due to the fact that a disk was > missing (disk failure) afte the reboot. > > fstab and mountpoint are working with UUIDs so they match but the journal > block device: > osd journal = /dev/sde1 > > didn't match anymore - as the numbers got renumber due to the failed disk. > Is there a way to use some kind of UUIDs here too for journal? You should be able to use /dev/disk/by-uuid/* instead. That should give you a stable view of the filesystems. Regards, Dennis ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 14:52 ` Dennis Jacobfeuerborn @ 2012-12-14 15:01 ` Mark Nelson 2012-12-14 15:11 ` Stefan Priebe - Profihost AG 2012-12-14 15:02 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 12+ messages in thread From: Mark Nelson @ 2012-12-14 15:01 UTC (permalink / raw) To: Dennis Jacobfeuerborn; +Cc: Stefan Priebe, ceph-devel@vger.kernel.org On 12/14/2012 08:52 AM, Dennis Jacobfeuerborn wrote: > On 12/14/2012 10:14 AM, Stefan Priebe wrote: >> One more IMPORTANT note. This might happen due to the fact that a disk was >> missing (disk failure) afte the reboot. >> >> fstab and mountpoint are working with UUIDs so they match but the journal >> block device: >> osd journal = /dev/sde1 >> >> didn't match anymore - as the numbers got renumber due to the failed disk. >> Is there a way to use some kind of UUIDs here too for journal? > > You should be able to use /dev/disk/by-uuid/* instead. That should give you > a stable view of the filesystems. I often map partitions to something in /dev/disk/by-partlabel and use those in my ceph.conf files. that way disks can be remapped behind the scenes and the ceph configuration doesn't have to change even if disks get replaced. > > Regards, > Dennis > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 15:01 ` Mark Nelson @ 2012-12-14 15:11 ` Stefan Priebe - Profihost AG 2012-12-14 15:20 ` Mark Nelson 0 siblings, 1 reply; 12+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-12-14 15:11 UTC (permalink / raw) To: Mark Nelson; +Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org Hi Mark, but do i set a label for a partition without FS like the journal blockdev? Am 14.12.2012 16:01, schrieb Mark Nelson: > I often map partitions to something in /dev/disk/by-partlabel and use > those in my ceph.conf files. that way disks can be remapped behind the > scenes and the ceph configuration doesn't have to change even if disks > get replaced. Greets, Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 15:11 ` Stefan Priebe - Profihost AG @ 2012-12-14 15:20 ` Mark Nelson 2012-12-14 15:25 ` Stefan Priebe - Profihost AG 2012-12-14 16:06 ` Stefan Priebe - Profihost AG 0 siblings, 2 replies; 12+ messages in thread From: Mark Nelson @ 2012-12-14 15:20 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org Hi Stefan, Here's what I often do when I have a journal and data partition sharing a disk: sudo parted -s -a optimal /dev/$DEV mklabel gpt sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% Mark On 12/14/2012 09:11 AM, Stefan Priebe - Profihost AG wrote: > Hi Mark, > > but do i set a label for a partition without FS like the journal blockdev? > Am 14.12.2012 16:01, schrieb Mark Nelson: >> I often map partitions to something in /dev/disk/by-partlabel and use >> those in my ceph.conf files. that way disks can be remapped behind the >> scenes and the ceph configuration doesn't have to change even if disks >> get replaced. > > Greets, > Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 15:20 ` Mark Nelson @ 2012-12-14 15:25 ` Stefan Priebe - Profihost AG 2012-12-14 16:06 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 12+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-12-14 15:25 UTC (permalink / raw) To: Mark Nelson; +Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org Hi Mark, Am 14.12.2012 16:20, schrieb Mark Nelson: > sudo parted -s -a optimal /dev/$DEV mklabel gpt > sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G > sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% My disks are gpt too and i'm also using parted. But i don't want to recreate my partitions. I haven't seen a way in parted to set such a label later. Greets, Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 15:20 ` Mark Nelson 2012-12-14 15:25 ` Stefan Priebe - Profihost AG @ 2012-12-14 16:06 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 12+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-12-14 16:06 UTC (permalink / raw) To: Mark Nelson; +Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org Hello Mark, Am 14.12.2012 16:20, schrieb Mark Nelson: > sudo parted -s -a optimal /dev/$DEV mklabel gpt > sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G > sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% Isn't that the part type you're using? mkpart part-type start-mb end-mb I like your idea and i think it's a good one but i want to know why this works. part-type isn't FS label... Greets, Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 14:52 ` Dennis Jacobfeuerborn 2012-12-14 15:01 ` Mark Nelson @ 2012-12-14 15:02 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 12+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-12-14 15:02 UTC (permalink / raw) To: Dennis Jacobfeuerborn; +Cc: ceph-devel@vger.kernel.org Hello Dennis, Am 14.12.2012 15:52, schrieb Dennis Jacobfeuerborn: >> didn't match anymore - as the numbers got renumber due to the failed disk. >> Is there a way to use some kind of UUIDs here too for journal? > > You should be able to use /dev/disk/by-uuid/* instead. That should give you > a stable view of the filesystems. Good idea but there are only listed partitions with UUIDs. When the journal is using directly the partition it does not have a UUID. But this reminded me of /dev/disk/by-id and that works fine. I'm now using the wwn Number. Greets, Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 9:14 ` Stefan Priebe 2012-12-14 14:52 ` Dennis Jacobfeuerborn @ 2012-12-14 19:42 ` Sage Weil 2012-12-14 19:47 ` Stefan Priebe 1 sibling, 1 reply; 12+ messages in thread From: Sage Weil @ 2012-12-14 19:42 UTC (permalink / raw) To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org On Fri, 14 Dec 2012, Stefan Priebe wrote: > One more IMPORTANT note. This might happen due to the fact that a disk was > missing (disk failure) afte the reboot. > > fstab and mountpoint are working with UUIDs so they match but the journal > block device: > osd journal = /dev/sde1 > > didn't match anymore - as the numbers got renumber due to the failed disk. Is > there a way to use some kind of UUIDs here too for journal? I think others have addressed the uuid question, but one note: The ceph-osd process has an internal uuid/fingerprint on the journal and data dir, and will refuse to start if they don't match. sage > > Stefan > > Am 14.12.2012 09:22, schrieb Stefan Priebe: > > same log more verbose: > > 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0 > > inactive] read_log done > > -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996 > > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > > inactive] handle_loaded > > -10> 2012-12-14 09:17:50.648581 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 > > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > > inactive] exit Initial 0.015080 0 0.000000 > > -9> 2012-12-14 09:17:50.648591 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 > > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > > inactive] enter Reset > > -8> 2012-12-14 09:17:50.648599 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 > > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 > > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 > > inactive] set_last_peering_reset 3996 > > -7> 2012-12-14 09:17:50.648609 7fb6e0d6b780 10 osd.3 4233 load_pgs > > loaded pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 > > ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=3996 lcod 0'0 mlcod > > 0'0 inactive] log(1379'2968,3988'3969] > > -6> 2012-12-14 09:17:50.648649 7fb6e0d6b780 15 > > filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head > > 'info' > > -5> 2012-12-14 09:17:50.648664 7fb6e0d6b780 10 > > filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head > > 'info' = 5 > > -4> 2012-12-14 09:17:50.648672 7fb6e0d6b780 20 osd.3 0 get_map 3316 > > - loading and decoding 0x2943e00 > > -3> 2012-12-14 09:17:50.648678 7fb6e0d6b780 15 > > filestore(/ceph/osd.3/) read meta/a09ec88/osdmap.3316/0//-1 0~0 > > -2> 2012-12-14 09:17:50.648705 7fb6e0d6b780 10 > > filestore(/ceph/osd.3/) error opening file > > /ceph/osd.3//current/meta/DIR_8/DIR_8/osdmap.3316__0_0A09EC88__none with > > flags=0 and mode=0: (2) No such file or directory > > -1> 2012-12-14 09:17:50.648722 7fb6e0d6b780 10 > > filestore(/ceph/osd.3/) FileStore::read(meta/a09ec88/osdmap.3316/0//-1) > > open error: (2) No such file or directory > > 0> 2012-12-14 09:17:50.649586 7fb6e0d6b780 -1 osd/OSD.cc: In > > function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fb6e0d6b780 > > time 2012-12-14 09:17:50.648733 > > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) > > > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > > 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > > 2: (OSD::load_pgs()+0x13ed) [0x6168ad] > > 3: (OSD::init()+0xaff) [0x617a5f] > > 4: (main()+0x2de6) [0x55a416] > > 5: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] > > 6: /usr/bin/ceph-osd() [0x557269] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > needed to interpret this. > > > > --- logging levels --- > > 0/ 5 none > > 0/ 0 lockdep > > 0/ 0 context > > 0/ 0 crush > > 1/ 5 mds > > 1/ 5 mds_balancer > > 1/ 5 mds_locker > > 1/ 5 mds_log > > 1/ 5 mds_log_expire > > 1/ 5 mds_migrator > > 0/ 0 buffer > > 0/ 0 timer > > 0/ 1 filer > > 0/ 1 striper > > 0/ 1 objecter > > 0/ 5 rados > > 0/ 5 rbd > > 0/20 journaler > > 0/ 5 objectcacher > > 0/ 5 client > > 0/20 osd > > 0/ 0 optracker > > 0/ 0 objclass > > 0/20 filestore > > 0/20 journal > > 0/ 0 ms > > 1/ 5 mon > > 0/ 0 monc > > 0/ 5 paxos > > 0/ 0 tp > > 0/ 0 auth > > 1/ 5 crypto > > 0/ 0 finisher > > 0/ 0 heartbeatmap > > 0/ 0 perfcounter > > 1/ 5 rgw > > 1/ 5 hadoop > > 1/ 5 javaclient > > 0/ 0 asok > > 0/ 0 throttle > > -2/-2 (syslog threshold) > > -1/-1 (stderr threshold) > > max_recent 100000 > > max_new 1000 > > log_file /var/log/ceph/ceph-osd.3.log > > --- end dump of recent events --- > > 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) ** > > in thread 7fb6e0d6b780 > > > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > > 1: /usr/bin/ceph-osd() [0x7a1889] > > 2: (()+0xeff0) [0x7fb6e0750ff0] > > 3: (gsignal()+0x35) [0x7fb6deb1a1b5] > > 4: (abort()+0x180) [0x7fb6deb1cfc0] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5] > > 6: (()+0xcb166) [0x7fb6df3ad166] > > 7: (()+0xcb193) [0x7fb6df3ad193] > > 8: (()+0xcb28e) [0x7fb6df3ad28e] > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x7c9) [0x805659] > > 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > > 11: (OSD::load_pgs()+0x13ed) [0x6168ad] > > 12: (OSD::init()+0xaff) [0x617a5f] > > 13: (main()+0x2de6) [0x55a416] > > 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] > > 15: /usr/bin/ceph-osd() [0x557269] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > needed to interpret this. > > > > --- begin dump of recent events --- > > 0> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal > > (Aborted) ** > > in thread 7fb6e0d6b780 > > > > ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824) > > 1: /usr/bin/ceph-osd() [0x7a1889] > > 2: (()+0xeff0) [0x7fb6e0750ff0] > > 3: (gsignal()+0x35) [0x7fb6deb1a1b5] > > 4: (abort()+0x180) [0x7fb6deb1cfc0] > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5] > > 6: (()+0xcb166) [0x7fb6df3ad166] > > 7: (()+0xcb193) [0x7fb6df3ad193] > > 8: (()+0xcb28e) [0x7fb6df3ad28e] > > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x7c9) [0x805659] > > 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > > 11: (OSD::load_pgs()+0x13ed) [0x6168ad] > > 12: (OSD::init()+0xaff) [0x617a5f] > > 13: (main()+0x2de6) [0x55a416] > > 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d] > > 15: /usr/bin/ceph-osd() [0x557269] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > needed to interpret this. > > > > --- logging levels --- > > 0/ 5 none > > 0/ 0 lockdep > > 0/ 0 context > > 0/ 0 crush > > 1/ 5 mds > > 1/ 5 mds_balancer > > 1/ 5 mds_locker > > 1/ 5 mds_log > > 1/ 5 mds_log_expire > > 1/ 5 mds_migrator > > 0/ 0 buffer > > 0/ 0 timer > > 0/ 1 filer > > 0/ 1 striper > > 0/ 1 objecter > > 0/ 5 rados > > 0/ 5 rbd > > 0/20 journaler > > 0/ 5 objectcacher > > 0/ 5 client > > 0/20 osd > > 0/ 0 optracker > > 0/ 0 objclass > > 0/20 filestore > > 0/20 journal > > 0/ 0 ms > > 1/ 5 mon > > 0/ 0 monc > > 0/ 5 paxos > > 0/ 0 tp > > 0/ 0 auth > > 1/ 5 crypto > > 0/ 0 finisher > > 0/ 0 heartbeatmap > > 0/ 0 perfcounter > > 1/ 5 rgw > > 1/ 5 hadoop > > 1/ 5 javaclient > > 0/ 0 asok > > 0/ 0 throttle > > -2/-2 (syslog threshold) > > -1/-1 (stderr threshold) > > max_recent 100000 > > max_new 1000 > > log_file /var/log/ceph/ceph-osd.3.log > > --- end dump of recent events --- > > > > Stefan > > > > Am 14.12.2012 09:12, schrieb Stefan Priebe: > > > Hello list, > > > > > > after a reboot of my node i see this on all OSDs of this node after the > > > reboot: > > > > > > 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function > > > 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time > > > 2012-12-14 09:03:20.392528 > > > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) > > > > > > ceph version 0.55-239-gc951c27 > > > (c951c270a42b94b6f269992c9001d90f70a2b824) > > > 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > > > 2: (OSD::load_pgs()+0x13ed) [0x6168ad] > > > 3: (OSD::init()+0xaff) [0x617a5f] > > > 4: (main()+0x2de6) [0x55a416] > > > 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d] > > > 6: /usr/bin/ceph-osd() [0x557269] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > > needed to interpret this. > > > > > > --- begin dump of recent events --- > > > -29> 2012-12-14 09:03:20.266349 7f8e652f8780 5 asok(0x285c000) > > > register_command perfcounters_dump hook 0x2850010 > > > -28> 2012-12-14 09:03:20.266366 7f8e652f8780 5 asok(0x285c000) > > > register_command 1 hook 0x2850010 > > > -27> 2012-12-14 09:03:20.266369 7f8e652f8780 5 asok(0x285c000) > > > register_command perf dump hook 0x2850010 > > > -26> 2012-12-14 09:03:20.266379 7f8e652f8780 5 asok(0x285c000) > > > register_command perfcounters_schema hook 0x2850010 > > > -25> 2012-12-14 09:03:20.266383 7f8e652f8780 5 asok(0x285c000) > > > register_command 2 hook 0x2850010 > > > -24> 2012-12-14 09:03:20.266386 7f8e652f8780 5 asok(0x285c000) > > > register_command perf schema hook 0x2850010 > > > -23> 2012-12-14 09:03:20.266389 7f8e652f8780 5 asok(0x285c000) > > > register_command config show hook 0x2850010 > > > -22> 2012-12-14 09:03:20.266392 7f8e652f8780 5 asok(0x285c000) > > > register_command config set hook 0x2850010 > > > -21> 2012-12-14 09:03:20.266396 7f8e652f8780 5 asok(0x285c000) > > > register_command log flush hook 0x2850010 > > > -20> 2012-12-14 09:03:20.266398 7f8e652f8780 5 asok(0x285c000) > > > register_command log dump hook 0x2850010 > > > -19> 2012-12-14 09:03:20.266401 7f8e652f8780 5 asok(0x285c000) > > > register_command log reopen hook 0x2850010 > > > -18> 2012-12-14 09:03:20.267686 7f8e652f8780 0 ceph version > > > 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process > > > ceph-osd, pid 7212 > > > -17> 2012-12-14 09:03:20.268738 7f8e652f8780 1 finished > > > global_init_daemonize > > > -16> 2012-12-14 09:03:20.275957 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to > > > work > > > -15> 2012-12-14 09:03:20.275968 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore > > > fiemap' config option > > > -14> 2012-12-14 09:03:20.276177 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount did NOT detect btrfs > > > -13> 2012-12-14 09:03:20.277051 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported > > > -12> 2012-12-14 09:03:20.277585 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount found snaps <> > > > -11> 2012-12-14 09:03:20.278899 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs > > > not detected > > > -10> 2012-12-14 09:03:20.290745 7f8e652f8780 0 journal kernel > > > version is 3.6.10 > > > -9> 2012-12-14 09:03:20.320728 7f8e652f8780 0 journal kernel > > > version is 3.6.10 > > > -8> 2012-12-14 09:03:20.328381 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to > > > work > > > -7> 2012-12-14 09:03:20.328391 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore > > > fiemap' config option > > > -6> 2012-12-14 09:03:20.328574 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount did NOT detect btrfs > > > -5> 2012-12-14 09:03:20.329579 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported > > > -4> 2012-12-14 09:03:20.329612 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount found snaps <> > > > -3> 2012-12-14 09:03:20.330786 7f8e652f8780 0 > > > filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs > > > not detected > > > -2> 2012-12-14 09:03:20.340711 7f8e652f8780 0 journal kernel > > > version is 3.6.10 > > > -1> 2012-12-14 09:03:20.370707 7f8e652f8780 0 journal kernel > > > version is 3.6.10 > > > 0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In > > > function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 > > > time 2012-12-14 09:03:20.392528 > > > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl)) > > > > > > ceph version 0.55-239-gc951c27 > > > (c951c270a42b94b6f269992c9001d90f70a2b824) > > > 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78] > > > 2: (OSD::load_pgs()+0x13ed) [0x6168ad] > > > 3: (OSD::init()+0xaff) [0x617a5f] > > > 4: (main()+0x2de6) [0x55a416] > > > 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d] > > > 6: /usr/bin/ceph-osd() [0x557269] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > > needed to interpret this. > > > > > > Stefan > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: osd crash after reboot 2012-12-14 19:42 ` Sage Weil @ 2012-12-14 19:47 ` Stefan Priebe 0 siblings, 0 replies; 12+ messages in thread From: Stefan Priebe @ 2012-12-14 19:47 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel@vger.kernel.org Hi Sage, this was just an idea and i need to fix MY uuid problem. But then the crash is still a problem of ceph. Have you looked into my log? Am 14.12.2012 20:42, schrieb Sage Weil: > On Fri, 14 Dec 2012, Stefan Priebe wrote: >> One more IMPORTANT note. This might happen due to the fact that a disk was >> missing (disk failure) afte the reboot. >> >> fstab and mountpoint are working with UUIDs so they match but the journal >> block device: >> osd journal = /dev/sde1 >> >> didn't match anymore - as the numbers got renumber due to the failed disk. Is >> there a way to use some kind of UUIDs here too for journal? > > I think others have addressed the uuid question, but one note: > > The ceph-osd process has an internal uuid/fingerprint on the journal and > data dir, and will refuse to start if they don't match. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-12-14 19:47 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-14 8:12 osd crash after reboot Stefan Priebe 2012-12-14 8:22 ` Stefan Priebe 2012-12-14 9:14 ` Stefan Priebe 2012-12-14 14:52 ` Dennis Jacobfeuerborn 2012-12-14 15:01 ` Mark Nelson 2012-12-14 15:11 ` Stefan Priebe - Profihost AG 2012-12-14 15:20 ` Mark Nelson 2012-12-14 15:25 ` Stefan Priebe - Profihost AG 2012-12-14 16:06 ` Stefan Priebe - Profihost AG 2012-12-14 15:02 ` Stefan Priebe - Profihost AG 2012-12-14 19:42 ` Sage Weil 2012-12-14 19:47 ` Stefan Priebe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.