From: Stefan Priebe <s.priebe@profihost.ag>
To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: osd crash after reboot
Date: Fri, 14 Dec 2012 10:14:13 +0100 [thread overview]
Message-ID: <50CAEDE5.9070509@profihost.ag> (raw)
In-Reply-To: <50CAE1AA.80801@profihost.ag>
One more IMPORTANT note. This might happen due to the fact that a disk
was missing (disk failure) afte the reboot.
fstab and mountpoint are working with UUIDs so they match but the
journal block device:
osd journal = /dev/sde1
didn't match anymore - as the numbers got renumber due to the failed
disk. Is there a way to use some kind of UUIDs here too for journal?
Stefan
Am 14.12.2012 09:22, schrieb Stefan Priebe:
> same log more verbose:
> 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] read_log done
> -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] handle_loaded
> -10> 2012-12-14 09:17:50.648581 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] exit Initial 0.015080 0 0.000000
> -9> 2012-12-14 09:17:50.648591 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] enter Reset
> -8> 2012-12-14 09:17:50.648599 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] set_last_peering_reset 3996
> -7> 2012-12-14 09:17:50.648609 7fb6e0d6b780 10 osd.3 4233 load_pgs
> loaded pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11
> ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=3996 lcod 0'0 mlcod
> 0'0 inactive] log(1379'2968,3988'3969]
> -6> 2012-12-14 09:17:50.648649 7fb6e0d6b780 15
> filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head
> 'info'
> -5> 2012-12-14 09:17:50.648664 7fb6e0d6b780 10
> filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head
> 'info' = 5
> -4> 2012-12-14 09:17:50.648672 7fb6e0d6b780 20 osd.3 0 get_map 3316
> - loading and decoding 0x2943e00
> -3> 2012-12-14 09:17:50.648678 7fb6e0d6b780 15
> filestore(/ceph/osd.3/) read meta/a09ec88/osdmap.3316/0//-1 0~0
> -2> 2012-12-14 09:17:50.648705 7fb6e0d6b780 10
> filestore(/ceph/osd.3/) error opening file
> /ceph/osd.3//current/meta/DIR_8/DIR_8/osdmap.3316__0_0A09EC88__none with
> flags=0 and mode=0: (2) No such file or directory
> -1> 2012-12-14 09:17:50.648722 7fb6e0d6b780 10
> filestore(/ceph/osd.3/) FileStore::read(meta/a09ec88/osdmap.3316/0//-1)
> open error: (2) No such file or directory
> 0> 2012-12-14 09:17:50.649586 7fb6e0d6b780 -1 osd/OSD.cc: In
> function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fb6e0d6b780
> time 2012-12-14 09:17:50.648733
> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>
> ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
> 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> 2: (OSD::load_pgs()+0x13ed) [0x6168ad]
> 3: (OSD::init()+0xaff) [0x617a5f]
> 4: (main()+0x2de6) [0x55a416]
> 5: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
> 6: /usr/bin/ceph-osd() [0x557269]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 0 lockdep
> 0/ 0 context
> 0/ 0 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 0 buffer
> 0/ 0 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/20 journaler
> 0/ 5 objectcacher
> 0/ 5 client
> 0/20 osd
> 0/ 0 optracker
> 0/ 0 objclass
> 0/20 filestore
> 0/20 journal
> 0/ 0 ms
> 1/ 5 mon
> 0/ 0 monc
> 0/ 5 paxos
> 0/ 0 tp
> 0/ 0 auth
> 1/ 5 crypto
> 0/ 0 finisher
> 0/ 0 heartbeatmap
> 0/ 0 perfcounter
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 javaclient
> 0/ 0 asok
> 0/ 0 throttle
> -2/-2 (syslog threshold)
> -1/-1 (stderr threshold)
> max_recent 100000
> max_new 1000
> log_file /var/log/ceph/ceph-osd.3.log
> --- end dump of recent events ---
> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) **
> in thread 7fb6e0d6b780
>
> ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
> 1: /usr/bin/ceph-osd() [0x7a1889]
> 2: (()+0xeff0) [0x7fb6e0750ff0]
> 3: (gsignal()+0x35) [0x7fb6deb1a1b5]
> 4: (abort()+0x180) [0x7fb6deb1cfc0]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
> 6: (()+0xcb166) [0x7fb6df3ad166]
> 7: (()+0xcb193) [0x7fb6df3ad193]
> 8: (()+0xcb28e) [0x7fb6df3ad28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7c9) [0x805659]
> 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> 11: (OSD::load_pgs()+0x13ed) [0x6168ad]
> 12: (OSD::init()+0xaff) [0x617a5f]
> 13: (main()+0x2de6) [0x55a416]
> 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
> 15: /usr/bin/ceph-osd() [0x557269]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- begin dump of recent events ---
> 0> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal
> (Aborted) **
> in thread 7fb6e0d6b780
>
> ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
> 1: /usr/bin/ceph-osd() [0x7a1889]
> 2: (()+0xeff0) [0x7fb6e0750ff0]
> 3: (gsignal()+0x35) [0x7fb6deb1a1b5]
> 4: (abort()+0x180) [0x7fb6deb1cfc0]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
> 6: (()+0xcb166) [0x7fb6df3ad166]
> 7: (()+0xcb193) [0x7fb6df3ad193]
> 8: (()+0xcb28e) [0x7fb6df3ad28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7c9) [0x805659]
> 10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> 11: (OSD::load_pgs()+0x13ed) [0x6168ad]
> 12: (OSD::init()+0xaff) [0x617a5f]
> 13: (main()+0x2de6) [0x55a416]
> 14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
> 15: /usr/bin/ceph-osd() [0x557269]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 0 lockdep
> 0/ 0 context
> 0/ 0 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 0 buffer
> 0/ 0 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/20 journaler
> 0/ 5 objectcacher
> 0/ 5 client
> 0/20 osd
> 0/ 0 optracker
> 0/ 0 objclass
> 0/20 filestore
> 0/20 journal
> 0/ 0 ms
> 1/ 5 mon
> 0/ 0 monc
> 0/ 5 paxos
> 0/ 0 tp
> 0/ 0 auth
> 1/ 5 crypto
> 0/ 0 finisher
> 0/ 0 heartbeatmap
> 0/ 0 perfcounter
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 javaclient
> 0/ 0 asok
> 0/ 0 throttle
> -2/-2 (syslog threshold)
> -1/-1 (stderr threshold)
> max_recent 100000
> max_new 1000
> log_file /var/log/ceph/ceph-osd.3.log
> --- end dump of recent events ---
>
> Stefan
>
> Am 14.12.2012 09:12, schrieb Stefan Priebe:
>> Hello list,
>>
>> after a reboot of my node i see this on all OSDs of this node after the
>> reboot:
>>
>> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function
>> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time
>> 2012-12-14 09:03:20.392528
>> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>>
>> ceph version 0.55-239-gc951c27
>> (c951c270a42b94b6f269992c9001d90f70a2b824)
>> 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>> 2: (OSD::load_pgs()+0x13ed) [0x6168ad]
>> 3: (OSD::init()+0xaff) [0x617a5f]
>> 4: (main()+0x2de6) [0x55a416]
>> 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
>> 6: /usr/bin/ceph-osd() [0x557269]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- begin dump of recent events ---
>> -29> 2012-12-14 09:03:20.266349 7f8e652f8780 5 asok(0x285c000)
>> register_command perfcounters_dump hook 0x2850010
>> -28> 2012-12-14 09:03:20.266366 7f8e652f8780 5 asok(0x285c000)
>> register_command 1 hook 0x2850010
>> -27> 2012-12-14 09:03:20.266369 7f8e652f8780 5 asok(0x285c000)
>> register_command perf dump hook 0x2850010
>> -26> 2012-12-14 09:03:20.266379 7f8e652f8780 5 asok(0x285c000)
>> register_command perfcounters_schema hook 0x2850010
>> -25> 2012-12-14 09:03:20.266383 7f8e652f8780 5 asok(0x285c000)
>> register_command 2 hook 0x2850010
>> -24> 2012-12-14 09:03:20.266386 7f8e652f8780 5 asok(0x285c000)
>> register_command perf schema hook 0x2850010
>> -23> 2012-12-14 09:03:20.266389 7f8e652f8780 5 asok(0x285c000)
>> register_command config show hook 0x2850010
>> -22> 2012-12-14 09:03:20.266392 7f8e652f8780 5 asok(0x285c000)
>> register_command config set hook 0x2850010
>> -21> 2012-12-14 09:03:20.266396 7f8e652f8780 5 asok(0x285c000)
>> register_command log flush hook 0x2850010
>> -20> 2012-12-14 09:03:20.266398 7f8e652f8780 5 asok(0x285c000)
>> register_command log dump hook 0x2850010
>> -19> 2012-12-14 09:03:20.266401 7f8e652f8780 5 asok(0x285c000)
>> register_command log reopen hook 0x2850010
>> -18> 2012-12-14 09:03:20.267686 7f8e652f8780 0 ceph version
>> 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process
>> ceph-osd, pid 7212
>> -17> 2012-12-14 09:03:20.268738 7f8e652f8780 1 finished
>> global_init_daemonize
>> -16> 2012-12-14 09:03:20.275957 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to
>> work
>> -15> 2012-12-14 09:03:20.275968 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
>> fiemap' config option
>> -14> 2012-12-14 09:03:20.276177 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount did NOT detect btrfs
>> -13> 2012-12-14 09:03:20.277051 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
>> -12> 2012-12-14 09:03:20.277585 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount found snaps <>
>> -11> 2012-12-14 09:03:20.278899 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
>> not detected
>> -10> 2012-12-14 09:03:20.290745 7f8e652f8780 0 journal kernel
>> version is 3.6.10
>> -9> 2012-12-14 09:03:20.320728 7f8e652f8780 0 journal kernel
>> version is 3.6.10
>> -8> 2012-12-14 09:03:20.328381 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to
>> work
>> -7> 2012-12-14 09:03:20.328391 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
>> fiemap' config option
>> -6> 2012-12-14 09:03:20.328574 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount did NOT detect btrfs
>> -5> 2012-12-14 09:03:20.329579 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
>> -4> 2012-12-14 09:03:20.329612 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount found snaps <>
>> -3> 2012-12-14 09:03:20.330786 7f8e652f8780 0
>> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
>> not detected
>> -2> 2012-12-14 09:03:20.340711 7f8e652f8780 0 journal kernel
>> version is 3.6.10
>> -1> 2012-12-14 09:03:20.370707 7f8e652f8780 0 journal kernel
>> version is 3.6.10
>> 0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In
>> function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780
>> time 2012-12-14 09:03:20.392528
>> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>>
>> ceph version 0.55-239-gc951c27
>> (c951c270a42b94b6f269992c9001d90f70a2b824)
>> 1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>> 2: (OSD::load_pgs()+0x13ed) [0x6168ad]
>> 3: (OSD::init()+0xaff) [0x617a5f]
>> 4: (main()+0x2de6) [0x55a416]
>> 5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
>> 6: /usr/bin/ceph-osd() [0x557269]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-12-14 9:14 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-14 8:12 osd crash after reboot Stefan Priebe
2012-12-14 8:22 ` Stefan Priebe
2012-12-14 9:14 ` Stefan Priebe [this message]
2012-12-14 14:52 ` Dennis Jacobfeuerborn
2012-12-14 15:01 ` Mark Nelson
2012-12-14 15:11 ` Stefan Priebe - Profihost AG
2012-12-14 15:20 ` Mark Nelson
2012-12-14 15:25 ` Stefan Priebe - Profihost AG
2012-12-14 16:06 ` Stefan Priebe - Profihost AG
2012-12-14 15:02 ` Stefan Priebe - Profihost AG
2012-12-14 19:42 ` Sage Weil
2012-12-14 19:47 ` Stefan Priebe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50CAEDE5.9070509@profihost.ag \
--to=s.priebe@profihost.ag \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox