All of lore.kernel.org
 help / color / mirror / Atom feed
* osd crash after reboot
@ 2012-12-14  8:12 Stefan Priebe
  2012-12-14  8:22 ` Stefan Priebe
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe @ 2012-12-14  8:12 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hello list,

after a reboot of my node i see this on all OSDs of this node after the 
reboot:

2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function 
'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time 
2012-12-14 09:03:20.392528
osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))

  ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
  1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
  2: (OSD::load_pgs()+0x13ed) [0x6168ad]
  3: (OSD::init()+0xaff) [0x617a5f]
  4: (main()+0x2de6) [0x55a416]
  5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
  6: /usr/bin/ceph-osd() [0x557269]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- begin dump of recent events ---
    -29> 2012-12-14 09:03:20.266349 7f8e652f8780  5 asok(0x285c000) 
register_command perfcounters_dump hook 0x2850010
    -28> 2012-12-14 09:03:20.266366 7f8e652f8780  5 asok(0x285c000) 
register_command 1 hook 0x2850010
    -27> 2012-12-14 09:03:20.266369 7f8e652f8780  5 asok(0x285c000) 
register_command perf dump hook 0x2850010
    -26> 2012-12-14 09:03:20.266379 7f8e652f8780  5 asok(0x285c000) 
register_command perfcounters_schema hook 0x2850010
    -25> 2012-12-14 09:03:20.266383 7f8e652f8780  5 asok(0x285c000) 
register_command 2 hook 0x2850010
    -24> 2012-12-14 09:03:20.266386 7f8e652f8780  5 asok(0x285c000) 
register_command perf schema hook 0x2850010
    -23> 2012-12-14 09:03:20.266389 7f8e652f8780  5 asok(0x285c000) 
register_command config show hook 0x2850010
    -22> 2012-12-14 09:03:20.266392 7f8e652f8780  5 asok(0x285c000) 
register_command config set hook 0x2850010
    -21> 2012-12-14 09:03:20.266396 7f8e652f8780  5 asok(0x285c000) 
register_command log flush hook 0x2850010
    -20> 2012-12-14 09:03:20.266398 7f8e652f8780  5 asok(0x285c000) 
register_command log dump hook 0x2850010
    -19> 2012-12-14 09:03:20.266401 7f8e652f8780  5 asok(0x285c000) 
register_command log reopen hook 0x2850010
    -18> 2012-12-14 09:03:20.267686 7f8e652f8780  0 ceph version 
0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process 
ceph-osd, pid 7212
    -17> 2012-12-14 09:03:20.268738 7f8e652f8780  1 finished 
global_init_daemonize
    -16> 2012-12-14 09:03:20.275957 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work
    -15> 2012-12-14 09:03:20.275968 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore 
fiemap' config option
    -14> 2012-12-14 09:03:20.276177 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount did NOT detect btrfs
    -13> 2012-12-14 09:03:20.277051 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
    -12> 2012-12-14 09:03:20.277585 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount found snaps <>
    -11> 2012-12-14 09:03:20.278899 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs 
not detected
    -10> 2012-12-14 09:03:20.290745 7f8e652f8780  0 journal  kernel 
version is 3.6.10
     -9> 2012-12-14 09:03:20.320728 7f8e652f8780  0 journal  kernel 
version is 3.6.10
     -8> 2012-12-14 09:03:20.328381 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work
     -7> 2012-12-14 09:03:20.328391 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore 
fiemap' config option
     -6> 2012-12-14 09:03:20.328574 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount did NOT detect btrfs
     -5> 2012-12-14 09:03:20.329579 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
     -4> 2012-12-14 09:03:20.329612 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount found snaps <>
     -3> 2012-12-14 09:03:20.330786 7f8e652f8780  0 
filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs 
not detected
     -2> 2012-12-14 09:03:20.340711 7f8e652f8780  0 journal  kernel 
version is 3.6.10
     -1> 2012-12-14 09:03:20.370707 7f8e652f8780  0 journal  kernel 
version is 3.6.10
      0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In 
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 
time 2012-12-14 09:03:20.392528
osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))

  ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
  1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
  2: (OSD::load_pgs()+0x13ed) [0x6168ad]
  3: (OSD::init()+0xaff) [0x617a5f]
  4: (main()+0x2de6) [0x55a416]
  5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
  6: /usr/bin/ceph-osd() [0x557269]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14  8:12 osd crash after reboot Stefan Priebe
@ 2012-12-14  8:22 ` Stefan Priebe
  2012-12-14  9:14   ` Stefan Priebe
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe @ 2012-12-14  8:22 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

same log more verbose:
11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0 
inactive] read_log done
    -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996 
pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 
les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 
inactive] handle_loaded
    -10> 2012-12-14 09:17:50.648581 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 
pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 
les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 
inactive] exit Initial 0.015080 0 0.000000
     -9> 2012-12-14 09:17:50.648591 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 
pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 
les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 
inactive] enter Reset
     -8> 2012-12-14 09:17:50.648599 7fb6e0d6b780 20 osd.3 pg_epoch: 3996 
pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 
les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0 
inactive] set_last_peering_reset 3996
     -7> 2012-12-14 09:17:50.648609 7fb6e0d6b780 10 osd.3 4233 load_pgs 
loaded pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 
ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=3996 lcod 0'0 mlcod 
0'0 inactive] log(1379'2968,3988'3969]
     -6> 2012-12-14 09:17:50.648649 7fb6e0d6b780 15 
filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head 
'info'
     -5> 2012-12-14 09:17:50.648664 7fb6e0d6b780 10 
filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head 
'info' = 5
     -4> 2012-12-14 09:17:50.648672 7fb6e0d6b780 20 osd.3 0 get_map 3316 
- loading and decoding 0x2943e00
     -3> 2012-12-14 09:17:50.648678 7fb6e0d6b780 15 
filestore(/ceph/osd.3/) read meta/a09ec88/osdmap.3316/0//-1 0~0
     -2> 2012-12-14 09:17:50.648705 7fb6e0d6b780 10 
filestore(/ceph/osd.3/) error opening file 
/ceph/osd.3//current/meta/DIR_8/DIR_8/osdmap.3316__0_0A09EC88__none with 
flags=0 and mode=0: (2) No such file or directory
     -1> 2012-12-14 09:17:50.648722 7fb6e0d6b780 10 
filestore(/ceph/osd.3/) FileStore::read(meta/a09ec88/osdmap.3316/0//-1) 
open error: (2) No such file or directory
      0> 2012-12-14 09:17:50.649586 7fb6e0d6b780 -1 osd/OSD.cc: In 
function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fb6e0d6b780 
time 2012-12-14 09:17:50.648733
osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))

  ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
  1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
  2: (OSD::load_pgs()+0x13ed) [0x6168ad]
  3: (OSD::init()+0xaff) [0x617a5f]
  4: (main()+0x2de6) [0x55a416]
  5: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
  6: /usr/bin/ceph-osd() [0x557269]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- logging levels ---
    0/ 5 none
    0/ 0 lockdep
    0/ 0 context
    0/ 0 crush
    1/ 5 mds
    1/ 5 mds_balancer
    1/ 5 mds_locker
    1/ 5 mds_log
    1/ 5 mds_log_expire
    1/ 5 mds_migrator
    0/ 0 buffer
    0/ 0 timer
    0/ 1 filer
    0/ 1 striper
    0/ 1 objecter
    0/ 5 rados
    0/ 5 rbd
    0/20 journaler
    0/ 5 objectcacher
    0/ 5 client
    0/20 osd
    0/ 0 optracker
    0/ 0 objclass
    0/20 filestore
    0/20 journal
    0/ 0 ms
    1/ 5 mon
    0/ 0 monc
    0/ 5 paxos
    0/ 0 tp
    0/ 0 auth
    1/ 5 crypto
    0/ 0 finisher
    0/ 0 heartbeatmap
    0/ 0 perfcounter
    1/ 5 rgw
    1/ 5 hadoop
    1/ 5 javaclient
    0/ 0 asok
    0/ 0 throttle
   -2/-2 (syslog threshold)
   -1/-1 (stderr threshold)
   max_recent    100000
   max_new         1000
   log_file /var/log/ceph/ceph-osd.3.log
--- end dump of recent events ---
2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) **
  in thread 7fb6e0d6b780

  ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
  1: /usr/bin/ceph-osd() [0x7a1889]
  2: (()+0xeff0) [0x7fb6e0750ff0]
  3: (gsignal()+0x35) [0x7fb6deb1a1b5]
  4: (abort()+0x180) [0x7fb6deb1cfc0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
  6: (()+0xcb166) [0x7fb6df3ad166]
  7: (()+0xcb193) [0x7fb6df3ad193]
  8: (()+0xcb28e) [0x7fb6df3ad28e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x7c9) [0x805659]
  10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
  11: (OSD::load_pgs()+0x13ed) [0x6168ad]
  12: (OSD::init()+0xaff) [0x617a5f]
  13: (main()+0x2de6) [0x55a416]
  14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
  15: /usr/bin/ceph-osd() [0x557269]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- begin dump of recent events ---
      0> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal 
(Aborted) **
  in thread 7fb6e0d6b780

  ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
  1: /usr/bin/ceph-osd() [0x7a1889]
  2: (()+0xeff0) [0x7fb6e0750ff0]
  3: (gsignal()+0x35) [0x7fb6deb1a1b5]
  4: (abort()+0x180) [0x7fb6deb1cfc0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
  6: (()+0xcb166) [0x7fb6df3ad166]
  7: (()+0xcb193) [0x7fb6df3ad193]
  8: (()+0xcb28e) [0x7fb6df3ad28e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x7c9) [0x805659]
  10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
  11: (OSD::load_pgs()+0x13ed) [0x6168ad]
  12: (OSD::init()+0xaff) [0x617a5f]
  13: (main()+0x2de6) [0x55a416]
  14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
  15: /usr/bin/ceph-osd() [0x557269]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
needed to interpret this.

--- logging levels ---
    0/ 5 none
    0/ 0 lockdep
    0/ 0 context
    0/ 0 crush
    1/ 5 mds
    1/ 5 mds_balancer
    1/ 5 mds_locker
    1/ 5 mds_log
    1/ 5 mds_log_expire
    1/ 5 mds_migrator
    0/ 0 buffer
    0/ 0 timer
    0/ 1 filer
    0/ 1 striper
    0/ 1 objecter
    0/ 5 rados
    0/ 5 rbd
    0/20 journaler
    0/ 5 objectcacher
    0/ 5 client
    0/20 osd
    0/ 0 optracker
    0/ 0 objclass
    0/20 filestore
    0/20 journal
    0/ 0 ms
    1/ 5 mon
    0/ 0 monc
    0/ 5 paxos
    0/ 0 tp
    0/ 0 auth
    1/ 5 crypto
    0/ 0 finisher
    0/ 0 heartbeatmap
    0/ 0 perfcounter
    1/ 5 rgw
    1/ 5 hadoop
    1/ 5 javaclient
    0/ 0 asok
    0/ 0 throttle
   -2/-2 (syslog threshold)
   -1/-1 (stderr threshold)
   max_recent    100000
   max_new         1000
   log_file /var/log/ceph/ceph-osd.3.log
--- end dump of recent events ---

Stefan

Am 14.12.2012 09:12, schrieb Stefan Priebe:
> Hello list,
>
> after a reboot of my node i see this on all OSDs of this node after the
> reboot:
>
> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function
> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time
> 2012-12-14 09:03:20.392528
> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>
>   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
>   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
>   3: (OSD::init()+0xaff) [0x617a5f]
>   4: (main()+0x2de6) [0x55a416]
>   5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
>   6: /usr/bin/ceph-osd() [0x557269]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- begin dump of recent events ---
>     -29> 2012-12-14 09:03:20.266349 7f8e652f8780  5 asok(0x285c000)
> register_command perfcounters_dump hook 0x2850010
>     -28> 2012-12-14 09:03:20.266366 7f8e652f8780  5 asok(0x285c000)
> register_command 1 hook 0x2850010
>     -27> 2012-12-14 09:03:20.266369 7f8e652f8780  5 asok(0x285c000)
> register_command perf dump hook 0x2850010
>     -26> 2012-12-14 09:03:20.266379 7f8e652f8780  5 asok(0x285c000)
> register_command perfcounters_schema hook 0x2850010
>     -25> 2012-12-14 09:03:20.266383 7f8e652f8780  5 asok(0x285c000)
> register_command 2 hook 0x2850010
>     -24> 2012-12-14 09:03:20.266386 7f8e652f8780  5 asok(0x285c000)
> register_command perf schema hook 0x2850010
>     -23> 2012-12-14 09:03:20.266389 7f8e652f8780  5 asok(0x285c000)
> register_command config show hook 0x2850010
>     -22> 2012-12-14 09:03:20.266392 7f8e652f8780  5 asok(0x285c000)
> register_command config set hook 0x2850010
>     -21> 2012-12-14 09:03:20.266396 7f8e652f8780  5 asok(0x285c000)
> register_command log flush hook 0x2850010
>     -20> 2012-12-14 09:03:20.266398 7f8e652f8780  5 asok(0x285c000)
> register_command log dump hook 0x2850010
>     -19> 2012-12-14 09:03:20.266401 7f8e652f8780  5 asok(0x285c000)
> register_command log reopen hook 0x2850010
>     -18> 2012-12-14 09:03:20.267686 7f8e652f8780  0 ceph version
> 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process
> ceph-osd, pid 7212
>     -17> 2012-12-14 09:03:20.268738 7f8e652f8780  1 finished
> global_init_daemonize
>     -16> 2012-12-14 09:03:20.275957 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work
>     -15> 2012-12-14 09:03:20.275968 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
> fiemap' config option
>     -14> 2012-12-14 09:03:20.276177 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount did NOT detect btrfs
>     -13> 2012-12-14 09:03:20.277051 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
>     -12> 2012-12-14 09:03:20.277585 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount found snaps <>
>     -11> 2012-12-14 09:03:20.278899 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
> not detected
>     -10> 2012-12-14 09:03:20.290745 7f8e652f8780  0 journal  kernel
> version is 3.6.10
>      -9> 2012-12-14 09:03:20.320728 7f8e652f8780  0 journal  kernel
> version is 3.6.10
>      -8> 2012-12-14 09:03:20.328381 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to work
>      -7> 2012-12-14 09:03:20.328391 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
> fiemap' config option
>      -6> 2012-12-14 09:03:20.328574 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount did NOT detect btrfs
>      -5> 2012-12-14 09:03:20.329579 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
>      -4> 2012-12-14 09:03:20.329612 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount found snaps <>
>      -3> 2012-12-14 09:03:20.330786 7f8e652f8780  0
> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
> not detected
>      -2> 2012-12-14 09:03:20.340711 7f8e652f8780  0 journal  kernel
> version is 3.6.10
>      -1> 2012-12-14 09:03:20.370707 7f8e652f8780  0 journal  kernel
> version is 3.6.10
>       0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In
> function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780
> time 2012-12-14 09:03:20.392528
> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>
>   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
>   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
>   3: (OSD::init()+0xaff) [0x617a5f]
>   4: (main()+0x2de6) [0x55a416]
>   5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
>   6: /usr/bin/ceph-osd() [0x557269]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14  8:22 ` Stefan Priebe
@ 2012-12-14  9:14   ` Stefan Priebe
  2012-12-14 14:52     ` Dennis Jacobfeuerborn
  2012-12-14 19:42     ` Sage Weil
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Priebe @ 2012-12-14  9:14 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

One more IMPORTANT note. This might happen due to the fact that a disk 
was missing (disk failure) afte the reboot.

fstab and mountpoint are working with UUIDs so they match but the 
journal block device:
osd journal  = /dev/sde1

didn't match anymore - as the numbers got renumber due to the failed 
disk. Is there a way to use some kind of UUIDs here too for journal?

Stefan

Am 14.12.2012 09:22, schrieb Stefan Priebe:
> same log more verbose:
> 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] read_log done
>     -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] handle_loaded
>     -10> 2012-12-14 09:17:50.648581 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] exit Initial 0.015080 0 0.000000
>      -9> 2012-12-14 09:17:50.648591 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] enter Reset
>      -8> 2012-12-14 09:17:50.648599 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> inactive] set_last_peering_reset 3996
>      -7> 2012-12-14 09:17:50.648609 7fb6e0d6b780 10 osd.3 4233 load_pgs
> loaded pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11
> ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=3996 lcod 0'0 mlcod
> 0'0 inactive] log(1379'2968,3988'3969]
>      -6> 2012-12-14 09:17:50.648649 7fb6e0d6b780 15
> filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head
> 'info'
>      -5> 2012-12-14 09:17:50.648664 7fb6e0d6b780 10
> filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head
> 'info' = 5
>      -4> 2012-12-14 09:17:50.648672 7fb6e0d6b780 20 osd.3 0 get_map 3316
> - loading and decoding 0x2943e00
>      -3> 2012-12-14 09:17:50.648678 7fb6e0d6b780 15
> filestore(/ceph/osd.3/) read meta/a09ec88/osdmap.3316/0//-1 0~0
>      -2> 2012-12-14 09:17:50.648705 7fb6e0d6b780 10
> filestore(/ceph/osd.3/) error opening file
> /ceph/osd.3//current/meta/DIR_8/DIR_8/osdmap.3316__0_0A09EC88__none with
> flags=0 and mode=0: (2) No such file or directory
>      -1> 2012-12-14 09:17:50.648722 7fb6e0d6b780 10
> filestore(/ceph/osd.3/) FileStore::read(meta/a09ec88/osdmap.3316/0//-1)
> open error: (2) No such file or directory
>       0> 2012-12-14 09:17:50.649586 7fb6e0d6b780 -1 osd/OSD.cc: In
> function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fb6e0d6b780
> time 2012-12-14 09:17:50.648733
> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>
>   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
>   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
>   3: (OSD::init()+0xaff) [0x617a5f]
>   4: (main()+0x2de6) [0x55a416]
>   5: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
>   6: /usr/bin/ceph-osd() [0x557269]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>     0/ 5 none
>     0/ 0 lockdep
>     0/ 0 context
>     0/ 0 crush
>     1/ 5 mds
>     1/ 5 mds_balancer
>     1/ 5 mds_locker
>     1/ 5 mds_log
>     1/ 5 mds_log_expire
>     1/ 5 mds_migrator
>     0/ 0 buffer
>     0/ 0 timer
>     0/ 1 filer
>     0/ 1 striper
>     0/ 1 objecter
>     0/ 5 rados
>     0/ 5 rbd
>     0/20 journaler
>     0/ 5 objectcacher
>     0/ 5 client
>     0/20 osd
>     0/ 0 optracker
>     0/ 0 objclass
>     0/20 filestore
>     0/20 journal
>     0/ 0 ms
>     1/ 5 mon
>     0/ 0 monc
>     0/ 5 paxos
>     0/ 0 tp
>     0/ 0 auth
>     1/ 5 crypto
>     0/ 0 finisher
>     0/ 0 heartbeatmap
>     0/ 0 perfcounter
>     1/ 5 rgw
>     1/ 5 hadoop
>     1/ 5 javaclient
>     0/ 0 asok
>     0/ 0 throttle
>    -2/-2 (syslog threshold)
>    -1/-1 (stderr threshold)
>    max_recent    100000
>    max_new         1000
>    log_file /var/log/ceph/ceph-osd.3.log
> --- end dump of recent events ---
> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) **
>   in thread 7fb6e0d6b780
>
>   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
>   1: /usr/bin/ceph-osd() [0x7a1889]
>   2: (()+0xeff0) [0x7fb6e0750ff0]
>   3: (gsignal()+0x35) [0x7fb6deb1a1b5]
>   4: (abort()+0x180) [0x7fb6deb1cfc0]
>   5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
>   6: (()+0xcb166) [0x7fb6df3ad166]
>   7: (()+0xcb193) [0x7fb6df3ad193]
>   8: (()+0xcb28e) [0x7fb6df3ad28e]
>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7c9) [0x805659]
>   10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>   11: (OSD::load_pgs()+0x13ed) [0x6168ad]
>   12: (OSD::init()+0xaff) [0x617a5f]
>   13: (main()+0x2de6) [0x55a416]
>   14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
>   15: /usr/bin/ceph-osd() [0x557269]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- begin dump of recent events ---
>       0> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal
> (Aborted) **
>   in thread 7fb6e0d6b780
>
>   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
>   1: /usr/bin/ceph-osd() [0x7a1889]
>   2: (()+0xeff0) [0x7fb6e0750ff0]
>   3: (gsignal()+0x35) [0x7fb6deb1a1b5]
>   4: (abort()+0x180) [0x7fb6deb1cfc0]
>   5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
>   6: (()+0xcb166) [0x7fb6df3ad166]
>   7: (()+0xcb193) [0x7fb6df3ad193]
>   8: (()+0xcb28e) [0x7fb6df3ad28e]
>   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7c9) [0x805659]
>   10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>   11: (OSD::load_pgs()+0x13ed) [0x6168ad]
>   12: (OSD::init()+0xaff) [0x617a5f]
>   13: (main()+0x2de6) [0x55a416]
>   14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
>   15: /usr/bin/ceph-osd() [0x557269]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>     0/ 5 none
>     0/ 0 lockdep
>     0/ 0 context
>     0/ 0 crush
>     1/ 5 mds
>     1/ 5 mds_balancer
>     1/ 5 mds_locker
>     1/ 5 mds_log
>     1/ 5 mds_log_expire
>     1/ 5 mds_migrator
>     0/ 0 buffer
>     0/ 0 timer
>     0/ 1 filer
>     0/ 1 striper
>     0/ 1 objecter
>     0/ 5 rados
>     0/ 5 rbd
>     0/20 journaler
>     0/ 5 objectcacher
>     0/ 5 client
>     0/20 osd
>     0/ 0 optracker
>     0/ 0 objclass
>     0/20 filestore
>     0/20 journal
>     0/ 0 ms
>     1/ 5 mon
>     0/ 0 monc
>     0/ 5 paxos
>     0/ 0 tp
>     0/ 0 auth
>     1/ 5 crypto
>     0/ 0 finisher
>     0/ 0 heartbeatmap
>     0/ 0 perfcounter
>     1/ 5 rgw
>     1/ 5 hadoop
>     1/ 5 javaclient
>     0/ 0 asok
>     0/ 0 throttle
>    -2/-2 (syslog threshold)
>    -1/-1 (stderr threshold)
>    max_recent    100000
>    max_new         1000
>    log_file /var/log/ceph/ceph-osd.3.log
> --- end dump of recent events ---
>
> Stefan
>
> Am 14.12.2012 09:12, schrieb Stefan Priebe:
>> Hello list,
>>
>> after a reboot of my node i see this on all OSDs of this node after the
>> reboot:
>>
>> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function
>> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time
>> 2012-12-14 09:03:20.392528
>> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>>
>>   ceph version 0.55-239-gc951c27
>> (c951c270a42b94b6f269992c9001d90f70a2b824)
>>   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>>   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
>>   3: (OSD::init()+0xaff) [0x617a5f]
>>   4: (main()+0x2de6) [0x55a416]
>>   5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
>>   6: /usr/bin/ceph-osd() [0x557269]
>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- begin dump of recent events ---
>>     -29> 2012-12-14 09:03:20.266349 7f8e652f8780  5 asok(0x285c000)
>> register_command perfcounters_dump hook 0x2850010
>>     -28> 2012-12-14 09:03:20.266366 7f8e652f8780  5 asok(0x285c000)
>> register_command 1 hook 0x2850010
>>     -27> 2012-12-14 09:03:20.266369 7f8e652f8780  5 asok(0x285c000)
>> register_command perf dump hook 0x2850010
>>     -26> 2012-12-14 09:03:20.266379 7f8e652f8780  5 asok(0x285c000)
>> register_command perfcounters_schema hook 0x2850010
>>     -25> 2012-12-14 09:03:20.266383 7f8e652f8780  5 asok(0x285c000)
>> register_command 2 hook 0x2850010
>>     -24> 2012-12-14 09:03:20.266386 7f8e652f8780  5 asok(0x285c000)
>> register_command perf schema hook 0x2850010
>>     -23> 2012-12-14 09:03:20.266389 7f8e652f8780  5 asok(0x285c000)
>> register_command config show hook 0x2850010
>>     -22> 2012-12-14 09:03:20.266392 7f8e652f8780  5 asok(0x285c000)
>> register_command config set hook 0x2850010
>>     -21> 2012-12-14 09:03:20.266396 7f8e652f8780  5 asok(0x285c000)
>> register_command log flush hook 0x2850010
>>     -20> 2012-12-14 09:03:20.266398 7f8e652f8780  5 asok(0x285c000)
>> register_command log dump hook 0x2850010
>>     -19> 2012-12-14 09:03:20.266401 7f8e652f8780  5 asok(0x285c000)
>> register_command log reopen hook 0x2850010
>>     -18> 2012-12-14 09:03:20.267686 7f8e652f8780  0 ceph version
>> 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process
>> ceph-osd, pid 7212
>>     -17> 2012-12-14 09:03:20.268738 7f8e652f8780  1 finished
>> global_init_daemonize
>>     -16> 2012-12-14 09:03:20.275957 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to
>> work
>>     -15> 2012-12-14 09:03:20.275968 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
>> fiemap' config option
>>     -14> 2012-12-14 09:03:20.276177 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount did NOT detect btrfs
>>     -13> 2012-12-14 09:03:20.277051 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
>>     -12> 2012-12-14 09:03:20.277585 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount found snaps <>
>>     -11> 2012-12-14 09:03:20.278899 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
>> not detected
>>     -10> 2012-12-14 09:03:20.290745 7f8e652f8780  0 journal  kernel
>> version is 3.6.10
>>      -9> 2012-12-14 09:03:20.320728 7f8e652f8780  0 journal  kernel
>> version is 3.6.10
>>      -8> 2012-12-14 09:03:20.328381 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to
>> work
>>      -7> 2012-12-14 09:03:20.328391 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
>> fiemap' config option
>>      -6> 2012-12-14 09:03:20.328574 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount did NOT detect btrfs
>>      -5> 2012-12-14 09:03:20.329579 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
>>      -4> 2012-12-14 09:03:20.329612 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount found snaps <>
>>      -3> 2012-12-14 09:03:20.330786 7f8e652f8780  0
>> filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
>> not detected
>>      -2> 2012-12-14 09:03:20.340711 7f8e652f8780  0 journal  kernel
>> version is 3.6.10
>>      -1> 2012-12-14 09:03:20.370707 7f8e652f8780  0 journal  kernel
>> version is 3.6.10
>>       0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In
>> function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780
>> time 2012-12-14 09:03:20.392528
>> osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
>>
>>   ceph version 0.55-239-gc951c27
>> (c951c270a42b94b6f269992c9001d90f70a2b824)
>>   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
>>   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
>>   3: (OSD::init()+0xaff) [0x617a5f]
>>   4: (main()+0x2de6) [0x55a416]
>>   5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
>>   6: /usr/bin/ceph-osd() [0x557269]
>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14  9:14   ` Stefan Priebe
@ 2012-12-14 14:52     ` Dennis Jacobfeuerborn
  2012-12-14 15:01       ` Mark Nelson
  2012-12-14 15:02       ` Stefan Priebe - Profihost AG
  2012-12-14 19:42     ` Sage Weil
  1 sibling, 2 replies; 12+ messages in thread
From: Dennis Jacobfeuerborn @ 2012-12-14 14:52 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org

On 12/14/2012 10:14 AM, Stefan Priebe wrote:
> One more IMPORTANT note. This might happen due to the fact that a disk was
> missing (disk failure) afte the reboot.
> 
> fstab and mountpoint are working with UUIDs so they match but the journal
> block device:
> osd journal  = /dev/sde1
> 
> didn't match anymore - as the numbers got renumber due to the failed disk.
> Is there a way to use some kind of UUIDs here too for journal?

You should be able to use /dev/disk/by-uuid/* instead. That should give you
a stable view of the filesystems.

Regards,
  Dennis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14 14:52     ` Dennis Jacobfeuerborn
@ 2012-12-14 15:01       ` Mark Nelson
  2012-12-14 15:11         ` Stefan Priebe - Profihost AG
  2012-12-14 15:02       ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 12+ messages in thread
From: Mark Nelson @ 2012-12-14 15:01 UTC (permalink / raw)
  To: Dennis Jacobfeuerborn; +Cc: Stefan Priebe, ceph-devel@vger.kernel.org

On 12/14/2012 08:52 AM, Dennis Jacobfeuerborn wrote:
> On 12/14/2012 10:14 AM, Stefan Priebe wrote:
>> One more IMPORTANT note. This might happen due to the fact that a disk was
>> missing (disk failure) afte the reboot.
>>
>> fstab and mountpoint are working with UUIDs so they match but the journal
>> block device:
>> osd journal  = /dev/sde1
>>
>> didn't match anymore - as the numbers got renumber due to the failed disk.
>> Is there a way to use some kind of UUIDs here too for journal?
>
> You should be able to use /dev/disk/by-uuid/* instead. That should give you
> a stable view of the filesystems.

I often map partitions to something in /dev/disk/by-partlabel and use 
those in my ceph.conf files.  that way disks can be remapped behind the 
scenes and the ceph configuration doesn't have to change even if disks 
get replaced.

>
> Regards,
>    Dennis
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14 14:52     ` Dennis Jacobfeuerborn
  2012-12-14 15:01       ` Mark Nelson
@ 2012-12-14 15:02       ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-14 15:02 UTC (permalink / raw)
  To: Dennis Jacobfeuerborn; +Cc: ceph-devel@vger.kernel.org

Hello Dennis,

Am 14.12.2012 15:52, schrieb Dennis Jacobfeuerborn:
>> didn't match anymore - as the numbers got renumber due to the failed disk.
>> Is there a way to use some kind of UUIDs here too for journal?
>
> You should be able to use /dev/disk/by-uuid/* instead. That should give you
> a stable view of the filesystems.

Good idea but there are only listed partitions with UUIDs. When the 
journal is using directly the partition it does not have a UUID.

But this reminded me of /dev/disk/by-id and that works fine. I'm now 
using the wwn Number.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14 15:01       ` Mark Nelson
@ 2012-12-14 15:11         ` Stefan Priebe - Profihost AG
  2012-12-14 15:20           ` Mark Nelson
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-14 15:11 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org

Hi Mark,

but do i set a label for a partition without FS like the journal blockdev?
Am 14.12.2012 16:01, schrieb Mark Nelson:
> I often map partitions to something in /dev/disk/by-partlabel and use
> those in my ceph.conf files.  that way disks can be remapped behind the
> scenes and the ceph configuration doesn't have to change even if disks
> get replaced.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14 15:11         ` Stefan Priebe - Profihost AG
@ 2012-12-14 15:20           ` Mark Nelson
  2012-12-14 15:25             ` Stefan Priebe - Profihost AG
  2012-12-14 16:06             ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 12+ messages in thread
From: Mark Nelson @ 2012-12-14 15:20 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org

Hi Stefan,

Here's what I often do when I have a journal and data partition sharing 
a disk:

sudo parted -s -a optimal /dev/$DEV mklabel gpt
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G
sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100%

Mark

On 12/14/2012 09:11 AM, Stefan Priebe - Profihost AG wrote:
> Hi Mark,
>
> but do i set a label for a partition without FS like the journal blockdev?
> Am 14.12.2012 16:01, schrieb Mark Nelson:
>> I often map partitions to something in /dev/disk/by-partlabel and use
>> those in my ceph.conf files.  that way disks can be remapped behind the
>> scenes and the ceph configuration doesn't have to change even if disks
>> get replaced.
>
> Greets,
> Stefan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14 15:20           ` Mark Nelson
@ 2012-12-14 15:25             ` Stefan Priebe - Profihost AG
  2012-12-14 16:06             ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-14 15:25 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org

Hi Mark,

Am 14.12.2012 16:20, schrieb Mark Nelson:
> sudo parted -s -a optimal /dev/$DEV mklabel gpt
> sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G
> sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100%

My disks are gpt too and i'm also using parted. But i don't want to 
recreate my partitions. I haven't seen a way in parted to set such a 
label later.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14 15:20           ` Mark Nelson
  2012-12-14 15:25             ` Stefan Priebe - Profihost AG
@ 2012-12-14 16:06             ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-14 16:06 UTC (permalink / raw)
  To: Mark Nelson; +Cc: Dennis Jacobfeuerborn, ceph-devel@vger.kernel.org

Hello Mark,

Am 14.12.2012 16:20, schrieb Mark Nelson:
> sudo parted -s -a optimal /dev/$DEV mklabel gpt
> sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G
> sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100%

Isn't that the part type you're using?
mkpart part-type start-mb end-mb

I like your idea and i think it's a good one but i want to know why this 
works. part-type isn't FS label...

Greets,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14  9:14   ` Stefan Priebe
  2012-12-14 14:52     ` Dennis Jacobfeuerborn
@ 2012-12-14 19:42     ` Sage Weil
  2012-12-14 19:47       ` Stefan Priebe
  1 sibling, 1 reply; 12+ messages in thread
From: Sage Weil @ 2012-12-14 19:42 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org

On Fri, 14 Dec 2012, Stefan Priebe wrote:
> One more IMPORTANT note. This might happen due to the fact that a disk was
> missing (disk failure) afte the reboot.
> 
> fstab and mountpoint are working with UUIDs so they match but the journal
> block device:
> osd journal  = /dev/sde1
> 
> didn't match anymore - as the numbers got renumber due to the failed disk. Is
> there a way to use some kind of UUIDs here too for journal?

I think others have addressed the uuid question, but one note:

The ceph-osd process has an internal uuid/fingerprint on the journal and 
data dir, and will refuse to start if they don't match.

sage


> 
> Stefan
> 
> Am 14.12.2012 09:22, schrieb Stefan Priebe:
> > same log more verbose:
> > 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0
> > inactive] read_log done
> >     -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996
> > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> > inactive] handle_loaded
> >     -10> 2012-12-14 09:17:50.648581 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> > inactive] exit Initial 0.015080 0 0.000000
> >      -9> 2012-12-14 09:17:50.648591 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> > inactive] enter Reset
> >      -8> 2012-12-14 09:17:50.648599 7fb6e0d6b780 20 osd.3 pg_epoch: 3996
> > pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10
> > les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=0 lcod 0'0 mlcod 0'0
> > inactive] set_last_peering_reset 3996
> >      -7> 2012-12-14 09:17:50.648609 7fb6e0d6b780 10 osd.3 4233 load_pgs
> > loaded pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11
> > ec=10 les/c 3307/3307 3306/3306/3306) [3,12] r=0 lpr=3996 lcod 0'0 mlcod
> > 0'0 inactive] log(1379'2968,3988'3969]
> >      -6> 2012-12-14 09:17:50.648649 7fb6e0d6b780 15
> > filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head
> > 'info'
> >      -5> 2012-12-14 09:17:50.648664 7fb6e0d6b780 10
> > filestore(/ceph/osd.3/) collection_getattr /ceph/osd.3//current/0.1_head
> > 'info' = 5
> >      -4> 2012-12-14 09:17:50.648672 7fb6e0d6b780 20 osd.3 0 get_map 3316
> > - loading and decoding 0x2943e00
> >      -3> 2012-12-14 09:17:50.648678 7fb6e0d6b780 15
> > filestore(/ceph/osd.3/) read meta/a09ec88/osdmap.3316/0//-1 0~0
> >      -2> 2012-12-14 09:17:50.648705 7fb6e0d6b780 10
> > filestore(/ceph/osd.3/) error opening file
> > /ceph/osd.3//current/meta/DIR_8/DIR_8/osdmap.3316__0_0A09EC88__none with
> > flags=0 and mode=0: (2) No such file or directory
> >      -1> 2012-12-14 09:17:50.648722 7fb6e0d6b780 10
> > filestore(/ceph/osd.3/) FileStore::read(meta/a09ec88/osdmap.3316/0//-1)
> > open error: (2) No such file or directory
> >       0> 2012-12-14 09:17:50.649586 7fb6e0d6b780 -1 osd/OSD.cc: In
> > function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fb6e0d6b780
> > time 2012-12-14 09:17:50.648733
> > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
> > 
> >   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
> >   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> >   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
> >   3: (OSD::init()+0xaff) [0x617a5f]
> >   4: (main()+0x2de6) [0x55a416]
> >   5: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
> >   6: /usr/bin/ceph-osd() [0x557269]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> > --- logging levels ---
> >     0/ 5 none
> >     0/ 0 lockdep
> >     0/ 0 context
> >     0/ 0 crush
> >     1/ 5 mds
> >     1/ 5 mds_balancer
> >     1/ 5 mds_locker
> >     1/ 5 mds_log
> >     1/ 5 mds_log_expire
> >     1/ 5 mds_migrator
> >     0/ 0 buffer
> >     0/ 0 timer
> >     0/ 1 filer
> >     0/ 1 striper
> >     0/ 1 objecter
> >     0/ 5 rados
> >     0/ 5 rbd
> >     0/20 journaler
> >     0/ 5 objectcacher
> >     0/ 5 client
> >     0/20 osd
> >     0/ 0 optracker
> >     0/ 0 objclass
> >     0/20 filestore
> >     0/20 journal
> >     0/ 0 ms
> >     1/ 5 mon
> >     0/ 0 monc
> >     0/ 5 paxos
> >     0/ 0 tp
> >     0/ 0 auth
> >     1/ 5 crypto
> >     0/ 0 finisher
> >     0/ 0 heartbeatmap
> >     0/ 0 perfcounter
> >     1/ 5 rgw
> >     1/ 5 hadoop
> >     1/ 5 javaclient
> >     0/ 0 asok
> >     0/ 0 throttle
> >    -2/-2 (syslog threshold)
> >    -1/-1 (stderr threshold)
> >    max_recent    100000
> >    max_new         1000
> >    log_file /var/log/ceph/ceph-osd.3.log
> > --- end dump of recent events ---
> > 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal (Aborted) **
> >   in thread 7fb6e0d6b780
> > 
> >   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
> >   1: /usr/bin/ceph-osd() [0x7a1889]
> >   2: (()+0xeff0) [0x7fb6e0750ff0]
> >   3: (gsignal()+0x35) [0x7fb6deb1a1b5]
> >   4: (abort()+0x180) [0x7fb6deb1cfc0]
> >   5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
> >   6: (()+0xcb166) [0x7fb6df3ad166]
> >   7: (()+0xcb193) [0x7fb6df3ad193]
> >   8: (()+0xcb28e) [0x7fb6df3ad28e]
> >   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x7c9) [0x805659]
> >   10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> >   11: (OSD::load_pgs()+0x13ed) [0x6168ad]
> >   12: (OSD::init()+0xaff) [0x617a5f]
> >   13: (main()+0x2de6) [0x55a416]
> >   14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
> >   15: /usr/bin/ceph-osd() [0x557269]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> > --- begin dump of recent events ---
> >       0> 2012-12-14 09:17:50.714676 7fb6e0d6b780 -1 *** Caught signal
> > (Aborted) **
> >   in thread 7fb6e0d6b780
> > 
> >   ceph version 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824)
> >   1: /usr/bin/ceph-osd() [0x7a1889]
> >   2: (()+0xeff0) [0x7fb6e0750ff0]
> >   3: (gsignal()+0x35) [0x7fb6deb1a1b5]
> >   4: (abort()+0x180) [0x7fb6deb1cfc0]
> >   5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb6df3aedc5]
> >   6: (()+0xcb166) [0x7fb6df3ad166]
> >   7: (()+0xcb193) [0x7fb6df3ad193]
> >   8: (()+0xcb28e) [0x7fb6df3ad28e]
> >   9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x7c9) [0x805659]
> >   10: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> >   11: (OSD::load_pgs()+0x13ed) [0x6168ad]
> >   12: (OSD::init()+0xaff) [0x617a5f]
> >   13: (main()+0x2de6) [0x55a416]
> >   14: (__libc_start_main()+0xfd) [0x7fb6deb06c8d]
> >   15: /usr/bin/ceph-osd() [0x557269]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> > 
> > --- logging levels ---
> >     0/ 5 none
> >     0/ 0 lockdep
> >     0/ 0 context
> >     0/ 0 crush
> >     1/ 5 mds
> >     1/ 5 mds_balancer
> >     1/ 5 mds_locker
> >     1/ 5 mds_log
> >     1/ 5 mds_log_expire
> >     1/ 5 mds_migrator
> >     0/ 0 buffer
> >     0/ 0 timer
> >     0/ 1 filer
> >     0/ 1 striper
> >     0/ 1 objecter
> >     0/ 5 rados
> >     0/ 5 rbd
> >     0/20 journaler
> >     0/ 5 objectcacher
> >     0/ 5 client
> >     0/20 osd
> >     0/ 0 optracker
> >     0/ 0 objclass
> >     0/20 filestore
> >     0/20 journal
> >     0/ 0 ms
> >     1/ 5 mon
> >     0/ 0 monc
> >     0/ 5 paxos
> >     0/ 0 tp
> >     0/ 0 auth
> >     1/ 5 crypto
> >     0/ 0 finisher
> >     0/ 0 heartbeatmap
> >     0/ 0 perfcounter
> >     1/ 5 rgw
> >     1/ 5 hadoop
> >     1/ 5 javaclient
> >     0/ 0 asok
> >     0/ 0 throttle
> >    -2/-2 (syslog threshold)
> >    -1/-1 (stderr threshold)
> >    max_recent    100000
> >    max_new         1000
> >    log_file /var/log/ceph/ceph-osd.3.log
> > --- end dump of recent events ---
> > 
> > Stefan
> > 
> > Am 14.12.2012 09:12, schrieb Stefan Priebe:
> > > Hello list,
> > > 
> > > after a reboot of my node i see this on all OSDs of this node after the
> > > reboot:
> > > 
> > > 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function
> > > 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time
> > > 2012-12-14 09:03:20.392528
> > > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
> > > 
> > >   ceph version 0.55-239-gc951c27
> > > (c951c270a42b94b6f269992c9001d90f70a2b824)
> > >   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> > >   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
> > >   3: (OSD::init()+0xaff) [0x617a5f]
> > >   4: (main()+0x2de6) [0x55a416]
> > >   5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
> > >   6: /usr/bin/ceph-osd() [0x557269]
> > >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > > needed to interpret this.
> > > 
> > > --- begin dump of recent events ---
> > >     -29> 2012-12-14 09:03:20.266349 7f8e652f8780  5 asok(0x285c000)
> > > register_command perfcounters_dump hook 0x2850010
> > >     -28> 2012-12-14 09:03:20.266366 7f8e652f8780  5 asok(0x285c000)
> > > register_command 1 hook 0x2850010
> > >     -27> 2012-12-14 09:03:20.266369 7f8e652f8780  5 asok(0x285c000)
> > > register_command perf dump hook 0x2850010
> > >     -26> 2012-12-14 09:03:20.266379 7f8e652f8780  5 asok(0x285c000)
> > > register_command perfcounters_schema hook 0x2850010
> > >     -25> 2012-12-14 09:03:20.266383 7f8e652f8780  5 asok(0x285c000)
> > > register_command 2 hook 0x2850010
> > >     -24> 2012-12-14 09:03:20.266386 7f8e652f8780  5 asok(0x285c000)
> > > register_command perf schema hook 0x2850010
> > >     -23> 2012-12-14 09:03:20.266389 7f8e652f8780  5 asok(0x285c000)
> > > register_command config show hook 0x2850010
> > >     -22> 2012-12-14 09:03:20.266392 7f8e652f8780  5 asok(0x285c000)
> > > register_command config set hook 0x2850010
> > >     -21> 2012-12-14 09:03:20.266396 7f8e652f8780  5 asok(0x285c000)
> > > register_command log flush hook 0x2850010
> > >     -20> 2012-12-14 09:03:20.266398 7f8e652f8780  5 asok(0x285c000)
> > > register_command log dump hook 0x2850010
> > >     -19> 2012-12-14 09:03:20.266401 7f8e652f8780  5 asok(0x285c000)
> > > register_command log reopen hook 0x2850010
> > >     -18> 2012-12-14 09:03:20.267686 7f8e652f8780  0 ceph version
> > > 0.55-239-gc951c27 (c951c270a42b94b6f269992c9001d90f70a2b824), process
> > > ceph-osd, pid 7212
> > >     -17> 2012-12-14 09:03:20.268738 7f8e652f8780  1 finished
> > > global_init_daemonize
> > >     -16> 2012-12-14 09:03:20.275957 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to
> > > work
> > >     -15> 2012-12-14 09:03:20.275968 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
> > > fiemap' config option
> > >     -14> 2012-12-14 09:03:20.276177 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount did NOT detect btrfs
> > >     -13> 2012-12-14 09:03:20.277051 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
> > >     -12> 2012-12-14 09:03:20.277585 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount found snaps <>
> > >     -11> 2012-12-14 09:03:20.278899 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
> > > not detected
> > >     -10> 2012-12-14 09:03:20.290745 7f8e652f8780  0 journal  kernel
> > > version is 3.6.10
> > >      -9> 2012-12-14 09:03:20.320728 7f8e652f8780  0 journal  kernel
> > > version is 3.6.10
> > >      -8> 2012-12-14 09:03:20.328381 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is supported and appears to
> > > work
> > >      -7> 2012-12-14 09:03:20.328391 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount FIEMAP ioctl is disabled via 'filestore
> > > fiemap' config option
> > >      -6> 2012-12-14 09:03:20.328574 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount did NOT detect btrfs
> > >      -5> 2012-12-14 09:03:20.329579 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount syscall(__NR_syncfs, fd) fully supported
> > >      -4> 2012-12-14 09:03:20.329612 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount found snaps <>
> > >      -3> 2012-12-14 09:03:20.330786 7f8e652f8780  0
> > > filestore(/ceph/osd.1/) mount: enabling WRITEAHEAD journal mode: btrfs
> > > not detected
> > >      -2> 2012-12-14 09:03:20.340711 7f8e652f8780  0 journal  kernel
> > > version is 3.6.10
> > >      -1> 2012-12-14 09:03:20.370707 7f8e652f8780  0 journal  kernel
> > > version is 3.6.10
> > >       0> 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In
> > > function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780
> > > time 2012-12-14 09:03:20.392528
> > > osd/OSD.cc: 4385: FAILED assert(_get_map_bl(epoch, bl))
> > > 
> > >   ceph version 0.55-239-gc951c27
> > > (c951c270a42b94b6f269992c9001d90f70a2b824)
> > >   1: (OSDService::get_map(unsigned int)+0x918) [0x607f78]
> > >   2: (OSD::load_pgs()+0x13ed) [0x6168ad]
> > >   3: (OSD::init()+0xaff) [0x617a5f]
> > >   4: (main()+0x2de6) [0x55a416]
> > >   5: (__libc_start_main()+0xfd) [0x7f8e63093c8d]
> > >   6: /usr/bin/ceph-osd() [0x557269]
> > >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > > needed to interpret this.
> > > 
> > > Stefan
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: osd crash after reboot
  2012-12-14 19:42     ` Sage Weil
@ 2012-12-14 19:47       ` Stefan Priebe
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Priebe @ 2012-12-14 19:47 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel@vger.kernel.org

Hi Sage,

this was just an idea and i need to fix MY uuid problem. But then the 
crash is still a problem of ceph. Have you looked into my log?
Am 14.12.2012 20:42, schrieb Sage Weil:
> On Fri, 14 Dec 2012, Stefan Priebe wrote:
>> One more IMPORTANT note. This might happen due to the fact that a disk was
>> missing (disk failure) afte the reboot.
>>
>> fstab and mountpoint are working with UUIDs so they match but the journal
>> block device:
>> osd journal  = /dev/sde1
>>
>> didn't match anymore - as the numbers got renumber due to the failed disk. Is
>> there a way to use some kind of UUIDs here too for journal?
>
> I think others have addressed the uuid question, but one note:
>
> The ceph-osd process has an internal uuid/fingerprint on the journal and
> data dir, and will refuse to start if they don't match.

Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-12-14 19:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-14  8:12 osd crash after reboot Stefan Priebe
2012-12-14  8:22 ` Stefan Priebe
2012-12-14  9:14   ` Stefan Priebe
2012-12-14 14:52     ` Dennis Jacobfeuerborn
2012-12-14 15:01       ` Mark Nelson
2012-12-14 15:11         ` Stefan Priebe - Profihost AG
2012-12-14 15:20           ` Mark Nelson
2012-12-14 15:25             ` Stefan Priebe - Profihost AG
2012-12-14 16:06             ` Stefan Priebe - Profihost AG
2012-12-14 15:02       ` Stefan Priebe - Profihost AG
2012-12-14 19:42     ` Sage Weil
2012-12-14 19:47       ` Stefan Priebe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.