All of lore.kernel.org
 help / color / mirror / Atom feed
* 0.37 crash
@ 2011-10-20 15:39 Martin Mailand
  2011-10-20 18:25 ` Stefan Kleijkers
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Mailand @ 2011-10-20 15:39 UTC (permalink / raw)
  To: ceph-devel

Hi,
today I tried the version 0.37 and it did not work very well, see below.
It was an update from 0.36.

Best Regards,
  Martin


2011-10-20 17:33:34.350502 7f0ada6f4760 ceph version 0.37 
(commit:a6f3bbb744a6faea95ae48317f0b838edb16a896), process ceph-osd, pid 
21707
2011-10-20 17:33:34.353543 7f0ada6f4760 filestore(/data/osd2) mount 
FIEMAP ioctl is NOT supported
2011-10-20 17:33:34.353628 7f0ada6f4760 filestore(/data/osd2) mount 
detected btrfs
2011-10-20 17:33:34.353656 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs CLONE_RANGE ioctl is supported
2011-10-20 17:33:34.425059 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE is supported
2011-10-20 17:33:34.544564 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_DESTROY is supported
2011-10-20 17:33:34.544873 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC got 0 Success
2011-10-20 17:33:34.544966 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC is supported (transid 149)
2011-10-20 17:33:34.624965 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs WAIT_SYNC is supported
2011-10-20 17:33:34.636719 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 got 0 Success
2011-10-20 17:33:34.636754 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 is supported
2011-10-20 17:33:34.644876 7f0ada6f4760 filestore(/data/osd2) mount 
found snaps <>
2011-10-20 17:33:34.644983 7f0ada6f4760 filestore(/data/osd2) mount: 
enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not enabled
2011-10-20 17:33:34.678324 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:34.678737 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
2011-10-20 17:33:34.688215 7f0ada6f4760 journal read_entry 39366656 : 
seq 4653 710 bytes
2011-10-20 17:33:34.688420 7f0ada6f4760 journal read_entry 39374848 : 
seq 4654 33 bytes
2011-10-20 17:33:34.695110 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:34.695496 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
2011-10-20 17:33:34.696359 7f0ada6f4760 FileStore is up to date.
2011-10-20 17:33:34.696683 7f0ada6f4760 journal close /dev/sda7
2011-10-20 17:33:34.697970 7f0ada6f4760 filestore(/data/osd2) mount 
FIEMAP ioctl is NOT supported
2011-10-20 17:33:34.698013 7f0ada6f4760 filestore(/data/osd2) mount 
detected btrfs
2011-10-20 17:33:34.698031 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs CLONE_RANGE ioctl is supported
2011-10-20 17:33:34.774980 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE is supported
2011-10-20 17:33:34.904538 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_DESTROY is supported
2011-10-20 17:33:34.904945 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC got 0 Success
2011-10-20 17:33:34.904995 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs START_SYNC is supported (transid 152)
2011-10-20 17:33:34.991585 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs WAIT_SYNC is supported
2011-10-20 17:33:34.996636 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 got 0 Success
2011-10-20 17:33:34.996664 7f0ada6f4760 filestore(/data/osd2) mount 
btrfs SNAP_CREATE_V2 is supported
2011-10-20 17:33:35.004813 7f0ada6f4760 filestore(/data/osd2) mount 
found snaps <>
2011-10-20 17:33:35.004902 7f0ada6f4760 filestore(/data/osd2) mount: 
enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not enabled
2011-10-20 17:33:35.023071 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:35.023353 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
2011-10-20 17:33:35.029846 7f0ada6f4760 journal read_entry 39366656 : 
seq 4653 710 bytes
2011-10-20 17:33:35.030077 7f0ada6f4760 journal read_entry 39374848 : 
seq 4654 33 bytes
2011-10-20 17:33:35.036728 7f0ada6f4760 journal  kernel version is 3.1.0
2011-10-20 17:33:35.037142 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
476500201472 bytes, block size 4096 bytes, directio = 1
*** Caught signal (Aborted) **
  in thread 0x7f0ace7f9700
  ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
  1: /usr/bin/ceph-osd() [0x5bd012]
  2: (()+0xfc60) [0x7f0ada2d4c60]
  3: (gsignal()+0x35) [0x7f0ad8a5ad05]
  4: (abort()+0x186) [0x7f0ad8a5eab6]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0ad93116dd]
  6: (()+0xb9926) [0x7f0ad930f926]
  7: (()+0xb9953) [0x7f0ad930f953]
  8: (()+0xb9a5e) [0x7f0ad930fa5e]
  9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x129) 
[0x5a7e99]
  10: (OSDMap::decode(ceph::buffer::list&)+0x81) [0x58f9f1]
  11: (OSD::get_map(unsigned int)+0x242) [0x53f6d2]
  12: (OSD::handle_osd_map(MOSDMap*)+0x1f82) [0x56ae72]
  13: (OSD::_dispatch(Message*)+0x36b) [0x56d11b]
  14: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
  15: (SimpleMessenger::dispatch_entry()+0x88b) [0x5fff2b]
  16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
  17: (()+0x6d8c) [0x7f0ada2cbd8c]
  18: (clone()+0x6d) [0x7f0ad8b0d04d]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 0.37 crash
  2011-10-20 15:39 0.37 crash Martin Mailand
@ 2011-10-20 18:25 ` Stefan Kleijkers
  2011-10-20 18:49   ` Martin Mailand
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Kleijkers @ 2011-10-20 18:25 UTC (permalink / raw)
  To: ceph-devel

Hello,

I got the exact same problem. Upgraded from 0.36 to 0.37 and one of the 
two osds wouldn't start. In the log of the osd I also found the same 
error as below. The ceph-osd had status D (with ps, which is 
uninterruptable sleep) and I see a high IO wait with top. Also I noticed 
a lot of disk io on the disks.

Stefan

On 10/20/2011 05:39 PM, Martin Mailand wrote:
> Hi,
> today I tried the version 0.37 and it did not work very well, see below.
> It was an update from 0.36.
>
> Best Regards,
>  Martin
>
>
> 2011-10-20 17:33:34.350502 7f0ada6f4760 ceph version 0.37 
> (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896), process ceph-osd, 
> pid 21707
> 2011-10-20 17:33:34.353543 7f0ada6f4760 filestore(/data/osd2) mount 
> FIEMAP ioctl is NOT supported
> 2011-10-20 17:33:34.353628 7f0ada6f4760 filestore(/data/osd2) mount 
> detected btrfs
> 2011-10-20 17:33:34.353656 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs CLONE_RANGE ioctl is supported
> 2011-10-20 17:33:34.425059 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_CREATE is supported
> 2011-10-20 17:33:34.544564 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_DESTROY is supported
> 2011-10-20 17:33:34.544873 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs START_SYNC got 0 Success
> 2011-10-20 17:33:34.544966 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs START_SYNC is supported (transid 149)
> 2011-10-20 17:33:34.624965 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs WAIT_SYNC is supported
> 2011-10-20 17:33:34.636719 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_CREATE_V2 got 0 Success
> 2011-10-20 17:33:34.636754 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_CREATE_V2 is supported
> 2011-10-20 17:33:34.644876 7f0ada6f4760 filestore(/data/osd2) mount 
> found snaps <>
> 2011-10-20 17:33:34.644983 7f0ada6f4760 filestore(/data/osd2) mount: 
> enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
> enabled
> 2011-10-20 17:33:34.678324 7f0ada6f4760 journal  kernel version is 3.1.0
> 2011-10-20 17:33:34.678737 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
> 476500201472 bytes, block size 4096 bytes, directio = 1
> 2011-10-20 17:33:34.688215 7f0ada6f4760 journal read_entry 39366656 : 
> seq 4653 710 bytes
> 2011-10-20 17:33:34.688420 7f0ada6f4760 journal read_entry 39374848 : 
> seq 4654 33 bytes
> 2011-10-20 17:33:34.695110 7f0ada6f4760 journal  kernel version is 3.1.0
> 2011-10-20 17:33:34.695496 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
> 476500201472 bytes, block size 4096 bytes, directio = 1
> 2011-10-20 17:33:34.696359 7f0ada6f4760 FileStore is up to date.
> 2011-10-20 17:33:34.696683 7f0ada6f4760 journal close /dev/sda7
> 2011-10-20 17:33:34.697970 7f0ada6f4760 filestore(/data/osd2) mount 
> FIEMAP ioctl is NOT supported
> 2011-10-20 17:33:34.698013 7f0ada6f4760 filestore(/data/osd2) mount 
> detected btrfs
> 2011-10-20 17:33:34.698031 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs CLONE_RANGE ioctl is supported
> 2011-10-20 17:33:34.774980 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_CREATE is supported
> 2011-10-20 17:33:34.904538 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_DESTROY is supported
> 2011-10-20 17:33:34.904945 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs START_SYNC got 0 Success
> 2011-10-20 17:33:34.904995 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs START_SYNC is supported (transid 152)
> 2011-10-20 17:33:34.991585 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs WAIT_SYNC is supported
> 2011-10-20 17:33:34.996636 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_CREATE_V2 got 0 Success
> 2011-10-20 17:33:34.996664 7f0ada6f4760 filestore(/data/osd2) mount 
> btrfs SNAP_CREATE_V2 is supported
> 2011-10-20 17:33:35.004813 7f0ada6f4760 filestore(/data/osd2) mount 
> found snaps <>
> 2011-10-20 17:33:35.004902 7f0ada6f4760 filestore(/data/osd2) mount: 
> enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
> enabled
> 2011-10-20 17:33:35.023071 7f0ada6f4760 journal  kernel version is 3.1.0
> 2011-10-20 17:33:35.023353 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
> 476500201472 bytes, block size 4096 bytes, directio = 1
> 2011-10-20 17:33:35.029846 7f0ada6f4760 journal read_entry 39366656 : 
> seq 4653 710 bytes
> 2011-10-20 17:33:35.030077 7f0ada6f4760 journal read_entry 39374848 : 
> seq 4654 33 bytes
> 2011-10-20 17:33:35.036728 7f0ada6f4760 journal  kernel version is 3.1.0
> 2011-10-20 17:33:35.037142 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
> 476500201472 bytes, block size 4096 bytes, directio = 1
> *** Caught signal (Aborted) **
>  in thread 0x7f0ace7f9700
>  ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
>  1: /usr/bin/ceph-osd() [0x5bd012]
>  2: (()+0xfc60) [0x7f0ada2d4c60]
>  3: (gsignal()+0x35) [0x7f0ad8a5ad05]
>  4: (abort()+0x186) [0x7f0ad8a5eab6]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0ad93116dd]
>  6: (()+0xb9926) [0x7f0ad930f926]
>  7: (()+0xb9953) [0x7f0ad930f953]
>  8: (()+0xb9a5e) [0x7f0ad930fa5e]
>  9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x129) 
> [0x5a7e99]
>  10: (OSDMap::decode(ceph::buffer::list&)+0x81) [0x58f9f1]
>  11: (OSD::get_map(unsigned int)+0x242) [0x53f6d2]
>  12: (OSD::handle_osd_map(MOSDMap*)+0x1f82) [0x56ae72]
>  13: (OSD::_dispatch(Message*)+0x36b) [0x56d11b]
>  14: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
>  15: (SimpleMessenger::dispatch_entry()+0x88b) [0x5fff2b]
>  16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
>  17: (()+0x6d8c) [0x7f0ada2cbd8c]
>  18: (clone()+0x6d) [0x7f0ad8b0d04d]
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 0.37 crash
  2011-10-20 18:25 ` Stefan Kleijkers
@ 2011-10-20 18:49   ` Martin Mailand
  2011-10-20 20:12     ` Stefan Kleijkers
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Mailand @ 2011-10-20 18:49 UTC (permalink / raw)
  To: Stefan Kleijkers; +Cc: ceph-devel

Hi Stefan,
in my case the osd process was just terminated, no IO wait. Could you 
have a look in your dmesg, if there is any btrfs entry?
Because the IO wait sounds like a btrfs problem.

Best Regards,
  martin

Stefan Kleijkers schrieb:
> Hello,
> 
> I got the exact same problem. Upgraded from 0.36 to 0.37 and one of the 
> two osds wouldn't start. In the log of the osd I also found the same 
> error as below. The ceph-osd had status D (with ps, which is 
> uninterruptable sleep) and I see a high IO wait with top. Also I noticed 
> a lot of disk io on the disks.
> 
> Stefan
> 
> On 10/20/2011 05:39 PM, Martin Mailand wrote:
>> Hi,
>> today I tried the version 0.37 and it did not work very well, see below.
>> It was an update from 0.36.
>>
>> Best Regards,
>>  Martin
>>
>>
>> 2011-10-20 17:33:34.350502 7f0ada6f4760 ceph version 0.37 
>> (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896), process ceph-osd, 
>> pid 21707
>> 2011-10-20 17:33:34.353543 7f0ada6f4760 filestore(/data/osd2) mount 
>> FIEMAP ioctl is NOT supported
>> 2011-10-20 17:33:34.353628 7f0ada6f4760 filestore(/data/osd2) mount 
>> detected btrfs
>> 2011-10-20 17:33:34.353656 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs CLONE_RANGE ioctl is supported
>> 2011-10-20 17:33:34.425059 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_CREATE is supported
>> 2011-10-20 17:33:34.544564 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_DESTROY is supported
>> 2011-10-20 17:33:34.544873 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs START_SYNC got 0 Success
>> 2011-10-20 17:33:34.544966 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs START_SYNC is supported (transid 149)
>> 2011-10-20 17:33:34.624965 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs WAIT_SYNC is supported
>> 2011-10-20 17:33:34.636719 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_CREATE_V2 got 0 Success
>> 2011-10-20 17:33:34.636754 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_CREATE_V2 is supported
>> 2011-10-20 17:33:34.644876 7f0ada6f4760 filestore(/data/osd2) mount 
>> found snaps <>
>> 2011-10-20 17:33:34.644983 7f0ada6f4760 filestore(/data/osd2) mount: 
>> enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
>> enabled
>> 2011-10-20 17:33:34.678324 7f0ada6f4760 journal  kernel version is 3.1.0
>> 2011-10-20 17:33:34.678737 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
>> 476500201472 bytes, block size 4096 bytes, directio = 1
>> 2011-10-20 17:33:34.688215 7f0ada6f4760 journal read_entry 39366656 : 
>> seq 4653 710 bytes
>> 2011-10-20 17:33:34.688420 7f0ada6f4760 journal read_entry 39374848 : 
>> seq 4654 33 bytes
>> 2011-10-20 17:33:34.695110 7f0ada6f4760 journal  kernel version is 3.1.0
>> 2011-10-20 17:33:34.695496 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
>> 476500201472 bytes, block size 4096 bytes, directio = 1
>> 2011-10-20 17:33:34.696359 7f0ada6f4760 FileStore is up to date.
>> 2011-10-20 17:33:34.696683 7f0ada6f4760 journal close /dev/sda7
>> 2011-10-20 17:33:34.697970 7f0ada6f4760 filestore(/data/osd2) mount 
>> FIEMAP ioctl is NOT supported
>> 2011-10-20 17:33:34.698013 7f0ada6f4760 filestore(/data/osd2) mount 
>> detected btrfs
>> 2011-10-20 17:33:34.698031 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs CLONE_RANGE ioctl is supported
>> 2011-10-20 17:33:34.774980 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_CREATE is supported
>> 2011-10-20 17:33:34.904538 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_DESTROY is supported
>> 2011-10-20 17:33:34.904945 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs START_SYNC got 0 Success
>> 2011-10-20 17:33:34.904995 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs START_SYNC is supported (transid 152)
>> 2011-10-20 17:33:34.991585 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs WAIT_SYNC is supported
>> 2011-10-20 17:33:34.996636 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_CREATE_V2 got 0 Success
>> 2011-10-20 17:33:34.996664 7f0ada6f4760 filestore(/data/osd2) mount 
>> btrfs SNAP_CREATE_V2 is supported
>> 2011-10-20 17:33:35.004813 7f0ada6f4760 filestore(/data/osd2) mount 
>> found snaps <>
>> 2011-10-20 17:33:35.004902 7f0ada6f4760 filestore(/data/osd2) mount: 
>> enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
>> enabled
>> 2011-10-20 17:33:35.023071 7f0ada6f4760 journal  kernel version is 3.1.0
>> 2011-10-20 17:33:35.023353 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
>> 476500201472 bytes, block size 4096 bytes, directio = 1
>> 2011-10-20 17:33:35.029846 7f0ada6f4760 journal read_entry 39366656 : 
>> seq 4653 710 bytes
>> 2011-10-20 17:33:35.030077 7f0ada6f4760 journal read_entry 39374848 : 
>> seq 4654 33 bytes
>> 2011-10-20 17:33:35.036728 7f0ada6f4760 journal  kernel version is 3.1.0
>> 2011-10-20 17:33:35.037142 7f0ada6f4760 journal _open /dev/sda7 fd 14: 
>> 476500201472 bytes, block size 4096 bytes, directio = 1
>> *** Caught signal (Aborted) **
>>  in thread 0x7f0ace7f9700
>>  ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
>>  1: /usr/bin/ceph-osd() [0x5bd012]
>>  2: (()+0xfc60) [0x7f0ada2d4c60]
>>  3: (gsignal()+0x35) [0x7f0ad8a5ad05]
>>  4: (abort()+0x186) [0x7f0ad8a5eab6]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0ad93116dd]
>>  6: (()+0xb9926) [0x7f0ad930f926]
>>  7: (()+0xb9953) [0x7f0ad930f953]
>>  8: (()+0xb9a5e) [0x7f0ad930fa5e]
>>  9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x129) 
>> [0x5a7e99]
>>  10: (OSDMap::decode(ceph::buffer::list&)+0x81) [0x58f9f1]
>>  11: (OSD::get_map(unsigned int)+0x242) [0x53f6d2]
>>  12: (OSD::handle_osd_map(MOSDMap*)+0x1f82) [0x56ae72]
>>  13: (OSD::_dispatch(Message*)+0x36b) [0x56d11b]
>>  14: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
>>  15: (SimpleMessenger::dispatch_entry()+0x88b) [0x5fff2b]
>>  16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
>>  17: (()+0x6d8c) [0x7f0ada2cbd8c]
>>  18: (clone()+0x6d) [0x7f0ad8b0d04d]
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 0.37 crash
  2011-10-20 18:49   ` Martin Mailand
@ 2011-10-20 20:12     ` Stefan Kleijkers
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Kleijkers @ 2011-10-20 20:12 UTC (permalink / raw)
  To: martin; +Cc: ceph-devel

Hello Martin,

I've recreated the fs with mkcephfs and the osd has been rebooted so I 
have no dmesg left from that time. At the moment I'm running a rsync 
workload and I just noticed I have the same problem (see paste below).
In the dmesg I found a couple of btrfs warnings (see below). I've found 
a patch for it, but I haven't applied it yet ( 
http://marc.info/?l=linux-btrfs&m=131547325515336&w=2 
<http://marc.info/?l=linux-btrfs&m=131547325515336&w=2>).
I don't know if the warnings from dmesg have anything to do with the 
error in the osd log. But I don't find any other warnings or errors in 
the dmesg.

Furthermore before the error in the osd log I got a lot of these messages:
2011-10-20 20:58:20.739916 7f5b9e71e700 heartbeat_map is_healthy 
'OSD::op_tp thread 0x7f5b90e02700' had timed out after 30
2011-10-20 20:58:20.739944 7f5b9e71e700 heartbeat_map is_healthy 
'OSD::op_tp thread 0x7f5b91603700' had timed out after 30
2011-10-20 20:58:20.739951 7f5b9e71e700 heartbeat_map is_healthy 
'FileStore::op_tp thread 0x7f5b96f0f700' had timed out after 60
2011-10-20 20:58:20.739956 7f5b9e71e700 heartbeat_map is_healthy 
'FileStore::op_tp thread 0x7f5b97710700' had timed out after 60

At the moment the osd daemon is in Dsl status (with ps) but I don't see 
any io...

I'm going to reboot the OSD and see if it comes back up.

Kind regards,
Stefan

Paste from dmesg:

[24597.271777] ------------[ cut here ]------------
[24597.271794] WARNING: at fs/btrfs/inode.c:2193 
btrfs_orphan_commit_root+0xa8/0xc0 [btrfs]()
[24597.271796] Hardware name: X8ST3
[24597.271797] Modules linked in: btrfs zlib_deflate lzo_compress 
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy 
async_tx md_mod target_core_mod configfs ahci libahci e1000e mptsas 
i7core_edac mptscsih mptbase scsi_transport_sas bnx2 i5000_edac 
edac_core ipmi_devintf ipmi_msghandler
[24597.271817] Pid: 6921, comm: ceph-osd Tainted: G        W   
3.1.0-rc10-un13.1-64-nohz #1
[24597.271819] Call Trace:
[24597.271827]  [<ffffffff81036dca>] warn_slowpath_common+0x7a/0xb0
[24597.271830]  [<ffffffff81036e15>] warn_slowpath_null+0x15/0x20
[24597.271840]  [<ffffffffa015f438>] btrfs_orphan_commit_root+0xa8/0xc0 
[btrfs]
[24597.271849]  [<ffffffffa0153c54>] commit_fs_roots+0xc4/0x1b0 [btrfs]
[24597.271856]  [<ffffffffa013a475>] ? btrfs_free_path+0x25/0x30 [btrfs]
[24597.271865]  [<ffffffffa0154c1e>] 
btrfs_commit_transaction+0x3be/0x7e0 [btrfs]
[24597.271874]  [<ffffffffa0153fb3>] ? wait_current_trans+0x23/0x110 [btrfs]
[24597.271878]  [<ffffffff810e41a6>] ? iput+0x46/0x210
[24597.271887]  [<ffffffffa0155120>] ? join_transaction+0x20/0x250 [btrfs]
[24597.271891]  [<ffffffff81052840>] ? wake_up_bit+0x40/0x40
[24597.271897]  [<ffffffffa01334a7>] btrfs_sync_fs+0x47/0x70 [btrfs]
[24597.271900]  [<ffffffff8102d7cb>] ? pick_next_task_fair+0x10b/0x190
[24597.271909]  [<ffffffffa01806b4>] btrfs_ioctl+0x4f4/0xd60 [btrfs]
[24597.271915]  [<ffffffff810ffc95>] ? fsnotify+0x1e5/0x310
[24597.271919]  [<ffffffff810dc77b>] do_vfs_ioctl+0x9b/0x4f0
[24597.271923]  [<ffffffff810dcc1a>] sys_ioctl+0x4a/0x80
[24597.271928]  [<ffffffff813fcc7b>] system_call_fastpath+0x16/0x1b
[24597.271931] ---[ end trace 3c3922c523490bbc ]---

Paste from osd log:

common/HeartbeatMap.cc: In function 'bool 
ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, 
time_t)', in thread '0x7f5b9e71e700'
common/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout")
  ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, 
long)+0x214) [0x65f804]
  2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x65fb2f]
  3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x65fd60]
  4: (CephContextServiceThread::entry()+0x5f) [0x5a84bf]
  5: (()+0x69ca) [0x7f5ba019d9ca]
  6: (clone()+0x6d) [0x7f5b9ea1e70d]
  ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
  1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, 
long)+0x214) [0x65f804]
  2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x65fb2f]
  3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x65fd60]
  4: (CephContextServiceThread::entry()+0x5f) [0x5a84bf]
  5: (()+0x69ca) [0x7f5ba019d9ca]
  6: (clone()+0x6d) [0x7f5b9ea1e70d]
*** Caught signal (Aborted) **
  in thread 0x7f5b9e71e700
  ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
  1: /usr/bin/ceph-osd() [0x660814]
  2: (()+0xf8f0) [0x7f5ba01a68f0]
  3: (gsignal()+0x35) [0x7f5b9e96ba75]
  4: (abort()+0x180) [0x7f5b9e96f5c0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5b9f2218e5]
  6: (()+0xcad16) [0x7f5b9f21fd16]
  7: (()+0xcad43) [0x7f5b9f21fd43]
  8: (()+0xcae3e) [0x7f5b9f21fe3e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x39f) [0x5e779f]
  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char 
const*, long)+0x214) [0x65f804]
  11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x65fb2f]
  12: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x65fd60]
  13: (CephContextServiceThread::entry()+0x5f) [0x5a84bf]
  14: (()+0x69ca) [0x7f5ba019d9ca]
  15: (clone()+0x6d) [0x7f5b9ea1e70d]

On 10/20/2011 08:49 PM, Martin Mailand wrote:
> Hi Stefan,
> in my case the osd process was just terminated, no IO wait. Could you 
> have a look in your dmesg, if there is any btrfs entry?
> Because the IO wait sounds like a btrfs problem.
>
> Best Regards,
>  martin
>
> Stefan Kleijkers schrieb:
>> Hello,
>>
>> I got the exact same problem. Upgraded from 0.36 to 0.37 and one of 
>> the two osds wouldn't start. In the log of the osd I also found the 
>> same error as below. The ceph-osd had status D (with ps, which is 
>> uninterruptable sleep) and I see a high IO wait with top. Also I 
>> noticed a lot of disk io on the disks.
>>
>> Stefan
>>
>> On 10/20/2011 05:39 PM, Martin Mailand wrote:
>>> Hi,
>>> today I tried the version 0.37 and it did not work very well, see 
>>> below.
>>> It was an update from 0.36.
>>>
>>> Best Regards,
>>>  Martin
>>>
>>>
>>> 2011-10-20 17:33:34.350502 7f0ada6f4760 ceph version 0.37 
>>> (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896), process ceph-osd, 
>>> pid 21707
>>> 2011-10-20 17:33:34.353543 7f0ada6f4760 filestore(/data/osd2) mount 
>>> FIEMAP ioctl is NOT supported
>>> 2011-10-20 17:33:34.353628 7f0ada6f4760 filestore(/data/osd2) mount 
>>> detected btrfs
>>> 2011-10-20 17:33:34.353656 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs CLONE_RANGE ioctl is supported
>>> 2011-10-20 17:33:34.425059 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_CREATE is supported
>>> 2011-10-20 17:33:34.544564 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_DESTROY is supported
>>> 2011-10-20 17:33:34.544873 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs START_SYNC got 0 Success
>>> 2011-10-20 17:33:34.544966 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs START_SYNC is supported (transid 149)
>>> 2011-10-20 17:33:34.624965 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs WAIT_SYNC is supported
>>> 2011-10-20 17:33:34.636719 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_CREATE_V2 got 0 Success
>>> 2011-10-20 17:33:34.636754 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_CREATE_V2 is supported
>>> 2011-10-20 17:33:34.644876 7f0ada6f4760 filestore(/data/osd2) mount 
>>> found snaps <>
>>> 2011-10-20 17:33:34.644983 7f0ada6f4760 filestore(/data/osd2) mount: 
>>> enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
>>> enabled
>>> 2011-10-20 17:33:34.678324 7f0ada6f4760 journal  kernel version is 
>>> 3.1.0
>>> 2011-10-20 17:33:34.678737 7f0ada6f4760 journal _open /dev/sda7 fd 
>>> 14: 476500201472 bytes, block size 4096 bytes, directio = 1
>>> 2011-10-20 17:33:34.688215 7f0ada6f4760 journal read_entry 39366656 
>>> : seq 4653 710 bytes
>>> 2011-10-20 17:33:34.688420 7f0ada6f4760 journal read_entry 39374848 
>>> : seq 4654 33 bytes
>>> 2011-10-20 17:33:34.695110 7f0ada6f4760 journal  kernel version is 
>>> 3.1.0
>>> 2011-10-20 17:33:34.695496 7f0ada6f4760 journal _open /dev/sda7 fd 
>>> 14: 476500201472 bytes, block size 4096 bytes, directio = 1
>>> 2011-10-20 17:33:34.696359 7f0ada6f4760 FileStore is up to date.
>>> 2011-10-20 17:33:34.696683 7f0ada6f4760 journal close /dev/sda7
>>> 2011-10-20 17:33:34.697970 7f0ada6f4760 filestore(/data/osd2) mount 
>>> FIEMAP ioctl is NOT supported
>>> 2011-10-20 17:33:34.698013 7f0ada6f4760 filestore(/data/osd2) mount 
>>> detected btrfs
>>> 2011-10-20 17:33:34.698031 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs CLONE_RANGE ioctl is supported
>>> 2011-10-20 17:33:34.774980 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_CREATE is supported
>>> 2011-10-20 17:33:34.904538 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_DESTROY is supported
>>> 2011-10-20 17:33:34.904945 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs START_SYNC got 0 Success
>>> 2011-10-20 17:33:34.904995 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs START_SYNC is supported (transid 152)
>>> 2011-10-20 17:33:34.991585 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs WAIT_SYNC is supported
>>> 2011-10-20 17:33:34.996636 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_CREATE_V2 got 0 Success
>>> 2011-10-20 17:33:34.996664 7f0ada6f4760 filestore(/data/osd2) mount 
>>> btrfs SNAP_CREATE_V2 is supported
>>> 2011-10-20 17:33:35.004813 7f0ada6f4760 filestore(/data/osd2) mount 
>>> found snaps <>
>>> 2011-10-20 17:33:35.004902 7f0ada6f4760 filestore(/data/osd2) mount: 
>>> enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not 
>>> enabled
>>> 2011-10-20 17:33:35.023071 7f0ada6f4760 journal  kernel version is 
>>> 3.1.0
>>> 2011-10-20 17:33:35.023353 7f0ada6f4760 journal _open /dev/sda7 fd 
>>> 14: 476500201472 bytes, block size 4096 bytes, directio = 1
>>> 2011-10-20 17:33:35.029846 7f0ada6f4760 journal read_entry 39366656 
>>> : seq 4653 710 bytes
>>> 2011-10-20 17:33:35.030077 7f0ada6f4760 journal read_entry 39374848 
>>> : seq 4654 33 bytes
>>> 2011-10-20 17:33:35.036728 7f0ada6f4760 journal  kernel version is 
>>> 3.1.0
>>> 2011-10-20 17:33:35.037142 7f0ada6f4760 journal _open /dev/sda7 fd 
>>> 14: 476500201472 bytes, block size 4096 bytes, directio = 1
>>> *** Caught signal (Aborted) **
>>>  in thread 0x7f0ace7f9700
>>>  ceph version 0.37 (commit:a6f3bbb744a6faea95ae48317f0b838edb16a896)
>>>  1: /usr/bin/ceph-osd() [0x5bd012]
>>>  2: (()+0xfc60) [0x7f0ada2d4c60]
>>>  3: (gsignal()+0x35) [0x7f0ad8a5ad05]
>>>  4: (abort()+0x186) [0x7f0ad8a5eab6]
>>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0ad93116dd]
>>>  6: (()+0xb9926) [0x7f0ad930f926]
>>>  7: (()+0xb9953) [0x7f0ad930f953]
>>>  8: (()+0xb9a5e) [0x7f0ad930fa5e]
>>>  9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x129) 
>>> [0x5a7e99]
>>>  10: (OSDMap::decode(ceph::buffer::list&)+0x81) [0x58f9f1]
>>>  11: (OSD::get_map(unsigned int)+0x242) [0x53f6d2]
>>>  12: (OSD::handle_osd_map(MOSDMap*)+0x1f82) [0x56ae72]
>>>  13: (OSD::_dispatch(Message*)+0x36b) [0x56d11b]
>>>  14: (OSD::ms_dispatch(Message*)+0xf6) [0x56e1c6]
>>>  15: (SimpleMessenger::dispatch_entry()+0x88b) [0x5fff2b]
>>>  16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4bd55c]
>>>  17: (()+0x6d8c) [0x7f0ada2cbd8c]
>>>  18: (clone()+0x6d) [0x7f0ad8b0d04d]
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-10-20 20:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-20 15:39 0.37 crash Martin Mailand
2011-10-20 18:25 ` Stefan Kleijkers
2011-10-20 18:49   ` Martin Mailand
2011-10-20 20:12     ` Stefan Kleijkers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.