* OSD Crashed when runing "rbd list"
@ 2013-01-08 15:51 Chen, Xiaoxi
2013-01-08 15:56 ` James Page
2013-01-08 17:12 ` Gregory Farnum
0 siblings, 2 replies; 3+ messages in thread
From: Chen, Xiaoxi @ 2013-01-08 15:51 UTC (permalink / raw)
To: ceph-devel
[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]
Hi List,
Every time I ran "rbd list" after creating a lot of rbd volumes (more than 100s), certain OSDs will die,osd.65 die first and then osd.35 (osd.65,that's the fifth disk on the sixth host) will die.
Is it a bug for 0.55? My ceph version is 0.55-1 with 3.7 kernel.
I would like to upgrade to 0.56-1 but there is no package for 3.7 kernel(raring)
Log of osd.35 attached.Key messages are below:
1 -- 192.101.11.203:6843/19960 mark_down 192.101.11.206:6861/3735 -- 0x7f331867a000
-38> 2013-01-08 23:37:37.751473 7f3302fc0700 -1 ./messages/MOSDOp.h: In function 'bool MOSDOp::check_rmw(int)' thread 7f3302fc0700 time 2013-01-08 23:37:37.748254
./messages/MOSDOp.h: 57: FAILED assert(rmw_flags)
ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
1: (()+0x22f765) [0x7f3310831765]
2: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
3: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
4: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
5: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
6: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
7: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
9: (()+0x7f9f) [0x7f330ffc5f9f]
10: (clone()+0x6d) [0x7f330e2800cd]
Thanks for the help.
Xiaoxi
[-- Attachment #2: dump_log --]
[-- Type: application/octet-stream, Size: 22036 bytes --]
-66> 2013-01-08 23:37:37.435635 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.205:0/13118 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.420064) v2 -- ?+0 0x7f3313a6ddc0 con 0x7f3317675340
-65> 2013-01-08 23:37:37.476956 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.52 192.101.11.205:0/2635 405 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.461487) v2 ==== 47+0+0 (2186900973 0 0) 0x7f3319a516c0 con 0x7f3316abf600
-64> 2013-01-08 23:37:37.477012 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.205:0/2635 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.461487) v2 -- ?+0 0x7f3319c79dc0 con 0x7f3316abf600
-63> 2013-01-08 23:37:37.486347 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.219 192.101.11.202:0/3053 414 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.480620) v2 ==== 47+0+0 (2770502130 0 0) 0x7f3319cbfc00 con 0x7f33147c1b80
-62> 2013-01-08 23:37:37.486404 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/3053 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.480620) v2 -- ?+0 0x7f3319a516c0 con 0x7f33147c1b80
-61> 2013-01-08 23:37:37.517501 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.417 192.101.11.204:0/26832 411 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.524010) v2 ==== 47+0+0 (728300027 0 0) 0x7f3313b4e1c0 con 0x7f33145ede40
-60> 2013-01-08 23:37:37.517570 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.204:0/26832 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.524010) v2 -- ?+0 0x7f3319cbfc00 con 0x7f33145ede40
-59> 2013-01-08 23:37:37.556975 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.112 192.101.11.201:0/22007 416 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.593957) v2 ==== 47+0+0 (2072717698 0 0) 0x7f3319d808c0 con 0x7f3313d57b80
-58> 2013-01-08 23:37:37.557031 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/22007 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.593957) v2 -- ?+0 0x7f3313b4e1c0 con 0x7f3313d57b80
-57> 2013-01-08 23:37:37.599649 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.118 192.101.11.201:0/22867 424 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.636554) v2 ==== 47+0+0 (532548856 0 0) 0x7f33182c5500 con 0x7f33133eab00
-56> 2013-01-08 23:37:37.599705 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/22867 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.636554) v2 -- ?+0 0x7f3319d808c0 con 0x7f33133eab00
-55> 2013-01-08 23:37:37.631457 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.415 192.101.11.204:0/24273 412 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.637870) v2 ==== 47+0+0 (1829221514 0 0) 0x7f331860f500 con 0x7f33147c14a0
-54> 2013-01-08 23:37:37.631515 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.204:0/24273 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.637870) v2 -- ?+0 0x7f33182c5500 con 0x7f33147c14a0
-53> 2013-01-08 23:37:37.697597 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.11 192.101.11.201:0/21633 424 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.734469) v2 ==== 47+0+0 (2808452608 0 0) 0x7f33134df500 con 0x7f33133dcf20
-52> 2013-01-08 23:37:37.697643 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/21633 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.734469) v2 -- ?+0 0x7f331860f500 con 0x7f33133dcf20
-51> 2013-01-08 23:37:37.735150 7f32df801700 1 -- 192.101.11.203:6842/19960 >> :/0 pipe(0x7f3317d70900 sd=32 :6842 pgs=0 cs=0 l=0).accept sd=32
-50> 2013-01-08 23:37:37.735765 7f3302fc0700 1 -- 192.101.11.203:6842/19960 <== client.5106 192.101.11.201:0/1008516 1 ==== osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4 ==== 152+0+23 (3309219154 0 1114467127) 0x7f3317d70000 con 0x7f331a76b4a0
-49> 2013-01-08 23:37:37.735825 7f3302fc0700 5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735657, event: header_read, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
-48> 2013-01-08 23:37:37.735849 7f3302fc0700 5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735663, event: throttled, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
-47> 2013-01-08 23:37:37.735862 7f3302fc0700 5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735696, event: all_read, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
-46> 2013-01-08 23:37:37.735888 7f3302fc0700 5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735815, event: dispatched, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
-45> 2013-01-08 23:37:37.735902 7f3302fc0700 5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735901, event: waiting_for_osdmap, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
-44> 2013-01-08 23:37:37.735931 7f3302fc0700 10 monclient: renew_subs
-43> 2013-01-08 23:37:37.735937 7f3302fc0700 10 monclient: _send_mon_message to mon.ceph1 at 192.101.11.201:6789/0
-42> 2013-01-08 23:37:37.735943 7f3302fc0700 1 -- 192.101.11.203:6842/19960 --> 192.101.11.201:6789/0 -- mon_subscribe({monmap=2+,osd_pg_creates=0,osdmap=177}) v2 -- ?+0 0x7f33187e7340 con 0x7f33176758c0
-41> 2013-01-08 23:37:37.736601 7f3302fc0700 1 -- 192.101.11.203:6842/19960 <== mon.0 192.101.11.201:6789/0 99 ==== osd_map(177..177 src has 1..177) v3 ==== 169+0+0 (2301414109 0 0) 0x7f33162fa600 con 0x7f33176758c0
-40> 2013-01-08 23:37:37.736645 7f3302fc0700 3 osd.34 176 handle_osd_map epochs [177,177], i have 176, src has [1,177]
-39> 2013-01-08 23:37:37.741739 7f3302fc0700 1 -- 192.101.11.203:6843/19960 mark_down 192.101.11.206:6861/3735 -- 0x7f331867a000
-38> 2013-01-08 23:37:37.751473 7f3302fc0700 -1 ./messages/MOSDOp.h: In function 'bool MOSDOp::check_rmw(int)' thread 7f3302fc0700 time 2013-01-08 23:37:37.748254
./messages/MOSDOp.h: 57: FAILED assert(rmw_flags)
ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
1: (()+0x22f765) [0x7f3310831765]
2: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
3: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
4: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
5: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
6: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
7: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
9: (()+0x7f9f) [0x7f330ffc5f9f]
10: (clone()+0x6d) [0x7f330e2800cd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-37> 2013-01-08 23:37:37.753392 7f32fe7b7700 1 osd.34 pg_epoch: 177 pg[0.c1e( empty local-les=175 n=0 ec=1 les/c 175/175 177/177/177) [34] r=0 lpr=177 pi=171-176/2 mlcod 0'0 inactive] state<Start>: transitioning to Primary
-36> 2013-01-08 23:37:37.753592 7f32fe7b7700 1 osd.34 pg_epoch: 177 pg[2.c1c( v 164'572 (0'0,164'572] local-les=175 n=1 ec=1 les/c 175/175 177/177/177) [34] r=0 lpr=177 pi=171-176/2 lcod 164'571 mlcod 0'0 inactive] state<Start>: transitioning to Primary
-35> 2013-01-08 23:37:37.753790 7f32fe7b7700 1 osd.34 pg_epoch: 177 pg[1.c1d( empty local-les=175 n=0 ec=1 les/c 175/175 177/177/177) [34] r=0 lpr=177 pi=171-176/2 mlcod 0'0 inactive] state<Start>: transitioning to Primary
-34> 2013-01-08 23:37:37.753950 7f32fe7b7700 10 monclient: _send_mon_message to mon.ceph1 at 192.101.11.201:6789/0
-33> 2013-01-08 23:37:37.753969 7f32fe7b7700 1 -- 192.101.11.203:6842/19960 --> 192.101.11.201:6789/0 -- osd_alive(want up_thru 177 have 177) v1 -- ?+0 0x7f3318791880 con 0x7f33176758c0
-32> 2013-01-08 23:37:37.831439 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.510 192.101.11.205:0/22503 410 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.815860) v2 ==== 47+0+0 (2347573173 0 0) 0x7f3313b2f880 con 0x7f3315c21080
-31> 2013-01-08 23:37:37.831493 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.205:0/22503 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.815860) v2 -- ?+0 0x7f33134df500 con 0x7f3315c21080
-30> 2013-01-08 23:37:37.831530 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.510 192.101.11.205:6804/22503 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317183400
-29> 2013-01-08 23:37:37.872961 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.12 192.101.11.201:0/23153 411 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.909863) v2 ==== 47+0+0 (869822583 0 0) 0x7f331a253500 con 0x7f3315c21ce0
-28> 2013-01-08 23:37:37.873041 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/23153 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.909863) v2 -- ?+0 0x7f3313b2f880 con 0x7f3315c21ce0
-27> 2013-01-08 23:37:37.873066 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.12 192.101.11.201:6834/23153 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317147200
-26> 2013-01-08 23:37:37.874917 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.214 192.101.11.202:0/31653 424 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.869254) v2 ==== 47+0+0 (2504273477 0 0) 0x7f3319b3ddc0 con 0x7f33133ea840
-25> 2013-01-08 23:37:37.874966 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/31653 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.869254) v2 -- ?+0 0x7f331a253500 con 0x7f33133ea840
-24> 2013-01-08 23:37:37.874994 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.214 192.101.11.202:6816/31653 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3313eb9000
-23> 2013-01-08 23:37:37.987972 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.219 192.101.11.202:0/3053 415 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.982219) v2 ==== 47+0+0 (808581091 0 0) 0x7f3319cbf340 con 0x7f33147c1b80
-22> 2013-01-08 23:37:37.988034 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/3053 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.982219) v2 -- ?+0 0x7f3319b3ddc0 con 0x7f33147c1b80
-21> 2013-01-08 23:37:37.988113 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.219 192.101.11.202:6831/3053 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3316276800
-20> 2013-01-08 23:37:38.016782 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.413 192.101.11.204:0/22042 416 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.023156) v2 ==== 47+0+0 (2277312799 0 0) 0x7f331437e540 con 0x7f33147c11e0
-19> 2013-01-08 23:37:38.016875 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.204:0/22042 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.023156) v2 -- ?+0 0x7f3319cbf340 con 0x7f33147c11e0
-18> 2013-01-08 23:37:38.016908 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.413 192.101.11.204:6813/22042 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f331565cc00
-17> 2013-01-08 23:37:38.040065 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.17 192.101.11.201:0/24014 413 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.077028) v2 ==== 47+0+0 (2165895644 0 0) 0x7f33141ada40 con 0x7f33148ccdc0
-16> 2013-01-08 23:37:38.040136 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/24014 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.077028) v2 -- ?+0 0x7f331437e540 con 0x7f33148ccdc0
-15> 2013-01-08 23:37:38.040211 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.17 192.101.11.201:6852/24014 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f33150fe000
-14> 2013-01-08 23:37:38.091430 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.69 192.101.11.206:0/22754 371 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.117701) v2 ==== 47+0+0 (3119010757 0 0) 0x7f331a819dc0 con 0x7f331897af20
-13> 2013-01-08 23:37:38.091488 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.206:0/22754 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.117701) v2 -- ?+0 0x7f33141ada40 con 0x7f331897af20
-12> 2013-01-08 23:37:38.091551 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.69 192.101.11.206:6858/22754 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f33150fe200
-11> 2013-01-08 23:37:38.142047 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.28 192.101.11.202:0/10345 426 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.136391) v2 ==== 47+0+0 (2478248440 0 0) 0x7f331a4fd880 con 0x7f3313d57e40
-10> 2013-01-08 23:37:38.142091 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/10345 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.136391) v2 -- ?+0 0x7f331a819dc0 con 0x7f3313d57e40
-9> 2013-01-08 23:37:38.142117 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.28 192.101.11.202:6855/10345 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659c00
-8> 2013-01-08 23:37:38.306536 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.27 192.101.11.202:0/9453 421 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.300821) v2 ==== 47+0+0 (2954499570 0 0) 0x7f331a6f6540 con 0x7f3313b11a20
-7> 2013-01-08 23:37:38.306629 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/9453 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.300821) v2 -- ?+0 0x7f331a4fd880 con 0x7f3313b11a20
-6> 2013-01-08 23:37:38.306683 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.27 192.101.11.202:6852/9453 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659a00
-5> 2013-01-08 23:37:38.369817 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.211 192.101.11.202:0/29353 420 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.364104) v2 ==== 47+0+0 (1839497095 0 0) 0x7f3319f4cc40 con 0x7f3313737340
-4> 2013-01-08 23:37:38.369897 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/29353 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.364104) v2 -- ?+0 0x7f331a6f6540 con 0x7f3313737340
-3> 2013-01-08 23:37:38.369937 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.211 192.101.11.202:6807/29353 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659800
-2> 2013-01-08 23:37:38.437758 7f33007bb700 1 -- 192.101.11.203:6844/19960 <== osd.29 192.101.11.202:0/11437 407 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.432069) v2 ==== 47+0+0 (3568911613 0 0) 0x7f331a579500 con 0x7f3315060dc0
-1> 2013-01-08 23:37:38.437835 7f33007bb700 1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/11437 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.432069) v2 -- ?+0 0x7f3319f4cc40 con 0x7f3315060dc0
0> 2013-01-08 23:37:38.437881 7f33007bb700 1 -- 192.101.11.203:6843/19960 --> osd.29 192.101.11.202:6858/11437 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659600
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 100000
max_new 1000
log_file /var/log/ceph/osd.34.log
--- end dump of recent events ---
2013-01-08 23:37:38.464910 7f3302fc0700 -1 *** Caught signal (Aborted) **
in thread 7f3302fc0700
ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
1: (()+0x433dd0) [0x7f3310a35dd0]
2: (()+0xfbb0) [0x7f330ffcdbb0]
3: (gsignal()+0x35) [0x7f330e1bfe35]
4: (abort()+0x148) [0x7f330e1c3498]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f330eac2e3d]
6: (()+0x5ef36) [0x7f330eac0f36]
7: (()+0x5ef63) [0x7f330eac0f63]
8: (()+0x5f18e) [0x7f330eac118e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x43d) [0x7f3310adc4ed]
10: (()+0x22f765) [0x7f3310831765]
11: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
12: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
13: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
14: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
15: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
16: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
18: (()+0x7f9f) [0x7f330ffc5f9f]
19: (clone()+0x6d) [0x7f330e2800cd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2013-01-08 23:37:38.464910 7f3302fc0700 -1 *** Caught signal (Aborted) **
in thread 7f3302fc0700
ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
1: (()+0x433dd0) [0x7f3310a35dd0]
2: (()+0xfbb0) [0x7f330ffcdbb0]
3: (gsignal()+0x35) [0x7f330e1bfe35]
4: (abort()+0x148) [0x7f330e1c3498]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f330eac2e3d]
6: (()+0x5ef36) [0x7f330eac0f36]
7: (()+0x5ef63) [0x7f330eac0f63]
8: (()+0x5f18e) [0x7f330eac118e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x43d) [0x7f3310adc4ed]
10: (()+0x22f765) [0x7f3310831765]
11: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
12: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
13: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
14: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
15: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
16: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
18: (()+0x7f9f) [0x7f330ffc5f9f]
19: (clone()+0x6d) [0x7f330e2800cd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 100000
max_new 1000
log_file /var/log/ceph/osd.34.log
--- end dump of recent events ---
2013-01-08 23:38:49.252564 7fa3a642e7c0 0 ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b), process ceph-osd, pid 13310
2013-01-08 23:38:49.577405 7fa3a642e7c0 0 filestore(/data/osd.34) mount FIEMAP ioctl is supported and appears to work
2013-01-08 23:38:49.577432 7fa3a642e7c0 0 filestore(/data/osd.34) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-01-08 23:38:49.577934 7fa3a642e7c0 0 filestore(/data/osd.34) mount did NOT detect btrfs
2013-01-08 23:38:49.581048 7fa3a642e7c0 0 filestore(/data/osd.34) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-01-08 23:38:49.583209 7fa3a642e7c0 0 filestore(/data/osd.34) mount syscall(SYS_syncfs, fd) fully supported
2013-01-08 23:38:49.585388 7fa3a642e7c0 0 filestore(/data/osd.34) mount syscall(__NR_syncfs, fd) fully supported
2013-01-08 23:38:49.585514 7fa3a642e7c0 0 filestore(/data/osd.34) mount found snaps <>
2013-01-08 23:38:49.594324 7fa3a642e7c0 0 filestore(/data/osd.34) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-01-08 23:38:49.606596 7fa3a642e7c0 0 journal kernel version is 3.7.0
2013-01-08 23:38:49.606902 7fa3a642e7c0 1 journal _open /dev/sda8 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:38:49.638795 7fa3a642e7c0 0 journal kernel version is 3.7.0
2013-01-08 23:38:49.639061 7fa3a642e7c0 1 journal _open /dev/sda8 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:38:49.679483 7fa3a642e7c0 1 journal close /dev/sda8
2013-01-08 23:38:49.699079 7fa3a642e7c0 0 filestore(/data/osd.34) mount FIEMAP ioctl is supported and appears to work
2013-01-08 23:38:49.699088 7fa3a642e7c0 0 filestore(/data/osd.34) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-01-08 23:38:49.699316 7fa3a642e7c0 0 filestore(/data/osd.34) mount did NOT detect btrfs
2013-01-08 23:38:49.703818 7fa3a642e7c0 0 filestore(/data/osd.34) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-01-08 23:38:49.705633 7fa3a642e7c0 0 filestore(/data/osd.34) mount syscall(SYS_syncfs, fd) fully supported
2013-01-08 23:38:49.707455 7fa3a642e7c0 0 filestore(/data/osd.34) mount syscall(__NR_syncfs, fd) fully supported
2013-01-08 23:38:49.707553 7fa3a642e7c0 0 filestore(/data/osd.34) mount found snaps <>
2013-01-08 23:38:49.711430 7fa3a642e7c0 0 filestore(/data/osd.34) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-01-08 23:38:49.719920 7fa3a642e7c0 0 journal kernel version is 3.7.0
2013-01-08 23:38:49.720234 7fa3a642e7c0 1 journal _open /dev/sda8 fd 26: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:38:49.741396 7fa3a642e7c0 0 journal kernel version is 3.7.0
2013-01-08 23:38:49.741709 7fa3a642e7c0 1 journal _open /dev/sda8 fd 26: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:39:40.791212 7fa392df6700 0 log [INF] : 0.30 scrub ok
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: OSD Crashed when runing "rbd list"
2013-01-08 15:51 OSD Crashed when runing "rbd list" Chen, Xiaoxi
@ 2013-01-08 15:56 ` James Page
2013-01-08 17:12 ` Gregory Farnum
1 sibling, 0 replies; 3+ messages in thread
From: James Page @ 2013-01-08 15:56 UTC (permalink / raw)
To: Chen, Xiaoxi; +Cc: ceph-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 08/01/13 15:51, Chen, Xiaoxi wrote:
> I would like to upgrade to 0.56-1 but there is no package for 3.7
> kernel(raring)
I uploaded 0.56.1 to Ubuntu Raring this morning - its published and
should ripple through archive mirrors in the next few hours.
- --
James Page
Ubuntu Core Developer
Debian Maintainer
james.page@ubuntu.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/
iQIcBAEBCAAGBQJQ7EGwAAoJEL/srsug59jDXpYQANFT1TmBSZlWhARzrRCgPwKG
p80UImsdQwEh8HZ2/hlgWfNM/ybs72yJ5eGYVm6tNSe33pUCLM4tXn/m75vzhhEQ
kvn89qKWVcWKoCcEE2m4ZcDykfmE5Ti0oHhoVRcLkaz3REWRaizBQH4iVV9DY86F
5xhUjEKoHwmpEBcYs2yzbPvEz18dOSmZfBpLMWUJBhofhXQaXUjOKj/0qH0rbfdg
Ntt6ijrP7IFIAYZQ4xbQAs8N7fO/nHna5no/v3KpVY45rHM7/mYjtaWKOenz+CeL
kYlWgQHgMhqP7PSj6FEOEw3ggGJcF4eVL+e/ApFYjJkFIhy+ro46guaSUrvtZGTW
UAPgxeRIRp6d390wlNQQmM/SyzaYRU2vxcCRYm0La2Q5+TrIRit3/4npRvR3az6B
7W/EFYsOJxp5UugYeiMHd3aZRF+ps/4Y3Ay+Uxp2tQO+Ng/PGhDkFz0bN0X5hR8c
Ioha7fWTeibCnPIIDs1GsA4BUDrCqBQoRyiqREYHRxfFISK67RCo6VLVhmjha8Dc
L/8QUCdionsRNjUhzlGlwZxWtPmgFTCDxIXSNXW1QScEaCmceh+P7MXDlXjFlzKG
EYNmTZ5egeNh/OQLWpgEFuqU3tG79yLT2znhXMTl3SMHcNX23LcknsFTZricnxSz
DnKH7gkBhukHrMG/H13V
=8DLv
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: OSD Crashed when runing "rbd list"
2013-01-08 15:51 OSD Crashed when runing "rbd list" Chen, Xiaoxi
2013-01-08 15:56 ` James Page
@ 2013-01-08 17:12 ` Gregory Farnum
1 sibling, 0 replies; 3+ messages in thread
From: Gregory Farnum @ 2013-01-08 17:12 UTC (permalink / raw)
To: Chen, Xiaoxi; +Cc: ceph-devel
On Tue, Jan 8, 2013 at 7:51 AM, Chen, Xiaoxi <xiaoxi.chen@intel.com> wrote:
> Hi List,
> Every time I ran "rbd list" after creating a lot of rbd volumes (more than 100s), certain OSDs will die,osd.65 die first and then osd.35 (osd.65,that's the fifth disk on the sixth host) will die.
> Is it a bug for 0.55? My ceph version is 0.55-1 with 3.7 kernel.
> I would like to upgrade to 0.56-1 but there is no package for 3.7 kernel(raring)
>
> Log of osd.35 attached.Key messages are below:
>
> 1 -- 192.101.11.203:6843/19960 mark_down 192.101.11.206:6861/3735 -- 0x7f331867a000
> -38> 2013-01-08 23:37:37.751473 7f3302fc0700 -1 ./messages/MOSDOp.h: In function 'bool MOSDOp::check_rmw(int)' thread 7f3302fc0700 time 2013-01-08 23:37:37.748254
> ./messages/MOSDOp.h: 57: FAILED assert(rmw_flags)
>
> ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
> 1: (()+0x22f765) [0x7f3310831765]
> 2: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
> 3: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
> 4: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
> 5: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
> 6: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
> 7: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
> 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
> 9: (()+0x7f9f) [0x7f330ffc5f9f]
> 10: (clone()+0x6d) [0x7f330e2800cd]
>
> Thanks for the help.
Sounds like you've got a v0.56 binary talking to v0.55 daemons. An
upgrade to v0.56.1 should fix it. See
http://tracker.newdream.net/issues/3715
-Greg
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-01-08 17:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-08 15:51 OSD Crashed when runing "rbd list" Chen, Xiaoxi
2013-01-08 15:56 ` James Page
2013-01-08 17:12 ` Gregory Farnum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.