All of lore.kernel.org
 help / color / mirror / Atom feed
* OSD Crashed when runing "rbd list"
@ 2013-01-08 15:51 Chen, Xiaoxi
  2013-01-08 15:56 ` James Page
  2013-01-08 17:12 ` Gregory Farnum
  0 siblings, 2 replies; 3+ messages in thread
From: Chen, Xiaoxi @ 2013-01-08 15:51 UTC (permalink / raw)
  To: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

Hi List,
      Every time I ran "rbd list" after creating a lot of rbd volumes (more than 100s), certain OSDs will die,osd.65 die first and then osd.35 (osd.65,that's the fifth disk on the sixth host) will die.
  Is it a bug for 0.55? My ceph version is 0.55-1 with 3.7 kernel.
	I would like to upgrade to 0.56-1 but there is no package for 3.7 kernel(raring)

   Log of osd.35 attached.Key messages are below:

1 -- 192.101.11.203:6843/19960 mark_down 192.101.11.206:6861/3735 -- 0x7f331867a000
   -38> 2013-01-08 23:37:37.751473 7f3302fc0700 -1 ./messages/MOSDOp.h: In function 'bool MOSDOp::check_rmw(int)' thread 7f3302fc0700 time 2013-01-08 23:37:37.748254
./messages/MOSDOp.h: 57: FAILED assert(rmw_flags)

 ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
 1: (()+0x22f765) [0x7f3310831765]
 2: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
 3: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
 4: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
 5: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
 6: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
 7: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
 9: (()+0x7f9f) [0x7f330ffc5f9f]
 10: (clone()+0x6d) [0x7f330e2800cd]
   
   Thanks for the help.
   																									Xiaoxi
	

[-- Attachment #2: dump_log --]
[-- Type: application/octet-stream, Size: 22036 bytes --]

   -66> 2013-01-08 23:37:37.435635 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.205:0/13118 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.420064) v2 -- ?+0 0x7f3313a6ddc0 con 0x7f3317675340
   -65> 2013-01-08 23:37:37.476956 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.52 192.101.11.205:0/2635 405 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.461487) v2 ==== 47+0+0 (2186900973 0 0) 0x7f3319a516c0 con 0x7f3316abf600
   -64> 2013-01-08 23:37:37.477012 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.205:0/2635 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.461487) v2 -- ?+0 0x7f3319c79dc0 con 0x7f3316abf600
   -63> 2013-01-08 23:37:37.486347 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.219 192.101.11.202:0/3053 414 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.480620) v2 ==== 47+0+0 (2770502130 0 0) 0x7f3319cbfc00 con 0x7f33147c1b80
   -62> 2013-01-08 23:37:37.486404 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/3053 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.480620) v2 -- ?+0 0x7f3319a516c0 con 0x7f33147c1b80
   -61> 2013-01-08 23:37:37.517501 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.417 192.101.11.204:0/26832 411 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.524010) v2 ==== 47+0+0 (728300027 0 0) 0x7f3313b4e1c0 con 0x7f33145ede40
   -60> 2013-01-08 23:37:37.517570 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.204:0/26832 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.524010) v2 -- ?+0 0x7f3319cbfc00 con 0x7f33145ede40
   -59> 2013-01-08 23:37:37.556975 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.112 192.101.11.201:0/22007 416 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.593957) v2 ==== 47+0+0 (2072717698 0 0) 0x7f3319d808c0 con 0x7f3313d57b80
   -58> 2013-01-08 23:37:37.557031 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/22007 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.593957) v2 -- ?+0 0x7f3313b4e1c0 con 0x7f3313d57b80
   -57> 2013-01-08 23:37:37.599649 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.118 192.101.11.201:0/22867 424 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.636554) v2 ==== 47+0+0 (532548856 0 0) 0x7f33182c5500 con 0x7f33133eab00
   -56> 2013-01-08 23:37:37.599705 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/22867 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.636554) v2 -- ?+0 0x7f3319d808c0 con 0x7f33133eab00
   -55> 2013-01-08 23:37:37.631457 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.415 192.101.11.204:0/24273 412 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.637870) v2 ==== 47+0+0 (1829221514 0 0) 0x7f331860f500 con 0x7f33147c14a0
   -54> 2013-01-08 23:37:37.631515 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.204:0/24273 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.637870) v2 -- ?+0 0x7f33182c5500 con 0x7f33147c14a0
   -53> 2013-01-08 23:37:37.697597 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.11 192.101.11.201:0/21633 424 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.734469) v2 ==== 47+0+0 (2808452608 0 0) 0x7f33134df500 con 0x7f33133dcf20
   -52> 2013-01-08 23:37:37.697643 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/21633 -- osd_ping(ping_reply e176 stamp 2013-01-08 23:37:37.734469) v2 -- ?+0 0x7f331860f500 con 0x7f33133dcf20
   -51> 2013-01-08 23:37:37.735150 7f32df801700  1 -- 192.101.11.203:6842/19960 >> :/0 pipe(0x7f3317d70900 sd=32 :6842 pgs=0 cs=0 l=0).accept sd=32
   -50> 2013-01-08 23:37:37.735765 7f3302fc0700  1 -- 192.101.11.203:6842/19960 <== client.5106 192.101.11.201:0/1008516 1 ==== osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4 ==== 152+0+23 (3309219154 0 1114467127) 0x7f3317d70000 con 0x7f331a76b4a0
   -49> 2013-01-08 23:37:37.735825 7f3302fc0700  5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735657, event: header_read, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
   -48> 2013-01-08 23:37:37.735849 7f3302fc0700  5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735663, event: throttled, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
   -47> 2013-01-08 23:37:37.735862 7f3302fc0700  5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735696, event: all_read, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
   -46> 2013-01-08 23:37:37.735888 7f3302fc0700  5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735815, event: dispatched, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
   -45> 2013-01-08 23:37:37.735902 7f3302fc0700  5 --OSD::tracker-- reqid: client.5106.0:2, seq: 8679, time: 2013-01-08 23:37:37.735901, event: waiting_for_osdmap, request: osd_op(client.5106.0:2 rbd_directory [??? rbd.dir_list] 2.30a98c1c RETRY) v4
   -44> 2013-01-08 23:37:37.735931 7f3302fc0700 10 monclient: renew_subs
   -43> 2013-01-08 23:37:37.735937 7f3302fc0700 10 monclient: _send_mon_message to mon.ceph1 at 192.101.11.201:6789/0
   -42> 2013-01-08 23:37:37.735943 7f3302fc0700  1 -- 192.101.11.203:6842/19960 --> 192.101.11.201:6789/0 -- mon_subscribe({monmap=2+,osd_pg_creates=0,osdmap=177}) v2 -- ?+0 0x7f33187e7340 con 0x7f33176758c0
   -41> 2013-01-08 23:37:37.736601 7f3302fc0700  1 -- 192.101.11.203:6842/19960 <== mon.0 192.101.11.201:6789/0 99 ==== osd_map(177..177 src has 1..177) v3 ==== 169+0+0 (2301414109 0 0) 0x7f33162fa600 con 0x7f33176758c0
   -40> 2013-01-08 23:37:37.736645 7f3302fc0700  3 osd.34 176 handle_osd_map epochs [177,177], i have 176, src has [1,177]
   -39> 2013-01-08 23:37:37.741739 7f3302fc0700  1 -- 192.101.11.203:6843/19960 mark_down 192.101.11.206:6861/3735 -- 0x7f331867a000
   -38> 2013-01-08 23:37:37.751473 7f3302fc0700 -1 ./messages/MOSDOp.h: In function 'bool MOSDOp::check_rmw(int)' thread 7f3302fc0700 time 2013-01-08 23:37:37.748254
./messages/MOSDOp.h: 57: FAILED assert(rmw_flags)

 ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
 1: (()+0x22f765) [0x7f3310831765]
 2: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
 3: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
 4: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
 5: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
 6: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
 7: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
 9: (()+0x7f9f) [0x7f330ffc5f9f]
 10: (clone()+0x6d) [0x7f330e2800cd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

   -37> 2013-01-08 23:37:37.753392 7f32fe7b7700  1 osd.34 pg_epoch: 177 pg[0.c1e( empty local-les=175 n=0 ec=1 les/c 175/175 177/177/177) [34] r=0 lpr=177 pi=171-176/2 mlcod 0'0 inactive] state<Start>: transitioning to Primary
   -36> 2013-01-08 23:37:37.753592 7f32fe7b7700  1 osd.34 pg_epoch: 177 pg[2.c1c( v 164'572 (0'0,164'572] local-les=175 n=1 ec=1 les/c 175/175 177/177/177) [34] r=0 lpr=177 pi=171-176/2 lcod 164'571 mlcod 0'0 inactive] state<Start>: transitioning to Primary
   -35> 2013-01-08 23:37:37.753790 7f32fe7b7700  1 osd.34 pg_epoch: 177 pg[1.c1d( empty local-les=175 n=0 ec=1 les/c 175/175 177/177/177) [34] r=0 lpr=177 pi=171-176/2 mlcod 0'0 inactive] state<Start>: transitioning to Primary
   -34> 2013-01-08 23:37:37.753950 7f32fe7b7700 10 monclient: _send_mon_message to mon.ceph1 at 192.101.11.201:6789/0
   -33> 2013-01-08 23:37:37.753969 7f32fe7b7700  1 -- 192.101.11.203:6842/19960 --> 192.101.11.201:6789/0 -- osd_alive(want up_thru 177 have 177) v1 -- ?+0 0x7f3318791880 con 0x7f33176758c0
   -32> 2013-01-08 23:37:37.831439 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.510 192.101.11.205:0/22503 410 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.815860) v2 ==== 47+0+0 (2347573173 0 0) 0x7f3313b2f880 con 0x7f3315c21080
   -31> 2013-01-08 23:37:37.831493 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.205:0/22503 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.815860) v2 -- ?+0 0x7f33134df500 con 0x7f3315c21080
   -30> 2013-01-08 23:37:37.831530 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.510 192.101.11.205:6804/22503 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317183400
   -29> 2013-01-08 23:37:37.872961 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.12 192.101.11.201:0/23153 411 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.909863) v2 ==== 47+0+0 (869822583 0 0) 0x7f331a253500 con 0x7f3315c21ce0
   -28> 2013-01-08 23:37:37.873041 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/23153 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.909863) v2 -- ?+0 0x7f3313b2f880 con 0x7f3315c21ce0
   -27> 2013-01-08 23:37:37.873066 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.12 192.101.11.201:6834/23153 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317147200
   -26> 2013-01-08 23:37:37.874917 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.214 192.101.11.202:0/31653 424 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.869254) v2 ==== 47+0+0 (2504273477 0 0) 0x7f3319b3ddc0 con 0x7f33133ea840
   -25> 2013-01-08 23:37:37.874966 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/31653 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.869254) v2 -- ?+0 0x7f331a253500 con 0x7f33133ea840
   -24> 2013-01-08 23:37:37.874994 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.214 192.101.11.202:6816/31653 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3313eb9000
   -23> 2013-01-08 23:37:37.987972 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.219 192.101.11.202:0/3053 415 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:37.982219) v2 ==== 47+0+0 (808581091 0 0) 0x7f3319cbf340 con 0x7f33147c1b80
   -22> 2013-01-08 23:37:37.988034 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/3053 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:37.982219) v2 -- ?+0 0x7f3319b3ddc0 con 0x7f33147c1b80
   -21> 2013-01-08 23:37:37.988113 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.219 192.101.11.202:6831/3053 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3316276800
   -20> 2013-01-08 23:37:38.016782 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.413 192.101.11.204:0/22042 416 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.023156) v2 ==== 47+0+0 (2277312799 0 0) 0x7f331437e540 con 0x7f33147c11e0
   -19> 2013-01-08 23:37:38.016875 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.204:0/22042 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.023156) v2 -- ?+0 0x7f3319cbf340 con 0x7f33147c11e0
   -18> 2013-01-08 23:37:38.016908 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.413 192.101.11.204:6813/22042 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f331565cc00
   -17> 2013-01-08 23:37:38.040065 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.17 192.101.11.201:0/24014 413 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.077028) v2 ==== 47+0+0 (2165895644 0 0) 0x7f33141ada40 con 0x7f33148ccdc0
   -16> 2013-01-08 23:37:38.040136 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.201:0/24014 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.077028) v2 -- ?+0 0x7f331437e540 con 0x7f33148ccdc0
   -15> 2013-01-08 23:37:38.040211 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.17 192.101.11.201:6852/24014 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f33150fe000
   -14> 2013-01-08 23:37:38.091430 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.69 192.101.11.206:0/22754 371 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.117701) v2 ==== 47+0+0 (3119010757 0 0) 0x7f331a819dc0 con 0x7f331897af20
   -13> 2013-01-08 23:37:38.091488 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.206:0/22754 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.117701) v2 -- ?+0 0x7f33141ada40 con 0x7f331897af20
   -12> 2013-01-08 23:37:38.091551 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.69 192.101.11.206:6858/22754 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f33150fe200
   -11> 2013-01-08 23:37:38.142047 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.28 192.101.11.202:0/10345 426 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.136391) v2 ==== 47+0+0 (2478248440 0 0) 0x7f331a4fd880 con 0x7f3313d57e40
   -10> 2013-01-08 23:37:38.142091 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/10345 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.136391) v2 -- ?+0 0x7f331a819dc0 con 0x7f3313d57e40
    -9> 2013-01-08 23:37:38.142117 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.28 192.101.11.202:6855/10345 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659c00
    -8> 2013-01-08 23:37:38.306536 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.27 192.101.11.202:0/9453 421 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.300821) v2 ==== 47+0+0 (2954499570 0 0) 0x7f331a6f6540 con 0x7f3313b11a20
    -7> 2013-01-08 23:37:38.306629 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/9453 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.300821) v2 -- ?+0 0x7f331a4fd880 con 0x7f3313b11a20
    -6> 2013-01-08 23:37:38.306683 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.27 192.101.11.202:6852/9453 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659a00
    -5> 2013-01-08 23:37:38.369817 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.211 192.101.11.202:0/29353 420 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.364104) v2 ==== 47+0+0 (1839497095 0 0) 0x7f3319f4cc40 con 0x7f3313737340
    -4> 2013-01-08 23:37:38.369897 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/29353 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.364104) v2 -- ?+0 0x7f331a6f6540 con 0x7f3313737340
    -3> 2013-01-08 23:37:38.369937 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.211 192.101.11.202:6807/29353 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659800
    -2> 2013-01-08 23:37:38.437758 7f33007bb700  1 -- 192.101.11.203:6844/19960 <== osd.29 192.101.11.202:0/11437 407 ==== osd_ping(ping e176 stamp 2013-01-08 23:37:38.432069) v2 ==== 47+0+0 (3568911613 0 0) 0x7f331a579500 con 0x7f3315060dc0
    -1> 2013-01-08 23:37:38.437835 7f33007bb700  1 -- 192.101.11.203:6844/19960 --> 192.101.11.202:0/11437 -- osd_ping(ping_reply e177 stamp 2013-01-08 23:37:38.432069) v2 -- ?+0 0x7f3319f4cc40 con 0x7f3315060dc0
     0> 2013-01-08 23:37:38.437881 7f33007bb700  1 -- 192.101.11.203:6843/19960 --> osd.29 192.101.11.202:6858/11437 -- osd_map(177..177 src has 1..177) v3 -- ?+0 0x7f3317659600
--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent    100000
  max_new         1000
  log_file /var/log/ceph/osd.34.log
--- end dump of recent events ---
2013-01-08 23:37:38.464910 7f3302fc0700 -1 *** Caught signal (Aborted) **
 in thread 7f3302fc0700

 ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
 1: (()+0x433dd0) [0x7f3310a35dd0]
 2: (()+0xfbb0) [0x7f330ffcdbb0]
 3: (gsignal()+0x35) [0x7f330e1bfe35]
 4: (abort()+0x148) [0x7f330e1c3498]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f330eac2e3d]
 6: (()+0x5ef36) [0x7f330eac0f36]
 7: (()+0x5ef63) [0x7f330eac0f63]
 8: (()+0x5f18e) [0x7f330eac118e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x43d) [0x7f3310adc4ed]
 10: (()+0x22f765) [0x7f3310831765]
 11: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
 12: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
 13: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
 14: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
 15: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
 16: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
 18: (()+0x7f9f) [0x7f330ffc5f9f]
 19: (clone()+0x6d) [0x7f330e2800cd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2013-01-08 23:37:38.464910 7f3302fc0700 -1 *** Caught signal (Aborted) **
 in thread 7f3302fc0700

 ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
 1: (()+0x433dd0) [0x7f3310a35dd0]
 2: (()+0xfbb0) [0x7f330ffcdbb0]
 3: (gsignal()+0x35) [0x7f330e1bfe35]
 4: (abort()+0x148) [0x7f330e1c3498]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f330eac2e3d]
 6: (()+0x5ef36) [0x7f330eac0f36]
 7: (()+0x5ef63) [0x7f330eac0f63]
 8: (()+0x5f18e) [0x7f330eac118e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x43d) [0x7f3310adc4ed]
 10: (()+0x22f765) [0x7f3310831765]
 11: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
 12: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
 13: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
 14: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
 15: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
 16: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
 18: (()+0x7f9f) [0x7f330ffc5f9f]
 19: (clone()+0x6d) [0x7f330e2800cd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent    100000
  max_new         1000
  log_file /var/log/ceph/osd.34.log
--- end dump of recent events ---
2013-01-08 23:38:49.252564 7fa3a642e7c0  0 ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b), process ceph-osd, pid 13310
2013-01-08 23:38:49.577405 7fa3a642e7c0  0 filestore(/data/osd.34) mount FIEMAP ioctl is supported and appears to work
2013-01-08 23:38:49.577432 7fa3a642e7c0  0 filestore(/data/osd.34) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-01-08 23:38:49.577934 7fa3a642e7c0  0 filestore(/data/osd.34) mount did NOT detect btrfs
2013-01-08 23:38:49.581048 7fa3a642e7c0  0 filestore(/data/osd.34) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-01-08 23:38:49.583209 7fa3a642e7c0  0 filestore(/data/osd.34) mount syscall(SYS_syncfs, fd) fully supported
2013-01-08 23:38:49.585388 7fa3a642e7c0  0 filestore(/data/osd.34) mount syscall(__NR_syncfs, fd) fully supported
2013-01-08 23:38:49.585514 7fa3a642e7c0  0 filestore(/data/osd.34) mount found snaps <>
2013-01-08 23:38:49.594324 7fa3a642e7c0  0 filestore(/data/osd.34) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-01-08 23:38:49.606596 7fa3a642e7c0  0 journal  kernel version is 3.7.0
2013-01-08 23:38:49.606902 7fa3a642e7c0  1 journal _open /dev/sda8 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:38:49.638795 7fa3a642e7c0  0 journal  kernel version is 3.7.0
2013-01-08 23:38:49.639061 7fa3a642e7c0  1 journal _open /dev/sda8 fd 18: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:38:49.679483 7fa3a642e7c0  1 journal close /dev/sda8
2013-01-08 23:38:49.699079 7fa3a642e7c0  0 filestore(/data/osd.34) mount FIEMAP ioctl is supported and appears to work
2013-01-08 23:38:49.699088 7fa3a642e7c0  0 filestore(/data/osd.34) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-01-08 23:38:49.699316 7fa3a642e7c0  0 filestore(/data/osd.34) mount did NOT detect btrfs
2013-01-08 23:38:49.703818 7fa3a642e7c0  0 filestore(/data/osd.34) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-01-08 23:38:49.705633 7fa3a642e7c0  0 filestore(/data/osd.34) mount syscall(SYS_syncfs, fd) fully supported
2013-01-08 23:38:49.707455 7fa3a642e7c0  0 filestore(/data/osd.34) mount syscall(__NR_syncfs, fd) fully supported
2013-01-08 23:38:49.707553 7fa3a642e7c0  0 filestore(/data/osd.34) mount found snaps <>
2013-01-08 23:38:49.711430 7fa3a642e7c0  0 filestore(/data/osd.34) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-01-08 23:38:49.719920 7fa3a642e7c0  0 journal  kernel version is 3.7.0
2013-01-08 23:38:49.720234 7fa3a642e7c0  1 journal _open /dev/sda8 fd 26: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:38:49.741396 7fa3a642e7c0  0 journal  kernel version is 3.7.0
2013-01-08 23:38:49.741709 7fa3a642e7c0  1 journal _open /dev/sda8 fd 26: 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-01-08 23:39:40.791212 7fa392df6700  0 log [INF] : 0.30 scrub ok

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OSD Crashed when runing "rbd list"
  2013-01-08 15:51 OSD Crashed when runing "rbd list" Chen, Xiaoxi
@ 2013-01-08 15:56 ` James Page
  2013-01-08 17:12 ` Gregory Farnum
  1 sibling, 0 replies; 3+ messages in thread
From: James Page @ 2013-01-08 15:56 UTC (permalink / raw)
  To: Chen, Xiaoxi; +Cc: ceph-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 08/01/13 15:51, Chen, Xiaoxi wrote:
> I would like to upgrade to 0.56-1 but there is no package for 3.7
> kernel(raring)

I uploaded 0.56.1 to Ubuntu Raring this morning - its published and
should ripple through archive mirrors in the next few hours.

- -- 
James Page
Ubuntu Core Developer
Debian Maintainer
james.page@ubuntu.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBCAAGBQJQ7EGwAAoJEL/srsug59jDXpYQANFT1TmBSZlWhARzrRCgPwKG
p80UImsdQwEh8HZ2/hlgWfNM/ybs72yJ5eGYVm6tNSe33pUCLM4tXn/m75vzhhEQ
kvn89qKWVcWKoCcEE2m4ZcDykfmE5Ti0oHhoVRcLkaz3REWRaizBQH4iVV9DY86F
5xhUjEKoHwmpEBcYs2yzbPvEz18dOSmZfBpLMWUJBhofhXQaXUjOKj/0qH0rbfdg
Ntt6ijrP7IFIAYZQ4xbQAs8N7fO/nHna5no/v3KpVY45rHM7/mYjtaWKOenz+CeL
kYlWgQHgMhqP7PSj6FEOEw3ggGJcF4eVL+e/ApFYjJkFIhy+ro46guaSUrvtZGTW
UAPgxeRIRp6d390wlNQQmM/SyzaYRU2vxcCRYm0La2Q5+TrIRit3/4npRvR3az6B
7W/EFYsOJxp5UugYeiMHd3aZRF+ps/4Y3Ay+Uxp2tQO+Ng/PGhDkFz0bN0X5hR8c
Ioha7fWTeibCnPIIDs1GsA4BUDrCqBQoRyiqREYHRxfFISK67RCo6VLVhmjha8Dc
L/8QUCdionsRNjUhzlGlwZxWtPmgFTCDxIXSNXW1QScEaCmceh+P7MXDlXjFlzKG
EYNmTZ5egeNh/OQLWpgEFuqU3tG79yLT2znhXMTl3SMHcNX23LcknsFTZricnxSz
DnKH7gkBhukHrMG/H13V
=8DLv
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OSD Crashed when runing "rbd list"
  2013-01-08 15:51 OSD Crashed when runing "rbd list" Chen, Xiaoxi
  2013-01-08 15:56 ` James Page
@ 2013-01-08 17:12 ` Gregory Farnum
  1 sibling, 0 replies; 3+ messages in thread
From: Gregory Farnum @ 2013-01-08 17:12 UTC (permalink / raw)
  To: Chen, Xiaoxi; +Cc: ceph-devel

On Tue, Jan 8, 2013 at 7:51 AM, Chen, Xiaoxi <xiaoxi.chen@intel.com> wrote:
> Hi List,
>       Every time I ran "rbd list" after creating a lot of rbd volumes (more than 100s), certain OSDs will die,osd.65 die first and then osd.35 (osd.65,that's the fifth disk on the sixth host) will die.
>   Is it a bug for 0.55? My ceph version is 0.55-1 with 3.7 kernel.
>         I would like to upgrade to 0.56-1 but there is no package for 3.7 kernel(raring)
>
>    Log of osd.35 attached.Key messages are below:
>
> 1 -- 192.101.11.203:6843/19960 mark_down 192.101.11.206:6861/3735 -- 0x7f331867a000
>    -38> 2013-01-08 23:37:37.751473 7f3302fc0700 -1 ./messages/MOSDOp.h: In function 'bool MOSDOp::check_rmw(int)' thread 7f3302fc0700 time 2013-01-08 23:37:37.748254
> ./messages/MOSDOp.h: 57: FAILED assert(rmw_flags)
>
>  ceph version 0.55.1 (8e25c8d984f9258644389a18997ec6bdef8e056b)
>  1: (()+0x22f765) [0x7f3310831765]
>  2: (MOSDOpReply::claim_op_out_data(std::vector<OSDOp, std::allocator<OSDOp> >&)+0) [0x7f3310897850]
>  3: (OSD::handle_op(std::tr1::shared_ptr<OpRequest>)+0x441) [0x7f33108f19c1]
>  4: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0x83) [0x7f33108fd8c3]
>  5: (OSD::do_waiters()+0x104) [0x7f33108fdc64]
>  6: (OSD::ms_dispatch(Message*)+0x317) [0x7f33109027e7]
>  7: (DispatchQueue::entry()+0x353) [0x7f3310b6b743]
>  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f3310ac7dad]
>  9: (()+0x7f9f) [0x7f330ffc5f9f]
>  10: (clone()+0x6d) [0x7f330e2800cd]
>
>    Thanks for the help.

Sounds like you've got a v0.56 binary talking to v0.55 daemons. An
upgrade to v0.56.1 should fix it. See
http://tracker.newdream.net/issues/3715
-Greg

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-01-08 17:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-08 15:51 OSD Crashed when runing "rbd list" Chen, Xiaoxi
2013-01-08 15:56 ` James Page
2013-01-08 17:12 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.