All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: MDS crash, wont startup again
       [not found] <CA+WUqKxZoh_5k5qgtUFwnsr2=gaZ4_RONazYQG7enKa75WhNVQ@mail.gmail.com>
@ 2012-05-16  8:11 ` Felix Feinhals
  2012-05-17 21:38   ` Josh Durgin
  0 siblings, 1 reply; 10+ messages in thread
From: Felix Feinhals @ 2012-05-16  8:11 UTC (permalink / raw)
  To: ceph-devel

Hi again,

anything on this Problem? Seems that the only choice for me is to
reinitialize the whole cephfs (mkcephfs...)
:(

2012/5/10 Felix Feinhals <ff@turtle-entertainment.de>:
> Hi List,
>
> we installed a ceph cluster with ceph version 0.46.
> 3 OSDs, 3 MONs and 3 MDSs.
>
> After copying a bunch of files to a ceph-fuse mount all MDS daemons
> crash and now i cant bring them back online.
> I already tried to restart the daemons in different order and also
> removed one OSD, nothing really happened only now we have pgs with
> active+remapped which i think is normal.
> Any hints?
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-16  8:11 ` MDS crash, wont startup again Felix Feinhals
@ 2012-05-17 21:38   ` Josh Durgin
  2012-05-21 12:38     ` Felix Feinhals
  0 siblings, 1 reply; 10+ messages in thread
From: Josh Durgin @ 2012-05-17 21:38 UTC (permalink / raw)
  To: Felix Feinhals; +Cc: ceph-devel

On 05/16/2012 01:11 AM, Felix Feinhals wrote:
> Hi again,
>
> anything on this Problem? Seems that the only choice for me is to
> reinitialize the whole cephfs (mkcephfs...)
> :(

Hi Felix, it looks like your first mail never reached the list.

> 2012/5/10 Felix Feinhals<ff@turtle-entertainment.de>:
>> Hi List,
>>
>> we installed a ceph cluster with ceph version 0.46.
>> 3 OSDs, 3 MONs and 3 MDSs.
>>
>> After copying a bunch of files to a ceph-fuse mount all MDS daemons
>> crash and now i cant bring them back online.
>> I already tried to restart the daemons in different order and also
>> removed one OSD, nothing really happened only now we have pgs with
>> active+remapped which i think is normal.
>> Any hints?

Are all three MDS active? At this point, more than one active MDS is
likely to crash. You can have one active and others standby.

If you've got only one active, what was the backtrace of the crash?
It'll be at the end of the MDS log (by default in /var/log/ceph).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-17 21:38   ` Josh Durgin
@ 2012-05-21 12:38     ` Felix Feinhals
  2012-05-21 18:44       ` Gregory Farnum
  0 siblings, 1 reply; 10+ messages in thread
From: Felix Feinhals @ 2012-05-21 12:38 UTC (permalink / raw)
  To: Josh Durgin; +Cc: ceph-devel

Hi Josh,

i quoted the trace and some other stats in my first email, maybe it
got stuck in the spam filters.
Well next try:

snip

-3> 2012-05-10 14:52:29.509940 7fb1c9351700 1 mds.0.40 handle_mds_map
 i am now mds.0.40
 -2> 2012-05-10 14:52:29.509956 7fb1c9351700 1 mds.0.40 handle_mds_map
 state change up:reconnect --> up:rejoin
 -1> 2012-05-10 14:52:29.509963 7fb1c9351700 1 mds.0.40 rejoin_joint_start
 0> 2012-05-10 14:52:29.512503 7fb1c9351700 -1 *** Caught signal
 (Segmentation fault) **
 in thread 7fb1c9351700

ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
 1: ceph-mds() [0x814279]
 2: (()+0xeff0) [0x7fb1cddbfff0]
 3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x4f) [0x6cb5ef]
 4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55d58b]
 5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29c) [0x572eec]
 6: (MDCache::rejoin_gather_finish()+0x90) [0x5931a0]
 7: (MDCache::rejoin_send_rejoins()+0x2c05) [0x59b9d5]
 8: (MDS::rejoin_joint_start()+0x131) [0x4a8721]
 9: (MDS::handle_mds_map(MMDSMap*)+0x2c4a) [0x4c253a]
 10: (MDS::handle_core_message(Message*)+0x913) [0x4c4513]
 11: (MDS::_dispatch(Message*)+0x2f) [0x4c45ef]
 12: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
 13: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
 14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
 15: (()+0x68ca) [0x7fb1cddb78ca]
 16: (clone()+0x6d) [0x7fb1cc63f92d]

snip

I though ceph chooses which MDS is active and which is standby, i just
have 3 in the cluster config:

[mds.a]
host = x

[mds.b]
host = y

[mds.c]
host = z

no global MDS config.
Should i reconfigure this?



2012/5/17 Josh Durgin <josh.durgin@inktank.com>:
> On 05/16/2012 01:11 AM, Felix Feinhals wrote:
>>
>> Hi again,
>>
>> anything on this Problem? Seems that the only choice for me is to
>> reinitialize the whole cephfs (mkcephfs...)
>> :(
>
>
> Hi Felix, it looks like your first mail never reached the list.
>
>
>> 2012/5/10 Felix Feinhals<ff@turtle-entertainment.de>:
>>>
>>> Hi List,
>>>
>>> we installed a ceph cluster with ceph version 0.46.
>>> 3 OSDs, 3 MONs and 3 MDSs.
>>>
>>> After copying a bunch of files to a ceph-fuse mount all MDS daemons
>>> crash and now i cant bring them back online.
>>> I already tried to restart the daemons in different order and also
>>> removed one OSD, nothing really happened only now we have pgs with
>>> active+remapped which i think is normal.
>>> Any hints?
>
>
> Are all three MDS active? At this point, more than one active MDS is
> likely to crash. You can have one active and others standby.
>
> If you've got only one active, what was the backtrace of the crash?
> It'll be at the end of the MDS log (by default in /var/log/ceph).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-21 12:38     ` Felix Feinhals
@ 2012-05-21 18:44       ` Gregory Farnum
  2012-05-22 10:12         ` Felix Feinhals
  0 siblings, 1 reply; 10+ messages in thread
From: Gregory Farnum @ 2012-05-21 18:44 UTC (permalink / raw)
  To: Felix Feinhals; +Cc: Josh Durgin, ceph-devel

On Mon, May 21, 2012 at 5:38 AM, Felix Feinhals
<ff@turtle-entertainment.de> wrote:
> Hi Josh,
>
> i quoted the trace and some other stats in my first email, maybe it
> got stuck in the spam filters.
> Well next try:
>
> snip
>
> -3> 2012-05-10 14:52:29.509940 7fb1c9351700 1 mds.0.40 handle_mds_map
>  i am now mds.0.40
>  -2> 2012-05-10 14:52:29.509956 7fb1c9351700 1 mds.0.40 handle_mds_map
>  state change up:reconnect --> up:rejoin
>  -1> 2012-05-10 14:52:29.509963 7fb1c9351700 1 mds.0.40 rejoin_joint_start
>  0> 2012-05-10 14:52:29.512503 7fb1c9351700 -1 *** Caught signal
>  (Segmentation fault) **
>  in thread 7fb1c9351700
>
> ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
>  1: ceph-mds() [0x814279]
>  2: (()+0xeff0) [0x7fb1cddbfff0]
>  3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x4f) [0x6cb5ef]
>  4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55d58b]
>  5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29c) [0x572eec]
>  6: (MDCache::rejoin_gather_finish()+0x90) [0x5931a0]
>  7: (MDCache::rejoin_send_rejoins()+0x2c05) [0x59b9d5]
>  8: (MDS::rejoin_joint_start()+0x131) [0x4a8721]
>  9: (MDS::handle_mds_map(MMDSMap*)+0x2c4a) [0x4c253a]
>  10: (MDS::handle_core_message(Message*)+0x913) [0x4c4513]
>  11: (MDS::_dispatch(Message*)+0x2f) [0x4c45ef]
>  12: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
>  13: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
>  14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
>  15: (()+0x68ca) [0x7fb1cddb78ca]
>  16: (clone()+0x6d) [0x7fb1cc63f92d]

There's nothing obvious here — can you run gdb on the core and get
another backtrace and the info from levels 3-5?

> snip
>
> I though ceph chooses which MDS is active and which is standby, i just
> have 3 in the cluster config:
Yes, it does — if you don't increase the number of allowed MDSes
you'll just get one of them active.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-21 18:44       ` Gregory Farnum
@ 2012-05-22 10:12         ` Felix Feinhals
  2012-05-22 17:56           ` Greg Farnum
  0 siblings, 1 reply; 10+ messages in thread
From: Felix Feinhals @ 2012-05-22 10:12 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Josh Durgin, ceph-devel

Hi,

snip

2012-05-22 12:02:02.678094 7f5f43b5d780  1 -- 0.0.0.0:6800/23071 messenger.start
2012-05-22 12:02:02.678245 7f5f43b5d780  1 -- 0.0.0.0:6800/23071 accepter.start
2012-05-22 12:02:02.679672 7f5f43b5d780  1 -- 0.0.0.0:6800/23071 -->
10.3.0.12:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x13ea400
con 0x13ecdc0
2012-05-22 12:02:02.680105 7f5f43b5c700  1 -- 10.3.0.12:6800/23071
learned my addr 10.3.0.12:6800/23071
2012-05-22 12:02:02.680390 7f5f3ecd1700  0 mds.-1.0 ms_handle_connect
on 10.3.0.12:6789/0
2012-05-22 12:02:02.681402 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 1 ==== mon_map v1 ==== 469+0+0 (56645397 0 0)
0x13eaa00 con 0x13ecdc0
2012-05-22 12:02:02.681529 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 2 ==== auth_reply(proto 1 0 Success) v1 ====
24+0+0 (3086217166 0 0) 0x13ea800 con 0x13ecdc0
2012-05-22 12:02:02.681596 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x1404380 con
0x13ecdc0
2012-05-22 12:02:02.681932 7f5f43b5d780  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:boot seq 1 v0) v2 -- ?+0
0x13f7c80 con 0x13ecdc0
2012-05-22 12:02:02.682060 7f5f43b5d780  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0
0x14041c0 con 0x13ecdc0
2012-05-22 12:02:02.682109 7f5f43b5d780  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mon_subscribe({mdsmap=0+,monmap=2+,osdmap=0}) v2
-- ?+0 0x14048c0 con 0x13ecdc0
2012-05-22 12:02:02.682421 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 3 ==== mon_map v1 ==== 469+0+0 (56645397 0 0)
0x1407600 con 0x13ecdc0
2012-05-22 12:02:02.682554 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 4 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
(311377444 0 0) 0x1404380 con 0x13ecdc0
2012-05-22 12:02:02.682978 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 5 ==== osd_map(234..234 src has 1..234) v3 ====
19741+0+0 (3560848488 0 0) 0x1407400 con 0x13ecdc0
2012-05-22 12:02:02.683587 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
(311377444 0 0) 0x1404a80 con 0x13ecdc0
2012-05-22 12:02:02.683601 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 7 ==== mdsmap(e 52010) v1 ==== 535+0+0
(2872786254 0 0) 0x1407200 con 0x13ecdc0
2012-05-22 12:02:02.683631 7f5f3ecd1700  5 mds.-1.0 handle_mds_map
epoch 52010 from mon.2
2012-05-22 12:02:02.683818 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 8 ==== osd_map(234..234 src has 1..234) v3 ====
19741+0+0 (3560848488 0 0) 0x1407000 con 0x13ecdc0
2012-05-22 12:02:02.683932 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 9 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
(311377444 0 0) 0x14041c0 con 0x13ecdc0
2012-05-22 12:02:02.768244 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 10 ==== mdsmap(e 52011) v1 ==== 733+0+0
(1289862739 0 0) 0x1407a00 con 0x13ecdc0
2012-05-22 12:02:02.768364 7f5f3ecd1700  5 mds.-1.-1 handle_mds_map
epoch 52011 from mon.2
2012-05-22 12:02:02.768413 7f5f3ecd1700  1 mds.-1.0 handle_mds_map standby
2012-05-22 12:02:02.933902 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 11 ==== mdsmap(e 52012) v1 ==== 535+0+0
(3794450792 0 0) 0x1407800 con 0x13ecdc0
2012-05-22 12:02:02.933965 7f5f3ecd1700  5 mds.-1.0 handle_mds_map
epoch 52012 from mon.2
2012-05-22 12:02:02.934007 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071
mark_down 10.3.0.12:6800/23046 -- pipe dne
2012-05-22 12:02:02.934016 7f5f3ecd1700  1 mds.0.49 handle_mds_map i
am now mds.0.49
2012-05-22 12:02:02.934030 7f5f3ecd1700  1 mds.0.49 handle_mds_map
state change up:standby --> up:replay
2012-05-22 12:02:02.934040 7f5f3ecd1700  1 mds.0.49 replay_start
2012-05-22 12:02:02.934052 7f5f3ecd1700  1 mds.0.49  recovery set is
2012-05-22 12:02:02.934055 7f5f3ecd1700  1 mds.0.49  need osdmap epoch
235, have 234
2012-05-22 12:02:02.934058 7f5f3ecd1700  1 mds.0.49  waiting for
osdmap 235 (which blacklists prior instance)
2012-05-22 12:02:02.934076 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 --
mon_subscribe({mdsmap=52013+,monmap=2+,osdmap=235}) v2 -- ?+0
0x14041c0 con 0x13ecdc0
2012-05-22 12:02:02.934154 7f5f3ecd1700  1 mds.0.cache
handle_mds_failure mds.0 : recovery peers are
2012-05-22 12:02:02.934184 7f5f3ecd1700  5 mds.0.migrator
handle_mds_failure_or_stop mds.0
2012-05-22 12:02:02.992970 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 12 ==== osd_map(235..235 src has 1..235) v3
==== 300+0+0 (2376656173 0 0) 0x1407600 con 0x13ecdc0
2012-05-22 12:02:02.993145 7f5f3ecd1700  2 mds.0.49 boot_start 1:
opening inotable
2012-05-22 12:02:02.993411 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:1 mds0_inotable [read 0~0]
1.b852b893) v4 -- ?+0 0x13e36c0 con 0x1408a00
2012-05-22 12:02:02.993471 7f5f3ecd1700  2 mds.0.49 boot_start 1:
opening sessionmap
2012-05-22 12:02:02.993553 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:2 mds0_sessionmap [read 0~0]
1.3270c60b) v4 -- ?+0 0x13e3d80 con 0x1408a00
2012-05-22 12:02:02.993577 7f5f3ecd1700  2 mds.0.49 boot_start 1:
opening anchor table
2012-05-22 12:02:02.993625 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:3 mds_anchortable [read 0~0]
1.a977f6a7) v4 -- ?+0 0x13e3900 con 0x1408a00
2012-05-22 12:02:02.993634 7f5f3ecd1700  2 mds.0.49 boot_start 1:
opening snap table
2012-05-22 12:02:02.993671 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:4 mds_snaptable [read 0~0]
1.d90270ad) v4 -- ?+0 0x140d6c0 con 0x1408a00
2012-05-22 12:02:02.993680 7f5f3ecd1700  2 mds.0.49 boot_start 1:
opening mds log
2012-05-22 12:02:02.993700 7f5f3ecd1700  5 mds.0.log open discovering log bounds
2012-05-22 12:02:02.993792 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:5 200.00000000 [read 0~0]
1.844f3494) v4 -- ?+0 0x140d480 con 0x1408a00
2012-05-22 12:02:02.993826 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 13 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0
(311377444 0 0) 0x140b380 con 0x13ecdc0
2012-05-22 12:02:02.994043 7f5f3ecd1700  0 mds.0.49 ms_handle_connect
on 10.3.0.12:6801/1172
2012-05-22 12:02:02.995288 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 1 ==== osd_op_reply(1 mds0_inotable [read
0~29] = 0) v4 ==== 112+0+29 (3509020669 0 2080707671) 0x1407400 con
0x1408a00
2012-05-22 12:02:02.995518 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 2 ==== osd_op_reply(2 mds0_sessionmap [read
0~10591] = 0) v4 ==== 114+0+10591 (734241002 0 1640895633) 0x1407000
con 0x1408a00
2012-05-22 12:02:02.995836 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 3 ==== osd_op_reply(3 mds_anchortable [read
0~29] = 0) v4 ==== 114+0+29 (1868654419 0 218791405) 0x1407200 con
0x1408a00
2012-05-22 12:02:02.995890 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 4 ==== osd_op_reply(4 mds_snaptable [read
0~41] = 0) v4 ==== 112+0+41 (3594810684 0 2312406677) 0x1407e00 con
0x1408a00
2012-05-22 12:02:02.995940 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 5 ==== osd_op_reply(5 200.00000000 [read
0~84] = 0) v4 ==== 111+0+84 (3837001893 0 1130699602) 0x1407c00 con
0x1408a00
2012-05-22 12:02:02.996032 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:6 200.0000016d [stat 0~0]
1.6f7c9ad6 rwordered) v4 -- ?+0 0x140d480 con 0x1408a00
2012-05-22 12:02:02.996107 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:7 200.0000016e [stat 0~0]
1.9feb60a6 rwordered) v4 -- ?+0 0x140d6c0 con 0x1408a00
2012-05-22 12:02:02.996608 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 6 ==== osd_op_reply(6 200.0000016d [stat
0~0] = 0) v4 ==== 111+0+16 (1485433033 0 624451111) 0x1407a00 con
0x1408a00
2012-05-22 12:02:02.997170 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 7 ==== osd_op_reply(7 200.0000016e [stat
0~0] = 0) v4 ==== 111+0+16 (2013887546 0 3894377183) 0x1407000 con
0x1408a00
2012-05-22 12:02:02.997202 7f5f3ecd1700  2 mds.0.49 boot_start 2:
loading/discovering base inodes
2012-05-22 12:02:02.997211 7f5f3ecd1700  0 mds.0.cache creating system
inode with ino:100
2012-05-22 12:02:02.999723 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:8 100.00000000 [getxattr inode]
1.c5265ab3) v4 -- ?+0 0x13e36c0 con 0x1408a00
2012-05-22 12:02:02.999788 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:9 100.00000000.inode [read 0~0]
1.85dde07f) v4 -- ?+0 0x13e3900 con 0x1408a00
2012-05-22 12:02:02.999804 7f5f3ecd1700  0 mds.0.cache creating system
inode with ino:1
2012-05-22 12:02:02.999874 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:10 1.00000000 [getxattr inode]
1.6b2cdaff) v4 -- ?+0 0x140db40 con 0x1408a00
2012-05-22 12:02:03.000142 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:11 1.00000000.inode [read 0~0]
1.232c0e14) v4 -- ?+0 0x140d900 con 0x1408a00
2012-05-22 12:02:03.000936 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 8 ==== osd_op_reply(9 100.00000000.inode
[read 0~399] = 0) v4 ==== 117+0+399 (2159939521 0 4121830280)
0x1407400 con 0x1408a00
2012-05-22 12:02:03.001468 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 9 ==== osd_op_reply(8 100.00000000
[getxattr] = -61 (No data available)) v4 ==== 111+0+0 (4127029859 0 0)
0x1407600 con 0x1408a00
2012-05-22 12:02:03.001669 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 10 ==== osd_op_reply(10 1.00000000
[getxattr] = -61 (No data available)) v4 ==== 109+0+0 (1727840390 0 0)
0x1407800 con 0x1408a00
2012-05-22 12:02:03.001742 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 11 ==== osd_op_reply(11 1.00000000.inode
[read 0~428] = 0) v4 ==== 115+0+428 (3364705558 0 715602140) 0x1407200
con 0x1408a00
2012-05-22 12:02:03.001809 7f5f3ecd1700  2 mds.0.49 boot_start 3:
replaying mds log
2012-05-22 12:02:03.002198 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:12 200.0000014f [read
2535~4191769] 1.b79a8c0b) v4 -- ?+0 0x140d480 con 0x1408a00
2012-05-22 12:02:03.002290 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:13 200.00000150 [read
0~4194304] 1.23163b1d) v4 -- ?+0 0x13e36c0 con 0x1408a00
2012-05-22 12:02:03.004043 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:14 200.00000151 [read
0~4194304] 1.28b76431) v4 -- ?+0 0x140d6c0 con 0x1408a00
2012-05-22 12:02:03.004205 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:15 200.00000152 [read
0~4194304] 1.ae9bb683) v4 -- ?+0 0x140d480 con 0x1408a00
2012-05-22 12:02:03.004380 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:16 200.00000153 [read
0~4194304] 1.d9966785) v4 -- ?+0 0x140d900 con 0x1408a00
2012-05-22 12:02:03.004454 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:17 200.00000154 [read
0~4194304] 1.47f9bf13) v4 -- ?+0 0x140db40 con 0x1408a00
2012-05-22 12:02:03.004520 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:18 200.00000155 [read
0~4194304] 1.460264c4) v4 -- ?+0 0x13e3900 con 0x1408a00
2012-05-22 12:02:03.004582 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:19 200.00000156 [read
0~4194304] 1.cae377ec) v4 -- ?+0 0x13e36c0 con 0x1408a00
2012-05-22 12:02:03.004650 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:20 200.00000157 [read
0~4194304] 1.1c534f6b) v4 -- ?+0 0x140d6c0 con 0x1408a00
2012-05-22 12:02:03.004740 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:21 200.00000158 [read
0~4194304] 1.21f6fd52) v4 -- ?+0 0x143f900 con 0x1408a00
2012-05-22 12:02:03.031626 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 12 ==== osd_op_reply(12 200.0000014f [read
2535~4191769] = 0) v4 ==== 111+0+4191769 (86938826 0 940254350)
0x1407600 con 0x1408a00
2012-05-22 12:02:03.049765 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 13 ==== osd_op_reply(13 200.00000150 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2540809359 0 2765354474)
0x1407400 con 0x1408a00
2012-05-22 12:02:03.061435 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 14 ==== osd_op_reply(14 200.00000151 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2370055404 0 728673944)
0x1407000 con 0x1408a00
2012-05-22 12:02:03.073952 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 15 ==== osd_op_reply(15 200.00000152 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2835608127 0 4176800115)
0x1407a00 con 0x1408a00
2012-05-22 12:02:03.086138 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:22 200.00000159 [read
0~4194304] 1.11c4d2c4) v4 -- ?+0 0x143f480 con 0x1408a00
2012-05-22 12:02:03.087461 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 16 ==== osd_op_reply(16 200.00000153 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (414535972 0 2827888072)
0x1407c00 con 0x1408a00
2012-05-22 12:02:03.119482 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 17 ==== osd_op_reply(17 200.00000154 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2490448686 0 1942553121)
0x1407e00 con 0x1408a00
2012-05-22 12:02:03.119562 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 18 ==== osd_op_reply(18 200.00000155 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (1664826329 0 2280396596)
0x143ee00 con 0x1408a00
2012-05-22 12:02:03.131469 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 19 ==== osd_op_reply(19 200.00000156 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (324288200 0 2843054375)
0x143ec00 con 0x1408a00
2012-05-22 12:02:03.139410 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:23 200.0000015a [read
0~4194304] 1.d46a5844) v4 -- ?+0 0x143f000 con 0x1408a00
2012-05-22 12:02:03.145585 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 20 ==== osd_op_reply(20 200.00000157 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (542705527 0 967023793)
0x143ea00 con 0x1408a00
2012-05-22 12:02:03.156508 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 21 ==== osd_op_reply(21 200.00000158 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (3296968726 0 3798599763)
0x143e800 con 0x1408a00
2012-05-22 12:02:03.172296 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 22 ==== osd_op_reply(22 200.00000159 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2210915791 0 2737758750)
0x143e600 con 0x1408a00
2012-05-22 12:02:03.182754 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:24 200.0000015b [read
0~4194304] 1.a8a12f71) v4 -- ?+0 0x140d480 con 0x1408a00
2012-05-22 12:02:03.185018 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 23 ==== osd_op_reply(23 200.0000015a [read
0~4194304] = 0) v4 ==== 111+0+4194304 (859847618 0 611094159)
0x143e400 con 0x1408a00
2012-05-22 12:02:03.205868 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 24 ==== osd_op_reply(24 200.0000015b [read
0~4194304] = 0) v4 ==== 111+0+4194304 (541055975 0 524786588)
0x143e200 con 0x1408a00
2012-05-22 12:02:03.225090 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:25 200.0000015c [read
0~4194304] 1.941114d8) v4 -- ?+0 0x13e3900 con 0x1408a00
2012-05-22 12:02:03.242516 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 25 ==== osd_op_reply(25 200.0000015c [read
0~4194304] = 0) v4 ==== 111+0+4194304 (1141303529 0 1957418634)
0x143e200 con 0x1408a00
2012-05-22 12:02:03.263094 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:26 200.0000015d [read
0~4194304] 1.bdfe9959) v4 -- ?+0 0x143fb40 con 0x1408a00
2012-05-22 12:02:03.278274 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 26 ==== osd_op_reply(26 200.0000015d [read
0~4194304] = 0) v4 ==== 111+0+4194304 (3503869406 0 773787611)
0x143e400 con 0x1408a00
2012-05-22 12:02:03.300327 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:27 200.0000015e [read
0~4194304] 1.b8cf7266) v4 -- ?+0 0x140db40 con 0x1408a00
2012-05-22 12:02:03.316669 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 27 ==== osd_op_reply(27 200.0000015e [read
0~4194304] = 0) v4 ==== 111+0+4194304 (1073368869 0 2950585796)
0x143e600 con 0x1408a00
2012-05-22 12:02:03.336096 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:28 200.0000015f [read
0~4194304] 1.2980b721) v4 -- ?+0 0x1ba9b40 con 0x1408a00
2012-05-22 12:02:03.352131 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 28 ==== osd_op_reply(28 200.0000015f [read
0~4194304] = 0) v4 ==== 111+0+4194304 (3513547997 0 1851987047)
0x143e800 con 0x1408a00
2012-05-22 12:02:03.371923 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:29 200.00000160 [read
0~4194304] 1.574ae4ae) v4 -- ?+0 0x1ba96c0 con 0x1408a00
2012-05-22 12:02:03.388290 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 29 ==== osd_op_reply(29 200.00000160 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2670366445 0 4080047592)
0x143ea00 con 0x1408a00
2012-05-22 12:02:03.410233 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:30 200.00000161 [read
0~4194304] 1.87287887) v4 -- ?+0 0x1ba9240 con 0x1408a00
2012-05-22 12:02:03.426379 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 30 ==== osd_op_reply(30 200.00000161 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2306991298 0 2632960265)
0x143ec00 con 0x1408a00
2012-05-22 12:02:03.450460 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:31 200.00000162 [read
0~4194304] 1.f23e2d31) v4 -- ?+0 0x140db40 con 0x1408a00
2012-05-22 12:02:03.466463 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 31 ==== osd_op_reply(31 200.00000162 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (4169482190 0 1357295963)
0x143ee00 con 0x1408a00
2012-05-22 12:02:03.492735 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:32 200.00000163 [read
0~4194304] 1.ac437fe4) v4 -- ?+0 0x143fb40 con 0x1408a00
2012-05-22 12:02:03.528942 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:33 200.00000164 [read
0~4194304] 1.5a233b33) v4 -- ?+0 0x143f480 con 0x1408a00
2012-05-22 12:02:03.570058 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:34 200.00000165 [read
0~4194304] 1.1f8332c7) v4 -- ?+0 0x13e3900 con 0x1408a00
2012-05-22 12:02:03.570757 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 32 ==== osd_op_reply(32 200.00000163 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (2725338461 0 3594916529)
0x1d3ce00 con 0x1408a00
2012-05-22 12:02:03.589941 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 33 ==== osd_op_reply(33 200.00000164 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (988396741 0 694923933)
0x1d3cc00 con 0x1408a00
2012-05-22 12:02:03.594028 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 34 ==== osd_op_reply(34 200.00000165 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (894342962 0 1116835292)
0x1d3ca00 con 0x1408a00
2012-05-22 12:02:03.616937 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:35 200.00000166 [read
0~4194304] 1.8ce28c4a) v4 -- ?+0 0x1d43b40 con 0x1408a00
2012-05-22 12:02:03.633104 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 35 ==== osd_op_reply(35 200.00000166 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (34046407 0 554791290) 0x1d3c800
con 0x1408a00
2012-05-22 12:02:03.656933 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:36 200.00000167 [read
0~4194304] 1.7e245b6f) v4 -- ?+0 0x1d436c0 con 0x1408a00
2012-05-22 12:02:03.672729 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 36 ==== osd_op_reply(36 200.00000167 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (3542980681 0 3731006206)
0x1d3c600 con 0x1408a00
2012-05-22 12:02:03.694800 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:37 200.00000168 [read
0~4194304] 1.b1365376) v4 -- ?+0 0x1d43240 con 0x1408a00
2012-05-22 12:02:03.713056 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 37 ==== osd_op_reply(37 200.00000168 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (3143521032 0 2534577188)
0x1d3c400 con 0x1408a00
2012-05-22 12:02:03.732734 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:38 200.00000169 [read
0~4194304] 1.ff0405d0) v4 -- ?+0 0x1d43b40 con 0x1408a00
2012-05-22 12:02:03.751450 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 38 ==== osd_op_reply(38 200.00000169 [read
0~4194304] = 0) v4 ==== 111+0+4194304 (733149638 0 738384784)
0x1d3c200 con 0x1408a00
2012-05-22 12:02:03.770040 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:39 200.0000016a [read
0~4194304] 1.3f32f1a0) v4 -- ?+0 0x143f480 con 0x1408a00
2012-05-22 12:02:03.787041 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 39 ==== osd_op_reply(39 200.0000016a [read
0~4194304] = 0) v4 ==== 111+0+4194304 (79880031 0 463711875) 0x1d3c000
con 0x1408a00
2012-05-22 12:02:03.807842 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:40 200.0000016b [read
0~4194304] 1.c299cbd3) v4 -- ?+0 0x140db40 con 0x1408a00
2012-05-22 12:02:03.824629 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 40 ==== osd_op_reply(40 200.0000016b [read
0~4194304] = 0) v4 ==== 111+0+4194304 (707054799 0 1356117031)
0x1d3cc00 con 0x1408a00
2012-05-22 12:02:03.845870 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:41 200.0000016c [read
0~4194304] 1.41d6438b) v4 -- ?+0 0x1ba96c0 con 0x1408a00
2012-05-22 12:02:03.862209 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 41 ==== osd_op_reply(41 200.0000016c [read
0~4194304] = 0) v4 ==== 111+0+4194304 (78005920 0 1848623926)
0x1d3ce00 con 0x1408a00
2012-05-22 12:02:03.882715 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:42 200.0000016d [read
0~4194304] 1.6f7c9ad6) v4 -- ?+0 0x1d58d80 con 0x1408a00
2012-05-22 12:02:03.901124 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 42 ==== osd_op_reply(42 200.0000016d [read
0~4194304] = 0) v4 ==== 111+0+4194304 (3681601409 0 3284364392)
0x143ee00 con 0x1408a00
2012-05-22 12:02:03.921699 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:43 200.0000016e [read 0~149004]
1.9feb60a6) v4 -- ?+0 0x1d58900 con 0x1408a00
2012-05-22 12:02:03.923308 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 43 ==== osd_op_reply(43 200.0000016e [read
0~149004] = 0) v4 ==== 111+0+149004 (188018409 0 1584225691) 0x143ec00
con 0x1408a00
2012-05-22 12:02:04.270213 7f5f3c9ca700  1 mds.0.49 replay_done
2012-05-22 12:02:04.270248 7f5f3c9ca700  1 mds.0.49 making mds journal writeable
2012-05-22 12:02:04.270338 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:44 200.0000016e [zero
149004~4045300] 1.9feb60a6) v4 -- ?+0 0x1ea56c0 con 0x1408a00
2012-05-22 12:02:04.270415 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:45 200.0000016f [delete]
1.e6816672) v4 -- ?+0 0x1ea5480 con 0x1408a00
2012-05-22 12:02:04.270488 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:46 200.00000170 [delete]
1.3e8426cd) v4 -- ?+0 0x1ea5240 con 0x1408a00
2012-05-22 12:02:04.270569 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:47 200.00000171 [delete]
1.75986202) v4 -- ?+0 0x1ea5000 con 0x1408a00
2012-05-22 12:02:04.270641 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:48 200.00000172 [delete]
1.d3c077fe) v4 -- ?+0 0x1ea56c0 con 0x1408a00
2012-05-22 12:02:04.270796 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6801/1172 -- osd_op(mds.0.49:49 200.00000173 [delete]
1.3b1f2284) v4 -- ?+0 0x1d58900 con 0x1408a00
2012-05-22 12:02:04.270893 7f5f3c9ca700  2 mds.0.49 i am alone, moving
to state reconnect
2012-05-22 12:02:04.270899 7f5f3c9ca700  3 mds.0.49 request_state up:reconnect
2012-05-22 12:02:04.270916 7f5f3c9ca700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 2 v52012) v2 --
?+0 0x140c280 con 0x13ecdc0
2012-05-22 12:02:04.271925 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 44 ==== osd_op_reply(45 200.0000016f
[delete] ondisk = -2 (No such file or directory)) v4 ==== 111+0+0
(3525298709 0 0) 0x143ea00 con 0x1408a00
2012-05-22 12:02:04.272089 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 45 ==== osd_op_reply(46 200.00000170
[delete] ondisk = -2 (No such file or directory)) v4 ==== 111+0+0
(1763128336 0 0) 0x143e800 con 0x1408a00
2012-05-22 12:02:04.272129 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 46 ==== osd_op_reply(47 200.00000171
[delete] ondisk = -2 (No such file or directory)) v4 ==== 111+0+0
(1713780953 0 0) 0x143e600 con 0x1408a00
2012-05-22 12:02:04.272151 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 47 ==== osd_op_reply(48 200.00000172
[delete] ondisk = -2 (No such file or directory)) v4 ==== 111+0+0
(1956823652 0 0) 0x143e400 con 0x1408a00
2012-05-22 12:02:04.272206 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 48 ==== osd_op_reply(49 200.00000173
[delete] ondisk = -2 (No such file or directory)) v4 ==== 111+0+0
(1397982303 0 0) 0x143e200 con 0x1408a00
2012-05-22 12:02:04.274577 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
osd.2 10.3.0.12:6801/1172 49 ==== osd_op_reply(44 200.0000016e
[truncate 149004] ondisk = 0) v4 ==== 111+0+0 (623816276 0 0)
0x143ec00 con 0x1408a00
2012-05-22 12:02:04.358132 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 14 ==== mdsmap(e 52013) v1 ==== 535+0+0
(132891602 0 0) 0x13eae00 con 0x13ecdc0
2012-05-22 12:02:04.358253 7f5f3ecd1700  5 mds.0.49 handle_mds_map
epoch 52013 from mon.2
2012-05-22 12:02:04.358293 7f5f3ecd1700  1 mds.0.49 handle_mds_map i
am now mds.0.49
2012-05-22 12:02:04.358299 7f5f3ecd1700  1 mds.0.49 handle_mds_map
state change up:replay --> up:reconnect
2012-05-22 12:02:04.358314 7f5f3ecd1700  1 mds.0.49 reconnect_start
2012-05-22 12:02:04.358319 7f5f3ecd1700  1 mds.0.49 reopen_log
2012-05-22 12:02:04.358340 7f5f3ecd1700  1 mds.0.server
reconnect_clients -- 2 sessions
2012-05-22 12:02:04.358474 7f5f3ecd1700  5 mds.0.bal rebalance done
2012-05-22 12:02:08.271081 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 3 v52013) v2 --
?+0 0x13f7c80 con 0x13ecdc0
2012-05-22 12:02:08.272677 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 15 ==== mdsbeacon(6308/c up:reconnect seq 3
v52013) v2 ==== 103+0+0 (3383220134 0 0) 0x140c280 con 0x13ecdc0
2012-05-22 12:02:12.271359 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 4 v52013) v2 --
?+0 0x140c780 con 0x13ecdc0
2012-05-22 12:02:12.272900 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 16 ==== mdsbeacon(6308/c up:reconnect seq 4
v52013) v2 ==== 103+0+0 (2744505186 0 0) 0x140cc80 con 0x13ecdc0
2012-05-22 12:02:16.271690 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 5 v52013) v2 --
?+0 0x140c500 con 0x13ecdc0
2012-05-22 12:02:16.273021 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 17 ==== mdsbeacon(6308/c up:reconnect seq 5
v52013) v2 ==== 103+0+0 (3669798405 0 0) 0x140ca00 con 0x13ecdc0
2012-05-22 12:02:20.272061 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 6 v52013) v2 --
?+0 0x140cc80 con 0x13ecdc0
2012-05-22 12:02:20.273524 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 18 ==== mdsbeacon(6308/c up:reconnect seq 6
v52013) v2 ==== 103+0+0 (1372006828 0 0) 0x140c500 con 0x13ecdc0
2012-05-22 12:02:24.272320 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 7 v52013) v2 --
?+0 0x140c280 con 0x13ecdc0
2012-05-22 12:02:24.273966 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 19 ==== mdsbeacon(6308/c up:reconnect seq 7
v52013) v2 ==== 103+0+0 (686707915 0 0) 0x140c780 con 0x13ecdc0
2012-05-22 12:02:28.272580 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 8 v52013) v2 --
?+0 0x13f7c80 con 0x13ecdc0
2012-05-22 12:02:28.273963 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 20 ==== mdsbeacon(6308/c up:reconnect seq 8
v52013) v2 ==== 103+0+0 (2242060324 0 0) 0x140ca00 con 0x13ecdc0
2012-05-22 12:02:32.272813 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 9 v52013) v2 --
?+0 0x1d9c780 con 0x13ecdc0
2012-05-22 12:02:32.274428 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 21 ==== mdsbeacon(6308/c up:reconnect seq 9
v52013) v2 ==== 103+0+0 (4236926275 0 0) 0x1d9c780 con 0x13ecdc0
2012-05-22 12:02:36.273024 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 10 v52013) v2 --
?+0 0x1d9c500 con 0x13ecdc0
2012-05-22 12:02:36.274520 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 22 ==== mdsbeacon(6308/c up:reconnect seq 10
v52013) v2 ==== 103+0+0 (2012338922 0 0) 0x13f7c80 con 0x13ecdc0
2012-05-22 12:02:40.273243 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 11 v52013) v2 --
?+0 0x1d9c280 con 0x13ecdc0
2012-05-22 12:02:40.274602 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 23 ==== mdsbeacon(6308/c up:reconnect seq 11
v52013) v2 ==== 103+0+0 (249079693 0 0) 0x140c280 con 0x13ecdc0
2012-05-22 12:02:44.273455 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 12 v52013) v2 --
?+0 0x1d9c000 con 0x13ecdc0
2012-05-22 12:02:44.274917 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 24 ==== mdsbeacon(6308/c up:reconnect seq 12
v52013) v2 ==== 103+0+0 (1693098825 0 0) 0x140cc80 con 0x13ecdc0
2012-05-22 12:02:48.273651 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 13 v52013) v2 --
?+0 0x1d9ca00 con 0x13ecdc0
2012-05-22 12:02:48.275240 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 25 ==== mdsbeacon(6308/c up:reconnect seq 13
v52013) v2 ==== 103+0+0 (499375662 0 0) 0x1d9cc80 con 0x13ecdc0
2012-05-22 12:02:52.273831 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:reconnect seq 14 v52013) v2 --
?+0 0x1d9c780 con 0x13ecdc0
2012-05-22 12:02:52.275198 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 26 ==== mdsbeacon(6308/c up:reconnect seq 14
v52013) v2 ==== 103+0+0 (2528663943 0 0) 0x1d9c780 con 0x13ecdc0
2012-05-22 12:02:52.683463 7f5f3d3cd700  1 mds.0.server reconnect gave
up on client.4114 10.3.0.11:0/32195
2012-05-22 12:02:52.683505 7f5f3d3cd700  1 mds.0.server reconnect gave
up on client.5099 10.3.0.12:0/773
2012-05-22 12:02:52.683527 7f5f3d3cd700  1 mds.0.49 reconnect_done
2012-05-22 12:02:52.683541 7f5f3d3cd700  3 mds.0.49 request_state up:rejoin
2012-05-22 12:02:52.683570 7f5f3d3cd700  1 -- 10.3.0.12:6800/23071 -->
10.3.0.12:6789/0 -- mdsbeacon(6308/c up:rejoin seq 15 v52013) v2 --
?+0 0x140ca00 con 0x13ecdc0
2012-05-22 12:02:53.138079 7f5f3ecd1700  1 -- 10.3.0.12:6800/23071 <==
mon.2 10.3.0.12:6789/0 27 ==== mdsmap(e 52014) v1 ==== 535+0+0
(780151091 0 0) 0x13ea800 con 0x13ecdc0
2012-05-22 12:02:53.138146 7f5f3ecd1700  5 mds.0.49 handle_mds_map
epoch 52014 from mon.2
2012-05-22 12:02:53.138192 7f5f3ecd1700  1 mds.0.49 handle_mds_map i
am now mds.0.49
2012-05-22 12:02:53.138198 7f5f3ecd1700  1 mds.0.49 handle_mds_map
state change up:reconnect --> up:rejoin
2012-05-22 12:02:53.138204 7f5f3ecd1700  1 mds.0.49 rejoin_joint_start
2012-05-22 12:02:53.140710 7f5f3ecd1700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f5f3ecd1700

 ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
 1: ceph-mds() [0x814279]
 2: (()+0xeff0) [0x7f5f4373fff0]
 3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x4f) [0x6cb5ef]
 4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55d58b]
 5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29c) [0x572eec]
 6: (MDCache::rejoin_gather_finish()+0x90) [0x5931a0]
 7: (MDCache::rejoin_send_rejoins()+0x2c05) [0x59b9d5]
 8: (MDS::rejoin_joint_start()+0x131) [0x4a8721]
 9: (MDS::handle_mds_map(MMDSMap*)+0x2c4a) [0x4c253a]
 10: (MDS::handle_core_message(Message*)+0x913) [0x4c4513]
 11: (MDS::_dispatch(Message*)+0x2f) [0x4c45ef]
 12: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
 13: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
 14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
 15: (()+0x68ca) [0x7f5f437378ca]
 16: (clone()+0x6d) [0x7f5f41fbf92d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
  -194> 2012-05-22 12:02:02.673184 7f5f43b5d780  0 ceph version 0.46
(commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1), process ceph-mds,
pid 23071
  -193> 2012-05-22 12:02:02.673364 7f5f43b5d780  1 --
0.0.0.0:6800/23071 accepter.bind my_inst.addr is 0.0.0.0:6800/23071
need_addr=1
  -192> 2012-05-22 12:02:02.678094 7f5f43b5d780  1 --
0.0.0.0:6800/23071 messenger.start
  -191> 2012-05-22 12:02:02.678245 7f5f43b5d780  1 --
0.0.0.0:6800/23071 accepter.start
  -190> 2012-05-22 12:02:02.679672 7f5f43b5d780  1 --
0.0.0.0:6800/23071 --> 10.3.0.12:6789/0 -- auth(proto 0 26 bytes epoch
0) v1 -- ?+0 0x13ea400 con 0x13ecdc0
  -189> 2012-05-22 12:02:02.680105 7f5f43b5c700  1 --
10.3.0.12:6800/23071 learned my addr 10.3.0.12:6800/23071
  -188> 2012-05-22 12:02:02.680390 7f5f3ecd1700  0 mds.-1.0
ms_handle_connect on 10.3.0.12:6789/0
  -187> 2012-05-22 12:02:02.681402 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 1 ==== mon_map v1 ====
469+0+0 (56645397 0 0) 0x13eaa00 con 0x13ecdc0
  -186> 2012-05-22 12:02:02.681529 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 2 ====
auth_reply(proto 1 0 Success) v1 ==== 24+0+0 (3086217166 0 0)
0x13ea800 con 0x13ecdc0
  -185> 2012-05-22 12:02:02.681596 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 --
mon_subscribe({monmap=0+}) v2 -- ?+0 0x1404380 con 0x13ecdc0
  -184> 2012-05-22 12:02:02.681932 7f5f43b5d780  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c up:boot
seq 1 v0) v2 -- ?+0 0x13f7c80 con 0x13ecdc0
  -183> 2012-05-22 12:02:02.682060 7f5f43b5d780  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 --
mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x14041c0 con 0x13ecdc0
  -182> 2012-05-22 12:02:02.682109 7f5f43b5d780  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 --
mon_subscribe({mdsmap=0+,monmap=2+,osdmap=0}) v2 -- ?+0 0x14048c0 con
0x13ecdc0
  -181> 2012-05-22 12:02:02.682421 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 3 ==== mon_map v1 ====
469+0+0 (56645397 0 0) 0x1407600 con 0x13ecdc0
  -180> 2012-05-22 12:02:02.682554 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 4 ====
mon_subscribe_ack(300s) v1 ==== 20+0+0 (311377444 0 0) 0x1404380 con
0x13ecdc0
  -179> 2012-05-22 12:02:02.682978 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 5 ====
osd_map(234..234 src has 1..234) v3 ==== 19741+0+0 (3560848488 0 0)
0x1407400 con 0x13ecdc0
  -178> 2012-05-22 12:02:02.683587 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 6 ====
mon_subscribe_ack(300s) v1 ==== 20+0+0 (311377444 0 0) 0x1404a80 con
0x13ecdc0
  -177> 2012-05-22 12:02:02.683601 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 7 ==== mdsmap(e 52010)
v1 ==== 535+0+0 (2872786254 0 0) 0x1407200 con 0x13ecdc0
  -176> 2012-05-22 12:02:02.683631 7f5f3ecd1700  5 mds.-1.0
handle_mds_map epoch 52010 from mon.2
  -175> 2012-05-22 12:02:02.683818 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 8 ====
osd_map(234..234 src has 1..234) v3 ==== 19741+0+0 (3560848488 0 0)
0x1407000 con 0x13ecdc0
  -174> 2012-05-22 12:02:02.683932 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 9 ====
mon_subscribe_ack(300s) v1 ==== 20+0+0 (311377444 0 0) 0x14041c0 con
0x13ecdc0
  -173> 2012-05-22 12:02:02.768244 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 10 ==== mdsmap(e
52011) v1 ==== 733+0+0 (1289862739 0 0) 0x1407a00 con 0x13ecdc0
  -172> 2012-05-22 12:02:02.768364 7f5f3ecd1700  5 mds.-1.-1
handle_mds_map epoch 52011 from mon.2
  -171> 2012-05-22 12:02:02.768413 7f5f3ecd1700  1 mds.-1.0
handle_mds_map standby
  -170> 2012-05-22 12:02:02.933902 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 11 ==== mdsmap(e
52012) v1 ==== 535+0+0 (3794450792 0 0) 0x1407800 con 0x13ecdc0
  -169> 2012-05-22 12:02:02.933965 7f5f3ecd1700  5 mds.-1.0
handle_mds_map epoch 52012 from mon.2
  -168> 2012-05-22 12:02:02.934007 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 mark_down 10.3.0.12:6800/23046 -- pipe dne
  -167> 2012-05-22 12:02:02.934016 7f5f3ecd1700  1 mds.0.49
handle_mds_map i am now mds.0.49
  -166> 2012-05-22 12:02:02.934030 7f5f3ecd1700  1 mds.0.49
handle_mds_map state change up:standby --> up:replay
  -165> 2012-05-22 12:02:02.934040 7f5f3ecd1700  1 mds.0.49 replay_start
  -164> 2012-05-22 12:02:02.934052 7f5f3ecd1700  1 mds.0.49  recovery set is
  -163> 2012-05-22 12:02:02.934055 7f5f3ecd1700  1 mds.0.49  need
osdmap epoch 235, have 234
  -162> 2012-05-22 12:02:02.934058 7f5f3ecd1700  1 mds.0.49  waiting
for osdmap 235 (which blacklists prior instance)
  -161> 2012-05-22 12:02:02.934076 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 --
mon_subscribe({mdsmap=52013+,monmap=2+,osdmap=235}) v2 -- ?+0
0x14041c0 con 0x13ecdc0
  -160> 2012-05-22 12:02:02.934154 7f5f3ecd1700  1 mds.0.cache
handle_mds_failure mds.0 : recovery peers are
  -159> 2012-05-22 12:02:02.934184 7f5f3ecd1700  5 mds.0.migrator
handle_mds_failure_or_stop mds.0
  -158> 2012-05-22 12:02:02.992970 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 12 ====
osd_map(235..235 src has 1..235) v3 ==== 300+0+0 (2376656173 0 0)
0x1407600 con 0x13ecdc0
  -157> 2012-05-22 12:02:02.993145 7f5f3ecd1700  2 mds.0.49 boot_start
1: opening inotable
  -156> 2012-05-22 12:02:02.993411 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:1
mds0_inotable [read 0~0] 1.b852b893) v4 -- ?+0 0x13e36c0 con 0x1408a00
  -155> 2012-05-22 12:02:02.993471 7f5f3ecd1700  2 mds.0.49 boot_start
1: opening sessionmap
  -154> 2012-05-22 12:02:02.993553 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:2
mds0_sessionmap [read 0~0] 1.3270c60b) v4 -- ?+0 0x13e3d80 con
0x1408a00
  -153> 2012-05-22 12:02:02.993577 7f5f3ecd1700  2 mds.0.49 boot_start
1: opening anchor table
  -152> 2012-05-22 12:02:02.993625 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:3
mds_anchortable [read 0~0] 1.a977f6a7) v4 -- ?+0 0x13e3900 con
0x1408a00
  -151> 2012-05-22 12:02:02.993634 7f5f3ecd1700  2 mds.0.49 boot_start
1: opening snap table
  -150> 2012-05-22 12:02:02.993671 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:4
mds_snaptable [read 0~0] 1.d90270ad) v4 -- ?+0 0x140d6c0 con 0x1408a00
  -149> 2012-05-22 12:02:02.993680 7f5f3ecd1700  2 mds.0.49 boot_start
1: opening mds log
  -148> 2012-05-22 12:02:02.993700 7f5f3ecd1700  5 mds.0.log open
discovering log bounds
  -147> 2012-05-22 12:02:02.993792 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:5
200.00000000 [read 0~0] 1.844f3494) v4 -- ?+0 0x140d480 con 0x1408a00
  -146> 2012-05-22 12:02:02.993826 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 13 ====
mon_subscribe_ack(300s) v1 ==== 20+0+0 (311377444 0 0) 0x140b380 con
0x13ecdc0
  -145> 2012-05-22 12:02:02.994043 7f5f3ecd1700  0 mds.0.49
ms_handle_connect on 10.3.0.12:6801/1172
  -144> 2012-05-22 12:02:02.995288 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 1 ====
osd_op_reply(1 mds0_inotable [read 0~29] = 0) v4 ==== 112+0+29
(3509020669 0 2080707671) 0x1407400 con 0x1408a00
  -143> 2012-05-22 12:02:02.995518 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 2 ====
osd_op_reply(2 mds0_sessionmap [read 0~10591] = 0) v4 ==== 114+0+10591
(734241002 0 1640895633) 0x1407000 con 0x1408a00
  -142> 2012-05-22 12:02:02.995836 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 3 ====
osd_op_reply(3 mds_anchortable [read 0~29] = 0) v4 ==== 114+0+29
(1868654419 0 218791405) 0x1407200 con 0x1408a00
  -141> 2012-05-22 12:02:02.995890 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 4 ====
osd_op_reply(4 mds_snaptable [read 0~41] = 0) v4 ==== 112+0+41
(3594810684 0 2312406677) 0x1407e00 con 0x1408a00
  -140> 2012-05-22 12:02:02.995940 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 5 ====
osd_op_reply(5 200.00000000 [read 0~84] = 0) v4 ==== 111+0+84
(3837001893 0 1130699602) 0x1407c00 con 0x1408a00
  -139> 2012-05-22 12:02:02.996032 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:6
200.0000016d [stat 0~0] 1.6f7c9ad6 rwordered) v4 -- ?+0 0x140d480 con
0x1408a00
  -138> 2012-05-22 12:02:02.996107 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:7
200.0000016e [stat 0~0] 1.9feb60a6 rwordered) v4 -- ?+0 0x140d6c0 con
0x1408a00
  -137> 2012-05-22 12:02:02.996608 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 6 ====
osd_op_reply(6 200.0000016d [stat 0~0] = 0) v4 ==== 111+0+16
(1485433033 0 624451111) 0x1407a00 con 0x1408a00
  -136> 2012-05-22 12:02:02.997170 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 7 ====
osd_op_reply(7 200.0000016e [stat 0~0] = 0) v4 ==== 111+0+16
(2013887546 0 3894377183) 0x1407000 con 0x1408a00
  -135> 2012-05-22 12:02:02.997202 7f5f3ecd1700  2 mds.0.49 boot_start
2: loading/discovering base inodes
  -134> 2012-05-22 12:02:02.997211 7f5f3ecd1700  0 mds.0.cache
creating system inode with ino:100
  -133> 2012-05-22 12:02:02.999723 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:8
100.00000000 [getxattr inode] 1.c5265ab3) v4 -- ?+0 0x13e36c0 con
0x1408a00
  -132> 2012-05-22 12:02:02.999788 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:9
100.00000000.inode [read 0~0] 1.85dde07f) v4 -- ?+0 0x13e3900 con
0x1408a00
  -131> 2012-05-22 12:02:02.999804 7f5f3ecd1700  0 mds.0.cache
creating system inode with ino:1
  -130> 2012-05-22 12:02:02.999874 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:10
1.00000000 [getxattr inode] 1.6b2cdaff) v4 -- ?+0 0x140db40 con
0x1408a00
  -129> 2012-05-22 12:02:03.000142 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:11
1.00000000.inode [read 0~0] 1.232c0e14) v4 -- ?+0 0x140d900 con
0x1408a00
  -128> 2012-05-22 12:02:03.000936 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 8 ====
osd_op_reply(9 100.00000000.inode [read 0~399] = 0) v4 ==== 117+0+399
(2159939521 0 4121830280) 0x1407400 con 0x1408a00
  -127> 2012-05-22 12:02:03.001468 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 9 ====
osd_op_reply(8 100.00000000 [getxattr] = -61 (No data available)) v4
==== 111+0+0 (4127029859 0 0) 0x1407600 con 0x1408a00
  -126> 2012-05-22 12:02:03.001669 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 10 ====
osd_op_reply(10 1.00000000 [getxattr] = -61 (No data available)) v4
==== 109+0+0 (1727840390 0 0) 0x1407800 con 0x1408a00
  -125> 2012-05-22 12:02:03.001742 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 11 ====
osd_op_reply(11 1.00000000.inode [read 0~428] = 0) v4 ==== 115+0+428
(3364705558 0 715602140) 0x1407200 con 0x1408a00
  -124> 2012-05-22 12:02:03.001809 7f5f3ecd1700  2 mds.0.49 boot_start
3: replaying mds log
  -123> 2012-05-22 12:02:03.002198 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:12
200.0000014f [read 2535~4191769] 1.b79a8c0b) v4 -- ?+0 0x140d480 con
0x1408a00
  -122> 2012-05-22 12:02:03.002290 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:13
200.00000150 [read 0~4194304] 1.23163b1d) v4 -- ?+0 0x13e36c0 con
0x1408a00
  -121> 2012-05-22 12:02:03.004043 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:14
200.00000151 [read 0~4194304] 1.28b76431) v4 -- ?+0 0x140d6c0 con
0x1408a00
  -120> 2012-05-22 12:02:03.004205 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:15
200.00000152 [read 0~4194304] 1.ae9bb683) v4 -- ?+0 0x140d480 con
0x1408a00
  -119> 2012-05-22 12:02:03.004380 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:16
200.00000153 [read 0~4194304] 1.d9966785) v4 -- ?+0 0x140d900 con
0x1408a00
  -118> 2012-05-22 12:02:03.004454 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:17
200.00000154 [read 0~4194304] 1.47f9bf13) v4 -- ?+0 0x140db40 con
0x1408a00
  -117> 2012-05-22 12:02:03.004520 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:18
200.00000155 [read 0~4194304] 1.460264c4) v4 -- ?+0 0x13e3900 con
0x1408a00
  -116> 2012-05-22 12:02:03.004582 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:19
200.00000156 [read 0~4194304] 1.cae377ec) v4 -- ?+0 0x13e36c0 con
0x1408a00
  -115> 2012-05-22 12:02:03.004650 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:20
200.00000157 [read 0~4194304] 1.1c534f6b) v4 -- ?+0 0x140d6c0 con
0x1408a00
  -114> 2012-05-22 12:02:03.004740 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:21
200.00000158 [read 0~4194304] 1.21f6fd52) v4 -- ?+0 0x143f900 con
0x1408a00
  -113> 2012-05-22 12:02:03.031626 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 12 ====
osd_op_reply(12 200.0000014f [read 2535~4191769] = 0) v4 ====
111+0+4191769 (86938826 0 940254350) 0x1407600 con 0x1408a00
  -112> 2012-05-22 12:02:03.049765 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 13 ====
osd_op_reply(13 200.00000150 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2540809359 0 2765354474) 0x1407400 con 0x1408a00
  -111> 2012-05-22 12:02:03.061435 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 14 ====
osd_op_reply(14 200.00000151 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2370055404 0 728673944) 0x1407000 con 0x1408a00
  -110> 2012-05-22 12:02:03.073952 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 15 ====
osd_op_reply(15 200.00000152 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2835608127 0 4176800115) 0x1407a00 con 0x1408a00
  -109> 2012-05-22 12:02:03.086138 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:22
200.00000159 [read 0~4194304] 1.11c4d2c4) v4 -- ?+0 0x143f480 con
0x1408a00
  -108> 2012-05-22 12:02:03.087461 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 16 ====
osd_op_reply(16 200.00000153 [read 0~4194304] = 0) v4 ====
111+0+4194304 (414535972 0 2827888072) 0x1407c00 con 0x1408a00
  -107> 2012-05-22 12:02:03.119482 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 17 ====
osd_op_reply(17 200.00000154 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2490448686 0 1942553121) 0x1407e00 con 0x1408a00
  -106> 2012-05-22 12:02:03.119562 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 18 ====
osd_op_reply(18 200.00000155 [read 0~4194304] = 0) v4 ====
111+0+4194304 (1664826329 0 2280396596) 0x143ee00 con 0x1408a00
  -105> 2012-05-22 12:02:03.131469 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 19 ====
osd_op_reply(19 200.00000156 [read 0~4194304] = 0) v4 ====
111+0+4194304 (324288200 0 2843054375) 0x143ec00 con 0x1408a00
  -104> 2012-05-22 12:02:03.139410 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:23
200.0000015a [read 0~4194304] 1.d46a5844) v4 -- ?+0 0x143f000 con
0x1408a00
  -103> 2012-05-22 12:02:03.145585 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 20 ====
osd_op_reply(20 200.00000157 [read 0~4194304] = 0) v4 ====
111+0+4194304 (542705527 0 967023793) 0x143ea00 con 0x1408a00
  -102> 2012-05-22 12:02:03.156508 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 21 ====
osd_op_reply(21 200.00000158 [read 0~4194304] = 0) v4 ====
111+0+4194304 (3296968726 0 3798599763) 0x143e800 con 0x1408a00
  -101> 2012-05-22 12:02:03.172296 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 22 ====
osd_op_reply(22 200.00000159 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2210915791 0 2737758750) 0x143e600 con 0x1408a00
  -100> 2012-05-22 12:02:03.182754 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:24
200.0000015b [read 0~4194304] 1.a8a12f71) v4 -- ?+0 0x140d480 con
0x1408a00
   -99> 2012-05-22 12:02:03.185018 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 23 ====
osd_op_reply(23 200.0000015a [read 0~4194304] = 0) v4 ====
111+0+4194304 (859847618 0 611094159) 0x143e400 con 0x1408a00
   -98> 2012-05-22 12:02:03.205868 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 24 ====
osd_op_reply(24 200.0000015b [read 0~4194304] = 0) v4 ====
111+0+4194304 (541055975 0 524786588) 0x143e200 con 0x1408a00
   -97> 2012-05-22 12:02:03.225090 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:25
200.0000015c [read 0~4194304] 1.941114d8) v4 -- ?+0 0x13e3900 con
0x1408a00
   -96> 2012-05-22 12:02:03.242516 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 25 ====
osd_op_reply(25 200.0000015c [read 0~4194304] = 0) v4 ====
111+0+4194304 (1141303529 0 1957418634) 0x143e200 con 0x1408a00
   -95> 2012-05-22 12:02:03.263094 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:26
200.0000015d [read 0~4194304] 1.bdfe9959) v4 -- ?+0 0x143fb40 con
0x1408a00
   -94> 2012-05-22 12:02:03.278274 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 26 ====
osd_op_reply(26 200.0000015d [read 0~4194304] = 0) v4 ====
111+0+4194304 (3503869406 0 773787611) 0x143e400 con 0x1408a00
   -93> 2012-05-22 12:02:03.300327 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:27
200.0000015e [read 0~4194304] 1.b8cf7266) v4 -- ?+0 0x140db40 con
0x1408a00
   -92> 2012-05-22 12:02:03.316669 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 27 ====
osd_op_reply(27 200.0000015e [read 0~4194304] = 0) v4 ====
111+0+4194304 (1073368869 0 2950585796) 0x143e600 con 0x1408a00
   -91> 2012-05-22 12:02:03.336096 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:28
200.0000015f [read 0~4194304] 1.2980b721) v4 -- ?+0 0x1ba9b40 con
0x1408a00
   -90> 2012-05-22 12:02:03.352131 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 28 ====
osd_op_reply(28 200.0000015f [read 0~4194304] = 0) v4 ====
111+0+4194304 (3513547997 0 1851987047) 0x143e800 con 0x1408a00
   -89> 2012-05-22 12:02:03.371923 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:29
200.00000160 [read 0~4194304] 1.574ae4ae) v4 -- ?+0 0x1ba96c0 con
0x1408a00
   -88> 2012-05-22 12:02:03.388290 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 29 ====
osd_op_reply(29 200.00000160 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2670366445 0 4080047592) 0x143ea00 con 0x1408a00
   -87> 2012-05-22 12:02:03.410233 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:30
200.00000161 [read 0~4194304] 1.87287887) v4 -- ?+0 0x1ba9240 con
0x1408a00
   -86> 2012-05-22 12:02:03.426379 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 30 ====
osd_op_reply(30 200.00000161 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2306991298 0 2632960265) 0x143ec00 con 0x1408a00
   -85> 2012-05-22 12:02:03.450460 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:31
200.00000162 [read 0~4194304] 1.f23e2d31) v4 -- ?+0 0x140db40 con
0x1408a00
   -84> 2012-05-22 12:02:03.466463 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 31 ====
osd_op_reply(31 200.00000162 [read 0~4194304] = 0) v4 ====
111+0+4194304 (4169482190 0 1357295963) 0x143ee00 con 0x1408a00
   -83> 2012-05-22 12:02:03.492735 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:32
200.00000163 [read 0~4194304] 1.ac437fe4) v4 -- ?+0 0x143fb40 con
0x1408a00
   -82> 2012-05-22 12:02:03.528942 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:33
200.00000164 [read 0~4194304] 1.5a233b33) v4 -- ?+0 0x143f480 con
0x1408a00
   -81> 2012-05-22 12:02:03.570058 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:34
200.00000165 [read 0~4194304] 1.1f8332c7) v4 -- ?+0 0x13e3900 con
0x1408a00
   -80> 2012-05-22 12:02:03.570757 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 32 ====
osd_op_reply(32 200.00000163 [read 0~4194304] = 0) v4 ====
111+0+4194304 (2725338461 0 3594916529) 0x1d3ce00 con 0x1408a00
   -79> 2012-05-22 12:02:03.589941 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 33 ====
osd_op_reply(33 200.00000164 [read 0~4194304] = 0) v4 ====
111+0+4194304 (988396741 0 694923933) 0x1d3cc00 con 0x1408a00
   -78> 2012-05-22 12:02:03.594028 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 34 ====
osd_op_reply(34 200.00000165 [read 0~4194304] = 0) v4 ====
111+0+4194304 (894342962 0 1116835292) 0x1d3ca00 con 0x1408a00
   -77> 2012-05-22 12:02:03.616937 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:35
200.00000166 [read 0~4194304] 1.8ce28c4a) v4 -- ?+0 0x1d43b40 con
0x1408a00
   -76> 2012-05-22 12:02:03.633104 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 35 ====
osd_op_reply(35 200.00000166 [read 0~4194304] = 0) v4 ====
111+0+4194304 (34046407 0 554791290) 0x1d3c800 con 0x1408a00
   -75> 2012-05-22 12:02:03.656933 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:36
200.00000167 [read 0~4194304] 1.7e245b6f) v4 -- ?+0 0x1d436c0 con
0x1408a00
   -74> 2012-05-22 12:02:03.672729 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 36 ====
osd_op_reply(36 200.00000167 [read 0~4194304] = 0) v4 ====
111+0+4194304 (3542980681 0 3731006206) 0x1d3c600 con 0x1408a00
   -73> 2012-05-22 12:02:03.694800 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:37
200.00000168 [read 0~4194304] 1.b1365376) v4 -- ?+0 0x1d43240 con
0x1408a00
   -72> 2012-05-22 12:02:03.713056 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 37 ====
osd_op_reply(37 200.00000168 [read 0~4194304] = 0) v4 ====
111+0+4194304 (3143521032 0 2534577188) 0x1d3c400 con 0x1408a00
   -71> 2012-05-22 12:02:03.732734 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:38
200.00000169 [read 0~4194304] 1.ff0405d0) v4 -- ?+0 0x1d43b40 con
0x1408a00
   -70> 2012-05-22 12:02:03.751450 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 38 ====
osd_op_reply(38 200.00000169 [read 0~4194304] = 0) v4 ====
111+0+4194304 (733149638 0 738384784) 0x1d3c200 con 0x1408a00
   -69> 2012-05-22 12:02:03.770040 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:39
200.0000016a [read 0~4194304] 1.3f32f1a0) v4 -- ?+0 0x143f480 con
0x1408a00
   -68> 2012-05-22 12:02:03.787041 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 39 ====
osd_op_reply(39 200.0000016a [read 0~4194304] = 0) v4 ====
111+0+4194304 (79880031 0 463711875) 0x1d3c000 con 0x1408a00
   -67> 2012-05-22 12:02:03.807842 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:40
200.0000016b [read 0~4194304] 1.c299cbd3) v4 -- ?+0 0x140db40 con
0x1408a00
   -66> 2012-05-22 12:02:03.824629 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 40 ====
osd_op_reply(40 200.0000016b [read 0~4194304] = 0) v4 ====
111+0+4194304 (707054799 0 1356117031) 0x1d3cc00 con 0x1408a00
   -65> 2012-05-22 12:02:03.845870 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:41
200.0000016c [read 0~4194304] 1.41d6438b) v4 -- ?+0 0x1ba96c0 con
0x1408a00
   -64> 2012-05-22 12:02:03.862209 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 41 ====
osd_op_reply(41 200.0000016c [read 0~4194304] = 0) v4 ====
111+0+4194304 (78005920 0 1848623926) 0x1d3ce00 con 0x1408a00
   -63> 2012-05-22 12:02:03.882715 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:42
200.0000016d [read 0~4194304] 1.6f7c9ad6) v4 -- ?+0 0x1d58d80 con
0x1408a00
   -62> 2012-05-22 12:02:03.901124 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 42 ====
osd_op_reply(42 200.0000016d [read 0~4194304] = 0) v4 ====
111+0+4194304 (3681601409 0 3284364392) 0x143ee00 con 0x1408a00
   -61> 2012-05-22 12:02:03.921699 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:43
200.0000016e [read 0~149004] 1.9feb60a6) v4 -- ?+0 0x1d58900 con
0x1408a00
   -60> 2012-05-22 12:02:03.923308 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 43 ====
osd_op_reply(43 200.0000016e [read 0~149004] = 0) v4 ==== 111+0+149004
(188018409 0 1584225691) 0x143ec00 con 0x1408a00
   -59> 2012-05-22 12:02:04.270213 7f5f3c9ca700  1 mds.0.49 replay_done
   -58> 2012-05-22 12:02:04.270248 7f5f3c9ca700  1 mds.0.49 making mds
journal writeable
   -57> 2012-05-22 12:02:04.270338 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:44
200.0000016e [zero 149004~4045300] 1.9feb60a6) v4 -- ?+0 0x1ea56c0 con
0x1408a00
   -56> 2012-05-22 12:02:04.270415 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:45
200.0000016f [delete] 1.e6816672) v4 -- ?+0 0x1ea5480 con 0x1408a00
   -55> 2012-05-22 12:02:04.270488 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:46
200.00000170 [delete] 1.3e8426cd) v4 -- ?+0 0x1ea5240 con 0x1408a00
   -54> 2012-05-22 12:02:04.270569 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:47
200.00000171 [delete] 1.75986202) v4 -- ?+0 0x1ea5000 con 0x1408a00
   -53> 2012-05-22 12:02:04.270641 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:48
200.00000172 [delete] 1.d3c077fe) v4 -- ?+0 0x1ea56c0 con 0x1408a00
   -52> 2012-05-22 12:02:04.270796 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6801/1172 -- osd_op(mds.0.49:49
200.00000173 [delete] 1.3b1f2284) v4 -- ?+0 0x1d58900 con 0x1408a00
   -51> 2012-05-22 12:02:04.270893 7f5f3c9ca700  2 mds.0.49 i am
alone, moving to state reconnect
   -50> 2012-05-22 12:02:04.270899 7f5f3c9ca700  3 mds.0.49
request_state up:reconnect
   -49> 2012-05-22 12:02:04.270916 7f5f3c9ca700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 2 v52012) v2 -- ?+0 0x140c280 con 0x13ecdc0
   -48> 2012-05-22 12:02:04.271925 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 44 ====
osd_op_reply(45 200.0000016f [delete] ondisk = -2 (No such file or
directory)) v4 ==== 111+0+0 (3525298709 0 0) 0x143ea00 con 0x1408a00
   -47> 2012-05-22 12:02:04.272089 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 45 ====
osd_op_reply(46 200.00000170 [delete] ondisk = -2 (No such file or
directory)) v4 ==== 111+0+0 (1763128336 0 0) 0x143e800 con 0x1408a00
   -46> 2012-05-22 12:02:04.272129 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 46 ====
osd_op_reply(47 200.00000171 [delete] ondisk = -2 (No such file or
directory)) v4 ==== 111+0+0 (1713780953 0 0) 0x143e600 con 0x1408a00
   -45> 2012-05-22 12:02:04.272151 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 47 ====
osd_op_reply(48 200.00000172 [delete] ondisk = -2 (No such file or
directory)) v4 ==== 111+0+0 (1956823652 0 0) 0x143e400 con 0x1408a00
   -44> 2012-05-22 12:02:04.272206 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 48 ====
osd_op_reply(49 200.00000173 [delete] ondisk = -2 (No such file or
directory)) v4 ==== 111+0+0 (1397982303 0 0) 0x143e200 con 0x1408a00
   -43> 2012-05-22 12:02:04.274577 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== osd.2 10.3.0.12:6801/1172 49 ====
osd_op_reply(44 200.0000016e [truncate 149004] ondisk = 0) v4 ====
111+0+0 (623816276 0 0) 0x143ec00 con 0x1408a00
   -42> 2012-05-22 12:02:04.358132 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 14 ==== mdsmap(e
52013) v1 ==== 535+0+0 (132891602 0 0) 0x13eae00 con 0x13ecdc0
   -41> 2012-05-22 12:02:04.358253 7f5f3ecd1700  5 mds.0.49
handle_mds_map epoch 52013 from mon.2
   -40> 2012-05-22 12:02:04.358293 7f5f3ecd1700  1 mds.0.49
handle_mds_map i am now mds.0.49
   -39> 2012-05-22 12:02:04.358299 7f5f3ecd1700  1 mds.0.49
handle_mds_map state change up:replay --> up:reconnect
   -38> 2012-05-22 12:02:04.358314 7f5f3ecd1700  1 mds.0.49 reconnect_start
   -37> 2012-05-22 12:02:04.358319 7f5f3ecd1700  1 mds.0.49 reopen_log
   -36> 2012-05-22 12:02:04.358340 7f5f3ecd1700  1 mds.0.server
reconnect_clients -- 2 sessions
   -35> 2012-05-22 12:02:04.358474 7f5f3ecd1700  5 mds.0.bal rebalance done
   -34> 2012-05-22 12:02:08.271081 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 3 v52013) v2 -- ?+0 0x13f7c80 con 0x13ecdc0
   -33> 2012-05-22 12:02:08.272677 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 15 ====
mdsbeacon(6308/c up:reconnect seq 3 v52013) v2 ==== 103+0+0
(3383220134 0 0) 0x140c280 con 0x13ecdc0
   -32> 2012-05-22 12:02:12.271359 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 4 v52013) v2 -- ?+0 0x140c780 con 0x13ecdc0
   -31> 2012-05-22 12:02:12.272900 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 16 ====
mdsbeacon(6308/c up:reconnect seq 4 v52013) v2 ==== 103+0+0
(2744505186 0 0) 0x140cc80 con 0x13ecdc0
   -30> 2012-05-22 12:02:16.271690 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 5 v52013) v2 -- ?+0 0x140c500 con 0x13ecdc0
   -29> 2012-05-22 12:02:16.273021 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 17 ====
mdsbeacon(6308/c up:reconnect seq 5 v52013) v2 ==== 103+0+0
(3669798405 0 0) 0x140ca00 con 0x13ecdc0
   -28> 2012-05-22 12:02:20.272061 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 6 v52013) v2 -- ?+0 0x140cc80 con 0x13ecdc0
   -27> 2012-05-22 12:02:20.273524 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 18 ====
mdsbeacon(6308/c up:reconnect seq 6 v52013) v2 ==== 103+0+0
(1372006828 0 0) 0x140c500 con 0x13ecdc0
   -26> 2012-05-22 12:02:24.272320 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 7 v52013) v2 -- ?+0 0x140c280 con 0x13ecdc0
   -25> 2012-05-22 12:02:24.273966 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 19 ====
mdsbeacon(6308/c up:reconnect seq 7 v52013) v2 ==== 103+0+0 (686707915
0 0) 0x140c780 con 0x13ecdc0
   -24> 2012-05-22 12:02:28.272580 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 8 v52013) v2 -- ?+0 0x13f7c80 con 0x13ecdc0
   -23> 2012-05-22 12:02:28.273963 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 20 ====
mdsbeacon(6308/c up:reconnect seq 8 v52013) v2 ==== 103+0+0
(2242060324 0 0) 0x140ca00 con 0x13ecdc0
   -22> 2012-05-22 12:02:32.272813 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 9 v52013) v2 -- ?+0 0x1d9c780 con 0x13ecdc0
   -21> 2012-05-22 12:02:32.274428 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 21 ====
mdsbeacon(6308/c up:reconnect seq 9 v52013) v2 ==== 103+0+0
(4236926275 0 0) 0x1d9c780 con 0x13ecdc0
   -20> 2012-05-22 12:02:36.273024 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 10 v52013) v2 -- ?+0 0x1d9c500 con 0x13ecdc0
   -19> 2012-05-22 12:02:36.274520 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 22 ====
mdsbeacon(6308/c up:reconnect seq 10 v52013) v2 ==== 103+0+0
(2012338922 0 0) 0x13f7c80 con 0x13ecdc0
   -18> 2012-05-22 12:02:40.273243 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 11 v52013) v2 -- ?+0 0x1d9c280 con 0x13ecdc0
   -17> 2012-05-22 12:02:40.274602 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 23 ====
mdsbeacon(6308/c up:reconnect seq 11 v52013) v2 ==== 103+0+0
(249079693 0 0) 0x140c280 con 0x13ecdc0
   -16> 2012-05-22 12:02:44.273455 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 12 v52013) v2 -- ?+0 0x1d9c000 con 0x13ecdc0
   -15> 2012-05-22 12:02:44.274917 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 24 ====
mdsbeacon(6308/c up:reconnect seq 12 v52013) v2 ==== 103+0+0
(1693098825 0 0) 0x140cc80 con 0x13ecdc0
   -14> 2012-05-22 12:02:48.273651 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 13 v52013) v2 -- ?+0 0x1d9ca00 con 0x13ecdc0
   -13> 2012-05-22 12:02:48.275240 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 25 ====
mdsbeacon(6308/c up:reconnect seq 13 v52013) v2 ==== 103+0+0
(499375662 0 0) 0x1d9cc80 con 0x13ecdc0
   -12> 2012-05-22 12:02:52.273831 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:reconnect seq 14 v52013) v2 -- ?+0 0x1d9c780 con 0x13ecdc0
   -11> 2012-05-22 12:02:52.275198 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 26 ====
mdsbeacon(6308/c up:reconnect seq 14 v52013) v2 ==== 103+0+0
(2528663943 0 0) 0x1d9c780 con 0x13ecdc0
   -10> 2012-05-22 12:02:52.683463 7f5f3d3cd700  1 mds.0.server
reconnect gave up on client.4114 10.3.0.11:0/32195
    -9> 2012-05-22 12:02:52.683505 7f5f3d3cd700  1 mds.0.server
reconnect gave up on client.5099 10.3.0.12:0/773
    -8> 2012-05-22 12:02:52.683527 7f5f3d3cd700  1 mds.0.49 reconnect_done
    -7> 2012-05-22 12:02:52.683541 7f5f3d3cd700  3 mds.0.49
request_state up:rejoin
    -6> 2012-05-22 12:02:52.683570 7f5f3d3cd700  1 --
10.3.0.12:6800/23071 --> 10.3.0.12:6789/0 -- mdsbeacon(6308/c
up:rejoin seq 15 v52013) v2 -- ?+0 0x140ca00 con 0x13ecdc0
    -5> 2012-05-22 12:02:53.138079 7f5f3ecd1700  1 --
10.3.0.12:6800/23071 <== mon.2 10.3.0.12:6789/0 27 ==== mdsmap(e
52014) v1 ==== 535+0+0 (780151091 0 0) 0x13ea800 con 0x13ecdc0
    -4> 2012-05-22 12:02:53.138146 7f5f3ecd1700  5 mds.0.49
handle_mds_map epoch 52014 from mon.2
    -3> 2012-05-22 12:02:53.138192 7f5f3ecd1700  1 mds.0.49
handle_mds_map i am now mds.0.49
    -2> 2012-05-22 12:02:53.138198 7f5f3ecd1700  1 mds.0.49
handle_mds_map state change up:reconnect --> up:rejoin
    -1> 2012-05-22 12:02:53.138204 7f5f3ecd1700  1 mds.0.49 rejoin_joint_start
     0> 2012-05-22 12:02:53.140710 7f5f3ecd1700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f5f3ecd1700

 ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
 1: ceph-mds() [0x814279]
 2: (()+0xeff0) [0x7f5f4373fff0]
 3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x4f) [0x6cb5ef]
 4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55d58b]
 5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29c) [0x572eec]
 6: (MDCache::rejoin_gather_finish()+0x90) [0x5931a0]
 7: (MDCache::rejoin_send_rejoins()+0x2c05) [0x59b9d5]
 8: (MDS::rejoin_joint_start()+0x131) [0x4a8721]
 9: (MDS::handle_mds_map(MMDSMap*)+0x2c4a) [0x4c253a]
 10: (MDS::handle_core_message(Message*)+0x913) [0x4c4513]
 11: (MDS::_dispatch(Message*)+0x2f) [0x4c45ef]
 12: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
 13: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
 14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
 15: (()+0x68ca) [0x7f5f437378ca]
 16: (clone()+0x6d) [0x7f5f41fbf92d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- end dump of recent events ---

snip

I am not quite sure on how to get you the coredump infos. I installed
all ceph-dbg packages and executed:

gdb /usr/bin/ceph-mds core

snip

GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/ceph-mds...Reading symbols from
/usr/lib/debug/usr/bin/ceph-mds...done.
(no debugging symbols found)...done.
[New Thread 22980]
[New Thread 22984]
[New Thread 22986]
[New Thread 22979]
[New Thread 22970]
[New Thread 22981]
[New Thread 22971]
[New Thread 22976]
[New Thread 22973]
[New Thread 22975]
[New Thread 22974]
[New Thread 22972]
[New Thread 22978]
[New Thread 22982]

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libcrypto++.so.8
Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libuuid.so.1
Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libtcmalloc.so.0
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libunwind.so.7
Core was generated by `/usr/bin/ceph-mds -i c --pid-file
/var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f10c00d2ebb in raise () from /lib/libpthread.so.0

snip



2012/5/21 Gregory Farnum <greg@inktank.com>:
> On Mon, May 21, 2012 at 5:38 AM, Felix Feinhals
> <ff@turtle-entertainment.de> wrote:
>> Hi Josh,
>>
>> i quoted the trace and some other stats in my first email, maybe it
>> got stuck in the spam filters.
>> Well next try:
>>
>> snip
>>
>> -3> 2012-05-10 14:52:29.509940 7fb1c9351700 1 mds.0.40 handle_mds_map
>>  i am now mds.0.40
>>  -2> 2012-05-10 14:52:29.509956 7fb1c9351700 1 mds.0.40 handle_mds_map
>>  state change up:reconnect --> up:rejoin
>>  -1> 2012-05-10 14:52:29.509963 7fb1c9351700 1 mds.0.40 rejoin_joint_start
>>  0> 2012-05-10 14:52:29.512503 7fb1c9351700 -1 *** Caught signal
>>  (Segmentation fault) **
>>  in thread 7fb1c9351700
>>
>> ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
>>  1: ceph-mds() [0x814279]
>>  2: (()+0xeff0) [0x7fb1cddbfff0]
>>  3: (SnapRealm::have_past_parents_open(snapid_t, snapid_t)+0x4f) [0x6cb5ef]
>>  4: (MDCache::check_realm_past_parents(SnapRealm*)+0x2b) [0x55d58b]
>>  5: (MDCache::choose_lock_states_and_reconnect_caps()+0x29c) [0x572eec]
>>  6: (MDCache::rejoin_gather_finish()+0x90) [0x5931a0]
>>  7: (MDCache::rejoin_send_rejoins()+0x2c05) [0x59b9d5]
>>  8: (MDS::rejoin_joint_start()+0x131) [0x4a8721]
>>  9: (MDS::handle_mds_map(MMDSMap*)+0x2c4a) [0x4c253a]
>>  10: (MDS::handle_core_message(Message*)+0x913) [0x4c4513]
>>  11: (MDS::_dispatch(Message*)+0x2f) [0x4c45ef]
>>  12: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b]
>>  13: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49]
>>  14: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed]
>>  15: (()+0x68ca) [0x7fb1cddb78ca]
>>  16: (clone()+0x6d) [0x7fb1cc63f92d]
>
> There's nothing obvious here — can you run gdb on the core and get
> another backtrace and the info from levels 3-5?
>
>> snip
>>
>> I though ceph chooses which MDS is active and which is standby, i just
>> have 3 in the cluster config:
> Yes, it does — if you don't increase the number of allowed MDSes
> you'll just get one of them active.
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-22 10:12         ` Felix Feinhals
@ 2012-05-22 17:56           ` Greg Farnum
  2012-05-23 12:28             ` Felix Feinhals
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Farnum @ 2012-05-22 17:56 UTC (permalink / raw)
  To: Felix Feinhals; +Cc: ceph-devel



On Tuesday, May 22, 2012 at 3:12 AM, Felix Feinhals wrote:

> I am not quite sure on how to get you the coredump infos. I installed
> all ceph-dbg packages and executed:
> 
> gdb /usr/bin/ceph-mds core
> 
> snip
> 
> GNU gdb (GDB) 7.0.1-debian
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/ceph-mds...Reading symbols from
> /usr/lib/debug/usr/bin/ceph-mds...done.
> (no debugging symbols found)...done.
> [New Thread 22980]
> [New Thread 22984]
> [New Thread 22986]
> [New Thread 22979]
> [New Thread 22970]
> [New Thread 22981]
> [New Thread 22971]
> [New Thread 22976]
> [New Thread 22973]
> [New Thread 22975]
> [New Thread 22974]
> [New Thread 22972]
> [New Thread 22978]
> [New Thread 22982]
> 
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libcrypto++.so.8
> Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libuuid.so.1
> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/librt.so.1
> Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libtcmalloc.so.0
> Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libstdc++.so.6
> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libunwind.so.7
> Core was generated by `/usr/bin/ceph-mds -i c --pid-file
> /var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
> Program terminated with signal 11, Segmentation fault.
> #0 0x00007f10c00d2ebb in raise () from /lib/libpthread.so.0
> 

Argh. This is finicky and annoying; don't feel bad. :) There are two possibilities here:
1) If I remember correctly, PATH and the actual debug symbol install locations often don't match up. Check out where the debug packages actually installed to, and make sure that directory is in PATH when running gdb.
2) The default thread you're getting a backtrace on doesn't look to be the one we actually care about (notice how the backtrace is through completely different parts of the code); it's conceivable that there just aren't any debug symbols for those libraries. Try running "thread apply all bt" (I think that's the right command) and looking for one that matches the backtrace in the log file. Then switch to it ("thread x" where x is the thread number) and get the backtrace of that.
-Greg


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-22 17:56           ` Greg Farnum
@ 2012-05-23 12:28             ` Felix Feinhals
  2012-05-23 21:39               ` Gregory Farnum
  0 siblings, 1 reply; 10+ messages in thread
From: Felix Feinhals @ 2012-05-23 12:28 UTC (permalink / raw)
  To: Greg Farnum; +Cc: ceph-devel

Hey,

ok i installed libc-dbg and run your commands now this comes up:

gdb /usr/bin/ceph-mds core

snip

GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/ceph-mds...Reading symbols from
/usr/lib/debug/usr/bin/ceph-mds...done.
(no debugging symbols found)...done.
[New Thread 22980]
[New Thread 22984]
[New Thread 22986]
[New Thread 22979]
[New Thread 22970]
[New Thread 22981]
[New Thread 22971]
[New Thread 22976]
[New Thread 22973]
[New Thread 22975]
[New Thread 22974]
[New Thread 22972]
[New Thread 22978]
[New Thread 22982]

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libpthread.so.0...Reading symbols from
/usr/lib/debug/lib/libpthread-2.11.3.so...done.
(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libcrypto++.so.8
Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libuuid.so.1
Reading symbols from /lib/librt.so.1...Reading symbols from
/usr/lib/debug/lib/librt-2.11.3.so...done.
(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libtcmalloc.so.0
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...Reading symbols from
/usr/lib/debug/lib/libm-2.11.3.so...done.
(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...Reading symbols from
/usr/lib/debug/lib/libc-2.11.3.so...done.
(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
from /usr/lib/debug/lib/ld-2.11.3.so...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libunwind.so.7
Core was generated by `/usr/bin/ceph-mds -i c --pid-file
/var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
41      ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
        in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c

snip

Now

thread apply all bt

...

thread 1
[Switching to thread 1 (Thread 22977)]#0  0x00007f10c00d2ebb in raise
(sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
41      in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c


Thread 1 (Thread 22977):
---Type <return> to continue, or q <return> to quit---
#0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
#1  0x000000000081469e in reraise_fatal (signum=11) at
global/signal_handler.cc:58
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
#3  <signal handler called>
#4  SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
at mds/snap.cc:112

#5  0x000000000055d58b in MDCache::check_realm_past_parents
(this=0x2b49200, realm=0x0) at mds/MDCache.cc:4495
#6  0x0000000000572eec in
MDCache::choose_lock_states_and_reconnect_caps (this=0x2b49200) at
mds/MDCache.cc:4533
#7  0x00000000005931a0 in MDCache::rejoin_gather_finish
(this=0x2b49200) at mds/MDCache.cc:4444
#8  0x000000000059b9d5 in MDCache::rejoin_send_rejoins
(this=0x2b49200) at mds/MDCache.cc:3388
#9  0x00000000004a8721 in MDS::rejoin_joint_start (this=0x2b5e000) at
mds/MDS.cc:1404
#10 0x00000000004c253a in MDS::handle_mds_map (this=0x2b5e000,
m=<value optimized out>) at mds/MDS.cc:968
#11 0x00000000004c4513 in MDS::handle_core_message (this=0x2b5e000,
m=0x2b4d800) at mds/MDS.cc:1651
#12 0x00000000004c45ef in MDS::_dispatch (this=0x2b5e000, m=0x2b4d800)
at mds/MDS.cc:1790
#13 0x00000000004c628b in MDS::ms_dispatch (this=0x2b5e000,
m=0x2b4d800) at mds/MDS.cc:1602
#14 0x00000000007acb49 in Messenger::ms_deliver_dispatch
(this=0x2b41680) at msg/Messenger.h:178
#15 SimpleMessenger::dispatch_entry (this=0x2b41680) at
msg/SimpleMessenger.cc:363
#16 0x00000000007336ed in SimpleMessenger::DispatchThread::entry() ()
#17 0x00007f10c00ca8ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#18 0x00007f10be95292d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()

So i wonder is the crash because of the missing file message?

2012/5/22 Greg Farnum <greg@inktank.com>:
>
>
> On Tuesday, May 22, 2012 at 3:12 AM, Felix Feinhals wrote:
>
>> I am not quite sure on how to get you the coredump infos. I installed
>> all ceph-dbg packages and executed:
>>
>> gdb /usr/bin/ceph-mds core
>>
>> snip
>>
>> GNU gdb (GDB) 7.0.1-debian
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/bin/ceph-mds...Reading symbols from
>> /usr/lib/debug/usr/bin/ceph-mds...done.
>> (no debugging symbols found)...done.
>> [New Thread 22980]
>> [New Thread 22984]
>> [New Thread 22986]
>> [New Thread 22979]
>> [New Thread 22970]
>> [New Thread 22981]
>> [New Thread 22971]
>> [New Thread 22976]
>> [New Thread 22973]
>> [New Thread 22975]
>> [New Thread 22974]
>> [New Thread 22972]
>> [New Thread 22978]
>> [New Thread 22982]
>>
>> warning: Can't read pathname for load map: Input/output error.
>> Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libpthread.so.0
>> Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libcrypto++.so.8
>> Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libuuid.so.1
>> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/librt.so.1
>> Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libtcmalloc.so.0
>> Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libstdc++.so.6
>> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libm.so.6
>> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libgcc_s.so.1
>> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libc.so.6
>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
>> symbols found)...done.
>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>> Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libunwind.so.7
>> Core was generated by `/usr/bin/ceph-mds -i c --pid-file
>> /var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
>> Program terminated with signal 11, Segmentation fault.
>> #0 0x00007f10c00d2ebb in raise () from /lib/libpthread.so.0
>>
>
> Argh. This is finicky and annoying; don't feel bad. :) There are two possibilities here:
> 1) If I remember correctly, PATH and the actual debug symbol install locations often don't match up. Check out where the debug packages actually installed to, and make sure that directory is in PATH when running gdb.
> 2) The default thread you're getting a backtrace on doesn't look to be the one we actually care about (notice how the backtrace is through completely different parts of the code); it's conceivable that there just aren't any debug symbols for those libraries. Try running "thread apply all bt" (I think that's the right command) and looking for one that matches the backtrace in the log file. Then switch to it ("thread x" where x is the thread number) and get the backtrace of that.
> -Greg
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-23 12:28             ` Felix Feinhals
@ 2012-05-23 21:39               ` Gregory Farnum
  2012-05-24 12:29                 ` Felix Feinhals
  0 siblings, 1 reply; 10+ messages in thread
From: Gregory Farnum @ 2012-05-23 21:39 UTC (permalink / raw)
  To: Felix Feinhals; +Cc: ceph-devel

On Wed, May 23, 2012 at 5:28 AM, Felix Feinhals
<ff@turtle-entertainment.de> wrote:
> Hey,
>
> ok i installed libc-dbg and run your commands now this comes up:
>
> gdb /usr/bin/ceph-mds core
>
> snip
>
> GNU gdb (GDB) 7.0.1-debian
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/ceph-mds...Reading symbols from
> /usr/lib/debug/usr/bin/ceph-mds...done.
> (no debugging symbols found)...done.
> [New Thread 22980]
> [New Thread 22984]
> [New Thread 22986]
> [New Thread 22979]
> [New Thread 22970]
> [New Thread 22981]
> [New Thread 22971]
> [New Thread 22976]
> [New Thread 22973]
> [New Thread 22975]
> [New Thread 22974]
> [New Thread 22972]
> [New Thread 22978]
> [New Thread 22982]
>
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib/libpthread.so.0...Reading symbols from
> /usr/lib/debug/lib/libpthread-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/libpthread.so.0
> Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libcrypto++.so.8
> Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libuuid.so.1
> Reading symbols from /lib/librt.so.1...Reading symbols from
> /usr/lib/debug/lib/librt-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/librt.so.1
> Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libtcmalloc.so.0
> Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libstdc++.so.6
> Reading symbols from /lib/libm.so.6...Reading symbols from
> /usr/lib/debug/lib/libm-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/libc.so.6...Reading symbols from
> /usr/lib/debug/lib/libc-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
> from /usr/lib/debug/lib/ld-2.11.3.so...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libunwind.so.7
> Core was generated by `/usr/bin/ceph-mds -i c --pid-file
> /var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> 41      ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
>        in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>
> snip
>
> Now
>
> thread apply all bt
>
> ...
>
> thread 1
> [Switching to thread 1 (Thread 22977)]#0  0x00007f10c00d2ebb in raise
> (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> 41      in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>
>
> Thread 1 (Thread 22977):
> ---Type <return> to continue, or q <return> to quit---
> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> #1  0x000000000081469e in reraise_fatal (signum=11) at
> global/signal_handler.cc:58
> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
> #3  <signal handler called>
> #4  SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
> at mds/snap.cc:112
>
> #5  0x000000000055d58b in MDCache::check_realm_past_parents
> (this=0x2b49200, realm=0x0) at mds/MDCache.cc:4495
> #6  0x0000000000572eec in
> MDCache::choose_lock_states_and_reconnect_caps (this=0x2b49200) at
> mds/MDCache.cc:4533
> #7  0x00000000005931a0 in MDCache::rejoin_gather_finish
> (this=0x2b49200) at mds/MDCache.cc:4444
> #8  0x000000000059b9d5 in MDCache::rejoin_send_rejoins
> (this=0x2b49200) at mds/MDCache.cc:3388
> #9  0x00000000004a8721 in MDS::rejoin_joint_start (this=0x2b5e000) at
> mds/MDS.cc:1404
> #10 0x00000000004c253a in MDS::handle_mds_map (this=0x2b5e000,
> m=<value optimized out>) at mds/MDS.cc:968
> #11 0x00000000004c4513 in MDS::handle_core_message (this=0x2b5e000,
> m=0x2b4d800) at mds/MDS.cc:1651
> #12 0x00000000004c45ef in MDS::_dispatch (this=0x2b5e000, m=0x2b4d800)
> at mds/MDS.cc:1790
> #13 0x00000000004c628b in MDS::ms_dispatch (this=0x2b5e000,
> m=0x2b4d800) at mds/MDS.cc:1602
> #14 0x00000000007acb49 in Messenger::ms_deliver_dispatch
> (this=0x2b41680) at msg/Messenger.h:178
> #15 SimpleMessenger::dispatch_entry (this=0x2b41680) at
> msg/SimpleMessenger.cc:363
> #16 0x00000000007336ed in SimpleMessenger::DispatchThread::entry() ()
> #17 0x00007f10c00ca8ca in start_thread (arg=<value optimized out>) at
> pthread_create.c:300
> #18 0x00007f10be95292d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #19 0x0000000000000000 in ?? ()
>
> So i wonder is the crash because of the missing file message?

Okay, that is what I wanted. It looks like it can't find the
snaprealm, and I have a pretty good guess why.
If you're building your own binaries, you can apply the patch below
and I bet things will work. (Let me know if they do or don't!)
-Greg


diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
index 70faeb8..becccf5 100644
--- a/src/mds/CInode.cc
+++ b/src/mds/CInode.cc
@@ -2130,7 +2130,7 @@ SnapRealm *CInode::find_snaprealm()
   while (!cur->snaprealm) {
     if (cur->get_parent_dn())
       cur = cur->get_parent_dn()->get_dir()->get_inode();
-    else if (get_projected_parent_dn())
+    else if (cur->get_projected_parent_dn())
       cur = cur->get_projected_parent_dn()->get_dir()->get_inode();
     else
       break;
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-23 21:39               ` Gregory Farnum
@ 2012-05-24 12:29                 ` Felix Feinhals
  2012-06-04 19:49                   ` Greg Farnum
  0 siblings, 1 reply; 10+ messages in thread
From: Felix Feinhals @ 2012-05-24 12:29 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Hi,

i was using the Debian Packages, but i tried now from source.
I used the same version from GIT
(cb7f1c9c7520848b0899b26440ac34a8acea58d1) and compiled it. Same crash
report.
Then i applied your patch but again the same crash, i think the
backtrace is also the same:

 (gdb) thread 1
[Switching to thread 1 (Thread 9564)]#0  0x00007f33a3e58ebb in raise
(sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
41      in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
(gdb) backtrace
#0  0x00007f33a3e58ebb in raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
#1  0x000000000081423e in reraise_fatal (signum=11) at
global/signal_handler.cc:58
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
#3  <signal handler called>
#4  SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
at mds/snap.cc:112
#5  0x000000000055d58b in MDCache::check_realm_past_parents
(this=0x27a7200, realm=0x0)
    at mds/MDCache.cc:4495
#6  0x0000000000572eec in
MDCache::choose_lock_states_and_reconnect_caps (this=0x27a7200)
    at mds/MDCache.cc:4533
#7  0x00000000005931a0 in MDCache::rejoin_gather_finish
(this=0x27a7200) at mds/MDCache.cc:4444
#8  0x000000000059b9d5 in MDCache::rejoin_send_rejoins
(this=0x27a7200) at mds/MDCache.cc:3388
#9  0x00000000004a8721 in MDS::rejoin_joint_start (this=0x27bc000) at
mds/MDS.cc:1404
#10 0x00000000004c253a in MDS::handle_mds_map (this=0x27bc000,
m=<value optimized out>)
    at mds/MDS.cc:968
#11 0x00000000004c4513 in MDS::handle_core_message (this=0x27bc000,
m=0x27ab800) at mds/MDS.cc:1651
#12 0x00000000004c45ef in MDS::_dispatch (this=0x27bc000, m=0x27ab800)
at mds/MDS.cc:1790
#13 0x00000000004c628b in MDS::ms_dispatch (this=0x27bc000,
m=0x27ab800) at mds/MDS.cc:1602
#14 0x0000000000732609 in Messenger::ms_deliver_dispatch
(this=0x279f680) at msg/Messenger.h:178
#15 SimpleMessenger::dispatch_entry (this=0x279f680) at
msg/SimpleMessenger.cc:363
#16 0x00000000007207ad in SimpleMessenger::DispatchThread::entry() ()
#17 0x00007f33a3e508ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#18 0x00007f33a26d892d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()

Any more ideas? :)
Or can i get you more debugging output?



2012/5/23 Gregory Farnum <greg@inktank.com>:
> On Wed, May 23, 2012 at 5:28 AM, Felix Feinhals
> <ff@turtle-entertainment.de> wrote:
>> Hey,
>>
>> ok i installed libc-dbg and run your commands now this comes up:
>>
>> gdb /usr/bin/ceph-mds core
>>
>> snip
>>
>> GNU gdb (GDB) 7.0.1-debian
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-linux-gnu".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/bin/ceph-mds...Reading symbols from
>> /usr/lib/debug/usr/bin/ceph-mds...done.
>> (no debugging symbols found)...done.
>> [New Thread 22980]
>> [New Thread 22984]
>> [New Thread 22986]
>> [New Thread 22979]
>> [New Thread 22970]
>> [New Thread 22981]
>> [New Thread 22971]
>> [New Thread 22976]
>> [New Thread 22973]
>> [New Thread 22975]
>> [New Thread 22974]
>> [New Thread 22972]
>> [New Thread 22978]
>> [New Thread 22982]
>>
>> warning: Can't read pathname for load map: Input/output error.
>> Reading symbols from /lib/libpthread.so.0...Reading symbols from
>> /usr/lib/debug/lib/libpthread-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/libpthread.so.0
>> Reading symbols from /usr/lib/libcrypto++.so.8...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libcrypto++.so.8
>> Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libuuid.so.1
>> Reading symbols from /lib/librt.so.1...Reading symbols from
>> /usr/lib/debug/lib/librt-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/librt.so.1
>> Reading symbols from /usr/lib/libtcmalloc.so.0...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libtcmalloc.so.0
>> Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libstdc++.so.6
>> Reading symbols from /lib/libm.so.6...Reading symbols from
>> /usr/lib/debug/lib/libm-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/libm.so.6
>> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libgcc_s.so.1
>> Reading symbols from /lib/libc.so.6...Reading symbols from
>> /usr/lib/debug/lib/libc-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib/libc.so.6
>> Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
>> from /usr/lib/debug/lib/ld-2.11.3.so...done.
>> (no debugging symbols found)...done.
>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>> Reading symbols from /usr/lib/libunwind.so.7...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libunwind.so.7
>> Core was generated by `/usr/bin/ceph-mds -i c --pid-file
>> /var/run/ceph/mds.c.pid -c /etc/ceph/ceph.con'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
>> 41      ../nptl/sysdeps/unix/sysv/linux/pt-raise.c: No such file or directory.
>>        in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>>
>> snip
>>
>> Now
>>
>> thread apply all bt
>>
>> ...
>>
>> thread 1
>> [Switching to thread 1 (Thread 22977)]#0  0x00007f10c00d2ebb in raise
>> (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
>> 41      in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
>>
>>
>> Thread 1 (Thread 22977):
>> ---Type <return> to continue, or q <return> to quit---
>> #0  0x00007f10c00d2ebb in raise (sig=<value optimized out>) at
>> ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
>> #1  0x000000000081469e in reraise_fatal (signum=11) at
>> global/signal_handler.cc:58
>> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:104
>> #3  <signal handler called>
>> #4  SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
>> at mds/snap.cc:112
>>
>> #5  0x000000000055d58b in MDCache::check_realm_past_parents
>> (this=0x2b49200, realm=0x0) at mds/MDCache.cc:4495
>> #6  0x0000000000572eec in
>> MDCache::choose_lock_states_and_reconnect_caps (this=0x2b49200) at
>> mds/MDCache.cc:4533
>> #7  0x00000000005931a0 in MDCache::rejoin_gather_finish
>> (this=0x2b49200) at mds/MDCache.cc:4444
>> #8  0x000000000059b9d5 in MDCache::rejoin_send_rejoins
>> (this=0x2b49200) at mds/MDCache.cc:3388
>> #9  0x00000000004a8721 in MDS::rejoin_joint_start (this=0x2b5e000) at
>> mds/MDS.cc:1404
>> #10 0x00000000004c253a in MDS::handle_mds_map (this=0x2b5e000,
>> m=<value optimized out>) at mds/MDS.cc:968
>> #11 0x00000000004c4513 in MDS::handle_core_message (this=0x2b5e000,
>> m=0x2b4d800) at mds/MDS.cc:1651
>> #12 0x00000000004c45ef in MDS::_dispatch (this=0x2b5e000, m=0x2b4d800)
>> at mds/MDS.cc:1790
>> #13 0x00000000004c628b in MDS::ms_dispatch (this=0x2b5e000,
>> m=0x2b4d800) at mds/MDS.cc:1602
>> #14 0x00000000007acb49 in Messenger::ms_deliver_dispatch
>> (this=0x2b41680) at msg/Messenger.h:178
>> #15 SimpleMessenger::dispatch_entry (this=0x2b41680) at
>> msg/SimpleMessenger.cc:363
>> #16 0x00000000007336ed in SimpleMessenger::DispatchThread::entry() ()
>> #17 0x00007f10c00ca8ca in start_thread (arg=<value optimized out>) at
>> pthread_create.c:300
>> #18 0x00007f10be95292d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #19 0x0000000000000000 in ?? ()
>>
>> So i wonder is the crash because of the missing file message?
>
> Okay, that is what I wanted. It looks like it can't find the
> snaprealm, and I have a pretty good guess why.
> If you're building your own binaries, you can apply the patch below
> and I bet things will work. (Let me know if they do or don't!)
> -Greg
>
>
> diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
> index 70faeb8..becccf5 100644
> --- a/src/mds/CInode.cc
> +++ b/src/mds/CInode.cc
> @@ -2130,7 +2130,7 @@ SnapRealm *CInode::find_snaprealm()
>   while (!cur->snaprealm) {
>     if (cur->get_parent_dn())
>       cur = cur->get_parent_dn()->get_dir()->get_inode();
> -    else if (get_projected_parent_dn())
> +    else if (cur->get_projected_parent_dn())
>       cur = cur->get_projected_parent_dn()->get_dir()->get_inode();
>     else
>       break;
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: MDS crash, wont startup again
  2012-05-24 12:29                 ` Felix Feinhals
@ 2012-06-04 19:49                   ` Greg Farnum
  0 siblings, 0 replies; 10+ messages in thread
From: Greg Farnum @ 2012-06-04 19:49 UTC (permalink / raw)
  To: Felix Feinhals; +Cc: ceph-devel

On Thursday, May 24, 2012 at 5:29 AM, Felix Feinhals wrote:
> Hi,
>  
> i was using the Debian Packages, but i tried now from source.
> I used the same version from GIT
> (cb7f1c9c7520848b0899b26440ac34a8acea58d1) and compiled it. Same crash
> report.
> Then i applied your patch but again the same crash, i think the
> backtrace is also the same:
>  
> (gdb) thread 1
> [Switching to thread 1 (Thread 9564)]#0 0x00007f33a3e58ebb in raise
> (sig=<value optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> 41 in ../nptl/sysdeps/unix/sysv/linux/pt-raise.c
> (gdb) backtrace
> #0 0x00007f33a3e58ebb in raise (sig=<value optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:41
> #1 0x000000000081423e in reraise_fatal (signum=11) at
> global/signal_handler.cc:58 (http://signal_handler.cc:58)
> #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:104 (http://signal_handler.cc:104)
> #3 <signal handler called>
> #4 SnapRealm::have_past_parents_open (this=0x0, first=..., last=...)
> at mds/snap.cc:112 (http://snap.cc:112)
> #5 0x000000000055d58b in MDCache::check_realm_past_parents
> (this=0x27a7200, realm=0x0)
> at mds/MDCache.cc:4495 (http://MDCache.cc:4495)
> #6 0x0000000000572eec in
> MDCache::choose_lock_states_and_reconnect_caps (this=0x27a7200)
> at mds/MDCache.cc:4533 (http://MDCache.cc:4533)
> #7 0x00000000005931a0 in MDCache::rejoin_gather_finish
> (this=0x27a7200) at mds/MDCache.cc:4444 (http://MDCache.cc:4444)
> #8 0x000000000059b9d5 in MDCache::rejoin_send_rejoins
> (this=0x27a7200) at mds/MDCache.cc:3388 (http://MDCache.cc:3388)
> #9 0x00000000004a8721 in MDS::rejoin_joint_start (this=0x27bc000) at
> mds/MDS.cc:1404 (http://MDS.cc:1404)
> #10 0x00000000004c253a in MDS::handle_mds_map (this=0x27bc000,
> m=<value optimized out>)
> at mds/MDS.cc:968 (http://MDS.cc:968)
> #11 0x00000000004c4513 in MDS::handle_core_message (this=0x27bc000,
> m=0x27ab800) at mds/MDS.cc:1651 (http://MDS.cc:1651)
> #12 0x00000000004c45ef in MDS::_dispatch (this=0x27bc000, m=0x27ab800)
> at mds/MDS.cc:1790 (http://MDS.cc:1790)
> #13 0x00000000004c628b in MDS::ms_dispatch (this=0x27bc000,
> m=0x27ab800) at mds/MDS.cc:1602 (http://MDS.cc:1602)
> #14 0x0000000000732609 in Messenger::ms_deliver_dispatch
> (this=0x279f680) at msg/Messenger.h:178
> #15 SimpleMessenger::dispatch_entry (this=0x279f680) at
> msg/SimpleMessenger.cc:363 (http://SimpleMessenger.cc:363)
> #16 0x00000000007207ad in SimpleMessenger::DispatchThread::entry() ()
> #17 0x00007f33a3e508ca in start_thread (arg=<value optimized out>) at
> pthread_create.c:300
> #18 0x00007f33a26d892d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #19 0x0000000000000000 in ?? ()
>  
> Any more ideas? :)
> Or can i get you more debugging output?

Sorry for the delay — I'm afraid that's a hazard of using the MDS before we're ready to support it. :(
Anyway, I haven't had a lot of time to look into this, but that makes it look like there's an actual problem, where one of the inodes can't find the "SnapRealm" which it lives in. Things that will make this easier to diagnose (in the event that somebody gets the time) include generating high-level debug logs and placing them somewhere accessible (start up the MDS with "debug mds = 20" added to the config file); if you want you could also try the below patch (which will cause the MDS to dump its full inode cache upon triggering this bug) and we can see if there's anything really obvious.
(This is a fine thing to make bug reports on at tracker.newdream.net, btw — and that allows attachments of things like log files.)
-Greg

diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index 143faca..6aa5923 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -4527,6 +4527,11 @@ void MDCache::choose_lock_states_and_reconnect_caps()
dout(15) << " chose lock states on " << *in << dendl;

SnapRealm *realm = in->find_snaprealm();
+ if (!realm) {
+ dout(0) << "serious error, could not find snaprealm for in " << *in
+ << ", triggering cache dump" << dendl;
+ dump_cache();
+ }

check_realm_past_parents(realm);



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-06-04 19:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CA+WUqKxZoh_5k5qgtUFwnsr2=gaZ4_RONazYQG7enKa75WhNVQ@mail.gmail.com>
2012-05-16  8:11 ` MDS crash, wont startup again Felix Feinhals
2012-05-17 21:38   ` Josh Durgin
2012-05-21 12:38     ` Felix Feinhals
2012-05-21 18:44       ` Gregory Farnum
2012-05-22 10:12         ` Felix Feinhals
2012-05-22 17:56           ` Greg Farnum
2012-05-23 12:28             ` Felix Feinhals
2012-05-23 21:39               ` Gregory Farnum
2012-05-24 12:29                 ` Felix Feinhals
2012-06-04 19:49                   ` Greg Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.