* Problem after ceph-osd crash
@ 2012-02-20 16:59 Oliver Francke
2012-02-20 17:41 ` Sage Weil
0 siblings, 1 reply; 4+ messages in thread
From: Oliver Francke @ 2012-02-20 16:59 UTC (permalink / raw)
To: ceph-devel
Hi,
we are just in trouble after some mess with trying to include a new
OSD-node into our cluster.
We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
(ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"
on the console.
The whole system is in a state ala:
012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43
active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059
GB / 29794 GB avail; 272914/1349073 degraded (20.230%)
and sometimes the ceph-osd on node0 is crashing. At the moment of
writing, the degrading continues to shrink down below 20%.
Any clues?
Thnx in @vance,
Oliver.
--
Oliver Francke
filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh
Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem after ceph-osd crash
2012-02-20 16:59 Problem after ceph-osd crash Oliver Francke
@ 2012-02-20 17:41 ` Sage Weil
2012-02-20 17:49 ` Oliver Francke
0 siblings, 1 reply; 4+ messages in thread
From: Sage Weil @ 2012-02-20 17:41 UTC (permalink / raw)
To: Oliver Francke; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1346 bytes --]
On Mon, 20 Feb 2012, Oliver Francke wrote:
> Hi,
>
> we are just in trouble after some mess with trying to include a new OSD-node
> into our cluster.
>
> We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
> (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"
>
> on the console.
> The whole system is in a state ala:
>
> 012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43
> active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
> active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB /
> 29794 GB avail; 272914/1349073 degraded (20.230%)
>
> and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the
> degrading continues to shrink down below 20%.
How did ceph-osd crash? Is there a dump in the log?
sage
>
> Any clues?
>
> Thnx in @vance,
>
> Oliver.
>
> --
>
> Oliver Francke
>
> filoo GmbH
> Moltkestraße 25a
> 33330 Gütersloh
> HRB4355 AG Gütersloh
>
> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
>
> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem after ceph-osd crash
2012-02-20 17:41 ` Sage Weil
@ 2012-02-20 17:49 ` Oliver Francke
2012-02-20 18:05 ` Sage Weil
0 siblings, 1 reply; 4+ messages in thread
From: Oliver Francke @ 2012-02-20 17:49 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hi Sage,
On 02/20/2012 06:41 PM, Sage Weil wrote:
> On Mon, 20 Feb 2012, Oliver Francke wrote:
>> Hi,
>>
>> we are just in trouble after some mess with trying to include a new OSD-node
>> into our cluster.
>>
>> We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
>> (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"
>>
>> on the console.
>> The whole system is in a state ala:
>>
>> 012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43
>> active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
>> active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB /
>> 29794 GB avail; 272914/1349073 degraded (20.230%)
>>
>> and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the
>> degrading continues to shrink down below 20%.
> How did ceph-osd crash? Is there a dump in the log?
'course I will provide all logs, uhm, a bit later, we are busy to start
all VM's, and handle first customer-tickets right now ;-)
To be most complete for the collection, would you be so kind to give a
list of all necessary kern.log osdX.log etc.?
Thnx for the fast reaction,
Oliver.
> sage
>
>> Any clues?
>>
>> Thnx in @vance,
>>
>> Oliver.
>>
>> --
>>
>> Oliver Francke
>>
>> filoo GmbH
>> Moltkestraße 25a
>> 33330 Gütersloh
>> HRB4355 AG Gütersloh
>>
>> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
>>
>> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
--
Oliver Francke
filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh
Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem after ceph-osd crash
2012-02-20 17:49 ` Oliver Francke
@ 2012-02-20 18:05 ` Sage Weil
0 siblings, 0 replies; 4+ messages in thread
From: Sage Weil @ 2012-02-20 18:05 UTC (permalink / raw)
To: Oliver Francke; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2898 bytes --]
On Mon, 20 Feb 2012, Oliver Francke wrote:
> Hi Sage,
>
> On 02/20/2012 06:41 PM, Sage Weil wrote:
> > On Mon, 20 Feb 2012, Oliver Francke wrote:
> > > Hi,
> > >
> > > we are just in trouble after some mess with trying to include a new
> > > OSD-node
> > > into our cluster.
> > >
> > > We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
> > > (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"
I just retested the kernel client against the new server code and I don't
see this. If you can pull the osdmap/880 file from the monitor data
directory (soon, please, the monitor will delete it once things fully
recover and move on) I can see what the data looks like.
> > >
> > > on the console.
> > > The whole system is in a state ala:
> > >
> > > 012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43
> > > active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
> > > active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB
> > > /
> > > 29794 GB avail; 272914/1349073 degraded (20.230%)
> > >
> > > and sometimes the ceph-osd on node0 is crashing. At the moment of writing,
> > > the
> > > degrading continues to shrink down below 20%.
> > How did ceph-osd crash? Is there a dump in the log?
>
> 'course I will provide all logs, uhm, a bit later, we are busy to start all
> VM's, and handle first customer-tickets right now ;-)
>
> To be most complete for the collection, would you be so kind to give a
> list of all necessary kern.log osdX.log etc.?
I think just the crashed osd log will be enough. It looks like the rest
of the cluster is recovering ok...
Are the VMs running on top of the kernel rbd client, or KVM+librbd?
sage
>
> Thnx for the fast reaction,
>
> Oliver.
>
> > sage
> >
> > > Any clues?
> > >
> > > Thnx in @vance,
> > >
> > > Oliver.
> > >
> > > --
> > >
> > > Oliver Francke
> > >
> > > filoo GmbH
> > > Moltkestraße 25a
> > > 33330 Gütersloh
> > > HRB4355 AG Gütersloh
> > >
> > > Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
> > >
> > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
>
>
> --
>
> Oliver Francke
>
> filoo GmbH
> Moltkestraße 25a
> 33330 Gütersloh
> HRB4355 AG Gütersloh
>
> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
>
> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-02-20 18:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-20 16:59 Problem after ceph-osd crash Oliver Francke
2012-02-20 17:41 ` Sage Weil
2012-02-20 17:49 ` Oliver Francke
2012-02-20 18:05 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.