Problem after ceph-osd crash

All of lore.kernel.org
 help / color / mirror / Atom feed

* Problem after ceph-osd crash
@ 2012-02-20 16:59 Oliver Francke
  2012-02-20 17:41 ` Sage Weil
  0 siblings, 1 reply; 4+ messages in thread
From: Oliver Francke @ 2012-02-20 16:59 UTC (permalink / raw)
  To: ceph-devel

Hi,

we are just in trouble after some mess with trying to include a new 
OSD-node into our cluster.

We get some weird "libceph: corrupt inc osdmap epoch 880 off 102 
(ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"

on the console.
The whole system is in a state ala:

012-02-20 17:56:27.585295    pg v942504: 2046 pgs: 1348 active+clean, 43 
active+recovering+degraded+remapped+backfill, 218 active+recovering, 437 
active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 
GB / 29794 GB avail; 272914/1349073 degraded (20.230%)

and sometimes the ceph-osd on node0 is crashing. At the moment of 
writing, the degrading continues to shrink down below 20%.

Any clues?

Thnx in @vance,

Oliver.

-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problem after ceph-osd crash
  2012-02-20 16:59 Problem after ceph-osd crash Oliver Francke
@ 2012-02-20 17:41 ` Sage Weil
  2012-02-20 17:49   ` Oliver Francke
  0 siblings, 1 reply; 4+ messages in thread
From: Sage Weil @ 2012-02-20 17:41 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1346 bytes --]

On Mon, 20 Feb 2012, Oliver Francke wrote:
> Hi,
> 
> we are just in trouble after some mess with trying to include a new OSD-node
> into our cluster.
> 
> We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
> (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"
> 
> on the console.
> The whole system is in a state ala:
> 
> 012-02-20 17:56:27.585295    pg v942504: 2046 pgs: 1348 active+clean, 43
> active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
> active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB /
> 29794 GB avail; 272914/1349073 degraded (20.230%)
> 
> and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the
> degrading continues to shrink down below 20%.

How did ceph-osd crash?  Is there a dump in the log?

sage

> 
> Any clues?
> 
> Thnx in @vance,
> 
> Oliver.
> 
> -- 
> 
> Oliver Francke
> 
> filoo GmbH
> Moltkestraße 25a
> 33330 Gütersloh
> HRB4355 AG Gütersloh
> 
> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
> 
> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problem after ceph-osd crash
  2012-02-20 17:41 ` Sage Weil
@ 2012-02-20 17:49   ` Oliver Francke
  2012-02-20 18:05     ` Sage Weil
  0 siblings, 1 reply; 4+ messages in thread
From: Oliver Francke @ 2012-02-20 17:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,

On 02/20/2012 06:41 PM, Sage Weil wrote:
> On Mon, 20 Feb 2012, Oliver Francke wrote:
>> Hi,
>>
>> we are just in trouble after some mess with trying to include a new OSD-node
>> into our cluster.
>>
>> We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
>> (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"
>>
>> on the console.
>> The whole system is in a state ala:
>>
>> 012-02-20 17:56:27.585295    pg v942504: 2046 pgs: 1348 active+clean, 43
>> active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
>> active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB /
>> 29794 GB avail; 272914/1349073 degraded (20.230%)
>>
>> and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the
>> degrading continues to shrink down below 20%.
> How did ceph-osd crash?  Is there a dump in the log?

'course I will provide all logs, uhm, a bit later, we are busy to start 
all VM's, and handle first customer-tickets right now ;-)

To be most complete for the collection, would you be so kind to give a 
list of all necessary kern.log osdX.log etc.?

Thnx for the fast reaction,

Oliver.

> sage
>
>> Any clues?
>>
>> Thnx in @vance,
>>
>> Oliver.
>>
>> -- 
>>
>> Oliver Francke
>>
>> filoo GmbH
>> Moltkestraße 25a
>> 33330 Gütersloh
>> HRB4355 AG Gütersloh
>>
>> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
>>
>> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problem after ceph-osd crash
  2012-02-20 17:49   ` Oliver Francke
@ 2012-02-20 18:05     ` Sage Weil
  0 siblings, 0 replies; 4+ messages in thread
From: Sage Weil @ 2012-02-20 18:05 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2898 bytes --]

On Mon, 20 Feb 2012, Oliver Francke wrote:
> Hi Sage,
> 
> On 02/20/2012 06:41 PM, Sage Weil wrote:
> > On Mon, 20 Feb 2012, Oliver Francke wrote:
> > > Hi,
> > > 
> > > we are just in trouble after some mess with trying to include a new
> > > OSD-node
> > > into our cluster.
> > > 
> > > We get some weird "libceph: corrupt inc osdmap epoch 880 off 102
> > > (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)"

I just retested the kernel client against the new server code and I don't 
see this.  If you can pull the osdmap/880 file from the monitor data 
directory (soon, please, the monitor will delete it once things fully 
recover and move on) I can see what the data looks like.

> > > 
> > > on the console.
> > > The whole system is in a state ala:
> > > 
> > > 012-02-20 17:56:27.585295    pg v942504: 2046 pgs: 1348 active+clean, 43
> > > active+recovering+degraded+remapped+backfill, 218 active+recovering, 437
> > > active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB
> > > /
> > > 29794 GB avail; 272914/1349073 degraded (20.230%)
> > > 
> > > and sometimes the ceph-osd on node0 is crashing. At the moment of writing,
> > > the
> > > degrading continues to shrink down below 20%.
> > How did ceph-osd crash?  Is there a dump in the log?
> 
> 'course I will provide all logs, uhm, a bit later, we are busy to start all
> VM's, and handle first customer-tickets right now ;-)
>
> To be most complete for the collection, would you be so kind to give a 
> list of all necessary kern.log osdX.log etc.?

I think just the crashed osd log will be enough.  It looks like the rest 
of the cluster is recovering ok...

Are the VMs running on top of the kernel rbd client, or KVM+librbd?

sage


> 
> Thnx for the fast reaction,
> 
> Oliver.
> 
> > sage
> > 
> > > Any clues?
> > > 
> > > Thnx in @vance,
> > > 
> > > Oliver.
> > > 
> > > -- 
> > > 
> > > Oliver Francke
> > > 
> > > filoo GmbH
> > > Moltkestraße 25a
> > > 33330 Gütersloh
> > > HRB4355 AG Gütersloh
> > > 
> > > Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
> > > 
> > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> 
> 
> -- 
> 
> Oliver Francke
> 
> filoo GmbH
> Moltkestraße 25a
> 33330 Gütersloh
> HRB4355 AG Gütersloh
> 
> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
> 
> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-20 18:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-20 16:59 Problem after ceph-osd crash Oliver Francke
2012-02-20 17:41 ` Sage Weil
2012-02-20 17:49   ` Oliver Francke
2012-02-20 18:05     ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.