* Problem after ceph-osd crash @ 2012-02-20 16:59 Oliver Francke 2012-02-20 17:41 ` Sage Weil 0 siblings, 1 reply; 4+ messages in thread From: Oliver Francke @ 2012-02-20 16:59 UTC (permalink / raw) To: ceph-devel Hi, we are just in trouble after some mess with trying to include a new OSD-node into our cluster. We get some weird "libceph: corrupt inc osdmap epoch 880 off 102 (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)" on the console. The whole system is in a state ala: 012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43 active+recovering+degraded+remapped+backfill, 218 active+recovering, 437 active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB / 29794 GB avail; 272914/1349073 degraded (20.230%) and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the degrading continues to shrink down below 20%. Any clues? Thnx in @vance, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 33330 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem after ceph-osd crash 2012-02-20 16:59 Problem after ceph-osd crash Oliver Francke @ 2012-02-20 17:41 ` Sage Weil 2012-02-20 17:49 ` Oliver Francke 0 siblings, 1 reply; 4+ messages in thread From: Sage Weil @ 2012-02-20 17:41 UTC (permalink / raw) To: Oliver Francke; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1346 bytes --] On Mon, 20 Feb 2012, Oliver Francke wrote: > Hi, > > we are just in trouble after some mess with trying to include a new OSD-node > into our cluster. > > We get some weird "libceph: corrupt inc osdmap epoch 880 off 102 > (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)" > > on the console. > The whole system is in a state ala: > > 012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43 > active+recovering+degraded+remapped+backfill, 218 active+recovering, 437 > active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB / > 29794 GB avail; 272914/1349073 degraded (20.230%) > > and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the > degrading continues to shrink down below 20%. How did ceph-osd crash? Is there a dump in the log? sage > > Any clues? > > Thnx in @vance, > > Oliver. > > -- > > Oliver Francke > > filoo GmbH > Moltkestraße 25a > 33330 Gütersloh > HRB4355 AG Gütersloh > > Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem after ceph-osd crash 2012-02-20 17:41 ` Sage Weil @ 2012-02-20 17:49 ` Oliver Francke 2012-02-20 18:05 ` Sage Weil 0 siblings, 1 reply; 4+ messages in thread From: Oliver Francke @ 2012-02-20 17:49 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hi Sage, On 02/20/2012 06:41 PM, Sage Weil wrote: > On Mon, 20 Feb 2012, Oliver Francke wrote: >> Hi, >> >> we are just in trouble after some mess with trying to include a new OSD-node >> into our cluster. >> >> We get some weird "libceph: corrupt inc osdmap epoch 880 off 102 >> (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)" >> >> on the console. >> The whole system is in a state ala: >> >> 012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43 >> active+recovering+degraded+remapped+backfill, 218 active+recovering, 437 >> active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB / >> 29794 GB avail; 272914/1349073 degraded (20.230%) >> >> and sometimes the ceph-osd on node0 is crashing. At the moment of writing, the >> degrading continues to shrink down below 20%. > How did ceph-osd crash? Is there a dump in the log? 'course I will provide all logs, uhm, a bit later, we are busy to start all VM's, and handle first customer-tickets right now ;-) To be most complete for the collection, would you be so kind to give a list of all necessary kern.log osdX.log etc.? Thnx for the fast reaction, Oliver. > sage > >> Any clues? >> >> Thnx in @vance, >> >> Oliver. >> >> -- >> >> Oliver Francke >> >> filoo GmbH >> Moltkestraße 25a >> 33330 Gütersloh >> HRB4355 AG Gütersloh >> >> Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz >> >> Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- Oliver Francke filoo GmbH Moltkestraße 25a 33330 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem after ceph-osd crash 2012-02-20 17:49 ` Oliver Francke @ 2012-02-20 18:05 ` Sage Weil 0 siblings, 0 replies; 4+ messages in thread From: Sage Weil @ 2012-02-20 18:05 UTC (permalink / raw) To: Oliver Francke; +Cc: ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 2898 bytes --] On Mon, 20 Feb 2012, Oliver Francke wrote: > Hi Sage, > > On 02/20/2012 06:41 PM, Sage Weil wrote: > > On Mon, 20 Feb 2012, Oliver Francke wrote: > > > Hi, > > > > > > we are just in trouble after some mess with trying to include a new > > > OSD-node > > > into our cluster. > > > > > > We get some weird "libceph: corrupt inc osdmap epoch 880 off 102 > > > (ffffc9001db8990a of ffffc9001db898a4-ffffc9001db89dae)" I just retested the kernel client against the new server code and I don't see this. If you can pull the osdmap/880 file from the monitor data directory (soon, please, the monitor will delete it once things fully recover and move on) I can see what the data looks like. > > > > > > on the console. > > > The whole system is in a state ala: > > > > > > 012-02-20 17:56:27.585295 pg v942504: 2046 pgs: 1348 active+clean, 43 > > > active+recovering+degraded+remapped+backfill, 218 active+recovering, 437 > > > active+recovering+remapped+backfill; 1950 GB data, 3734 GB used, 26059 GB > > > / > > > 29794 GB avail; 272914/1349073 degraded (20.230%) > > > > > > and sometimes the ceph-osd on node0 is crashing. At the moment of writing, > > > the > > > degrading continues to shrink down below 20%. > > How did ceph-osd crash? Is there a dump in the log? > > 'course I will provide all logs, uhm, a bit later, we are busy to start all > VM's, and handle first customer-tickets right now ;-) > > To be most complete for the collection, would you be so kind to give a > list of all necessary kern.log osdX.log etc.? I think just the crashed osd log will be enough. It looks like the rest of the cluster is recovering ok... Are the VMs running on top of the kernel rbd client, or KVM+librbd? sage > > Thnx for the fast reaction, > > Oliver. > > > sage > > > > > Any clues? > > > > > > Thnx in @vance, > > > > > > Oliver. > > > > > > -- > > > > > > Oliver Francke > > > > > > filoo GmbH > > > Moltkestraße 25a > > > 33330 Gütersloh > > > HRB4355 AG Gütersloh > > > > > > Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz > > > > > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > > Oliver Francke > > filoo GmbH > Moltkestraße 25a > 33330 Gütersloh > HRB4355 AG Gütersloh > > Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-02-20 18:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-20 16:59 Problem after ceph-osd crash Oliver Francke 2012-02-20 17:41 ` Sage Weil 2012-02-20 17:49 ` Oliver Francke 2012-02-20 18:05 ` Sage Weil
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.