All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Székelyi Szabolcs" <szekelyi@niif.hu>
To: ceph-devel@vger.kernel.org
Subject: Re: OSD doesn't start
Date: Thu, 05 Jul 2012 16:12:42 +0200	[thread overview]
Message-ID: <95834053.QbLuzMQ4OG@mranderson> (raw)
In-Reply-To: <F1FB8F95B3FA4FF19D53AE9F060D88F5@inktank.com>

On 2012. July 4. 09:34:04 Gregory Farnum wrote:
> Hrm, it looks like the OSD data directory got a little busted somehow. How
> did you perform your upgrade? (That is, how did you kill your daemons, in
> what order, and when did you bring them back up.)

Since it would be hard and long to describe in text, I've collected the 
relevant log entries, sorted by time at http://pastebin.com/Ev3M4DQ9 . The 
short story is that after seeing that the OSDs won't start, I tried to bring 
down the whole cluster and start it up from scratch. It didn't change 
anything, so I rebooted the two machines (running all three daemons), to see 
if it changes anything. It didn't and I gave up.

My ceph config is available at http://pastebin.com/KKNjmiWM .

Since this is my test cluster, I'm not very concerned about the data on it. 
But the other one, with the same config, is dying I think. ceph-fuse is eating 
around 75% CPU on the sole monitor ("cc") node. The monitor about 15%. On the 
other two nodes, the OSD eats around 50%, the MDS 15%, the monitor another 
10%. No Ceph filesystem activity is going on at the moment. Blktrace reports 
about 1kB/s disk traffic on the partition hosting the OSD data dir. The data 
seems to be accessible at the moment, but I'm afraid that my production 
cluster will end up in a similar situation after upgrade, so I don't dare to 
touch it.

Do you have any suggestion what I should check?

Thanks,
-- 
cc

> On Wednesday, July 4, 2012 at 8:31 AM, Székelyi Szabolcs wrote:
> > Hi,
> > 
> > after upgrading to 0.48 "Argonaut", my OSDs won't start up again. This
> > problem might not be related to the upgrade, since the cluster had
> > strange behavior before, too: ceph-fuse was spinning the CPU around 70%,
> > so did the OSDs. This happened to both of my clusters. Thought that
> > upgrading might solve the problem, but it just got worse.
> > 
> > I've copied the log of the OSD run to http://pastebin.com/XYRtfFMU . I've
> > rebooted all the nodes, but they still don't work.
> > 
> > What should I do to resurrect my OSDs?
> > 
> > Thanks,
> > --
> > cc
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > (mailto:majordomo@vger.kernel.org) More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-07-05 14:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-04 15:31 OSD doesn't start Székelyi Szabolcs
2012-07-04 16:34 ` Gregory Farnum
2012-07-05 14:12   ` Székelyi Szabolcs [this message]
2012-07-05 23:33     ` Székelyi Szabolcs
2012-07-08 18:51       ` Székelyi Szabolcs
2012-07-08 18:53   ` Székelyi Szabolcs
2012-07-09 16:18     ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95834053.QbLuzMQ4OG@mranderson \
    --to=szekelyi@niif.hu \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.