Re: OSD doesn't start - Székelyi Szabolcs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Székelyi Szabolcs" <szekelyi@niif.hu>
To: ceph-devel@vger.kernel.org
Subject: Re: OSD doesn't start
Date: Fri, 06 Jul 2012 01:33:13 +0200	[thread overview]
Message-ID: <1680690.nczT3S6HBC@mranderson> (raw)
In-Reply-To: <95834053.QbLuzMQ4OG@mranderson>

On 2012. July 5. 16:12:42 Székelyi Szabolcs wrote:
> On 2012. July 4. 09:34:04 Gregory Farnum wrote:
> > Hrm, it looks like the OSD data directory got a little busted somehow. How
> > did you perform your upgrade? (That is, how did you kill your daemons, in
> > what order, and when did you bring them back up.)
> 
> Since it would be hard and long to describe in text, I've collected the
> relevant log entries, sorted by time at http://pastebin.com/Ev3M4DQ9 . The
> short story is that after seeing that the OSDs won't start, I tried to bring
> down the whole cluster and start it up from scratch. It didn't change
> anything, so I rebooted the two machines (running all three daemons), to
> see if it changes anything. It didn't and I gave up.
> 
> My ceph config is available at http://pastebin.com/KKNjmiWM .
> 
> Since this is my test cluster, I'm not very concerned about the data on it.
> But the other one, with the same config, is dying I think. ceph-fuse is
> eating around 75% CPU on the sole monitor ("cc") node. The monitor about
> 15%. On the other two nodes, the OSD eats around 50%, the MDS 15%, the
> monitor another 10%. No Ceph filesystem activity is going on at the moment.
> Blktrace reports about 1kB/s disk traffic on the partition hosting the OSD
> data dir. The data seems to be accessible at the moment, but I'm afraid
> that my production cluster will end up in a similar situation after
> upgrade, so I don't dare to touch it.
> 
> Do you have any suggestion what I should check?

Yes, it definitely looks like dying. Besides the above symptoms all clients' 
ceph-fuse burn the CPU, there are unreadable files on the fs (tar blocks on 
them infinitely), the FUSE clients emit messages like

ceph-fuse: 2012-07-05 23:21:41.583692 7f444dfd5700  0 -- client_ip:0/1181 
send_message dropped message ping v1 because of no pipe on con 0x1034000

every 5 seconds. I tried to backup the data on it, but it got blocked in the 
middle. Since then I'm unable to get any data out of it, not even by killing 
ceph-fuse and remounting the fs.

-- 
cc


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2012-07-06  8:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-04 15:31 OSD doesn't start Székelyi Szabolcs
2012-07-04 16:34 ` Gregory Farnum
2012-07-05 14:12   ` Székelyi Szabolcs
2012-07-05 23:33     ` Székelyi Szabolcs [this message]
2012-07-08 18:51       ` Székelyi Szabolcs
2012-07-08 18:53   ` Székelyi Szabolcs
2012-07-09 16:18     ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1680690.nczT3S6HBC@mranderson \
    --to=szekelyi@niif.hu \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.