From: "Székelyi Szabolcs" <szekelyi@niif.hu>
To: ceph-devel@vger.kernel.org
Subject: Re: OSD doesn't start
Date: Fri, 06 Jul 2012 01:33:13 +0200 [thread overview]
Message-ID: <1680690.nczT3S6HBC@mranderson> (raw)
In-Reply-To: <95834053.QbLuzMQ4OG@mranderson>
On 2012. July 5. 16:12:42 Székelyi Szabolcs wrote:
> On 2012. July 4. 09:34:04 Gregory Farnum wrote:
> > Hrm, it looks like the OSD data directory got a little busted somehow. How
> > did you perform your upgrade? (That is, how did you kill your daemons, in
> > what order, and when did you bring them back up.)
>
> Since it would be hard and long to describe in text, I've collected the
> relevant log entries, sorted by time at http://pastebin.com/Ev3M4DQ9 . The
> short story is that after seeing that the OSDs won't start, I tried to bring
> down the whole cluster and start it up from scratch. It didn't change
> anything, so I rebooted the two machines (running all three daemons), to
> see if it changes anything. It didn't and I gave up.
>
> My ceph config is available at http://pastebin.com/KKNjmiWM .
>
> Since this is my test cluster, I'm not very concerned about the data on it.
> But the other one, with the same config, is dying I think. ceph-fuse is
> eating around 75% CPU on the sole monitor ("cc") node. The monitor about
> 15%. On the other two nodes, the OSD eats around 50%, the MDS 15%, the
> monitor another 10%. No Ceph filesystem activity is going on at the moment.
> Blktrace reports about 1kB/s disk traffic on the partition hosting the OSD
> data dir. The data seems to be accessible at the moment, but I'm afraid
> that my production cluster will end up in a similar situation after
> upgrade, so I don't dare to touch it.
>
> Do you have any suggestion what I should check?
Yes, it definitely looks like dying. Besides the above symptoms all clients'
ceph-fuse burn the CPU, there are unreadable files on the fs (tar blocks on
them infinitely), the FUSE clients emit messages like
ceph-fuse: 2012-07-05 23:21:41.583692 7f444dfd5700 0 -- client_ip:0/1181
send_message dropped message ping v1 because of no pipe on con 0x1034000
every 5 seconds. I tried to backup the data on it, but it got blocked in the
middle. Since then I'm unable to get any data out of it, not even by killing
ceph-fuse and remounting the fs.
--
cc
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-07-06 8:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-04 15:31 OSD doesn't start Székelyi Szabolcs
2012-07-04 16:34 ` Gregory Farnum
2012-07-05 14:12 ` Székelyi Szabolcs
2012-07-05 23:33 ` Székelyi Szabolcs [this message]
2012-07-08 18:51 ` Székelyi Szabolcs
2012-07-08 18:53 ` Székelyi Szabolcs
2012-07-09 16:18 ` Gregory Farnum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1680690.nczT3S6HBC@mranderson \
--to=szekelyi@niif.hu \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.