From: Zenon Panoussis <oracle@provocation.net>
To: ceph-devel@vger.kernel.org
Subject: Re: Suicide
Date: Sat, 16 Apr 2011 01:29:51 +0200 [thread overview]
Message-ID: <4DA8D4EF.8090006@provocation.net> (raw)
In-Reply-To: <161F0A2E50F1465996C6A9075A508C85@gmail.com>
On 04/16/2011 01:06 AM, Gregory Farnum wrote:
> Hmm, that timeline doesn't quite make sense -- node01 takes over the MDS
> duties at 4:33 and crashes, but then it starts up again at 4:50. But it's
> possible that node02 took over in the interval there and we just don't see
> it because the log disk was full (I had erroneously thought that a filled
> disk would hang the daemon but that turns out not to be the case). So I'd
> guess you shut everything down sometime after 5:08, and that would make
> sense.
Indeed, you're probably right.
> Unfortunately what we're really interested in is what caused the assert
> failure on node01 at 4:35 and the reasons for that aren't available in the
> logs we have. :(
> This is the second time we've seen that assert but we've not been able to
> reproduce it or figure out how the invariant that it's checking against got
> broken. If you like we can come up with a hacky fix that should let your
> cluster come back up, but it's possible that you'd lose some data and this
> is a very rare condition so if it's not a big deal I'd just re-create your
> cluster.
My data has been safe elsewhere all along and I have already re-created the
cluster. In other words I don't need the hacky fix, but someone else might
be desperate for it in the future, so creating it could be a good idea anyway.
However, the cause of the corruption is still an open issue that ought to be
understood and solved. The most likely place to be able to reproduce it at is
right here, so if you think it's useful, I'm willing to try to crash it again.
If you want me to, let's make a plan for it. These are just test boxes and
I have no problem even giving you root on them, if that can help pinpoint
the cause of the corruption.
Z
next prev parent reply other threads:[~2011-04-15 23:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-15 14:48 Suicide Zenon Panoussis
2011-04-15 15:16 ` Suicide Zenon Panoussis
2011-04-15 15:38 ` Suicide Zenon Panoussis
2011-04-15 20:21 ` Suicide Gregory Farnum
2011-04-15 22:38 ` Suicide Zenon Panoussis
2011-04-15 23:06 ` Suicide Gregory Farnum
2011-04-15 23:29 ` Zenon Panoussis [this message]
2011-04-16 0:00 ` Suicide Gregory Farnum
2011-04-16 9:53 ` Suicide Zenon Panoussis
2011-04-16 23:50 ` Suicide Zenon Panoussis
2011-04-17 0:14 ` Suicide Zenon Panoussis
2011-04-18 16:40 ` Suicide Gregory Farnum
2011-04-18 21:21 ` Suicide Gregory Farnum
2011-04-18 22:38 ` Suicide Zenon Panoussis
2011-04-18 23:02 ` Suicide Gregory Farnum
2011-04-19 0:17 ` Suicide Colin McCabe
2011-04-19 10:45 ` Suicide Zenon Panoussis
2011-04-19 16:29 ` Suicide Gregory Farnum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DA8D4EF.8090006@provocation.net \
--to=oracle@provocation.net \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.