From: Alexandre Oliva <oliva@gnu.org>
To: "Yan, Zheng" <zheng.z.yan@intel.com>
Cc: Gregory Farnum <greg@inktank.com>, Sage Weil <sage@inktank.com>,
"Yan, Zheng" <ukernel@gmail.com>,
ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH] mds: handle setxattr ceph.parent
Date: Fri, 20 Dec 2013 22:22:12 -0200 [thread overview]
Message-ID: <ord2kr3vez.fsf@livre.home> (raw)
In-Reply-To: <52B3F9DE.60301@intel.com> (Zheng Yan's message of "Fri, 20 Dec 2013 16:03:42 +0800")
On Dec 20, 2013, "Yan, Zheng" <zheng.z.yan@intel.com> wrote:
> On 12/20/2013 11:35 AM, Alexandre Oliva wrote:
>> On Dec 19, 2013, "Yan, Zheng" <zheng.z.yan@intel.com> wrote:
>>
>>> next time you encountered log corruption, please open "new issues" at http://tracker.ceph.com/
>>
>> I have saved the dumped journal, but it contains too much information
>> I'm not supposed to or willing to share, especially with a US-based
>> company in the post-Snowden era :-( (something like btrfs-image -s
>> might be good to have)
>>
>> I'd be glad to try to dig info that might be useful out of it, if told
>> how to do so, but uploading the journal as taken from the cluster is not
>> an option.
> can you send out the mds log (the latest few hundreds of lines) when the mds crashed.
It didn't quite crash. Here's what it spit out:
mds.0.cache creating system inode with ino:1
mds.0.journaler(ro) try_read_entry got 0 len entry at offset 1248878082748
mds.0.log _replay journaler got error -22, aborting
mds.0.1307 boot_start encountered an error, failing
mds.0.1307 suicide. wanted down:dne, now up:replay
Before this, the mds was brought down along with the rest of the
cluster:
mds.0.1273 *** got signal Terminated ***
mds.0.1273 suicide. wanted down:dne, now up:active
The --reset-journal session, that aborted because of the assertion
failure I mentioned in the patch, added the recent message log, but
there's nothing useful there: the mds asked osds for each one of the log
segments, got successful responses for them all, and then proceeded to
reset the journal. The only oddity I found was this:
client.411305.journaler(ro) _finish_probe_end write_pos = 1248890415494 (header had 1248890413881). recovered.
For reference, here's the complete loghead info:
client.411305.journaler(ro) _finish_read_head loghead(trim 1248761741312, expire 1248761744503, write 1248890413881). probing for end of log (from 1248890413881)...
Note how the offset at which we got the zero-len entry is not close to
either end of the (non-trimmed portion of) the log. 200.00048b1b, where
the zero-len entry was, was not only read successfully, the
corresponding file in the osds had the correct 4MB size, and the md5sum
of that file was the same on all 4 osds that contained it. So, whatever
it was that corrupted the log happened long before the mds shutdown;
200.00048b1b's timestamp is 1387397117, while 200.00048b1e's (the tail
before the journal reset) is 1387430417; the 2 other files in between
are spaced at a nearly constant rate of 10000 seconds per 4MB.
The mds log has nothing special for that time range, but from the mon
and the osd logs I can tell that was about the time in which I rolled
back many of the osds to recent snapshots thereof, from which I'd
cleaned all traces of the user.ceph._parent. I intended to roll back
only 2 out of the 4 replicas of each pg at a time, and wait for them to
recover, but I recall that during this process, two osds came down
before they were meant to, because of timeouts caused by the
metadata-heavy recursive setfattr. I guess this may have caused me to
lose changes that had already been committed, by rolling back all the
osds that had those changes.
So I think by now I'm happy to announce that it was an IO error (where
IO stands for Incompetence of the Operator ;-)
Sorry about this disturbance, and thanks for asking me to investigate it
further and find a probable cause that involves no fault of Ceph's.
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist Red Hat Brazil Compiler Engineer
next prev parent reply other threads:[~2013-12-21 0:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 11:25 [PATCH] mds: handle setxattr ceph.parent Alexandre Oliva
2013-12-18 2:47 ` Yan, Zheng
2013-12-18 16:32 ` Alexandre Oliva
2013-12-18 17:09 ` Sage Weil
2013-12-18 21:10 ` Gregory Farnum
2013-12-19 8:00 ` Alexandre Oliva
2013-12-19 13:27 ` Yan, Zheng
2013-12-20 3:35 ` Alexandre Oliva
2013-12-20 8:03 ` Yan, Zheng
2013-12-21 0:22 ` Alexandre Oliva [this message]
2013-12-21 0:50 ` Alexandre Oliva
2014-01-06 17:22 ` Gregory Farnum
2014-01-07 4:15 ` Alexandre Oliva
2014-01-07 16:39 ` Gregory Farnum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ord2kr3vez.fsf@livre.home \
--to=oliva@gnu.org \
--cc=ceph-devel@vger.kernel.org \
--cc=greg@inktank.com \
--cc=sage@inktank.com \
--cc=ukernel@gmail.com \
--cc=zheng.z.yan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.