From: Dave Chinner <david@fromorbit.com>
To: Christian Theune <ct@flyingcircus.io>
Cc: linux-xfs@vger.kernel.org
Subject: Re: XFS bug?
Date: Thu, 1 Dec 2016 22:03:47 +1100 [thread overview]
Message-ID: <20161201110347.GD11750@dastard> (raw)
In-Reply-To: <0F1FE7A8-D5C6-4FFD-9B9C-E03FB1962F3E@flyingcircus.io>
On Wed, Nov 30, 2016 at 02:07:39PM +0100, Christian Theune wrote:
> Hi there,
>
> we’re running a Ceph cluster which had a very rough outage not
> long ago[1].
>
> When updating our previous kernels from 4.1.16 (Gentoo) to 4.4.27
> (Gentoo) we encountered the following problem in our production
> environment (but not in staging or development):
Hi Christian - thanks for perservering and getting this report to
the list. :P
>
> - Properly shut down and reboot the machine running Ceph OSDs on XFS w/ kernel 4.1.16.
> - Boot with 4.4.27, let the machine mount the FS’ and start OSDs
> - Have everything run 20-30 minutes
> - Ceph OSDs start crashing. Kernel shows messages attached in kern.log
Which shouldn't happen. I'm pretty sure it's the AGFL packing change
that has caused the problem here, but I'm still paging all that
back into memory and clearing out all the other little things I need
to before digging back into this. I have a couple of ideas about how
this could occur:
> An interesting error we saw during repair was this (I can’t remember or reconstruct whether this was on the 4.1 or 4.4 kernel):
>
> bad agbno 4294967295 in agfl, agno 12
> freeblk count 7 != flcount 6 in ag 12
> sb_fdblocks 82969993, counted 82969994
Because this:
> Note, that the agbno is 2**32-1 repeatedly
is NULLAGBNO, which is what the AGFL is initialised to by mkfs, and
indicates we're accessing a slot that hasn't been filled correctly.
> Also interesting: the broken filesystems and xfs_repair behaved
> completely differently whether talked to from a 4.1 or 4.4 kernel,
> thus the pattern of first running xfs_repair on 4.1 and then again
> on 4.4.
Yup, I'd expect that given that xfs_repair has the same AGFL packing
issue and what it ends up with is dependent on whether the packing
matches the kernel being run or not...
> This looks similar to [2] and may be related to the already fixed
> bug referenced by Dave in [3], but in our case there was no 32/64
> bit migration involved.
That was the initial discovery vector, but looking into this again I
suspect the issue is packing changes the slot indexing. I do have a
patchset where I started trying to fix all this up automatically,
and so I need to go back to that and sort out where I was up to and
see if I was addressing this index offset problem at all. This is
where I previously got up to:
https://www.spinics.net/lists/linux-xfs/msg00445.html
More tomorrow once I've dug in further...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2016-12-01 11:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-30 13:07 XFS bug? Christian Theune
2016-12-01 11:03 ` Dave Chinner [this message]
2016-12-01 11:56 ` Christian Theune
2016-12-01 20:15 ` Dave Chinner
[not found] ` <C28A1C2E-423B-48BC-8953-735B85CDFE08@flyingcircus.io>
2016-12-07 6:14 ` Dave Chinner
[not found] <87y7lrmnra.wl%peterc@chubb.wattle.id.au>
2007-03-21 1:17 ` XFS bug??? Nathan Scott
2007-03-21 2:24 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161201110347.GD11750@dastard \
--to=david@fromorbit.com \
--cc=ct@flyingcircus.io \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).