From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: Filippo Giunchedi <fgiunchedi@wikimedia.org>, linux-xfs@vger.kernel.org
Subject: Re: Recently-formatted XFS filesystems reporting negative used space
Date: Wed, 11 Jul 2018 08:40:26 +1000 [thread overview]
Message-ID: <20180710224026.GV2234@dastard> (raw)
In-Reply-To: <3895eebe-3fa2-a977-4021-3bd6ef645fdd@sandeen.net>
On Tue, Jul 10, 2018 at 04:39:26PM -0500, Eric Sandeen wrote:
> On 7/10/18 8:43 AM, Filippo Giunchedi wrote:
> > Hello,
> > a little background: at Wikimedia Foundation we are running a 30-hosts
> > Openstack Swift cluster to host user media uploads, each host has 12
> > spinning disks formatted individually with xfs.
> >
> > Some of the recently-formatted filesystems have started reporting
> > negative usage upon hitting around 70% usage, though some filesystems
> > on the same host kept reporting as expected:
> >
> > /dev/sdn1 3.7T -14T 17T - /srv/swift-storage/sdn1
> > /dev/sdh1 3.7T -13T 17T - /srv/swift-storage/sdh1
> > /dev/sdc1 3.7T 3.0T 670G 83% /srv/swift-storage/sdc1
> > /dev/sdk1 3.7T 3.1T 643G 83% /srv/swift-storage/sdk1
> >
> > We have experienced this bug only on the last four machines to be put
> > in service and formatted with xfsprogs 4.9.0+nmu1 from Debian Stretch.
> > The remaining hosts were formatted in the past with xfsprogs 3.2.1 or
> > older.
> > We have also a standby cluster in another datacenter with similar
> > configuration and hosts that received write traffic only but not read
> > traffic; the standby cluster hasn't experienced the bug and all
> > filesystems report the correct usage.
> > As far as I can tell the difference in xfsprogs version used for
> > formatting means defaults have changed, (e.g. crc is enabled on the
> > affected filesystems). Have you seen this issue before and do you know
> > how to fix it?
> >
> > I would love to help debugging this issue, we've been detailing the
> > work done so far at https://phabricator.wikimedia.org/T199198
>
> What kernel are the problematic nodes running?
>
> From your repair output:
>
> root@ms-be1040:~# xfs_repair -n /dev/sde1
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> sb_fdblocks 4461713825, counted 166746529
> - found root inode chunk
>
> that sb_fdblocks really is ~17T which indicates the problem
> really is on disk.
>
> 4461713825
> 100001001111100000101100110100001
> 166746529
> 1001111100000101100110100001
>
> you have a bit flipped in the problematic value... but you're running
> with CRCs so it seems unlikely to have been some sort of bit-rot (that,
> and the fact that you're hitting the same problem on multiple nodes).
>
> Soooo not sure what to say right now other than "your bad value has an
> extra bit set for some reason."
Looks like the superblock verifier doesn't bounds check free block
or free/used inode counts. Perhaps we should be checking this in
the verifier so in-memory corruption like this never makes it to
disk?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-07-10 22:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-10 13:43 Recently-formatted XFS filesystems reporting negative used space Filippo Giunchedi
2018-07-10 21:39 ` Eric Sandeen
2018-07-10 22:40 ` Dave Chinner [this message]
2018-07-13 17:44 ` Bill O'Donnell
2018-07-11 8:31 ` Filippo Giunchedi
2018-07-16 9:29 ` Filippo Giunchedi
2018-07-17 9:26 ` Carlos Maiolino
2018-07-20 10:20 ` Filippo Giunchedi
2018-07-22 0:03 ` Eric Sandeen
2018-07-30 10:02 ` Filippo Giunchedi
2018-07-30 23:42 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180710224026.GV2234@dastard \
--to=david@fromorbit.com \
--cc=fgiunchedi@wikimedia.org \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.