From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: Filippo Giunchedi <fgiunchedi@wikimedia.org>, linux-xfs@vger.kernel.org
Subject: Re: Recently-formatted XFS filesystems reporting negative used space
Date: Wed, 11 Jul 2018 08:40:26 +1000 [thread overview]
Message-ID: <20180710224026.GV2234@dastard> (raw)
In-Reply-To: <3895eebe-3fa2-a977-4021-3bd6ef645fdd@sandeen.net>
On Tue, Jul 10, 2018 at 04:39:26PM -0500, Eric Sandeen wrote:
> On 7/10/18 8:43 AM, Filippo Giunchedi wrote:
> > Hello,
> > a little background: at Wikimedia Foundation we are running a 30-hosts
> > Openstack Swift cluster to host user media uploads, each host has 12
> > spinning disks formatted individually with xfs.
> >
> > Some of the recently-formatted filesystems have started reporting
> > negative usage upon hitting around 70% usage, though some filesystems
> > on the same host kept reporting as expected:
> >
> > /dev/sdn1 3.7T -14T 17T - /srv/swift-storage/sdn1
> > /dev/sdh1 3.7T -13T 17T - /srv/swift-storage/sdh1
> > /dev/sdc1 3.7T 3.0T 670G 83% /srv/swift-storage/sdc1
> > /dev/sdk1 3.7T 3.1T 643G 83% /srv/swift-storage/sdk1
> >
> > We have experienced this bug only on the last four machines to be put
> > in service and formatted with xfsprogs 4.9.0+nmu1 from Debian Stretch.
> > The remaining hosts were formatted in the past with xfsprogs 3.2.1 or
> > older.
> > We have also a standby cluster in another datacenter with similar
> > configuration and hosts that received write traffic only but not read
> > traffic; the standby cluster hasn't experienced the bug and all
> > filesystems report the correct usage.
> > As far as I can tell the difference in xfsprogs version used for
> > formatting means defaults have changed, (e.g. crc is enabled on the
> > affected filesystems). Have you seen this issue before and do you know
> > how to fix it?
> >
> > I would love to help debugging this issue, we've been detailing the
> > work done so far at https://phabricator.wikimedia.org/T199198
>
> What kernel are the problematic nodes running?
>
> From your repair output:
>
> root@ms-be1040:~# xfs_repair -n /dev/sde1
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> sb_fdblocks 4461713825, counted 166746529
> - found root inode chunk
>
> that sb_fdblocks really is ~17T which indicates the problem
> really is on disk.
>
> 4461713825
> 100001001111100000101100110100001
> 166746529
> 1001111100000101100110100001
>
> you have a bit flipped in the problematic value... but you're running
> with CRCs so it seems unlikely to have been some sort of bit-rot (that,
> and the fact that you're hitting the same problem on multiple nodes).
>
> Soooo not sure what to say right now other than "your bad value has an
> extra bit set for some reason."
Looks like the superblock verifier doesn't bounds check free block
or free/used inode counts. Perhaps we should be checking this in
the verifier so in-memory corruption like this never makes it to
disk?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-07-10 22:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-10 13:43 Recently-formatted XFS filesystems reporting negative used space Filippo Giunchedi
2018-07-10 21:39 ` Eric Sandeen
2018-07-10 22:40 ` Dave Chinner [this message]
2018-07-13 17:44 ` Bill O'Donnell
2018-07-11 8:31 ` Filippo Giunchedi
2018-07-16 9:29 ` Filippo Giunchedi
2018-07-17 9:26 ` Carlos Maiolino
2018-07-20 10:20 ` Filippo Giunchedi
2018-07-22 0:03 ` Eric Sandeen
2018-07-30 10:02 ` Filippo Giunchedi
2018-07-30 23:42 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180710224026.GV2234@dastard \
--to=david@fromorbit.com \
--cc=fgiunchedi@wikimedia.org \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).