linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill O'Donnell <billodo@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@sandeen.net>,
	Filippo Giunchedi <fgiunchedi@wikimedia.org>,
	linux-xfs@vger.kernel.org
Subject: Re: Recently-formatted XFS filesystems reporting negative used space
Date: Fri, 13 Jul 2018 12:44:33 -0500	[thread overview]
Message-ID: <20180713174433.GA22284@redhat.com> (raw)
In-Reply-To: <20180710224026.GV2234@dastard>

On Wed, Jul 11, 2018 at 08:40:26AM +1000, Dave Chinner wrote:
> On Tue, Jul 10, 2018 at 04:39:26PM -0500, Eric Sandeen wrote:
> > On 7/10/18 8:43 AM, Filippo Giunchedi wrote:
> > > Hello,
> > > a little background: at Wikimedia Foundation we are running a 30-hosts
> > > Openstack Swift cluster to host user media uploads, each host has 12
> > > spinning disks formatted individually with xfs.
> > > 
> > > Some of the recently-formatted filesystems have started reporting
> > > negative usage upon hitting around 70% usage, though some filesystems
> > > on the same host kept reporting as expected:
> > > 
> > > /dev/sdn1       3.7T  -14T   17T    - /srv/swift-storage/sdn1
> > > /dev/sdh1       3.7T  -13T   17T    - /srv/swift-storage/sdh1
> > > /dev/sdc1       3.7T  3.0T  670G  83% /srv/swift-storage/sdc1
> > > /dev/sdk1       3.7T  3.1T  643G  83% /srv/swift-storage/sdk1
> > > 
> > > We have experienced this bug only on the last four machines to be put
> > > in service and formatted with xfsprogs 4.9.0+nmu1 from Debian Stretch.
> > > The remaining hosts were formatted in the past with xfsprogs 3.2.1 or
> > > older.
> > > We have also a standby cluster in another datacenter with similar
> > > configuration and hosts that received write traffic only but not read
> > > traffic; the standby cluster hasn't experienced the bug and all
> > > filesystems report the correct usage.
> > > As far as I can tell the difference in xfsprogs version used for
> > > formatting means defaults have changed, (e.g. crc is enabled on the
> > > affected filesystems). Have you seen this issue before and do you know
> > > how to fix it?
> > > 
> > > I would love to help debugging this issue, we've been detailing the
> > > work done so far at https://phabricator.wikimedia.org/T199198
> > 
> > What kernel are the problematic nodes running?
> > 
> > From your repair output:
> > 
> > root@ms-be1040:~# xfs_repair -n /dev/sde1
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> >         - zero log...
> >         - scan filesystem freespace and inode maps...
> > sb_fdblocks 4461713825, counted 166746529
> >         - found root inode chunk
> > 
> > that sb_fdblocks really is ~17T which indicates the problem
> > really is on disk.
> > 
> > 4461713825
> > 100001001111100000101100110100001
> > 166746529
> >      1001111100000101100110100001
> > 
> > you have a bit flipped in the problematic value... but you're running
> > with CRCs so it seems unlikely to have been some sort of bit-rot (that,
> > and the fact that you're hitting the same problem on multiple nodes).
> > 
> > Soooo not sure what to say right now other than "your bad value has an
> > extra bit set for some reason."
> 
> Looks like the superblock verifier doesn't bounds check free block
> or free/used inode counts.  Perhaps we should be checking this in
> the verifier so in-memory corruption like this never makes it to
> disk?

A proposed patch and discussion thread is on the list:
https://www.spinics.net/lists/linux-xfs/msg20645.html

Thanks-
Bill


> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-07-13 18:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-10 13:43 Recently-formatted XFS filesystems reporting negative used space Filippo Giunchedi
2018-07-10 21:39 ` Eric Sandeen
2018-07-10 22:40   ` Dave Chinner
2018-07-13 17:44     ` Bill O'Donnell [this message]
2018-07-11  8:31   ` Filippo Giunchedi
2018-07-16  9:29     ` Filippo Giunchedi
2018-07-17  9:26       ` Carlos Maiolino
2018-07-20 10:20         ` Filippo Giunchedi
2018-07-22  0:03           ` Eric Sandeen
2018-07-30 10:02             ` Filippo Giunchedi
2018-07-30 23:42               ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180713174433.GA22284@redhat.com \
    --to=billodo@redhat.com \
    --cc=david@fromorbit.com \
    --cc=fgiunchedi@wikimedia.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).