Re: xfs filesystem reports negative usage - reoccurring problem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Tim Smith <tim.smith@vaultcloud.com.au>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs filesystem reports negative usage - reoccurring problem
Date: Mon, 13 May 2019 10:09:46 -0400	[thread overview]
Message-ID: <20190513140943.GC61135@bfoster> (raw)
In-Reply-To: <CAHgs-5XkA5xFgxgSaX9m70gduuO1beq6fiY7UEGv1ad6bd19Hw@mail.gmail.com>

On Mon, May 13, 2019 at 11:45:26AM +1000, Tim Smith wrote:
> Hey guys,
> 
> We've got a bunch of hosts with multiple spinning disks providing file
> server duties with xfs.
> 
> Some of the filesystems will go into a state where they report
> negative used space -  e.g. available is greater than total.
> 
> This appears to be purely cosmetic, as we can still write data to (and
> read from) the filesystem, but it throws out our reporting data.
> 
> We can (temporarily) fix the issue by unmounting and running
> `xfs_repair` on the filesystem, but it soon reoccurs.
> 
> Does anybody have any ideas as to why this might be happening and how
> to prevent it? Can userspace processes affect change to the xfs
> superblock?
> 

Hmm, I feel like there have been at least a few fixes for similar
symptoms over the past few releases. It might be hard to pinpoint one
unless somebody more familiar with this problem comes across this.

FWIW, something like commit aafe12cee0 ("xfs: don't trip over negative
free space in xfs_reserve_blocks") looks like it could cause this kind
of wonky accounting, but that's just a guess from skimming the patch
log. I have no idea if you'd be affected by this.

> Example of a 'good' filesystem on the host:
> 
> $ sudo df -k /dev/sdac
> Filesystem      1K-blocks       Used  Available Use% Mounted on
> /dev/sdac      9764349952 7926794452 1837555500  82% /srv/node/sdac
> 
> $ sudo strace df -k /dev/sdac |& grep statfs
> 
> statfs("/srv/node/sdac", {f_type=0x58465342, f_bsize=4096,
> f_blocks=2441087488, f_bfree=459388875, f_bavail=459388875,
> f_files=976643648, f_ffree=922112135, f_fsid={16832, 0},
> f_namelen=255, f_frsize=4096, f_flags=3104}) = 0
> 
> $ sudo xfs_db -r /dev/sdac
> [ snip ]
> icount = 54621696
> free = 90183
> fdblocks = 459388955
> 
> Example of a 'bad' filesystem on the host:
> 
> $ sudo df -k /dev/sdad
> Filesystem      1K-blocks        Used   Available Use% Mounted on
> /dev/sdad      9764349952 -9168705440 18933055392    - /srv/node/sdad
> 
> $ sudo strace df -k /dev/sdad |& grep statfs
> statfs("/srv/node/sdad", {f_type=0x58465342, f_bsize=4096,
> f_blocks=2441087488, f_bfree=4733263848, f_bavail=4733263848,
> f_files=976643648, f_ffree=922172221, f_fsid={16848, 0},
> f_namelen=255, f_frsize=4096, f_flags=3104}) = 0
> 

It looks like you end up somehow having a huge free block count, larger
even than the total block count. The 'used' value reported by userspace
ends up being f_blocks - f_bfree, hence the negative value.

> $ sudo xfs_db -r /dev/sdad
> [ snip ]
> icount = 54657600
> ifree = 186173
> fdblocks = 4733263928
> 
> Host environment:
> $ uname -a
> Linux hostname 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15
> 16:06:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
> 

Could you also include xfs_info and mount params of the filesystem(s) in
question?

Also, is this negative blocks used state persistent for any of these
filesystems? IOW, if you unmount/mount, are you right back into this
state, or does accounting start off sane and fall into this bogus state
after a period of runtime or due to some unknown operation?

If the former, the next best step might be to try a filesystem on a more
recent kernel and determine whether this problem is already fixed one
way or another. Note that this could be easily done on a
development/test system with an xfs_metadump image of the fs if you
didn't want to muck around with production systems.

Brian

> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 16.04.5 LTS
> Release: 16.04
> Codename: xenial
> 
> Thank you!
> Tim

next prev parent reply	other threads:[~2019-05-13 14:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-13  1:45 xfs filesystem reports negative usage - reoccurring problem Tim Smith
2019-05-13 14:09 ` Brian Foster [this message]
2019-05-13 15:06   ` Eric Sandeen
2019-05-20 23:39     ` Tim Smith
2019-05-21  1:43       ` Dave Chinner
2019-05-21  2:10         ` Tim Smith
2019-05-20 23:36   ` Tim Smith
2019-05-13 21:19 ` Dave Chinner
2019-05-20 23:41   ` Tim Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190513140943.GC61135@bfoster \
    --to=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tim.smith@vaultcloud.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.