From: Brian Foster <bfoster@redhat.com>
To: Tim Smith <tim.smith@vaultcloud.com.au>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs filesystem reports negative usage - reoccurring problem
Date: Mon, 13 May 2019 10:09:46 -0400 [thread overview]
Message-ID: <20190513140943.GC61135@bfoster> (raw)
In-Reply-To: <CAHgs-5XkA5xFgxgSaX9m70gduuO1beq6fiY7UEGv1ad6bd19Hw@mail.gmail.com>
On Mon, May 13, 2019 at 11:45:26AM +1000, Tim Smith wrote:
> Hey guys,
>
> We've got a bunch of hosts with multiple spinning disks providing file
> server duties with xfs.
>
> Some of the filesystems will go into a state where they report
> negative used space - e.g. available is greater than total.
>
> This appears to be purely cosmetic, as we can still write data to (and
> read from) the filesystem, but it throws out our reporting data.
>
> We can (temporarily) fix the issue by unmounting and running
> `xfs_repair` on the filesystem, but it soon reoccurs.
>
> Does anybody have any ideas as to why this might be happening and how
> to prevent it? Can userspace processes affect change to the xfs
> superblock?
>
Hmm, I feel like there have been at least a few fixes for similar
symptoms over the past few releases. It might be hard to pinpoint one
unless somebody more familiar with this problem comes across this.
FWIW, something like commit aafe12cee0 ("xfs: don't trip over negative
free space in xfs_reserve_blocks") looks like it could cause this kind
of wonky accounting, but that's just a guess from skimming the patch
log. I have no idea if you'd be affected by this.
> Example of a 'good' filesystem on the host:
>
> $ sudo df -k /dev/sdac
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/sdac 9764349952 7926794452 1837555500 82% /srv/node/sdac
>
> $ sudo strace df -k /dev/sdac |& grep statfs
>
> statfs("/srv/node/sdac", {f_type=0x58465342, f_bsize=4096,
> f_blocks=2441087488, f_bfree=459388875, f_bavail=459388875,
> f_files=976643648, f_ffree=922112135, f_fsid={16832, 0},
> f_namelen=255, f_frsize=4096, f_flags=3104}) = 0
>
> $ sudo xfs_db -r /dev/sdac
> [ snip ]
> icount = 54621696
> free = 90183
> fdblocks = 459388955
>
> Example of a 'bad' filesystem on the host:
>
> $ sudo df -k /dev/sdad
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/sdad 9764349952 -9168705440 18933055392 - /srv/node/sdad
>
> $ sudo strace df -k /dev/sdad |& grep statfs
> statfs("/srv/node/sdad", {f_type=0x58465342, f_bsize=4096,
> f_blocks=2441087488, f_bfree=4733263848, f_bavail=4733263848,
> f_files=976643648, f_ffree=922172221, f_fsid={16848, 0},
> f_namelen=255, f_frsize=4096, f_flags=3104}) = 0
>
It looks like you end up somehow having a huge free block count, larger
even than the total block count. The 'used' value reported by userspace
ends up being f_blocks - f_bfree, hence the negative value.
> $ sudo xfs_db -r /dev/sdad
> [ snip ]
> icount = 54657600
> ifree = 186173
> fdblocks = 4733263928
>
> Host environment:
> $ uname -a
> Linux hostname 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15
> 16:06:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>
Could you also include xfs_info and mount params of the filesystem(s) in
question?
Also, is this negative blocks used state persistent for any of these
filesystems? IOW, if you unmount/mount, are you right back into this
state, or does accounting start off sane and fall into this bogus state
after a period of runtime or due to some unknown operation?
If the former, the next best step might be to try a filesystem on a more
recent kernel and determine whether this problem is already fixed one
way or another. Note that this could be easily done on a
development/test system with an xfs_metadump image of the fs if you
didn't want to muck around with production systems.
Brian
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 16.04.5 LTS
> Release: 16.04
> Codename: xenial
>
> Thank you!
> Tim
next prev parent reply other threads:[~2019-05-13 14:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-13 1:45 xfs filesystem reports negative usage - reoccurring problem Tim Smith
2019-05-13 14:09 ` Brian Foster [this message]
2019-05-13 15:06 ` Eric Sandeen
2019-05-20 23:39 ` Tim Smith
2019-05-21 1:43 ` Dave Chinner
2019-05-21 2:10 ` Tim Smith
2019-05-20 23:36 ` Tim Smith
2019-05-13 21:19 ` Dave Chinner
2019-05-20 23:41 ` Tim Smith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190513140943.GC61135@bfoster \
--to=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=tim.smith@vaultcloud.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.