From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/3]: Extreme fragmentation ahoy!
Date: Wed, 6 Feb 2019 21:21:14 -0800 [thread overview]
Message-ID: <20190207052114.GA7991@magnolia> (raw)
In-Reply-To: <20190207050813.24271-1-david@fromorbit.com>
On Thu, Feb 07, 2019 at 04:08:10PM +1100, Dave Chinner wrote:
> Hi folks,
>
> I've just finished analysing an IO trace from a application
> generating an extreme filesystem fragmentation problem that started
> with extent size hints and ended with spurious ENOSPC reports due to
> massively fragmented files and free space. While the ENOSPC issue
> looks to have previously been solved, I still wanted to understand
> how the application had so comprehensively defeated extent size
> hints as a method of avoiding file fragmentation.
>
> The key behaviour that I discovered was that specific "append write
> only" files that had extent size hints to prevent fragmentation
> weren't actually write only. The application didn't do a lot of
> writes to the file, but it kept the file open and appended to the
> file (from the traces I have) in chunks of between ~3000 bytes and
> ~160000 bytes. This didn't explain the problem. I did notice that
> the files were opened O_SYNC, however.
>
> I then found was another process that, once every second, opened the
> log file O_RDONLY, read 28 bytes from offset zero, then closed the
> file. Every second. IOWs, between every appending write that would
> allocate an extent size hint worth of space beyond EOF and then
> write a small chunk of it, there were numerous open/read/close
> cycles being done on the same file.
>
> And what do we do on close()? We call xfs_release() and that can
> truncate away blocks beyond EOF. For some reason the close wasn't
> triggering the IDIRTY_RELEASE heuristic that preventd close from
> removing EOF blocks prematurely. Then I realised that O_SYNC writes
> don't leave delayed allocation blocks behind - they are always
> converted in the context of the write. That's why it wasn't
> triggering, and that meant that the open/read/close cycle was
> removing the extent size hint allocation beyond EOF prematurely.
> beyond EOF prematurely.
<urk>
> Then it occurred to me that extent size hints don't use delalloc
> either, so they behave the same was as O_SYNC writes in this
> situation.
>
> Oh, and we remove EOF blocks on O_RDONLY file close, too. i.e. we
> modify the file without having write permissions.
Yikes!
> I suspect there's more cases like this when combined with repeated
> open/<do_something>/close operations on a file that is being
> written, but the patches address just these ones I just talked
> about. The test script to reproduce them is below. Fragmentation
> reduction results are in the commit descriptions. It's running
> through fstests for a couple of hours now, no issues have been
> noticed yet.
>
> FWIW, I suspect we need to have a good hard think about whether we
> should be trimming EOF blocks on close by default, or whether we
> should only be doing it in very limited situations....
>
> Comments, thoughts, flames welcome.
>
> -Dave.
>
>
> #!/bin/bash
> #
> # Test 1
Can you please turn these into fstests to cause the maintainer maximal
immediate pain^W^W^Wmake everyone pay attention^W^W^W^Westablish a basis
for regression testing and finding whatever other problems we can find
from digging deeper? :)
--D
> #
> # Write multiple files in parallel using synchronous buffered writes. Aim is to
> # interleave allocations to fragment the files. Synchronous writes defeat the
> # open/write/close heuristics in xfs_release() that prevent EOF block removal,
> # so this should fragment badly.
>
> workdir=/mnt/scratch
> nfiles=8
> wsize=4096
> wcnt=1000
>
> echo
> echo "Test 1: sync write fragmentation counts"
> echo
> write_sync_file()
> {
> idx=$1
>
> for ((cnt=0; cnt<$wcnt; cnt++)); do
> xfs_io -f -s -c "pwrite $((cnt * wsize)) $wsize" $workdir/file.$idx
> done
> }
>
> rm -f $workdir/file*
> for ((n=0; n<$nfiles; n++)); do
> write_sync_file $n > /dev/null 2>&1 &
> done
> wait
>
> sync
>
> for ((n=0; n<$nfiles; n++)); do
> echo -n "$workdir/file.$n: "
> xfs_bmap -vp $workdir/file.$n | wc -l
> done;
>
>
> # Test 2
> #
> # Same as test 1, but instead of sync writes, use extent size hints to defeat
> # the open/write/close heuristic
>
> extent_size=16m
>
> echo
> echo "Test 2: Extent size hint fragmentation counts"
> echo
>
> write_extsz_file()
> {
> idx=$1
>
> xfs_io -f -c "extsize $extent_size" $workdir/file.$idx
> for ((cnt=0; cnt<$wcnt; cnt++)); do
> xfs_io -f -c "pwrite $((cnt * wsize)) $wsize" $workdir/file.$idx
> done
> }
>
> rm -f $workdir/file*
> for ((n=0; n<$nfiles; n++)); do
> write_extsz_file $n > /dev/null 2>&1 &
> done
> wait
>
> sync
>
> for ((n=0; n<$nfiles; n++)); do
> echo -n "$workdir/file.$n: "
> xfs_bmap -vp $workdir/file.$n | wc -l
> done;
>
>
>
> # Test 3
> #
> # Same as test 2, but instead of extent size hints, use open/read/close loops
> # on the files to remove EOF blocks.
>
> echo
> echo "Test 3: Open/read/close loop fragmentation counts"
> echo
>
> write_file()
> {
> idx=$1
>
> xfs_io -f -s -c "pwrite -b 64k 0 50m" $workdir/file.$idx
> }
>
> read_file()
> {
> idx=$1
>
> for ((cnt=0; cnt<$wcnt; cnt++)); do
> xfs_io -f -r -c "pread 0 28" $workdir/file.$idx
> done
> }
>
> rm -f $workdir/file*
> for ((n=0; n<$((nfiles * 4)); n++)); do
> write_file $n > /dev/null 2>&1 &
> read_file $n > /dev/null 2>&1 &
> done
> wait
>
> sync
>
> for ((n=0; n<$nfiles; n++)); do
> echo -n "$workdir/file.$n: "
> xfs_bmap -vp $workdir/file.$n | wc -l
> done;
>
>
next prev parent reply other threads:[~2019-02-07 5:21 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-07 5:08 [RFC PATCH 0/3]: Extreme fragmentation ahoy! Dave Chinner
2019-02-07 5:08 ` [PATCH 1/3] xfs: Don't free EOF blocks on sync write close Dave Chinner
2019-02-07 5:08 ` [PATCH 2/3] xfs: Don't free EOF blocks on close when extent size hints are set Dave Chinner
2019-02-07 15:51 ` Brian Foster
2019-02-07 5:08 ` [PATCH 3/3] xfs: Don't free EOF blocks on sync write close Dave Chinner
2019-02-07 5:19 ` Dave Chinner
2019-02-07 5:21 ` Darrick J. Wong [this message]
2019-02-07 5:39 ` [RFC PATCH 0/3]: Extreme fragmentation ahoy! Dave Chinner
2019-02-07 15:52 ` Brian Foster
2019-02-08 2:47 ` Dave Chinner
2019-02-08 12:34 ` Brian Foster
2019-02-12 1:13 ` Darrick J. Wong
2019-02-12 11:46 ` Brian Foster
2019-02-12 20:21 ` Dave Chinner
2019-02-13 13:50 ` Brian Foster
2019-02-13 22:27 ` Dave Chinner
2019-02-14 13:00 ` Brian Foster
2019-02-14 21:51 ` Dave Chinner
2019-02-15 2:35 ` Brian Foster
2019-02-15 7:23 ` Dave Chinner
2019-02-15 20:33 ` Brian Foster
2019-02-08 16:29 ` Darrick J. Wong
2019-02-18 2:26 ` [PATCH 4/3] xfs: EOF blocks are not busy extents Dave Chinner
2019-02-20 15:12 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190207052114.GA7991@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).