From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/3]: Extreme fragmentation ahoy!
Date: Thu, 7 Feb 2019 16:39:41 +1100 [thread overview]
Message-ID: <20190207053941.GL14116@dastard> (raw)
In-Reply-To: <20190207052114.GA7991@magnolia>
On Wed, Feb 06, 2019 at 09:21:14PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 07, 2019 at 04:08:10PM +1100, Dave Chinner wrote:
> > Hi folks,
> >
> > I've just finished analysing an IO trace from a application
> > generating an extreme filesystem fragmentation problem that started
> > with extent size hints and ended with spurious ENOSPC reports due to
> > massively fragmented files and free space. While the ENOSPC issue
> > looks to have previously been solved, I still wanted to understand
> > how the application had so comprehensively defeated extent size
> > hints as a method of avoiding file fragmentation.
> >
> > The key behaviour that I discovered was that specific "append write
> > only" files that had extent size hints to prevent fragmentation
> > weren't actually write only. The application didn't do a lot of
> > writes to the file, but it kept the file open and appended to the
> > file (from the traces I have) in chunks of between ~3000 bytes and
> > ~160000 bytes. This didn't explain the problem. I did notice that
> > the files were opened O_SYNC, however.
> >
> > I then found was another process that, once every second, opened the
> > log file O_RDONLY, read 28 bytes from offset zero, then closed the
> > file. Every second. IOWs, between every appending write that would
> > allocate an extent size hint worth of space beyond EOF and then
> > write a small chunk of it, there were numerous open/read/close
> > cycles being done on the same file.
> >
> > And what do we do on close()? We call xfs_release() and that can
> > truncate away blocks beyond EOF. For some reason the close wasn't
> > triggering the IDIRTY_RELEASE heuristic that preventd close from
> > removing EOF blocks prematurely. Then I realised that O_SYNC writes
> > don't leave delayed allocation blocks behind - they are always
> > converted in the context of the write. That's why it wasn't
> > triggering, and that meant that the open/read/close cycle was
> > removing the extent size hint allocation beyond EOF prematurely.
> > beyond EOF prematurely.
>
> <urk>
>
> > Then it occurred to me that extent size hints don't use delalloc
> > either, so they behave the same was as O_SYNC writes in this
> > situation.
> >
> > Oh, and we remove EOF blocks on O_RDONLY file close, too. i.e. we
> > modify the file without having write permissions.
>
> Yikes!
>
> > I suspect there's more cases like this when combined with repeated
> > open/<do_something>/close operations on a file that is being
> > written, but the patches address just these ones I just talked
> > about. The test script to reproduce them is below. Fragmentation
> > reduction results are in the commit descriptions. It's running
> > through fstests for a couple of hours now, no issues have been
> > noticed yet.
> >
> > FWIW, I suspect we need to have a good hard think about whether we
> > should be trimming EOF blocks on close by default, or whether we
> > should only be doing it in very limited situations....
> >
> > Comments, thoughts, flames welcome.
> >
> > -Dave.
> >
> >
> > #!/bin/bash
> > #
> > # Test 1
>
> Can you please turn these into fstests to cause the maintainer maximal
> immediate pain^W^W^Wmake everyone pay attention^W^W^W^Westablish a basis
> for regression testing and finding whatever other problems we can find
> from digging deeper? :)
I will, but not today - I only understood the cause well enough to
write a prototype reproducer about 4 hours ago. The rest of the time
since then has been fixing the issues and running smoke tests. My
brain is about fried now....
FWIW, I think the scope of the problem is quite widespread -
anything that does open/something/close repeatedly on a file that is
being written to with O_DSYNC or O_DIRECT appending writes will kill
the post-eof extent size hint allocated space. That's why I suspect
we need to think about not trimming by default and trying to
enumerating only the cases that need to trim eof blocks.
e.g. I closed the O_RDONLY case, but O_RDWR/read/close in a loop
will still trigger removal of post EOF extent size hint
preallocation and hence severe fragmentation.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-02-07 5:39 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-07 5:08 [RFC PATCH 0/3]: Extreme fragmentation ahoy! Dave Chinner
2019-02-07 5:08 ` [PATCH 1/3] xfs: Don't free EOF blocks on sync write close Dave Chinner
2019-02-07 5:08 ` [PATCH 2/3] xfs: Don't free EOF blocks on close when extent size hints are set Dave Chinner
2019-02-07 15:51 ` Brian Foster
2019-02-07 5:08 ` [PATCH 3/3] xfs: Don't free EOF blocks on sync write close Dave Chinner
2019-02-07 5:19 ` Dave Chinner
2019-02-07 5:21 ` [RFC PATCH 0/3]: Extreme fragmentation ahoy! Darrick J. Wong
2019-02-07 5:39 ` Dave Chinner [this message]
2019-02-07 15:52 ` Brian Foster
2019-02-08 2:47 ` Dave Chinner
2019-02-08 12:34 ` Brian Foster
2019-02-12 1:13 ` Darrick J. Wong
2019-02-12 11:46 ` Brian Foster
2019-02-12 20:21 ` Dave Chinner
2019-02-13 13:50 ` Brian Foster
2019-02-13 22:27 ` Dave Chinner
2019-02-14 13:00 ` Brian Foster
2019-02-14 21:51 ` Dave Chinner
2019-02-15 2:35 ` Brian Foster
2019-02-15 7:23 ` Dave Chinner
2019-02-15 20:33 ` Brian Foster
2019-02-08 16:29 ` Darrick J. Wong
2019-02-18 2:26 ` [PATCH 4/3] xfs: EOF blocks are not busy extents Dave Chinner
2019-02-20 15:12 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190207053941.GL14116@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).