From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
chandanbabu@kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/5] xfs: don't treat append-only files as having preallocations
Date: Mon, 17 Jun 2024 15:03:28 +1000 [thread overview]
Message-ID: <Zm/DoN5npLCd+Y/n@dread.disaster.area> (raw)
In-Reply-To: <20240613082855.GA22403@lst.de>
On Thu, Jun 13, 2024 at 10:28:55AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 13, 2024 at 04:03:53PM +1000, Dave Chinner wrote:
> > I disagree, there was a very good reason for this behaviour:
> > preventing append-only log files from getting excessively fragmented
> > because speculative prealloc would get removed on close().
>
> Where is that very clear intent documented? Not in the original
> commit message (which is very sparse) and no where in any documentation
> I can find.
We've lost all the internal SGI bug databases, so there's little to
know evidence I can point at. But at the time, it was a well known
problem amongst Irix XFS engineers that append-only log files would
regularly get horribly fragmented.
There'd been several escalations over that behaviour over the years
w.r.t. large remote servers (think of facilities that "don't trust
the logs on client machines because they might be compromised"). In
general, the fixes for these applications tended to require the
loggin server application to use F_RESVSP to do the append-only log
file initialisation. That got XFS_DIFLAG_PREALLOC set on the files,
so then anything allocated by appending writes beyond EOF was left
alone. That small change was largely sufficient to mitigate worst
case log file fragmentation on Irix-XFS.
So when adding a flag on disk for Linux-XFS to say "this is an
append only file" it made lots of sense to make it behave like
XFS_DIFLAG_PREALLOC had already been set on the inode without
requring the application to do anything to set that up.
I'll note that the patches sent to the list by Ethan Benson to
originally implement XFS_DIFLAG_APPEND (and others) is not exactly
what was committed in this commit:
https://marc.info/?l=linux-xfs&m=106360278223548&w=2
The last version posted on the list was this:
https://marc.info/?l=linux-xfs&m=106109662212214&w=2
but the version committed had lots of things renamed, sysctls for
sync and nodump inheritance and other bits and pieces including
the EOF freeing changes to skip if DIFLAG_APPEND was set.
It is clear that there was internal SGI discussion, modification and
review of the original proposed patch set, and none of that internal
discussion is on open mailing lists. We might have the historical
XFS code and Linux mailing list archives, but that doesn't always
tell us what institutional knowledge was behind subtle changes to
publicly proposed patches like this....
> > i.e. applications that slowly log messages to append only files
> > with the pattern open(O_APPEND); write(a single line to the log);
> > close(); caused worst case file fragmentation because the close()
> > always removed the speculative prealloc beyond EOF.
>
> That case should be covered by the XFS_IDIRTY_RELEASE, at least
> except for O_SYNC workloads.
Ah, so I fixed the problem independently 7 or 8 years later to fix
Linux NFS server performance issues. Ok, that makes removing the
flag less bad, but I still don't see the harm in keeping it there
given that behaviour has existed for the past 20 years....
> > The fix for this pessimisitic XFS behaviour is for the application
> > to use chattr +A (like they would for ext3/4) hence triggering the
> > existence of XFS_DIFLAG_APPEND and that avoided the removal
> > speculative delalloc removed when the file is closed. hence the
> > fragmentation problems went away.
>
> For ext4 the EXT4_APPEND_FL flag does not cause any difference
> in allocation behavior.
Sure, but ext4 doesn't have speculative preallocation beyond EOF to
prevent fragmentation, either.
> For the historic ext2 driver it apparently
> did just, with an XXX comment marking this as a bug, but for ext3 it
> also never did looking back quite a bit in history.
Ditto - when the filesystem isn't allocating anything beyond EOF,
there's little point in trying to removing blocks beyond EOF that
can't exist on final close()...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2024-06-17 5:03 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-12 17:46 [PATCHSET] xfs: random fixes for 6.10 Darrick J. Wong
2024-06-12 17:46 ` [PATCH 1/5] xfs: don't treat append-only files as having preallocations Darrick J. Wong
2024-06-13 6:03 ` Dave Chinner
2024-06-13 8:28 ` Christoph Hellwig
2024-06-17 5:03 ` Dave Chinner [this message]
2024-06-17 6:46 ` Christoph Hellwig
2024-06-17 23:28 ` Dave Chinner
2024-06-12 17:47 ` [PATCH 2/5] xfs: fix freeing speculative preallocations for preallocated files Darrick J. Wong
2024-06-12 17:47 ` [PATCH 3/5] xfs: restrict when we try to align cow fork delalloc to cowextsz hints Darrick J. Wong
2024-06-13 5:06 ` Christoph Hellwig
2024-06-14 4:13 ` Darrick J. Wong
2024-06-14 4:41 ` Christoph Hellwig
2024-06-14 5:27 ` Darrick J. Wong
2024-06-14 5:30 ` Christoph Hellwig
2024-06-12 17:47 ` [PATCH 4/5] xfs: allow unlinked symlinks and dirs with zero size Darrick J. Wong
2024-06-13 4:57 ` Christoph Hellwig
2024-06-12 17:47 ` [PATCH 5/5] xfs: verify buffer, inode, and dquot items every tx commit Darrick J. Wong
2024-06-13 5:07 ` Christoph Hellwig
2024-06-13 7:04 ` Dave Chinner
2024-06-14 3:49 ` Darrick J. Wong
2024-06-14 4:42 ` Christoph Hellwig
2024-06-14 5:23 ` Darrick J. Wong
2024-06-18 0:18 ` [PATCH v1.1 " Darrick J. Wong
2024-06-18 6:38 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zm/DoN5npLCd+Y/n@dread.disaster.area \
--to=david@fromorbit.com \
--cc=chandanbabu@kernel.org \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox