public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: John Garry <john.g.garry@oracle.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	brauner@kernel.org, hch@lst.de, viro@zeniv.linux.org.uk,
	jack@suse.cz, cem@kernel.org, linux-fsdevel@vger.kernel.org,
	dchinner@redhat.com, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, ojaswin@linux.ibm.com,
	ritesh.list@gmail.com, martin.petersen@oracle.com,
	linux-ext4@vger.kernel.org, linux-block@vger.kernel.org,
	catherine.hoang@oracle.com
Subject: Re: [PATCH v6 11/12] xfs: add xfs_compute_atomic_write_unit_max()
Date: Thu, 10 Apr 2025 08:49:08 +1000	[thread overview]
Message-ID: <Z_b5ZK8H0pK0Saga@dread.disaster.area> (raw)
In-Reply-To: <ed53dc33-c811-4c20-8713-8d2d32cb81d7@oracle.com>

On Wed, Apr 09, 2025 at 09:15:23AM +0100, John Garry wrote:
> On 09/04/2025 06:30, Dave Chinner wrote:
> > > This is why I don't agree with adding a static 16MB limit -- we clearly
> > > don't need it to emulate current hardware, which can commit up to 64k
> > > atomically.  Future hardware can increase that by 64x and we'll still be
> > > ok with using the existing tr_write transaction type.
> > > 
> > > By contrast, adding a 16MB limit would result in a much larger minimum
> > > log size.  If we add that to struct xfs_trans_resv for all filesystems
> > > then we run the risk of some ancient filesystem with a 12M log failing
> > > suddenly failing to mount on a new kernel.
> > > 
> > > I don't see the point.
> > You've got stuck on ithe example size of 16MB I gave, not
> > the actual reason I gave that example.
> 
> You did provide a relatively large value in 16MB. When I say relative, I
> mean relative to what can be achieved with HW offload today.
> 
> The target user we see for this feature is DBs, and they want to do writes
> in the 16/32/64KB size range. Indeed, these are the sort of sizes we see
> supported in terms of disk atomic write support today.

The target user I see for RWF_ATOMIC write is applications
overwriting files safely (e.g. config files, documents, etc).

This requires an atomic write operation that is large enough to
overwrite the file entirely in one go.

i.e. we need to think about how RWF_ATOMIC is applicable to the
entire userspace ecosystem, not just a narrow database specific
niche. Databases really want atomic writes to avoid the need for
WAL, whereas application developers that keep asking us for safe
file overwrite without fsync() for arbitrary sized files and IO.

> Furthermore, they (DBs) want fast and predictable performance which HW
> offload provides. They do not want to use a slow software-based solution.
> Such a software-based solution will always be slower, as we need to deal
> with block alloc/de-alloc and extent remapping for every write.

"slow" is relative to the use case for atomic writes.

> So are there people who really want very large atomic write support and will
> tolerate slow performance, i.e. slower than what can be achieved with
> double-write buffer or some other application logging?

Large atomic write support solves the O_PONIES problem, which is
fundamentally a performance problem w.r.t. ensuring data integrity.
I'll quote myself when you asked this exact same question back about
4 months ago:

| "At this point we actually provide app developers with what they've
| been repeatedly asking kernel filesystem engineers to provide them
| for the past 20 years: a way of overwriting arbitrary file data
| safely without needing an expensive fdatasync operation on every
| file that gets modified.
| 
| Put simply: atomic writes have a huge potential to fundamentally
| change the way applications interact with Linux filesystems and to
| make it *much* simpler for applications to safely overwrite user
| data.  Hence there is an imperitive here to make the foundational
| support for this technology solid and robust because atomic writes
| are going to be with us for the next few decades..."

https://lwn.net/Articles/1001770/

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2025-04-09 22:49 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-08 10:41 [PATCH v6 00/12] large atomic writes for xfs John Garry
2025-04-08 10:41 ` [PATCH v6 01/12] fs: add atomic write unit max opt to statx John Garry
2025-04-09  2:23   ` Darrick J. Wong
2025-04-09 10:45   ` Christoph Hellwig
2025-04-08 10:41 ` [PATCH v6 02/12] xfs: add helpers to compute log item overhead John Garry
2025-04-08 22:50   ` Dave Chinner
2025-04-08 23:21     ` Darrick J. Wong
2025-04-09  2:25   ` [PATCH v6.1 " Darrick J. Wong
2025-04-09  2:25   ` [PATCH v6.1 RFC 02.1/12] xfs: add helpers to compute transaction reservation for finishing intent items Darrick J. Wong
2025-04-08 10:42 ` [PATCH v6 03/12] xfs: rename xfs_inode_can_atomicwrite() -> xfs_inode_can_hw_atomicwrite() John Garry
2025-04-09  2:02   ` Darrick J. Wong
2025-04-09 10:46   ` Christoph Hellwig
2025-04-08 10:42 ` [PATCH v6 04/12] xfs: allow block allocator to take an alignment hint John Garry
2025-04-08 10:42 ` [PATCH v6 05/12] xfs: refactor xfs_reflink_end_cow_extent() John Garry
2025-04-08 10:42 ` [PATCH v6 06/12] xfs: refine atomic write size check in xfs_file_write_iter() John Garry
2025-04-08 10:42 ` [PATCH v6 07/12] xfs: add xfs_atomic_write_cow_iomap_begin() John Garry
2025-04-08 10:42 ` [PATCH v6 08/12] xfs: add large atomic writes checks in xfs_direct_write_iomap_begin() John Garry
2025-04-08 10:42 ` [PATCH v6 09/12] xfs: commit CoW-based atomic writes atomically John Garry
2025-04-08 10:42 ` [PATCH v6 10/12] xfs: add xfs_file_dio_write_atomic() John Garry
2025-04-08 10:42 ` [PATCH v6 11/12] xfs: add xfs_compute_atomic_write_unit_max() John Garry
2025-04-08 21:28   ` Darrick J. Wong
2025-04-08 22:47   ` Dave Chinner
2025-04-09  0:41     ` Darrick J. Wong
2025-04-09  5:30       ` Dave Chinner
2025-04-09  8:15         ` John Garry
2025-04-09 22:49           ` Dave Chinner [this message]
2025-04-10  8:58             ` John Garry
2025-04-09 23:46         ` Darrick J. Wong
2025-04-08 10:42 ` [PATCH v6 12/12] xfs: update atomic write limits John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_b5ZK8H0pK0Saga@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=cem@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox