Re: [PATCH 5/6] iomap: Lift blocksize restriction on atomic writes

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
	Theodore Ts'o <tytso@mit.edu>,
	John Garry <john.g.garry@oracle.com>,
	linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	Ojaswin Mujoo <ojaswin@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 5/6] iomap: Lift blocksize restriction on atomic writes
Date: Mon, 4 Nov 2024 16:09:56 -0800	[thread overview]
Message-ID: <20241105000956.GJ21832@frogsfrogsfrogs> (raw)
In-Reply-To: <Zygo6nqOJMoJxYrm@dread.disaster.area>

On Mon, Nov 04, 2024 at 12:52:42PM +1100, Dave Chinner wrote:
> On Thu, Oct 31, 2024 at 02:36:40PM -0700, Darrick J. Wong wrote:
> > On Sat, Oct 26, 2024 at 10:05:44AM +0530, Ritesh Harjani wrote:
> > > > This gets me to the third and much less general solution -- only allow
> > > > untorn writes if we know that the ioend only ever has to run a single
> > > > transaction.  That's why untorn writes are limited to a single fsblock
> > > > for now -- it's a simple solution so that we can get our downstream
> > > > customers to kick the tires and start on the next iteration instead of
> > > > spending years on waterfalling.
> > > >
> > > > Did you notice that in all of these cases, the capabilities of the
> > > > filesystem's ioend processing determines the restrictions on the number
> > > > and type of mappings that ->iomap_begin can give to iomap?
> > > >
> > > > Now that we have a second system trying to hook up to the iomap support,
> > > > it's clear to me that the restrictions on mappings are specific to each
> > > > filesystem.  Therefore, the iomap directio code should not impose
> > > > restrictions on the mappings it receives unless they would prevent the
> > > > creation of the single aligned bio.
> > > >
> > > > Instead, xfs_direct_write_iomap_begin and ext4_iomap_begin should return
> > > > EINVAL or something if they look at the file mappings and discover that
> > > > they cannot perform the ioend without risking torn mapping updates.  In
> > > > the long run, ->iomap_begin is where this iomap->len <= iter->len check
> > > > really belongs, but hold that thought.
> > > >
> > > > For the multi fsblock case, the ->iomap_begin functions would have to
> > > > check that only one metadata update would be necessary in the ioend.
> > > > That's where things get murky, since ext4/xfs drop their mapping locks
> > > > between calls to ->iomap_begin.  So you'd have to check all the mappings
> > > > for unsupported mixed state every time.  Yuck.
> > > >
> > > 
> > > Thanks Darrick for taking time summarizing what all has been done
> > > and your thoughts here.
> > > 
> > > > It might be less gross to retain the restriction that iomap accepts only
> > > > one mapping for the entire file range, like Ritesh has here.
> > > 
> > > less gross :) sure. 
> > > 
> > > I would like to think of this as, being less restrictive (compared to
> > > only allowing a single fsblock) by adding a constraint on the atomic
> > > write I/O request i.e.  
> > > 
> > > "Atomic write I/O request to a region in a file is only allowed if that
> > > region has no partially allocated extents. Otherwise, the file system
> > > can fail the I/O operation by returning -EINVAL."
> > > 
> > > Essentially by adding this constraint to the I/O request, we are
> > > helping the user to prevent atomic writes from accidentally getting
> > > torned and also allowing multi-fsblock writes. So I still think that
> > > might be the right thing to do here or at least a better start. FS can
> > > later work on adding such support where we don't even need above
> > > such constraint on a given atomic write I/O request.
> > 
> > On today's ext4 call, Ted and Ritesh and I realized that there's a bit
> > more to it than this -- it's not possible to support untorn writes to a
> > mix of written/(cow,unwritten) mappings even if they all point to the
> > same physical space.  If the system fails after the storage device
> > commits the write but before any of the ioend processing is scheduled, a
> > subsequent read of the previously written blocks will produce the new
> > data, but reads to the other areas will produce the old contents (or
> > zeroes, or whatever).  That's a torn write.
> 
> I'm *really* surprised that people are only realising that IO
> completion processing for atomic writes *must be atomic*.

I've been saying for a while; it's just that I didn't realize until last
week that there were more rules than "can't do an untorn write if you
need to do more than 1 mapping update":

Untorn writes are not possible if:
1. More than 1 mapping update is needed
2. 1 mapping update is needed but there's a written block

(1) can be worked around with a log intent item for ioend processing,
but I don't think (2) can at all.  That's why I went back to saying that
untorn writes require that there can only be one mapping.

> This was the foundational constraint that the forced alignment
> proposal for XFS was intended to address. i.e. to prevent fs
> operations from violating atomic write IO constraints (e.g. punching
> sub-atomic write size holes in the file) so that the physical IO can
> be done without tearing and the IO completion processing that
> exposes the new data can be done atomically.
> 
> > Therefore, iomap ought to stick to requiring that ->iomap_begin returns
> > a single iomap to cover the entire file range for the untorn write.  For
> > an unwritten extent, the post-recovery read will see either zeroes or
> > the new contents; for a single-mapping COW it'll see old or new contents
> > but not both.
> 
> I'm pretty sure we enforced that in the XFS mapping implemention for
> atomic writes using forced alignment. i.e.  we have to return a
> correctly aligned, contiguous mapping to iomap or we have to return
> -EINVAL to indicate atomic write mapping failed.

I think you guys did, it's just the ext4 bigalloc thing that started
this back up again. :/

> Yes, we can check this in iomap, but it's really the filesystem that
> has to implement and enforce it...

I think we ought to keep the "1 mapping per untorn write" check in iomap
until someone decides that it's worth the trouble to make a filesystem
handle mixed states correctly.  Mostly as a guard against the
implementations.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

next prev parent reply	other threads:[~2024-11-05  0:09 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25  3:45 [PATCH 0/6] ext4: Add atomic write support for DIO Ritesh Harjani (IBM)
2024-10-25  3:45 ` [PATCH 1/6] ext4: Add statx support for atomic writes Ritesh Harjani (IBM)
2024-10-25  9:41   ` John Garry
2024-10-25 10:08     ` Ritesh Harjani
2024-10-25 16:09       ` Darrick J. Wong
2024-10-25 17:45         ` Ritesh Harjani
2024-10-25  3:45 ` [PATCH 2/6] ext4: Check for atomic writes support in write iter Ritesh Harjani (IBM)
2024-10-25  9:44   ` John Garry
2024-10-25 10:33     ` Ritesh Harjani
2024-10-25 16:11       ` Darrick J. Wong
2024-10-25 17:50         ` Ritesh Harjani
2024-10-25  3:45 ` [PATCH 3/6] ext4: Support setting FMODE_CAN_ATOMIC_WRITE Ritesh Harjani (IBM)
2024-10-25  3:45 ` [PATCH 4/6] ext4: Warn if we ever fallback to buffered-io for DIO atomic writes Ritesh Harjani (IBM)
2024-10-25 16:16   ` Darrick J. Wong
2024-10-25 17:51     ` Ritesh Harjani
2024-10-27 22:26   ` Dave Chinner
2024-10-28  1:09     ` Ritesh Harjani
2024-10-28  5:26       ` Dave Chinner
2024-10-28  8:43         ` Ritesh Harjani
2024-10-28 18:14         ` Ritesh Harjani
2024-10-29 22:29           ` Dave Chinner
2024-10-29 23:51             ` Ritesh Harjani
2024-10-25  3:45 ` [PATCH 5/6] iomap: Lift blocksize restriction on " Ritesh Harjani (IBM)
2024-10-25  8:52   ` John Garry
2024-10-25  9:31     ` Ritesh Harjani
2024-10-25  9:59       ` John Garry
2024-10-25 10:35         ` Ritesh Harjani
2024-10-25 11:07           ` John Garry
2024-10-25 11:19             ` Ritesh Harjani
2024-10-25 12:23               ` John Garry
2024-10-25 12:36                 ` Ritesh Harjani
2024-10-25 14:04                   ` John Garry
2024-10-25 14:13                     ` Ritesh Harjani
2024-10-25 18:28                       ` Darrick J. Wong
2024-10-26  4:35                         ` Ritesh Harjani
2024-10-31 21:36                           ` Darrick J. Wong
2024-11-04  1:52                             ` Dave Chinner
2024-11-05  0:09                               ` Darrick J. Wong [this message]
2024-10-25  3:45 ` [PATCH 6/6] ext4: Add atomic write support for bigalloc Ritesh Harjani (IBM)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241105000956.GJ21832@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox