public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: brauner@kernel.org, cem@kernel.org, dchinner@redhat.com,
	hch@lst.de, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	ojaswin@linux.ibm.com, ritesh.list@gmail.com,
	martin.petersen@oracle.com
Subject: Re: [PATCH RFC 06/10] xfs: iomap CoW-based atomic write support
Date: Thu, 6 Feb 2025 13:44:17 -0800	[thread overview]
Message-ID: <20250206214417.GW21808@frogsfrogsfrogs> (raw)
In-Reply-To: <58f630a4-3e02-451c-bd6e-22427cec5c11@oracle.com>

On Thu, Feb 06, 2025 at 11:10:40AM +0000, John Garry wrote:
> On 05/02/2025 20:05, Darrick J. Wong wrote:
> > On Tue, Feb 04, 2025 at 12:01:23PM +0000, John Garry wrote:
> > > In cases of an atomic write occurs for misaligned or discontiguous disk
> > > blocks, we will use a CoW-based method to issue the atomic write.
> > > 
> > > So, for that case, return -EAGAIN to request that the write be issued in
> > > CoW atomic write mode. The dio write path should detect this, similar to
> > > how misaligned regalar DIO writes are handled.
> > > 
> > > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > > ---
> > >   fs/xfs/xfs_iomap.c | 68 ++++++++++++++++++++++++++++++++++++++++++++--
> > >   1 file changed, 66 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index ae3755ed00e6..2c2867d728e4 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -809,9 +809,12 @@ xfs_direct_write_iomap_begin(
> > >   	struct xfs_bmbt_irec	imap, cmap;
> > >   	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > >   	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
> > > +	bool			atomic = flags & IOMAP_ATOMIC;
> > >   	int			nimaps = 1, error = 0;
> > >   	bool			shared = false;
> > > +	bool			found = false;
> > >   	u16			iomap_flags = 0;
> > > +	bool			need_alloc;
> > >   	unsigned int		lockmode;
> > >   	u64			seq;
> > > @@ -832,7 +835,7 @@ xfs_direct_write_iomap_begin(
> > >   	 * COW writes may allocate delalloc space or convert unwritten COW
> > >   	 * extents, so we need to make sure to take the lock exclusively here.
> > >   	 */
> > > -	if (xfs_is_cow_inode(ip))
> > > +	if (xfs_is_cow_inode(ip) || atomic)
> > >   		lockmode = XFS_ILOCK_EXCL;
> > >   	else
> > >   		lockmode = XFS_ILOCK_SHARED;
> > > @@ -857,12 +860,73 @@ xfs_direct_write_iomap_begin(
> > >   	if (error)
> > >   		goto out_unlock;
> > > +
> > > +	if (flags & IOMAP_ATOMIC_COW) {
> > > +		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> > > +				&lockmode,
> > > +				(flags & IOMAP_DIRECT) || IS_DAX(inode), true);
> > 
> > Weird nit not relate to this patch: Is there ever a case where
> > IS_DAX(inode) and (flags & IOMAP_DAX) disagree?  I wonder if this odd
> > construction could be simplified to:
> > 
> > 	(flags & (IOMAP_DIRECT | IOMAP_DAX))
> 
> I'm not sure. I assume that we always want to convert for DAX, and IOMAP_DAX
> may not be set always for DIO path - but I only see xfs_file_write_iter() ->
> xfs_file_dax_write() ->dax_iomap_rw(xfs_dax_write_iomap_ops), which sets
> IOMAP_DAX in iomap_iter.flags
> 
> > 
> > ?
> > 
> > > +		if (error)
> > > +			goto out_unlock;
> > > +
> > > +		end_fsb = imap.br_startoff + imap.br_blockcount;
> > > +		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
> > > +
> > > +		if (imap.br_startblock != HOLESTARTBLOCK) {
> > > +			seq = xfs_iomap_inode_sequence(ip, 0);
> > > +
> > > +			error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags,
> > > +				iomap_flags | IOMAP_F_ATOMIC_COW, seq);
> > > +			if (error)
> > > +				goto out_unlock;
> > > +		}
> > > +		seq = xfs_iomap_inode_sequence(ip, 0);
> > > +		xfs_iunlock(ip, lockmode);
> > > +		return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
> > > +					iomap_flags | IOMAP_F_ATOMIC_COW, seq);
> > > +	}
> > 
> > /me wonders if this should be a separate helper so that the main
> > xfs_direct_write_iomap_begin doesn't get even longer... but otherwise
> > the logic in here looks sane.
> 
> I can do that. Maybe some code can be factored out for regular "found cow
> path".
> 
> > 
> > > +
> > > +	need_alloc = imap_needs_alloc(inode, flags, &imap, nimaps);
> > > +
> > > +	if (atomic) {
> > > +		/* Use CoW-based method if any of the following fail */
> > > +		error = -EAGAIN;
> > > +
> > > +		/*
> > > +		 * Lazily use CoW-based method for initial alloc of data.
> > > +		 * Check br_blockcount for FSes which do not support atomic
> > > +		 * writes > 1x block.
> > > +		 */
> > > +		if (need_alloc && imap.br_blockcount > 1)
> > > +			goto out_unlock;
> > > +
> > > +		/* Misaligned start block wrt size */
> > > +		if (!IS_ALIGNED(imap.br_startblock, imap.br_blockcount))
> > > +			goto out_unlock;
> > > +
> > > +		/* Discontiguous or mixed extents */
> > > +		if (!imap_spans_range(&imap, offset_fsb, end_fsb))
> > > +			goto out_unlock;
> > > +	}
> > 
> > (Same two comments here.)
> 
> ok
> 
> > 
> > > +
> > >   	if (imap_needs_cow(ip, flags, &imap, nimaps)) {
> > >   		error = -EAGAIN;
> > >   		if (flags & IOMAP_NOWAIT)
> > >   			goto out_unlock;
> > > +		if (atomic) {
> > > +			/* Detect whether we're already covered in a cow fork */
> > > +			error  = xfs_find_trim_cow_extent(ip, &imap, &cmap, &shared, &found);
> > > +			if (error)
> > > +				goto out_unlock;
> > > +
> > > +			if (shared) {
> > > +				error = -EAGAIN;
> > > +				goto out_unlock;
> > 
> > What is this checking?  That something else already created a mapping in
> > the COW fork, so we want to bail out to get rid of it?
> 
> I want to check if some data is shared. In that case, we should unshare.

Why is it necessary to unshare?  Userspace gave us a buffer of new
contents, and we're already prepared to write that out of place and
remap it.

> And I am not sure if that check is sufficient.
> 
> On the buffered write path, we may have something in a CoW fork - in that
> case it should be flushed, right?

Flushed against what?  Concurrent writeback or something?  The directio
setup should have flushed dirty pagecache, so the only things left in
the COW fork are speculative preallocations.  (IOWs, I don't understand
what needs to be flushed or why.)

--D

> 
> Thanks,
> John
> 

  reply	other threads:[~2025-02-06 21:44 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-04 12:01 [PATCH RFC 00/10] large atomic writes for xfs with CoW John Garry
2025-02-04 12:01 ` [PATCH RFC 01/10] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
2025-02-04 12:01 ` [PATCH RFC 02/10] xfs: Refactor xfs_reflink_end_cow_extent() John Garry
2025-02-05 19:50   ` Darrick J. Wong
2025-02-06 10:35     ` John Garry
2025-02-06 21:38       ` Darrick J. Wong
2025-02-04 12:01 ` [PATCH RFC 03/10] iomap: Support CoW-based atomic writes John Garry
2025-02-05 20:11   ` Darrick J. Wong
2025-02-06 11:21     ` John Garry
2025-02-06 21:40       ` Darrick J. Wong
2025-02-04 12:01 ` [PATCH RFC 04/10] xfs: Make xfs_find_trim_cow_extent() public John Garry
2025-02-04 12:01 ` [PATCH RFC 05/10] xfs: Reflink CoW-based atomic write support John Garry
2025-02-04 12:01 ` [PATCH RFC 06/10] xfs: iomap " John Garry
2025-02-05 20:05   ` Darrick J. Wong
2025-02-06 11:10     ` John Garry
2025-02-06 21:44       ` Darrick J. Wong [this message]
2025-02-07 11:48         ` John Garry
2025-02-04 12:01 ` [PATCH RFC 07/10] xfs: Add xfs_file_dio_write_atomic() John Garry
2025-02-05 19:55   ` Darrick J. Wong
2025-02-06 10:43     ` John Garry
2025-02-10 16:59   ` John Garry
2025-02-04 12:01 ` [PATCH RFC 08/10] xfs: Commit CoW-based atomic writes atomically John Garry
2025-02-05 19:47   ` Darrick J. Wong
2025-02-06 10:27     ` John Garry
2025-02-06 21:50       ` Darrick J. Wong
2025-02-07 11:52         ` John Garry
2025-02-04 12:01 ` [PATCH RFC 09/10] xfs: Update atomic write max size John Garry
2025-02-05 19:41   ` Darrick J. Wong
2025-02-06  9:15     ` John Garry
2025-02-06 21:54       ` Darrick J. Wong
2025-02-07 11:53         ` John Garry
2025-02-04 12:01 ` [PATCH RFC 10/10] xfs: Allow block allocator to take an alignment hint John Garry
2025-02-05 19:20   ` Darrick J. Wong
2025-02-06  8:10     ` John Garry
2025-02-06 21:54       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250206214417.GW21808@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox