Re: [f2fs-dev] [PATCH v4 03/22] xfs: Use extent size granularity for iomap->io_block_size

linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed

From: John Garry via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: ritesh.list@gmail.com, gfs2@lists.linux.dev,
	mikulas@artax.karlin.mff.cuni.cz, hch@lst.de,
	agruenba@redhat.com, miklos@szeredi.hu,
	linux-ext4@vger.kernel.org, catherine.hoang@oracle.com,
	linux-block@vger.kernel.org, viro@zeniv.linux.org.uk,
	dchinner@redhat.com, axboe@kernel.dk, brauner@kernel.org,
	tytso@mit.edu, martin.petersen@oracle.com,
	linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, mcgrof@kernel.org, jack@suse.com,
	linux-fsdevel@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	linux-btrfs@vger.kernel.org, chandan.babu@oracle.com
Subject: Re: [f2fs-dev] [PATCH v4 03/22] xfs: Use extent size granularity for iomap->io_block_size
Date: Thu, 13 Jun 2024 12:13:45 +0100	[thread overview]
Message-ID: <a7caf7f2-837d-4cfd-afd0-123a99f6fee5@oracle.com> (raw)
In-Reply-To: <20240612214729.GL2764752@frogsfrogsfrogs>

On 12/06/2024 22:47, Darrick J. Wong wrote:
> On Fri, Jun 07, 2024 at 02:39:00PM +0000, John Garry wrote:
>> Currently iomap->io_block_size is set to the i_blocksize() value for the
>> inode.
>>
>> Expand the sub-fs block size zeroing to now cover RT extents, by calling
>> setting iomap->io_block_size as xfs_inode_alloc_unitsize().
>>
>> In xfs_iomap_write_unwritten(), update the unwritten range fsb to cover
>> this extent granularity.
>>
>> In xfs_file_dio_write(), handle a write which is not aligned to extent
>> size granularity as unaligned. Since the extent size granularity need not
>> be a power-of-2, handle this also.
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/xfs/xfs_file.c  | 24 +++++++++++++++++++-----
>>   fs/xfs/xfs_inode.c | 17 +++++++++++------
>>   fs/xfs/xfs_inode.h |  1 +
>>   fs/xfs/xfs_iomap.c |  8 +++++++-
>>   4 files changed, 38 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index b240ea5241dc..24fe3c2e03da 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -601,7 +601,7 @@ xfs_file_dio_write_aligned(
>>   }
>>   
>>   /*
>> - * Handle block unaligned direct I/O writes
>> + * Handle unaligned direct IO writes.
>>    *
>>    * In most cases direct I/O writes will be done holding IOLOCK_SHARED, allowing
>>    * them to be done in parallel with reads and other direct I/O writes.  However,
>> @@ -630,9 +630,9 @@ xfs_file_dio_write_unaligned(
>>   	ssize_t			ret;
>>   
>>   	/*
>> -	 * Extending writes need exclusivity because of the sub-block zeroing
>> -	 * that the DIO code always does for partial tail blocks beyond EOF, so
>> -	 * don't even bother trying the fast path in this case.
>> +	 * Extending writes need exclusivity because of the sub-block/extent
>> +	 * zeroing that the DIO code always does for partial tail blocks
>> +	 * beyond EOF, so don't even bother trying the fast path in this case.
> 
> Hummm.  So let's say the fsblock size is 4k, the rt extent size is 16k,
> and you want to write bytes 8192-12287 of a file.  Currently we'd use
> xfs_file_dio_write_aligned for that, but now we'd use
> xfs_file_dio_write_unaligned?  Even though we don't need zeroing or any
> of that stuff?

Right, this is something which I mentioned in response to the previous 
patch.

I doubt whether we should only do this for atomic writes inodes, or also 
RT and forcealign-only inodes.

I got the impression from Dave in review of the previous version of this 
series that it should include RT and forcealign-only.

> 
>>   	 */
>>   	if (iocb->ki_pos > isize || iocb->ki_pos + count >= isize) {
>>   		if (iocb->ki_flags & IOCB_NOWAIT)
>> @@ -698,11 +698,25 @@ xfs_file_dio_write(
>>   	struct xfs_inode	*ip = XFS_I(file_inode(iocb->ki_filp));
>>   	struct xfs_buftarg      *target = xfs_inode_buftarg(ip);
>>   	size_t			count = iov_iter_count(from);
>> +	bool			unaligned;
>> +	u64			unitsize;
>>   
>>   	/* direct I/O must be aligned to device logical sector size */
>>   	if ((iocb->ki_pos | count) & target->bt_logical_sectormask)
>>   		return -EINVAL;
>> -	if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask)
>> +
>> +	unitsize = xfs_inode_alloc_unitsize(ip);
>> +	if (!is_power_of_2(unitsize)) {
>> +		if (isaligned_64(iocb->ki_pos, unitsize) &&
>> +		    isaligned_64(count, unitsize))
>> +			unaligned = false;
>> +		else
>> +			unaligned = true;
>> +	} else {
>> +		unaligned = (iocb->ki_pos | count) & (unitsize - 1);
>> +	}
> 
> Didn't I already write this?

It's from xfs_is_falloc_aligned(). Let's reuse that fully here. I did 
look at doing that before, though...

> 
>> +	if (unaligned)
> 
> 	if (!xfs_is_falloc_aligned(ip, iocb->ki_pos, count))
> 
>>   		return xfs_file_dio_write_unaligned(ip, iocb, from);
>>   	return xfs_file_dio_write_aligned(ip, iocb, from);
>>   }
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 58fb7a5062e1..93ad442f399b 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -4264,15 +4264,20 @@ xfs_break_layouts(
>>   	return error;
>>   }
>>   
>> -/* Returns the size of fundamental allocation unit for a file, in bytes. */
> 
> Don't delete the comment, it has useful return type information.

It wasn't deleted, it is still below.

> 
> /*
>   * Returns the size of fundamental allocation unit for a file, in
>   * fsblocks.
>   */
> 
>>   unsigned int
>> -xfs_inode_alloc_unitsize(
>> +xfs_inode_alloc_unitsize_fsb(
>>   	struct xfs_inode	*ip)
>>   {
>> -	unsigned int		blocks = 1;
>> -
>>   	if (XFS_IS_REALTIME_INODE(ip))
>> -		blocks = ip->i_mount->m_sb.sb_rextsize;
>> +		return ip->i_mount->m_sb.sb_rextsize;
>> +
>> +	return 1;
>> +}
>>   
>> -	return XFS_FSB_TO_B(ip->i_mount, blocks);
>> +/* Returns the size of fundamental allocation unit for a file, in bytes. */
>> +unsigned int
>> +xfs_inode_alloc_unitsize(
>> +	struct xfs_inode	*ip)
>> +{
>> +	return XFS_FSB_TO_B(ip->i_mount, xfs_inode_alloc_unitsize_fsb(ip));
>>   }
>> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
>> index 292b90b5f2ac..90d2fa837117 100644
>> --- a/fs/xfs/xfs_inode.h
>> +++ b/fs/xfs/xfs_inode.h
>> @@ -643,6 +643,7 @@ int xfs_inode_reload_unlinked(struct xfs_inode *ip);
>>   bool xfs_ifork_zapped(const struct xfs_inode *ip, int whichfork);
>>   void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
>>   		xfs_filblks_t *dblocks, xfs_filblks_t *rblocks);
>> +unsigned int xfs_inode_alloc_unitsize_fsb(struct xfs_inode *ip);
>>   unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
>>   
>>   struct xfs_dir_update_params {
>> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>> index ecb4cae88248..fbe69f747e30 100644
>> --- a/fs/xfs/xfs_iomap.c
>> +++ b/fs/xfs/xfs_iomap.c
>> @@ -127,7 +127,7 @@ xfs_bmbt_to_iomap(
>>   	}
>>   	iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
>>   	iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
>> -	iomap->io_block_size = i_blocksize(VFS_I(ip));
>> +	iomap->io_block_size = xfs_inode_alloc_unitsize(ip);
> 
> Oh, I see.  So io_block_size causes iomap to write zeroes to the storage
> backing surrounding areas of the file range. 
Yes

> In this case, for direct
> writes to the unwritten middle 4k of an otherwise written 16k extent,
> we'll write zeroes to 0-4k and 8k-16k even though that wasn't what the
> caller asked for?

We would only do that for a newly allocated extent. We should not 
overwrite existing data.

> 
> IOWs, if you start with:
> 
> WWuW
> 
> write to the "U", then it'll write zeroes to the "W" areas?  That
> doesn't sound good...

No, that definitely should not happen.

We only would zero once when do a sub-extent granule write to an 
unallocated extent.

In iomap_dio_bio_iter(), we only zero for IOMAP_UNWRITTEN or IOMAP_F_NEW.

> 
>>   	if (mapping_flags & IOMAP_DAX)
>>   		iomap->dax_dev = target->bt_daxdev;
>>   	else
>> @@ -577,11 +577,17 @@ xfs_iomap_write_unwritten(
>>   	xfs_fsize_t	i_size;
>>   	uint		resblks;
>>   	int		error;
>> +	unsigned int	rounding;
>>   
>>   	trace_xfs_unwritten_convert(ip, offset, count);
>>   
>>   	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>   	count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>> +	rounding = xfs_inode_alloc_unitsize_fsb(ip);
>> +	if (rounding > 1) {
>> +		offset_fsb = rounddown_64(offset_fsb, rounding);
>> +		count_fsb = roundup_64(count_fsb, rounding);
>> +	}
> 
> ...and then the ioend handler is supposed to be smart enough to know
> that iomap quietly wrote to other parts of the disk.

iomap_io_complete() only knows about the non-zeroing written data. I am 
not changing that really.

> 
> Um, does this cause unwritten extent conversion for entire rtextents
> after writeback to a rtextsize > 1fsb file?

Yes.

> 
> Or am I really misunderstanding what's going on here with the io paths?

Thanks,
John


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

next prev parent reply	other threads:[~2024-06-13 11:14 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 14:38 [f2fs-dev] [PATCH v4 00/22] block atomic writes for xfs John Garry via Linux-f2fs-devel
2024-06-07 14:38 ` [f2fs-dev] [PATCH v4 01/22] fs: Add generic_atomic_write_valid_size() John Garry via Linux-f2fs-devel
2024-06-12 21:10   ` Darrick J. Wong
2024-06-13  7:35     ` John Garry via Linux-f2fs-devel
2024-06-20 21:24       ` Darrick J. Wong
2024-06-07 14:38 ` [f2fs-dev] [PATCH v4 02/22] iomap: Allow filesystems set IO block zeroing size John Garry via Linux-f2fs-devel
2024-06-12 21:32   ` Darrick J. Wong
2024-06-13 10:31     ` John Garry via Linux-f2fs-devel
2024-06-21 21:18       ` Darrick J. Wong
2024-06-24 13:58         ` John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 03/22] xfs: Use extent size granularity for iomap->io_block_size John Garry via Linux-f2fs-devel
2024-06-12 21:47   ` Darrick J. Wong
2024-06-13 11:13     ` John Garry via Linux-f2fs-devel [this message]
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 04/22] xfs: only allow minlen allocations when near ENOSPC John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 05/22] xfs: always tail align maxlen allocations John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 06/22] xfs: simplify extent allocation alignment John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 07/22] xfs: make EOF allocation simpler John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 08/22] xfs: introduce forced allocation alignment John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 09/22] xfs: align args->minlen for " John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 10/22] xfs: Introduce FORCEALIGN inode flag John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 11/22] xfs: Do not free EOF blocks for forcealign John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 12/22] xfs: Update xfs_inode_alloc_unitsize_fsb() " John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 13/22] xfs: Unmap blocks according to forcealign John Garry via Linux-f2fs-devel
2024-06-11 10:08   ` John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 14/22] xfs: Only free full extents for forcealign John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 15/22] xfs: Don't revert allocated offset " John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 16/22] xfs: Enable file data forcealign feature John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 17/22] fs: Add FS_XFLAG_ATOMICWRITES flag John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 18/22] iomap: Atomic write support John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 19/22] xfs: Support FS_XFLAG_ATOMICWRITES for forcealign John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 20/22] xfs: Support atomic write for statx John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 21/22] xfs: Validate atomic writes John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 22/22] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry via Linux-f2fs-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7caf7f2-837d-4cfd-afd0-123a99f6fee5@oracle.com \
    --to=linux-f2fs-devel@lists.sourceforge.net \
    --cc=agruenba@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=gfs2@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=jack@suse.com \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcgrof@kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).