From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: axboe@kernel.dk, tytso@mit.edu, dchinner@redhat.com,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.com,
chandan.babu@oracle.com, hch@lst.de, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net,
linux-fsdevel@vger.kernel.org, gfs2@lists.linux.dev,
linux-xfs@vger.kernel.org, catherine.hoang@oracle.com,
ritesh.list@gmail.com, mcgrof@kernel.org,
mikulas@artax.karlin.mff.cuni.cz, agruenba@redhat.com,
miklos@szeredi.hu, martin.petersen@oracle.com
Subject: Re: [PATCH v4 03/22] xfs: Use extent size granularity for iomap->io_block_size
Date: Wed, 12 Jun 2024 14:47:29 -0700 [thread overview]
Message-ID: <20240612214729.GL2764752@frogsfrogsfrogs> (raw)
In-Reply-To: <20240607143919.2622319-4-john.g.garry@oracle.com>
On Fri, Jun 07, 2024 at 02:39:00PM +0000, John Garry wrote:
> Currently iomap->io_block_size is set to the i_blocksize() value for the
> inode.
>
> Expand the sub-fs block size zeroing to now cover RT extents, by calling
> setting iomap->io_block_size as xfs_inode_alloc_unitsize().
>
> In xfs_iomap_write_unwritten(), update the unwritten range fsb to cover
> this extent granularity.
>
> In xfs_file_dio_write(), handle a write which is not aligned to extent
> size granularity as unaligned. Since the extent size granularity need not
> be a power-of-2, handle this also.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> fs/xfs/xfs_file.c | 24 +++++++++++++++++++-----
> fs/xfs/xfs_inode.c | 17 +++++++++++------
> fs/xfs/xfs_inode.h | 1 +
> fs/xfs/xfs_iomap.c | 8 +++++++-
> 4 files changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index b240ea5241dc..24fe3c2e03da 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -601,7 +601,7 @@ xfs_file_dio_write_aligned(
> }
>
> /*
> - * Handle block unaligned direct I/O writes
> + * Handle unaligned direct IO writes.
> *
> * In most cases direct I/O writes will be done holding IOLOCK_SHARED, allowing
> * them to be done in parallel with reads and other direct I/O writes. However,
> @@ -630,9 +630,9 @@ xfs_file_dio_write_unaligned(
> ssize_t ret;
>
> /*
> - * Extending writes need exclusivity because of the sub-block zeroing
> - * that the DIO code always does for partial tail blocks beyond EOF, so
> - * don't even bother trying the fast path in this case.
> + * Extending writes need exclusivity because of the sub-block/extent
> + * zeroing that the DIO code always does for partial tail blocks
> + * beyond EOF, so don't even bother trying the fast path in this case.
Hummm. So let's say the fsblock size is 4k, the rt extent size is 16k,
and you want to write bytes 8192-12287 of a file. Currently we'd use
xfs_file_dio_write_aligned for that, but now we'd use
xfs_file_dio_write_unaligned? Even though we don't need zeroing or any
of that stuff?
> */
> if (iocb->ki_pos > isize || iocb->ki_pos + count >= isize) {
> if (iocb->ki_flags & IOCB_NOWAIT)
> @@ -698,11 +698,25 @@ xfs_file_dio_write(
> struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp));
> struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> size_t count = iov_iter_count(from);
> + bool unaligned;
> + u64 unitsize;
>
> /* direct I/O must be aligned to device logical sector size */
> if ((iocb->ki_pos | count) & target->bt_logical_sectormask)
> return -EINVAL;
> - if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask)
> +
> + unitsize = xfs_inode_alloc_unitsize(ip);
> + if (!is_power_of_2(unitsize)) {
> + if (isaligned_64(iocb->ki_pos, unitsize) &&
> + isaligned_64(count, unitsize))
> + unaligned = false;
> + else
> + unaligned = true;
> + } else {
> + unaligned = (iocb->ki_pos | count) & (unitsize - 1);
> + }
Didn't I already write this?
> + if (unaligned)
if (!xfs_is_falloc_aligned(ip, iocb->ki_pos, count))
> return xfs_file_dio_write_unaligned(ip, iocb, from);
> return xfs_file_dio_write_aligned(ip, iocb, from);
> }
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 58fb7a5062e1..93ad442f399b 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -4264,15 +4264,20 @@ xfs_break_layouts(
> return error;
> }
>
> -/* Returns the size of fundamental allocation unit for a file, in bytes. */
Don't delete the comment, it has useful return type information.
/*
* Returns the size of fundamental allocation unit for a file, in
* fsblocks.
*/
> unsigned int
> -xfs_inode_alloc_unitsize(
> +xfs_inode_alloc_unitsize_fsb(
> struct xfs_inode *ip)
> {
> - unsigned int blocks = 1;
> -
> if (XFS_IS_REALTIME_INODE(ip))
> - blocks = ip->i_mount->m_sb.sb_rextsize;
> + return ip->i_mount->m_sb.sb_rextsize;
> +
> + return 1;
> +}
>
> - return XFS_FSB_TO_B(ip->i_mount, blocks);
> +/* Returns the size of fundamental allocation unit for a file, in bytes. */
> +unsigned int
> +xfs_inode_alloc_unitsize(
> + struct xfs_inode *ip)
> +{
> + return XFS_FSB_TO_B(ip->i_mount, xfs_inode_alloc_unitsize_fsb(ip));
> }
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 292b90b5f2ac..90d2fa837117 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -643,6 +643,7 @@ int xfs_inode_reload_unlinked(struct xfs_inode *ip);
> bool xfs_ifork_zapped(const struct xfs_inode *ip, int whichfork);
> void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
> xfs_filblks_t *dblocks, xfs_filblks_t *rblocks);
> +unsigned int xfs_inode_alloc_unitsize_fsb(struct xfs_inode *ip);
> unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
>
> struct xfs_dir_update_params {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index ecb4cae88248..fbe69f747e30 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -127,7 +127,7 @@ xfs_bmbt_to_iomap(
> }
> iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
> iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
> - iomap->io_block_size = i_blocksize(VFS_I(ip));
> + iomap->io_block_size = xfs_inode_alloc_unitsize(ip);
Oh, I see. So io_block_size causes iomap to write zeroes to the storage
backing surrounding areas of the file range. In this case, for direct
writes to the unwritten middle 4k of an otherwise written 16k extent,
we'll write zeroes to 0-4k and 8k-16k even though that wasn't what the
caller asked for?
IOWs, if you start with:
WWuW
write to the "U", then it'll write zeroes to the "W" areas? That
doesn't sound good...
> if (mapping_flags & IOMAP_DAX)
> iomap->dax_dev = target->bt_daxdev;
> else
> @@ -577,11 +577,17 @@ xfs_iomap_write_unwritten(
> xfs_fsize_t i_size;
> uint resblks;
> int error;
> + unsigned int rounding;
>
> trace_xfs_unwritten_convert(ip, offset, count);
>
> offset_fsb = XFS_B_TO_FSBT(mp, offset);
> count_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
> + rounding = xfs_inode_alloc_unitsize_fsb(ip);
> + if (rounding > 1) {
> + offset_fsb = rounddown_64(offset_fsb, rounding);
> + count_fsb = roundup_64(count_fsb, rounding);
> + }
...and then the ioend handler is supposed to be smart enough to know
that iomap quietly wrote to other parts of the disk.
Um, does this cause unwritten extent conversion for entire rtextents
after writeback to a rtextsize > 1fsb file?
Or am I really misunderstanding what's going on here with the io paths?
--D
> count_fsb = (xfs_filblks_t)(count_fsb - offset_fsb);
>
> /*
> --
> 2.31.1
>
>
next prev parent reply other threads:[~2024-06-12 21:47 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-07 14:38 [PATCH v4 00/22] block atomic writes for xfs John Garry
2024-06-07 14:38 ` [PATCH v4 01/22] fs: Add generic_atomic_write_valid_size() John Garry
2024-06-12 21:10 ` Darrick J. Wong
2024-06-13 7:35 ` John Garry
2024-06-20 21:24 ` Darrick J. Wong
2024-06-07 14:38 ` [PATCH v4 02/22] iomap: Allow filesystems set IO block zeroing size John Garry
2024-06-12 21:32 ` Darrick J. Wong
2024-06-13 10:31 ` John Garry
2024-06-21 21:18 ` Darrick J. Wong
2024-06-24 13:58 ` John Garry
2024-06-07 14:39 ` [PATCH v4 03/22] xfs: Use extent size granularity for iomap->io_block_size John Garry
2024-06-12 21:47 ` Darrick J. Wong [this message]
2024-06-13 11:13 ` John Garry
2024-06-07 14:39 ` [PATCH v4 04/22] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-06-07 14:39 ` [PATCH v4 05/22] xfs: always tail align maxlen allocations John Garry
2024-06-07 14:39 ` [PATCH v4 06/22] xfs: simplify extent allocation alignment John Garry
2024-06-07 14:39 ` [PATCH v4 07/22] xfs: make EOF allocation simpler John Garry
2024-06-07 14:39 ` [PATCH v4 08/22] xfs: introduce forced allocation alignment John Garry
2024-06-07 14:39 ` [PATCH v4 09/22] xfs: align args->minlen for " John Garry
2024-06-07 14:39 ` [PATCH v4 10/22] xfs: Introduce FORCEALIGN inode flag John Garry
2024-06-07 14:39 ` [PATCH v4 11/22] xfs: Do not free EOF blocks for forcealign John Garry
2024-06-07 14:39 ` [PATCH v4 12/22] xfs: Update xfs_inode_alloc_unitsize_fsb() " John Garry
2024-06-07 14:39 ` [PATCH v4 13/22] xfs: Unmap blocks according to forcealign John Garry
2024-06-11 10:08 ` John Garry
2024-06-07 14:39 ` [PATCH v4 14/22] xfs: Only free full extents for forcealign John Garry
2024-06-07 14:39 ` [PATCH v4 15/22] xfs: Don't revert allocated offset " John Garry
2024-06-07 14:39 ` [PATCH v4 16/22] xfs: Enable file data forcealign feature John Garry
2024-06-07 14:39 ` [PATCH v4 17/22] fs: Add FS_XFLAG_ATOMICWRITES flag John Garry
2024-06-07 14:39 ` [PATCH v4 18/22] iomap: Atomic write support John Garry
2024-06-07 14:39 ` [PATCH v4 19/22] xfs: Support FS_XFLAG_ATOMICWRITES for forcealign John Garry
2024-06-07 14:39 ` [PATCH v4 20/22] xfs: Support atomic write for statx John Garry
2024-06-07 14:39 ` [PATCH v4 21/22] xfs: Validate atomic writes John Garry
2024-06-07 14:39 ` [PATCH v4 22/22] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240612214729.GL2764752@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=agruenba@redhat.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=catherine.hoang@oracle.com \
--cc=chandan.babu@oracle.com \
--cc=dchinner@redhat.com \
--cc=gfs2@lists.linux.dev \
--cc=hch@lst.de \
--cc=jack@suse.com \
--cc=john.g.garry@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mcgrof@kernel.org \
--cc=miklos@szeredi.hu \
--cc=mikulas@artax.karlin.mff.cuni.cz \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).