linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Long Li <leo.lilong@huawei.com>
To: John Garry <john.g.garry@oracle.com>, Dave Chinner <david@fromorbit.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>, <chandan.babu@oracle.com>,
	<djwong@kernel.org>, <dchinner@redhat.com>, <hch@lst.de>,
	<viro@zeniv.linux.org.uk>, <brauner@kernel.org>, <jack@suse.cz>,
	<linux-xfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <catherine.hoang@oracle.com>,
	<martin.petersen@oracle.com>
Subject: Re: [PATCH v4 00/14] forcealign for xfs
Date: Thu, 14 Nov 2024 20:48:21 +0800	[thread overview]
Message-ID: <ZzXxlf6RWeX3e-3x@localhost.localdomain> (raw)
In-Reply-To: <1394ceeb-ce8c-4d0f-aec8-ba93bf1afb90@oracle.com>

On Wed, Sep 18, 2024 at 11:12:47AM +0100, John Garry wrote:
> On 17/09/2024 23:27, Dave Chinner wrote:
> > > # xfs_bmap -vvp  mnt/file
> > > mnt/file:
> > > EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> > >    0: [0..15]:         384..399          0 (384..399)          16 010000
> > >    1: [16..31]:        400..415          0 (400..415)          16 000000
> > >    2: [32..127]:       416..511          0 (416..511)          96 010000
> > >    3: [128..255]:      256..383          0 (256..383)         128 000000
> > > FLAG Values:
> > >     0010000 Unwritten preallocated extent
> > > 
> > > Here we have unaligned extents wrt extsize.
> > > 
> > > The sub-alloc unit zeroing would solve that - is that what you would still
> > > advocate (to solve that issue)?
> > Yes, I thought that was already implemented for force-align with the
> > DIO code via the extsize zero-around changes in the iomap code. Why
> > isn't that zero-around code ensuring the correct extent layout here?
> 
> I just have not included the extsize zero-around changes here. They were
> just grouped with the atomic writes support, as they were added specifically
> for the atomic writes support. Indeed - to me at least - it is strange that
> the DIO code changes are required for XFS forcealign implementation. And,
> even if we use extsize zero-around changes for DIO path, what about buffered
> IO?


I've been reviewing and testing the XFS atomic write patch series. Since
there haven't been any new responses to the previous discussions on this
issue, I'd like to inquire about the buffered IO problem with force-aligned
files, which is a scenario we might encounter.

Consider a case where the file supports force-alignment with a 64K extent size,
and the system page size is 4K. Take the following commands as an example:

xfs_io  -c "pwrite 64k 64k" mnt/file
xfs_io  -c "pwrite 8k 8k" mnt/file

If unaligned unwritten extents are not permitted, we need to zero out the
sub-allocation units for ranges [0, 8K] and [16K, 64K] to prevent stale
data. While this can be handled relatively easily in direct I/O scenarios,
it presents significant challenges in buffered I/O operations. The main
difficulty arises because the extent size (64K) is larger than the page
size (4K), and our current code base has substantial limitations in handling
such cases.

Any thoughts on this?

Thanks,
Long Li

> 
> BTW, I still have concern with this extsize zero-around change which I was
> making:
> 
> xfs_iomap_write_unwritten()
> {
> 	unsigned int rounding;
> 
> 	/* when converting anything unwritten, we must be spanning an 	alloc unit,
> so round up/down */
> 	if (rounding > 1) {
> 		offset_fsb = rounddown(rounding);
> 		count_fsb = roundup(rounding);
> 	}
> 
> 	...
> 	do {
> 		xfs_bmapi_write();
> 		...
> 		xfs_trans_commit();
> 	} while ();
> }
> 
> As mentioned elsewhere, it's a bit of a bodge (to do this rounding).
> 
> > 
> > > > FWIW, I also understand things are different if we are doing 128kB
> > > > atomic writes on 16kB force aligned files. However, in this
> > > > situation we are treating the 128kB atomic IO as eight individual
> > > > 16kB atomic IOs that are physically contiguous.
> > > Yes, if 16kB force aligned, userspace can only issue 16KB atomic writes.
> > Right, but the eventual goal (given the statx parameters) is to be
> > able to do 8x16kB sequential atomic writes as a single 128kB IO, yes?
> 
> No, if atomic write unit max is 16KB, then userspace can only issue a single
> 16KB atomic write.
> 
> However, some things to consider:
> a. the block layer may merge those 16KB atomic writes
> b. userspace may also merge 16KB atomic writes and issue a larger atomic
> write (if atomic write unit max is > 16KB)
> 
> I had been wondering if there is any value in a lib for helping with b.
> 
> > 
> > > > > > Again, this is different to the traditional RT file behaviour - it
> > > > > > can use unwritten extents for sub-alloc-unit alignment unmaps
> > > > > > because the RT device can align file offset to any physical offset,
> > > > > > and issue unaligned sector sized IO without any restrictions. Forced
> > > > > > alignment does not have this freedom, and when we extend forced
> > > > > > alignment to RT files, it will not have the freedom to use
> > > > > > unwritten extents for sub-alloc-unit unmapping, either.
> > > > > > 
> > > > > So how do you think that we should actually implement
> > > > > xfs_itruncate_extents_flags() properly for forcealign? Would it simply be
> > > > > like:
> > > > > 
> > > > > --- a/fs/xfs/xfs_inode.c
> > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > @@ -1050,7 +1050,7 @@ xfs_itruncate_extents_flags(
> > > > >                   WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
> > > > >                   return 0;
> > > > >           }
> > > > > +	if (xfs_inode_has_forcealign(ip))
> > > > > +	       first_unmap_block = xfs_inode_roundup_alloc_unit(ip,
> > > > > first_unmap_block);
> > > > >           error = xfs_bunmapi_range(&tp, ip, flags, first_unmap_block,
> > > > Yes, it would be something like that, except it would have to be
> > > > done before first_unmap_block is verified.
> > > > 
> > > ok, and are you still of the opinion that this does not apply to rtvol?
> > The rtvol is*not* force-aligned. It -may- have some aligned
> > allocation requirements that are similar (i.e. sb_rextsize > 1 fsb)
> > but it does*not* force-align extents, written or unwritten.
> > 
> > The moment we add force-align support to RT files (as is the plan),
> > then the force-aligned inodes on the rtvol will need to behave as
> > force aligned inodes, not "rtvol" inodes.
> 
> ok, fine
> 
> Thanks,
> John
> 
> 
> 

  reply	other threads:[~2024-11-14 12:49 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 16:36 [PATCH v4 00/14] forcealign for xfs John Garry
2024-08-13 16:36 ` [PATCH v4 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-08-23 16:28   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 02/14] xfs: always tail align maxlen allocations John Garry
2024-08-23 16:31   ` Darrick J. Wong
2024-08-29 17:58     ` John Garry
2024-08-29 21:34       ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 03/14] xfs: simplify extent allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 04/14] xfs: make EOF allocation simpler John Garry
2024-09-04 18:25   ` Ritesh Harjani
2024-09-05  7:51     ` John Garry
2024-08-13 16:36 ` [PATCH v4 05/14] xfs: introduce forced allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 06/14] xfs: align args->minlen for " John Garry
2024-08-13 16:36 ` [PATCH v4 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
2024-08-13 16:36 ` [PATCH v4 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 09/14] xfs: Update xfs_setattr_size() " John Garry
2024-08-13 16:36 ` [PATCH v4 10/14] xfs: Do not free EOF blocks " John Garry
2024-08-13 16:36 ` [PATCH v4 11/14] xfs: Only free full extents " John Garry
2024-08-13 16:36 ` [PATCH v4 12/14] xfs: Unmap blocks according to forcealign John Garry
2024-08-23 16:35   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 13/14] xfs: Don't revert allocated offset for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 14/14] xfs: Enable file data forcealign feature John Garry
2024-09-04 18:14 ` [PATCH v4 00/14] forcealign for xfs Ritesh Harjani
2024-09-04 23:20   ` Dave Chinner
2024-09-05  3:56     ` Ritesh Harjani
2024-09-05  6:33       ` Dave Chinner
2024-09-10  2:51         ` Ritesh Harjani
2024-09-16  6:33           ` Dave Chinner
2024-09-10 12:33         ` Ritesh Harjani
2024-09-16  7:03           ` Dave Chinner
2024-09-16 10:24             ` John Garry
2024-09-17 20:54               ` Darrick J. Wong
2024-09-17 23:34                 ` Dave Chinner
2024-09-17 22:12               ` Dave Chinner
2024-09-18  7:59                 ` John Garry
2024-09-23  2:57                   ` Dave Chinner
2024-09-23  3:33                     ` Christoph Hellwig
2024-09-23  8:16                       ` John Garry
2024-09-23 12:07                         ` Christoph Hellwig
2024-09-23 12:33                           ` John Garry
2024-09-24  6:17                             ` Christoph Hellwig
2024-09-24  9:48                               ` John Garry
2024-11-29 11:36                                 ` John Garry
2024-09-23  8:00                     ` John Garry
2024-09-05 10:15     ` John Garry
2024-09-05 21:47       ` Dave Chinner
2024-09-06 14:31         ` John Garry
2024-09-08 22:49           ` Dave Chinner
2024-09-09 16:18             ` John Garry
2024-09-16  5:25               ` Dave Chinner
2024-09-16  9:44                 ` John Garry
2024-09-17 22:27                   ` Dave Chinner
2024-09-18 10:12                     ` John Garry
2024-11-14 12:48                       ` Long Li [this message]
2024-11-14 16:22                         ` John Garry
2024-11-14 20:07                         ` Dave Chinner
2024-11-15  8:14                           ` John Garry
2024-11-15 11:20                           ` Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzXxlf6RWeX3e-3x@localhost.localdomain \
    --to=leo.lilong@huawei.com \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ritesh.list@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).