public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: John Garry <john.g.garry@oracle.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
	chandan.babu@oracle.com, djwong@kernel.org, dchinner@redhat.com,
	hch@lst.de, viro@zeniv.linux.org.uk, brauner@kernel.org,
	jack@suse.cz, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	catherine.hoang@oracle.com, martin.petersen@oracle.com
Subject: Re: [PATCH v4 00/14] forcealign for xfs
Date: Mon, 16 Sep 2024 15:25:06 +1000	[thread overview]
Message-ID: <ZufBMioqpwjSFul+@dread.disaster.area> (raw)
In-Reply-To: <84b68068-e159-4e28-bf06-767ea7858d79@oracle.com>

On Mon, Sep 09, 2024 at 05:18:43PM +0100, John Garry wrote:
> > > > Patch 10 also modifies xfs_can_free_eofblocks() to take alignment
> > > > into account for the post-eof block removal, but doesn't change
> > > > xfs_free_eofblocks() at all. i.e  it also relies on
> > > > xfs_itruncate_extents_flags() to do the right thing for force
> > > > aligned inodes.
> > > 
> > > What state should the blocks post-EOF blocks be? A simple example of
> > > partially truncating an alloc unit is:
> > > 
> > > $xfs_io -c "extsize" mnt/file
> > > [16384] mnt/file
> > > 
> > > 
> > > $xfs_bmap -vvp mnt/file
> > > mnt/file:
> > >   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> > >     0: [0..20479]:      192..20671        0 (192..20671)     20480 000000
> > > 
> > > 
> > > $truncate -s 10461184 mnt/file # 10M - 6FSB
> > > 
> > > $xfs_bmap -vvp mnt/file
> > > mnt/file:
> > >   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> > >     0: [0..20431]:      192..20623        0 (192..20623)     20432 000000
> > >     1: [20432..20447]:  20624..20639      0 (20624..20639)      16 010000
> > >   FLAG Values:
> > >      0010000 Unwritten preallocated extent
> > > 
> > > Is that incorrect state?
> > 
> > Think about it: what happens if you now truncate it back up to 10MB
> > (i.e. aligned length) and then do an aligned atomic write on it.
> > 
> > First: What happens when you truncate up?
> > 
> > ......
> > 
> > Yes, iomap_zero_range() will see the unwritten extent and skip it.
> > i.e. The unwritten extent stays as an unwritten extent, it's now
> > within EOF. That written->unwritten extent boundary is not on an
> > aligned file offset.
> 
> Right
> 
> > 
> > Second: What happens when you do a correctly aligned atomic write
> > that spans this range now?
> > 
> > ......
> > 
> > Iomap only maps a single extent at a time, so it will only map the
> > written range from the start of the IO (aligned) to the start of the
> > unwritten extent (unaligned).  Hence the atomic write will be
> > rejected because we can't do the atomic write to such an unaligned
> > extent.
> 
> It was being considered to change this handling for atomic writes. More
> below at *.

I don't think that this is something specific to atomic writes -
forced alignment means -alignment is guaranteed- regardless of what
ends up using it.

Yes, we can track unwritten extents on an -unaligned- boundary, but
that doesn't mean that we should allow it when we are trying to
guarantee logical and physical alignment of the file offset and
extent boundaries. i.e. The definition of forced alignment behaviour
is that all file offsets and extents in the file are aligned to the
same alignment.

I don't see an exception that allows for unaligned unwritten
extents in that definition.


> > That's not a bug in the atomic write path - this failure occurs
> > because of the truncate behaviour doing post-eof unwritten extent
> > conversion....
> > 
> > Yes, I agree that the entire -physical- extent is still correctly
> > aligned on disk so you could argue that the unwritten conversion
> > that xfs_bunmapi_range is doing is valid forced alignment behaviour.
> > However, the fact is that breaking the aligned physical extent into
> > two unaligned contiguous extents in different states in the BMBT
> > means that they are treated as two seperate unaligned extents, not
> > one contiguous aligned physical extent.
> 
> Right, this is problematic.
> 
> * I guess that you had not been following the recent discussion on this
> topic in the latest xfs atomic writes series @ https://lore.kernel.org/linux-xfs/20240817094800.776408-1-john.g.garry@oracle.com/
> and also mentioned earlier in
> https://lore.kernel.org/linux-xfs/20240726171358.GA27612@lst.de/
> 
> There I dropped the sub-alloc unit zeroing. The concept to iter for a single
> bio seems sane, but as Darrick mentioned, we have issue of non-atomically
> committing all the extent conversions.

Yes, I understand these problems exist.  My entire point is that the
forced alignment implemention should never allow such unaligned
extent patterns to be created in the first place. If we avoid
creating such situations in the first place, then we never have to
care about about unaligned unwritten extent conversion breaking
atomic IO.

FWIW, I also understand things are different if we are doing 128kB
atomic writes on 16kB force aligned files. However, in this
situation we are treating the 128kB atomic IO as eight individual
16kB atomic IOs that are physically contiguous. Hence in this
situation it doesn't matter if we have a mix of 16kB aligned
written/unwritten/hole extents as each 16kB chunks is independent of
the others.

What matters is that each indivudal 16kB chunk shows either the old
data or the new data - we are not guaranteeing that the entire 128kB
write is atomic. Hence in this situation we can both submit and
process each 16kB shunk as independent IOs with independent IO
compeltion transactions. All that matters is that we don't signal
completion to userspace until all the IO is complete, and we already
do that for fragmented DIO writes...

> > Again, this is different to the traditional RT file behaviour - it
> > can use unwritten extents for sub-alloc-unit alignment unmaps
> > because the RT device can align file offset to any physical offset,
> > and issue unaligned sector sized IO without any restrictions. Forced
> > alignment does not have this freedom, and when we extend forced
> > alignment to RT files, it will not have the freedom to use
> > unwritten extents for sub-alloc-unit unmapping, either.
> > 
> So how do you think that we should actually implement
> xfs_itruncate_extents_flags() properly for forcealign? Would it simply be
> like:
> 
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1050,7 +1050,7 @@ xfs_itruncate_extents_flags(
>                 WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
>                 return 0;
>         }
> +	if (xfs_inode_has_forcealign(ip))
> +	       first_unmap_block = xfs_inode_roundup_alloc_unit(ip,
> first_unmap_block);
>         error = xfs_bunmapi_range(&tp, ip, flags, first_unmap_block,

Yes, it would be something like that, except it would have to be
done before first_unmap_block is verified.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2024-09-16  5:25 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 16:36 [PATCH v4 00/14] forcealign for xfs John Garry
2024-08-13 16:36 ` [PATCH v4 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-08-23 16:28   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 02/14] xfs: always tail align maxlen allocations John Garry
2024-08-23 16:31   ` Darrick J. Wong
2024-08-29 17:58     ` John Garry
2024-08-29 21:34       ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 03/14] xfs: simplify extent allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 04/14] xfs: make EOF allocation simpler John Garry
2024-09-04 18:25   ` Ritesh Harjani
2024-09-05  7:51     ` John Garry
2024-08-13 16:36 ` [PATCH v4 05/14] xfs: introduce forced allocation alignment John Garry
2024-08-13 16:36 ` [PATCH v4 06/14] xfs: align args->minlen for " John Garry
2024-08-13 16:36 ` [PATCH v4 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
2024-08-13 16:36 ` [PATCH v4 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 09/14] xfs: Update xfs_setattr_size() " John Garry
2024-08-13 16:36 ` [PATCH v4 10/14] xfs: Do not free EOF blocks " John Garry
2024-08-13 16:36 ` [PATCH v4 11/14] xfs: Only free full extents " John Garry
2024-08-13 16:36 ` [PATCH v4 12/14] xfs: Unmap blocks according to forcealign John Garry
2024-08-23 16:35   ` Darrick J. Wong
2024-08-13 16:36 ` [PATCH v4 13/14] xfs: Don't revert allocated offset for forcealign John Garry
2024-08-13 16:36 ` [PATCH v4 14/14] xfs: Enable file data forcealign feature John Garry
2024-09-04 18:14 ` [PATCH v4 00/14] forcealign for xfs Ritesh Harjani
2024-09-04 23:20   ` Dave Chinner
2024-09-05  3:56     ` Ritesh Harjani
2024-09-05  6:33       ` Dave Chinner
2024-09-10  2:51         ` Ritesh Harjani
2024-09-16  6:33           ` Dave Chinner
2024-09-10 12:33         ` Ritesh Harjani
2024-09-16  7:03           ` Dave Chinner
2024-09-16 10:24             ` John Garry
2024-09-17 20:54               ` Darrick J. Wong
2024-09-17 23:34                 ` Dave Chinner
2024-09-17 22:12               ` Dave Chinner
2024-09-18  7:59                 ` John Garry
2024-09-23  2:57                   ` Dave Chinner
2024-09-23  3:33                     ` Christoph Hellwig
2024-09-23  8:16                       ` John Garry
2024-09-23 12:07                         ` Christoph Hellwig
2024-09-23 12:33                           ` John Garry
2024-09-24  6:17                             ` Christoph Hellwig
2024-09-24  9:48                               ` John Garry
2024-11-29 11:36                                 ` John Garry
2024-09-23  8:00                     ` John Garry
2024-09-05 10:15     ` John Garry
2024-09-05 21:47       ` Dave Chinner
2024-09-06 14:31         ` John Garry
2024-09-08 22:49           ` Dave Chinner
2024-09-09 16:18             ` John Garry
2024-09-16  5:25               ` Dave Chinner [this message]
2024-09-16  9:44                 ` John Garry
2024-09-17 22:27                   ` Dave Chinner
2024-09-18 10:12                     ` John Garry
2024-11-14 12:48                       ` Long Li
2024-11-14 16:22                         ` John Garry
2024-11-14 20:07                         ` Dave Chinner
2024-11-15  8:14                           ` John Garry
2024-11-15 11:20                           ` Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZufBMioqpwjSFul+@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ritesh.list@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox