linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
From: John Garry via Linux-f2fs-devel <linux-f2fs-devel@lists.sourceforge.net>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: ritesh.list@gmail.com, gfs2@lists.linux.dev,
	mikulas@artax.karlin.mff.cuni.cz, hch@lst.de,
	agruenba@redhat.com, miklos@szeredi.hu,
	linux-ext4@vger.kernel.org, catherine.hoang@oracle.com,
	linux-block@vger.kernel.org, viro@zeniv.linux.org.uk,
	dchinner@redhat.com, axboe@kernel.dk, brauner@kernel.org,
	tytso@mit.edu, martin.petersen@oracle.com,
	linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-xfs@vger.kernel.org, mcgrof@kernel.org, jack@suse.com,
	linux-fsdevel@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	linux-btrfs@vger.kernel.org, chandan.babu@oracle.com
Subject: Re: [f2fs-dev] [PATCH v4 02/22] iomap: Allow filesystems set IO block zeroing size
Date: Mon, 24 Jun 2024 14:58:40 +0100	[thread overview]
Message-ID: <644afc72-71bb-4201-8829-ccf3211d68b7@oracle.com> (raw)
In-Reply-To: <20240621211855.GY3058325@frogsfrogsfrogs>

On 21/06/2024 22:18, Darrick J. Wong wrote:
> On Thu, Jun 13, 2024 at 11:31:35AM +0100, John Garry wrote:
>> On 12/06/2024 22:32, Darrick J. Wong wrote:
>>>> unsigned int fs_block_size = i_blocksize(inode), pad;
>>>> +	u64 io_block_size = iomap->io_block_size;
>>> I wonder, should iomap be nice and not require filesystems to set
>>> io_block_size themselves unless they really need it?
>>
>> That's what I had in v3, like:
>>
>> if (iomap->io_block_size)
>> 	io_block_size = iomap->io_block_size;
>> else
>> 	io_block_size = i_block_size(inode)
>>
>> but it was suggested to change that (to like what I have here).
> 
> oh, ok.  Ignore that comment, then. :)
> 

To be clear, Dave actually suggested that change.

>>> Anyone working on
>>> an iomap port while this patchset is in progress may or may not remember
>>> to add this bit if they get their port merged after atomicwrites is
>>> merged; and you might not remember to prevent the bitrot if the reverse
>>> order happens.
>>
>> Sure, I get your point.
>>
>> However, OTOH, if we check xfs_bmbt_to_iomap(), it does set all or close to
>> all members of struct iomap, so we are just continuing that trend, i.e. it
>> is the job of the FS callback to set all these members.
>>
>>>
>>> 	u64 io_block_size = iomap->io_block_size ?: i_blocksize(inode);
>>>
>>>>    	loff_t length = iomap_length(iter);
>>>>    	loff_t pos = iter->pos;
>>>>    	blk_opf_t bio_opf;
>>>> @@ -287,6 +287,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>>>>    	int nr_pages, ret = 0;
>>>>    	size_t copied = 0;
>>>>    	size_t orig_count;
>>>> +	unsigned int pad;
>>>>    	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
>>>>    	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
>>>> @@ -355,7 +356,14 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>>>>    	if (need_zeroout) {
>>>>    		/* zero out from the start of the block to the write offset */
>>>> -		pad = pos & (fs_block_size - 1);
>>>> +		if (is_power_of_2(io_block_size)) {
>>>> +			pad = pos & (io_block_size - 1);
>>>> +		} else {
>>>> +			loff_t _pos = pos;
>>>> +
>>>> +			pad = do_div(_pos, io_block_size);
>>>> +		}
>>> Please don't opencode this twice.
>>>
>>> static unsigned int offset_in_block(loff_t pos, u64 blocksize)
>>> {
>>> 	if (likely(is_power_of_2(blocksize)))
>>> 		return pos & (blocksize - 1);
>>> 	return do_div(pos, blocksize);
>>> }
>>
>> ok, fine
>>
>>>
>>> 		pad = offset_in_block(pos, io_block_size);
>>> 		if (pad)
>>> 			...
>>>
>>> Also, what happens if pos-pad points to a byte before the mapping?
>>
>> It's the job of the FS to map in something aligned to io_block_size. Having
>> said that, I don't think we are doing that for XFS (which sets io_block_size
>>> i_block_size(inode)), so I need to check that.
> 
> <nod>  You can only play with the mapping that the fs gave you.
> If xfs doesn't give you a big enough mapping, then that's a programming
> bug to WARN_ON_ONCE about and return EIO.


I think that this is pretty easy to solve by just ensuring that for an 
atomic write inode, the call xfs_direct_write_iomap_being() -> 
xfs_bmapi_read(bno, len) is such that bno and len are extent size aligned.

>>
>> Naming is hard. Essentally we are trying to reuse the sub-fs block zeroing
>> code for sub-extent granule writes. More below.
> 
> Yeah.  For sub-fsblock zeroing we issue (chained) bios to write zeroes
> to the sectors surrounding the part we're actually writing, then we're
> issuing the write itself, and finally the ioend converts the mapping to
> unwritten.
> 
> For untorn writes we're doing the same thing, but now on the level of
> multiple fsblocks.  I guess this is all going to support a
> 
> 
> <nod> "IO granularity" ?  For untorn writes I guess you want mappings
> that are aligned to a supported untorn write granularity; for bs > ps
> filesystems I guess the IO granularity is

For LBS, it's still going to be inode block size.


>>>
>>> <still confused about why we need to do this, maybe i'll figure it out
>>> as I go along>
>>
>> This zeroing is just really required for atomic writes. The purpose is to
>> zero the extent granule for any write within a newly allocated granule.
>>
>> Consider we have uuWu, above. If the user then attempts to write the full
>> 16K as an atomic write, the iomap iter code would generate writes for sizes
>> 8k, 4k, and 4k, i.e. not a single 16K write. This is not acceptable. So the
>> idea is to zero the full extent granule when allocated, so we have ZZWZ
>> after the 4k write at offset 8192, above. As such, if we then attempt this
>> 16K atomic write, we get a single 16K BIO, i.e. there is no unwritten extent
>> conversion.
> 
> Wait, are we issuing zeroing writes for 0-8191 and 12288-16383, then
> issuing a single atomic write for 0-16383? 

When we have uuuu and attempt the first 4k write @ offset 4k, we also 
issue zeroes for 0-8191 and 12288-16383.

But this is done synchronously.  We are leveraging the existing code to 
issue the write with the exclusive IOLOCK in 
xfs_file_dio_write_unaligned(), so no one else can access that data 
while we do that initial write+zeroing to the extent.

> That won't work, because all
> the bios attached to an iomap_dio are submitted and execute
> asynchronously.  I think you need ->iomap_begin to do XFS_BMAPI_ZERO
> allocations if the writes aren't aligned to the minimum untorn write
> granularity.
> 
>> I am not sure if we should be doing this only for atomic writes inodes, or
>> also forcealign only or RT.
> 
> I think it only applies to untorn writes because the default behavior
> everywhere is is that writes can tear.
> 

ok, fine.

Thanks,
John



_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2024-06-24 13:59 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 14:38 [f2fs-dev] [PATCH v4 00/22] block atomic writes for xfs John Garry via Linux-f2fs-devel
2024-06-07 14:38 ` [f2fs-dev] [PATCH v4 01/22] fs: Add generic_atomic_write_valid_size() John Garry via Linux-f2fs-devel
2024-06-12 21:10   ` Darrick J. Wong
2024-06-13  7:35     ` John Garry via Linux-f2fs-devel
2024-06-20 21:24       ` Darrick J. Wong
2024-06-07 14:38 ` [f2fs-dev] [PATCH v4 02/22] iomap: Allow filesystems set IO block zeroing size John Garry via Linux-f2fs-devel
2024-06-12 21:32   ` Darrick J. Wong
2024-06-13 10:31     ` John Garry via Linux-f2fs-devel
2024-06-21 21:18       ` Darrick J. Wong
2024-06-24 13:58         ` John Garry via Linux-f2fs-devel [this message]
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 03/22] xfs: Use extent size granularity for iomap->io_block_size John Garry via Linux-f2fs-devel
2024-06-12 21:47   ` Darrick J. Wong
2024-06-13 11:13     ` John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 04/22] xfs: only allow minlen allocations when near ENOSPC John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 05/22] xfs: always tail align maxlen allocations John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 06/22] xfs: simplify extent allocation alignment John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 07/22] xfs: make EOF allocation simpler John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 08/22] xfs: introduce forced allocation alignment John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 09/22] xfs: align args->minlen for " John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 10/22] xfs: Introduce FORCEALIGN inode flag John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 11/22] xfs: Do not free EOF blocks for forcealign John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 12/22] xfs: Update xfs_inode_alloc_unitsize_fsb() " John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 13/22] xfs: Unmap blocks according to forcealign John Garry via Linux-f2fs-devel
2024-06-11 10:08   ` John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 14/22] xfs: Only free full extents for forcealign John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 15/22] xfs: Don't revert allocated offset " John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 16/22] xfs: Enable file data forcealign feature John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 17/22] fs: Add FS_XFLAG_ATOMICWRITES flag John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 18/22] iomap: Atomic write support John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 19/22] xfs: Support FS_XFLAG_ATOMICWRITES for forcealign John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 20/22] xfs: Support atomic write for statx John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 21/22] xfs: Validate atomic writes John Garry via Linux-f2fs-devel
2024-06-07 14:39 ` [f2fs-dev] [PATCH v4 22/22] xfs: Support setting FMODE_CAN_ATOMIC_WRITE John Garry via Linux-f2fs-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=644afc72-71bb-4201-8829-ccf3211d68b7@oracle.com \
    --to=linux-f2fs-devel@lists.sourceforge.net \
    --cc=agruenba@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=gfs2@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=jack@suse.com \
    --cc=john.g.garry@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcgrof@kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mikulas@artax.karlin.mff.cuni.cz \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).