From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: John Garry <john.g.garry@oracle.com>, linux-ext4@vger.kernel.org
Cc: Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
"Darrick J . Wong" <djwong@kernel.org>,
Christoph Hellwig <hch@infradead.org>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Dave Chinner <david@fromorbit.com>,
linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 5/6] iomap: Lift blocksize restriction on atomic writes
Date: Fri, 25 Oct 2024 19:43:17 +0530 [thread overview]
Message-ID: <87plnomfsy.fsf@gmail.com> (raw)
In-Reply-To: <509180f3-4cc1-4cc2-9d43-5a1e728fb718@oracle.com>
John Garry <john.g.garry@oracle.com> writes:
> On 25/10/2024 13:36, Ritesh Harjani (IBM) wrote:
>>>> So user will anyway will have to be made aware of not to
>>>> attempt writes of fashion which can cause them such penalties.
>>>>
>>>> As patch-6 mentions this is a base support for bs = ps systems for
>>>> enabling atomic writes using bigalloc. For now we return -EINVAL when we
>>>> can't allocate a continuous user requested mapping which means it won't
>>>> support operations of types 8k followed by 16k.
>>>>
>>> That's my least-preferred option.
>>>
>>> I think better would be reject atomic writes that cover unwritten
>>> extents always - but that boat is about to sail...
>> That's what this patch does.
>
> Not really.
>
> Currently we have 2x iomap restrictions:
> a. mapping length must equal fs block size
> b. bio created must equal total write size
>
> This patch just says that the mapping length must equal total write size
> (instead of a.). So quite similar to b.
>
>> For whatever reason if we couldn't allocate
>> a single contiguous region of requested size for atomic write, then we
>> reject the request always, isn't it. Or maybe I didn't understand your comment.
>
> As the simplest example, for an atomic write to an empty file, there
> should only be a single mapping returned to iomap_dio_bio_iter() and
> that would be of IOMAP_UNWRITTEN type. And we don't reject that.
>
Ok. Maybe this is what I am missing. Could you please help me understand
why should such writes be rejected?
For e.g.
If FS could allocate a single contiguous IOMAP_UNWRITTEN extent of
atomic write request size, that means -
1. FS will allocate an unwritten extent.
2. will do writes (using submit_bio) to the unwritten extent.
3. will do unwritten to written conversion.
It is ok if either of the above operations fail right? If (3) fails
then the region will still be marked unwritten that means it will read
zero (old contents). (2) can anyway fail and will not result into
partial writes. (1) will anyway not result into any write whatsoever.
So we can never have a situation where there is partial writes leading
to mix of old and new write contents right for such cases? Which is what the
requirement of atomic/untorn write also is?
Sorry am I missing something here?
>>
>> If others prefer - we can maybe add such a check (e.g. ext4_dio_atomic_write_checks())
>> for atomic writes in ext4_dio_write_checks(), similar to how we detect
>> overwrites case to decide whether we need a read v/s write semaphore.
>> So this can check if the user has a partially allocated extent for the
>> user requested region and if yes, we can return -EINVAL from
>> ext4_dio_write_iter() itself.
> > > I think this maybe better option than waiting until ->iomap_begin().
>> This might also bring all atomic write constraints to be checked in one
>> place i.e. during ext4_file_write_iter() itself.
>
> Something like this can be done once we decide how atomic writing to
> regions which cover mixed unwritten and written extents is to be handled.
Mixed extent regions (written + unwritten) is a different case all
together (which can lead to mix of old and new contents).
But here what I am suggesting is to add following constraint in case of
ext4 with bigalloc -
"Writes to a region which already has partially allocated extent is not supported."
That means we will return -EINVAL if we detect above case in
ext4_file_write_iter() and sure we can document this behavior.
In retrospect, I am not sure why we cannot add a constraint for atomic
writes (e.g. for ext4 bigalloc) and reject such writes outright,
instead of silently incurring a performance penalty by zeroing out the
partial regions by allowing such write request.
-ritesh
next prev parent reply other threads:[~2024-10-25 14:38 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 3:45 [PATCH 0/6] ext4: Add atomic write support for DIO Ritesh Harjani (IBM)
2024-10-25 3:45 ` [PATCH 1/6] ext4: Add statx support for atomic writes Ritesh Harjani (IBM)
2024-10-25 9:41 ` John Garry
2024-10-25 10:08 ` Ritesh Harjani
2024-10-25 16:09 ` Darrick J. Wong
2024-10-25 17:45 ` Ritesh Harjani
2024-10-25 3:45 ` [PATCH 2/6] ext4: Check for atomic writes support in write iter Ritesh Harjani (IBM)
2024-10-25 9:44 ` John Garry
2024-10-25 10:33 ` Ritesh Harjani
2024-10-25 16:11 ` Darrick J. Wong
2024-10-25 17:50 ` Ritesh Harjani
2024-10-25 3:45 ` [PATCH 3/6] ext4: Support setting FMODE_CAN_ATOMIC_WRITE Ritesh Harjani (IBM)
2024-10-25 3:45 ` [PATCH 4/6] ext4: Warn if we ever fallback to buffered-io for DIO atomic writes Ritesh Harjani (IBM)
2024-10-25 16:16 ` Darrick J. Wong
2024-10-25 17:51 ` Ritesh Harjani
2024-10-27 22:26 ` Dave Chinner
2024-10-28 1:09 ` Ritesh Harjani
2024-10-28 5:26 ` Dave Chinner
2024-10-28 8:43 ` Ritesh Harjani
2024-10-28 18:14 ` Ritesh Harjani
2024-10-29 22:29 ` Dave Chinner
2024-10-29 23:51 ` Ritesh Harjani
2024-10-25 3:45 ` [PATCH 5/6] iomap: Lift blocksize restriction on " Ritesh Harjani (IBM)
2024-10-25 8:52 ` John Garry
2024-10-25 9:31 ` Ritesh Harjani
2024-10-25 9:59 ` John Garry
2024-10-25 10:35 ` Ritesh Harjani
2024-10-25 11:07 ` John Garry
2024-10-25 11:19 ` Ritesh Harjani
2024-10-25 12:23 ` John Garry
2024-10-25 12:36 ` Ritesh Harjani
2024-10-25 14:04 ` John Garry
2024-10-25 14:13 ` Ritesh Harjani [this message]
2024-10-25 18:28 ` Darrick J. Wong
2024-10-26 4:35 ` Ritesh Harjani
2024-10-31 21:36 ` Darrick J. Wong
2024-11-04 1:52 ` Dave Chinner
2024-11-05 0:09 ` Darrick J. Wong
2024-10-25 3:45 ` [PATCH 6/6] ext4: Add atomic write support for bigalloc Ritesh Harjani (IBM)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87plnomfsy.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=john.g.garry@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).