public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: brauner@kernel.org, djwong@kernel.org, cem@kernel.org,
	dchinner@redhat.com, hch@lst.de, ritesh.list@gmail.com,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, martin.petersen@oracle.com
Subject: Re: [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes
Date: Thu, 5 Dec 2024 10:52:50 +0000	[thread overview]
Message-ID: <3ab6000e-030d-435a-88c3-9026171ae9f1@oracle.com> (raw)
In-Reply-To: <Z1C9IfLgB_jDCF18@dread.disaster.area>

On 04/12/2024 20:35, Dave Chinner wrote:
> On Wed, Dec 04, 2024 at 03:43:41PM +0000, John Garry wrote:
>> From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
>>
>> Filesystems like ext4 can submit writes in multiples of blocksizes.
>> But we still can't allow the writes to be split into multiple BIOs. Hence
>> let's check if the iomap_length() is same as iter->len or not.
>>
>> It is the responsibility of userspace to ensure that a write does not span
>> mixed unwritten and mapped extents (which would lead to multiple BIOs).
> 
> How is "userspace" supposed to do this?

If an atomic write spans mixed unwritten and mapped extents, then it 
should manually zero the unwritten extents beforehand.

> 
> No existing utility in userspace is aware of atomic write limits or
> rtextsize configs, so how does "userspace" ensure everything is
> laid out in a manner compatible with atomic writes?
> 
> e.g. restoring a backup (or other disaster recovery procedures) is
> going to have to lay the files out correctly for atomic writes.
> backup tools often sparsify the data set and so what gets restored
> will not have the same layout as the original data set...

I am happy to support whatever is needed to make atomic writes work over
mixed extents if that is really an expected use case and it is a pain 
for an application writer/admin to deal with this (by manually zeroing 
extents).

JFYI, I did originally support the extent pre-zeroing for this. That was 
to support a real-life scenario which we saw where we were attempting 
atomic writes over mixed extents. The mixed extents were coming from 
userspace punching holes and then attempting an atomic write over that 
space. However that was using an early experimental and buggy 
forcealign; it was buggy as it did not handle punching holes properly - 
it punched out single blocks and not only full alloc units.

> 
> Where's the documentation that outlines all the restrictions on
> userspace behaviour to prevent this sort of problem being triggered?

I would provide a man page update.

> Common operations such as truncate, hole punch,

So how would punch hole be a problem? The atomic write unit max is 
limited by the alloc unit, and we can only punch out full alloc units.

> buffered writes,
> reflinks, etc will trip over this, so application developers, users
> and admins really need to know what they should be doing to avoid
> stepping on this landmine...

If this is not a real-life scenario which we expect to see, then I don't 
see why we would add the complexity to the kernel for this.

My motivation for atomic writes support is to support atomically writing 
large database internal page size. If the database only writes at a 
fixed internal page size, then we should not see mixed mappings.

But you see potential problems elsewhere ..

> 
> Further to that, what is the correction process for users to get rid
> of this error? They'll need some help from an atomic write
> constraint aware utility that can resilver the file such that these
> failures do not occur again. Can xfs_fsr do this? Or maybe the new
> exchangr-range code? Or does none of this infrastructure yet exist?

Nothing exists yet.

> 
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> jpg: tweak commit message
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/iomap/direct-io.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>> index b521eb15759e..3dd883dd77d2 100644
>> --- a/fs/iomap/direct-io.c
>> +++ b/fs/iomap/direct-io.c
>> @@ -306,7 +306,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>>   	size_t copied = 0;
>>   	size_t orig_count;
>>   
>> -	if (atomic && length != fs_block_size)
>> +	if (atomic && length != iter->len)
>>   		return -EINVAL;
> 
> Given this is now rejecting IOs that are otherwise well formed from
> userspace, this situation needs to have a different errno now. The
> userspace application has not provided an invalid set of
> IO parameters for this IO - the IO has been rejected because
> the previously set up persistent file layout was screwed up by
> something in the past.
> 
> i.e. This is not an application IO submission error anymore, hence
> EINVAL is the wrong error to be returning to userspace here.
> 

Understood, but let's come back to this (if needed..).

Thanks,
John


  parent reply	other threads:[~2024-12-05 10:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-04 15:43 [PATCH 0/4] large atomic writes for xfs John Garry
2024-12-04 15:43 ` [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes John Garry
2024-12-04 20:35   ` Dave Chinner
2024-12-05  6:30     ` Darrick J. Wong
2024-12-05 11:51       ` John Garry
2024-12-05 10:52     ` John Garry [this message]
2024-12-05 21:15       ` Dave Chinner
2024-12-06  9:43         ` John Garry
2024-12-12  1:34         ` Darrick J. Wong
2025-01-14  4:41           ` Dave Chinner
2025-01-14 23:57             ` Darrick J. Wong
2025-01-15  9:30               ` John Garry
2025-01-16  6:52               ` Christoph Hellwig
2025-01-17 18:49                 ` Darrick J. Wong
2025-01-22  6:42                   ` Christoph Hellwig
2025-01-22 10:45                     ` John Garry
2025-01-22 23:51                       ` Dave Chinner
2025-01-23  9:28                         ` John Garry
2025-01-17 10:26               ` John Garry
2025-01-17 18:29                 ` Darrick J. Wong
2025-01-20  8:29                   ` John Garry
2025-01-22 21:05               ` Dave Chinner
2025-01-13 21:35         ` John Garry
2025-01-14  4:43           ` Dave Chinner
2024-12-04 15:43 ` [PATCH 2/4] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
2024-12-04 15:43 ` [PATCH 3/4] xfs: Add RT atomic write unit max to xfs_mount John Garry
2024-12-04 15:43 ` [PATCH 4/4] xfs: Update xfs_get_atomic_write_attr() for large atomic writes John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ab6000e-030d-435a-88c3-9026171ae9f1@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ritesh.list@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox