All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>,
	Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org,
	"Darrick J . Wong" <djwong@kernel.org>,
	linux-xfs@vger.kernel.org, wu lei <uwydoc@gmail.com>
Subject: Re: [PATCH v2 1/1] iomap: propagate nowait to block layer
Date: Wed, 5 Mar 2025 06:10:59 -0800	[thread overview]
Message-ID: <Z8hbc5Nzp6cFMpXO@infradead.org> (raw)
In-Reply-To: <Z8emslEolstG76A7@dread.disaster.area>

On Wed, Mar 05, 2025 at 12:19:46PM +1100, Dave Chinner wrote:
> I really don't care about what io_uring thinks or does. If the block
> layer REQ_NOWAIT semantics are unusable for non-blocking IO
> submission, then that's the problem that needs fixing. This isn't a
> problem we can (or should) try to work around in the iomap layer.

Agreed.  The problem are the block layer semantics.  iomap/xfs really
just is the messenger here.

> For example: we have RAID5 witha 64kB chunk size, so max REQ_NOWAIT
> io size is 64kB according to the queue limits. However, if we do a
> 64kB IO at a 60kB chunk offset, that bio is going to be split into a
> 4kB bio and a 60kB bio because they are issued to different physical
> devices.....
> 
> There is no way the bio submitter can know that this behaviour will
> occur, nor should they even be attempting to predict when/if such
> splitting may occur.

And for something that has a real block allocator it could also be
entirely dynamic.  But I'm not sure if dm-thinp or bcache do anything
like that at the moment.

> > Are you only concerned about the size being too restrictive or do you
> > see any other problems?
> 
> I'm concerned abou the fact that REQ_NOWAIT is not usable as it
> stands. We've identified bio chaining as an issue, now bio splitting
> is an issue, and I'm sure if we look further there will be other
> cases that are issues (e.g. bounce buffers).
> 
> The underlying problem here is that bio submission errors are
> reported through bio completion mechanisms, not directly back to the
> submitting context. Fix that problem in the block layer API, and
> then iomap can use REQ_NOWAIT without having to care about what the
> block layer is doing under the covers.

Exactly.  Either they need to be reported synchronously, or maybe we
need a block layer hook in bio_endio that retries the given bio on a
workqueue without ever bubbling up to the caller.  But allowing delayed
BLK_STS_AGAIN is going to mess up any non-trivial caller.  But even
for the plain block device is will cause duplicate I/O where some
blocks have already been read/written and then will get resubmitted.

I'm not sure that breaks any atomicity assumptions as we don't really
give explicit ones for block devices (except maybe for the new
RWF_ATOMIC flag?), but it certainly is unexpected and suboptimal.

      reply	other threads:[~2025-03-05 14:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04 12:18 [PATCH v2 1/1] iomap: propagate nowait to block layer Pavel Begunkov
2025-03-04 16:07 ` Christoph Hellwig
2025-03-04 16:41   ` Pavel Begunkov
2025-03-04 16:59     ` Christoph Hellwig
2025-03-04 17:36       ` Jens Axboe
2025-03-04 23:26         ` Christoph Hellwig
2025-03-04 23:43           ` Jens Axboe
2025-03-04 23:49             ` Christoph Hellwig
2025-03-05  0:14               ` Pavel Begunkov
2025-03-05  0:18                 ` Pavel Begunkov
2025-03-04 17:54       ` Pavel Begunkov
2025-03-04 23:28         ` Christoph Hellwig
2025-03-04 19:22     ` Darrick J. Wong
2025-03-04 20:35       ` Pavel Begunkov
2025-03-05  0:01         ` Christoph Hellwig
2025-03-05  0:45           ` Pavel Begunkov
2025-03-05  1:34             ` Christoph Hellwig
2025-03-04 21:11 ` Dave Chinner
2025-03-04 22:47   ` Pavel Begunkov
2025-03-04 23:40     ` Christoph Hellwig
2025-03-05  1:19     ` Dave Chinner
2025-03-05 14:10       ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8hbc5Nzp6cFMpXO@infradead.org \
    --to=hch@infradead.org \
    --cc=asml.silence@gmail.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=uwydoc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.