All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Bart Van Assche <bvanassche@acm.org>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: Zoned storage and BLK_STS_RESOURCE
Date: Tue, 17 Dec 2024 11:20:41 -0800	[thread overview]
Message-ID: <3eb6ba65-daf8-4d8f-a37f-61bea129b165@kernel.org> (raw)
In-Reply-To: <96e900ed-4984-4fbe-a74d-06a15fd7f3f7@kernel.dk>

On 2024/12/17 11:07, Jens Axboe wrote:
> On 12/17/24 11:51 AM, Damien Le Moal wrote:
>> On 2024/12/17 10:46, Jens Axboe wrote:
>>>> Of note about io_uring: if writes are submitted from multiple jobs to
>>>> multiple queues, then you will see unaligned write errors, but the
>>>> same test with libaio will work just fine. The reason is that io_uring
>>>> fio engine IO submission only adds write requests to the io rings,
>>>> which will then be submitted by the kernel ring handling later. But at
>>>> that time, the ordering information is lost and if the rings are
>>>> processed in the wrong order, you'll get unaligned errors.
>>>
>>> Sorry, but this is woefully incorrect.
>>>
>>> Submissions are always in order, I suspect the main difference here is
>>> that some submissions would block, and that will certainly cause the
>>> effective issue point to be reordered, as the initial issue will get
>>> -EAGAIN. This isn't a problem on libaio as it simply blocks on
>>> submission instead. Because the actual issue is the same, and the kernel
>>> will absolutely see the submissions in order when io_uring_enter() is
>>> called, just like it would when io_submit() is called.
>>
>> I did not mean to say that the processing of requests in each
>> queue/ring is done out of order. They are not. What I meant to say is
>> that multiple queues/rings may be processed in parallel, so if
>> sequential writes are submitted to different queues, the BIOs for
>> these write IOs may endup being issued out of order to the zone. Is
>> that an incorrect assumption ? Reading the io_uring code, I think
>> there is one work item per ring and these are not synchronized.
> 
> Sure, if you have multiple rings, there's no synchronization between
> them. Within each ring, reordering in terms of issue can only happen if
> the target response with -EAGAIN to a REQ_NOWAIT request, as they are
> always issued in order. If that doesn't happen, there should be no
> difference to what the issue looks like with multiple rings or contexts
> for io_uring or libaio - any kind of ordering could be observed.

Yes. The fixes that went into rc3 addressed the REQ_NOWAIT issue. So we are good
on this front.

> Unsure of which queues you are talking about here, are these the block
> level queues?

My bad. I was talking about the io_uring rings. Not the block layer queues.

> And ditto on the io_uring question, which work items are we talking
> about? There can be any number of requests for any given ring, inflight.

I was talking about the work that gets IOs submitted by the user from the rings
and turn them into BIOs for submission. My understanding is that these are not
synchronized. For a simple fio "--zonemode=zbd --rw=randwrite --numjobs=X" for X
> 1, the fio level synchronization will serialize the calls to io_submit() for
libaio, thus delivering the BIOs to a zone in order in the kernel. With io_uring
as the I/O engine, the same fio level synchronization happens but is only around
the IO getting in the ring. The IOs being turned into BIOs and submitted will be
done outside of the fio serialization and can thus can endup being issued out of
order if multiple rings are used. At least, that is my understanding of
io_uring... Am I getting this wrong ?


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2024-12-17 19:20 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-16 19:24 Zoned storage and BLK_STS_RESOURCE Bart Van Assche
2024-12-16 20:23 ` Damien Le Moal
2024-12-16 20:42   ` Bart Van Assche
2024-12-16 20:54     ` Damien Le Moal
2024-12-16 21:22       ` Bart Van Assche
2024-12-16 22:49         ` Damien Le Moal
2024-12-17 14:56         ` Damien Le Moal
2024-12-19  5:55           ` Christoph Hellwig
2024-12-19 17:07             ` Bart Van Assche
2024-12-17  4:15 ` Christoph Hellwig
2024-12-17 15:04   ` Damien Le Moal
2024-12-17 18:38     ` Bart Van Assche
2024-12-17 18:46     ` Jens Axboe
2024-12-17 18:51       ` Damien Le Moal
2024-12-17 19:07         ` Jens Axboe
2024-12-17 19:20           ` Damien Le Moal [this message]
2024-12-17 19:25             ` Bart Van Assche
2024-12-17 19:28               ` Damien Le Moal
2024-12-17 19:33                 ` Jens Axboe
2024-12-17 19:37                   ` Damien Le Moal
2024-12-17 19:41                     ` Jens Axboe
2024-12-17 19:48                       ` Damien Le Moal
2024-12-17 19:54                         ` Jens Axboe
2024-12-17 19:58                           ` Jens Axboe
2024-12-17 20:59                             ` Damien Le Moal
2024-12-17 21:25                               ` Jens Axboe
2024-12-18  6:58                               ` Christoph Hellwig
2024-12-19 18:04                                 ` Bart Van Assche
2024-12-21  8:10                                   ` Christoph Hellwig
2025-01-06 18:54                                     ` Bart Van Assche
2024-12-19  6:00                       ` Christoph Hellwig
2024-12-19 14:50                         ` Jens Axboe
2024-12-19 17:12                         ` Bart Van Assche
2024-12-19 23:10                           ` Damien Le Moal
2025-01-06 20:14                             ` Bart Van Assche
2024-12-21  8:13                           ` Christoph Hellwig
2024-12-17 19:32             ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3eb6ba65-daf8-4d8f-a37f-61bea129b165@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.