* Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()
2024-01-24 11:59 [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs() Ming Lei
@ 2024-01-24 15:41 ` Keith Busch
2024-01-24 22:32 ` Damien Le Moal
2024-01-25 15:33 ` Jens Axboe
2024-01-26 14:10 ` Christoph Hellwig
2 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2024-01-24 15:41 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, virtualization, linux-nvme
On Wed, Jan 24, 2024 at 07:59:54PM +0800, Ming Lei wrote:
> Requests are added to plug list in reverse order, and both virtio-blk
> and nvme retrieves request from plug list in order, so finally requests
> are submitted to hardware in reverse order via nvme_queue_rqs() or
> virtio_queue_rqs, see:
>
> io_uring submit_bio vdb 6302096 4096
> io_uring submit_bio vdb 12235072 4096
> io_uring submit_bio vdb 7682280 4096
> io_uring submit_bio vdb 11912464 4096
> io_uring virtio_queue_rqs vdb 11912464 4096
> io_uring virtio_queue_rqs vdb 7682280 4096
> io_uring virtio_queue_rqs vdb 12235072 4096
> io_uring virtio_queue_rqs vdb 6302096 4096
>
>
> May this reorder be one problem for virtio-blk and nvme-pci?
For nvme, it depends. Usually it's probably not a problem, though some
pci ssd's have optimizations for sequential IO that might not work if
these get reordered.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()
2024-01-24 15:41 ` Keith Busch
@ 2024-01-24 22:32 ` Damien Le Moal
2024-01-25 4:23 ` Ming Lei
0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2024-01-24 22:32 UTC (permalink / raw)
To: Keith Busch, Ming Lei; +Cc: linux-block, virtualization, linux-nvme
On 1/25/24 00:41, Keith Busch wrote:
> On Wed, Jan 24, 2024 at 07:59:54PM +0800, Ming Lei wrote:
>> Requests are added to plug list in reverse order, and both virtio-blk
>> and nvme retrieves request from plug list in order, so finally requests
>> are submitted to hardware in reverse order via nvme_queue_rqs() or
>> virtio_queue_rqs, see:
>>
>> io_uring submit_bio vdb 6302096 4096
>> io_uring submit_bio vdb 12235072 4096
>> io_uring submit_bio vdb 7682280 4096
>> io_uring submit_bio vdb 11912464 4096
>> io_uring virtio_queue_rqs vdb 11912464 4096
>> io_uring virtio_queue_rqs vdb 7682280 4096
>> io_uring virtio_queue_rqs vdb 12235072 4096
>> io_uring virtio_queue_rqs vdb 6302096 4096
>>
>>
>> May this reorder be one problem for virtio-blk and nvme-pci?
>
> For nvme, it depends. Usually it's probably not a problem, though some
> pci ssd's have optimizations for sequential IO that might not work if
> these get reordered.
ZNS and zoned virtio-blk drives... Cannot use io_uring at the moment. But I do
not thing we reliably can anyway, unless the issuer is CPU/ring aware and always
issue writes to a zone using the same ring.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()
2024-01-24 22:32 ` Damien Le Moal
@ 2024-01-25 4:23 ` Ming Lei
0 siblings, 0 replies; 8+ messages in thread
From: Ming Lei @ 2024-01-25 4:23 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Keith Busch, linux-block, virtualization, linux-nvme
On Thu, Jan 25, 2024 at 07:32:37AM +0900, Damien Le Moal wrote:
> On 1/25/24 00:41, Keith Busch wrote:
> > On Wed, Jan 24, 2024 at 07:59:54PM +0800, Ming Lei wrote:
> >> Requests are added to plug list in reverse order, and both virtio-blk
> >> and nvme retrieves request from plug list in order, so finally requests
> >> are submitted to hardware in reverse order via nvme_queue_rqs() or
> >> virtio_queue_rqs, see:
> >>
> >> io_uring submit_bio vdb 6302096 4096
> >> io_uring submit_bio vdb 12235072 4096
> >> io_uring submit_bio vdb 7682280 4096
> >> io_uring submit_bio vdb 11912464 4096
> >> io_uring virtio_queue_rqs vdb 11912464 4096
> >> io_uring virtio_queue_rqs vdb 7682280 4096
> >> io_uring virtio_queue_rqs vdb 12235072 4096
> >> io_uring virtio_queue_rqs vdb 6302096 4096
> >>
> >>
> >> May this reorder be one problem for virtio-blk and nvme-pci?
> >
> > For nvme, it depends. Usually it's probably not a problem, though some
> > pci ssd's have optimizations for sequential IO that might not work if
> > these get reordered.
>
> ZNS and zoned virtio-blk drives... Cannot use io_uring at the moment. But I do
> not thing we reliably can anyway, unless the issuer is CPU/ring aware and always
> issue writes to a zone using the same ring.
It isn't related with io_uring.
What matters is plug & none & queue_rqs(). If none is applied, any IOs
in single batch will be added to plug list, then dispatched to hardware
in reversed order via queue_rqs().
Thanks,
Ming
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()
2024-01-24 11:59 [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs() Ming Lei
2024-01-24 15:41 ` Keith Busch
@ 2024-01-25 15:33 ` Jens Axboe
2024-01-26 14:10 ` Christoph Hellwig
2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-01-25 15:33 UTC (permalink / raw)
To: Ming Lei, linux-block, virtualization, linux-nvme
On 1/24/24 4:59 AM, Ming Lei wrote:
> Hello,
>
> Requests are added to plug list in reverse order, and both virtio-blk
> and nvme retrieves request from plug list in order, so finally requests
> are submitted to hardware in reverse order via nvme_queue_rqs() or
> virtio_queue_rqs, see:
>
> io_uring submit_bio vdb 6302096 4096
> io_uring submit_bio vdb 12235072 4096
> io_uring submit_bio vdb 7682280 4096
> io_uring submit_bio vdb 11912464 4096
> io_uring virtio_queue_rqs vdb 11912464 4096
> io_uring virtio_queue_rqs vdb 7682280 4096
> io_uring virtio_queue_rqs vdb 12235072 4096
> io_uring virtio_queue_rqs vdb 6302096 4096
>
>
> May this reorder be one problem for virtio-blk and nvme-pci?
This is known and has been the case for a while, that requests inside
the plug list are added to the front and we dispatch in list order
(hence getting them reversed). Not aware of any performance issues
related to that, though I have had someone being surprised by it.
It'd be trivial to just reverse the list before queue_rqs or direct
dispatch, and probably not at a huge cost as the lists will be pretty
short. See below. Or we could change the plug list to be doubly linked,
which would (I'm guessing...) likely be a higher cost. But unless this
is an actual issue, I propose we just leave it alone.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aa87fcfda1ec..ecfba42157ee 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2697,6 +2697,21 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
return __blk_mq_issue_directly(hctx, rq, last);
}
+static struct request *blk_plug_reverse_order(struct blk_plug *plug)
+{
+ struct request *rq = plug->mq_list, *new_head = NULL;
+
+ while (rq) {
+ struct request *tmp = rq;
+
+ rq = rq->rq_next;
+ tmp->rq_next = new_head;
+ new_head = tmp;
+ }
+
+ return new_head;
+}
+
static void blk_mq_plug_issue_direct(struct blk_plug *plug)
{
struct blk_mq_hw_ctx *hctx = NULL;
@@ -2704,6 +2719,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
int queued = 0;
blk_status_t ret = BLK_STS_OK;
+ plug->mq_list = blk_plug_reverse_order(plug);
while ((rq = rq_list_pop(&plug->mq_list))) {
bool last = rq_list_empty(plug->mq_list);
@@ -2741,6 +2757,7 @@ static void __blk_mq_flush_plug_list(struct request_queue *q,
{
if (blk_queue_quiesced(q))
return;
+ plug->mq_list = blk_plug_reverse_order(plug);
q->mq_ops->queue_rqs(&plug->mq_list);
}
--
Jens Axboe
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()
2024-01-24 11:59 [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs() Ming Lei
2024-01-24 15:41 ` Keith Busch
2024-01-25 15:33 ` Jens Axboe
@ 2024-01-26 14:10 ` Christoph Hellwig
2024-10-07 22:39 ` Bart Van Assche
2 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2024-01-26 14:10 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, virtualization, linux-nvme
On Wed, Jan 24, 2024 at 07:59:54PM +0800, Ming Lei wrote:
> Hello,
>
> Requests are added to plug list in reverse order, and both virtio-blk
> and nvme retrieves request from plug list in order, so finally requests
> are submitted to hardware in reverse order via nvme_queue_rqs() or
> virtio_queue_rqs, see:
> May this reorder be one problem for virtio-blk and nvme-pci?
It it isn't really a problem for the drivers, but de-serializing
I/O patterns tends to be not good. I know at least a couple cases
where this would hurt:
- SSDs with sequential write detection
- zoned SSDs with zoned append, as this now turns a sequential
user write pattern into one that is fairly random
- HDDs much prefer real sequential I/O, although until nvme HDDs
go beyong the prototype stage that's probably not hitting this
case yet
So yes, we should fix this.
>
>
> Thanks,
> Ming
>
>
---end quoted text---
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()
2024-01-26 14:10 ` Christoph Hellwig
@ 2024-10-07 22:39 ` Bart Van Assche
2024-10-08 11:33 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2024-10-07 22:39 UTC (permalink / raw)
To: Christoph Hellwig, Ming Lei; +Cc: linux-block, virtualization, linux-nvme
On 1/26/24 6:10 AM, Christoph Hellwig wrote:
> On Wed, Jan 24, 2024 at 07:59:54PM +0800, Ming Lei wrote:
>> Hello,
>>
>> Requests are added to plug list in reverse order, and both virtio-blk
>> and nvme retrieves request from plug list in order, so finally requests
>> are submitted to hardware in reverse order via nvme_queue_rqs() or
>> virtio_queue_rqs, see:
>
>> May this reorder be one problem for virtio-blk and nvme-pci?
>
> It it isn't really a problem for the drivers, but de-serializing
> I/O patterns tends to be not good. I know at least a couple cases
> where this would hurt:
>
> - SSDs with sequential write detection
> - zoned SSDs with zoned append, as this now turns a sequential
> user write pattern into one that is fairly random
> - HDDs much prefer real sequential I/O, although until nvme HDDs
> go beyong the prototype stage that's probably not hitting this
> case yet
>
> So yes, we should fix this.
(replying to an email from January)
For my patch series that supports pipelining for zoned writes, I need
the submission order to be preserved. Jens mentioned two possible
solutions:
- Either keep the approach that requests on plug->mq_list are in reverse
order and reverse the request order just before submitting requests.
- Or change plug->mq_list into a doubly linked list.
The second approach seems the most interesting to me. I'm concerned that
with the first approach it will be difficult to preserve the request
order if a subset of the requests on plug->mq_list are submitted, e.g.
because a queue full condition is encountered by
blk_mq_dispatch_plug_list().
Thanks,
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Report] requests are submitted to hardware in reverse order from nvme/virtio-blk queue_rqs()
2024-10-07 22:39 ` Bart Van Assche
@ 2024-10-08 11:33 ` Christoph Hellwig
0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2024-10-08 11:33 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Ming Lei, linux-block, virtualization,
linux-nvme
On Mon, Oct 07, 2024 at 03:39:42PM -0700, Bart Van Assche wrote:
> For my patch series that supports pipelining for zoned writes, I need
> the submission order to be preserved. Jens mentioned two possible
> solutions:
> - Either keep the approach that requests on plug->mq_list are in reverse
> order and reverse the request order just before submitting requests.
> - Or change plug->mq_list into a doubly linked list.
>
> The second approach seems the most interesting to me. I'm concerned that
> with the first approach it will be difficult to preserve the request
> order if a subset of the requests on plug->mq_list are submitted, e.g.
> because a queue full condition is encountered by
> blk_mq_dispatch_plug_list().
Note that you don't really need a full doubly linked, you just need a
tail pointer in the plug, i.g. the same scheme as struct bio_list.
^ permalink raw reply [flat|nested] 8+ messages in thread