From: Klaus Jensen <its@irrelevant.dk>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, Klaus Jensen <k.jensen@samsung.com>,
qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Keith Busch <kbusch@kernel.org>
Subject: Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs
Date: Mon, 7 Jun 2021 12:00:16 +0200 [thread overview]
Message-ID: <YL3uMOfMBKGM9KpQ@apples.localdomain> (raw)
In-Reply-To: <a9104cf3-efed-524b-803f-b49d93fd062f@virtuozzo.com>
[-- Attachment #1: Type: text/plain, Size: 6240 bytes --]
On Jun 7 10:11, Vladimir Sementsov-Ogievskiy wrote:
>07.06.2021 09:17, Klaus Jensen wrote:
>>On Jun 7 08:14, Vladimir Sementsov-Ogievskiy wrote:
>>>04.06.2021 09:52, Klaus Jensen wrote:
>>>>
>>>>I've kept the RFC since I'm still new to using the block layer like
>>>>this. I was hoping that Stefan could find some time to look over this -
>>>>this is a huge series, so I don't expect non-nvme folks to spend a large
>>>>amount of time on it, but I would really like feedback on my approach in
>>>>the reimplementation of flush and format.
>>>
>>>If I understand your code correctly, you do stat next io operation
>>>from call-back of a previous. It works, and it is similar to haw
>>>mirror block-job was operating some time ago (still it maintained
>>>several in-flight requests simultaneously, but I'm about using _aio_
>>>functions). Still, now mirror doesn't use _aio_ functions like this.
>>>
>>>Better approach to call several io functions of block layer
>>>one-by-one is creating a coroutine. You may just add a coroutine
>>>function, that does all your linear logic as you want, without any
>>>callbacks like:
>>>
>>>nvme_co_flush()
>>>{
>>> for (...) {
>>> blk_co_flush();
>>> }
>>>}
>>>
>>>(and you'll need qemu_coroutine_create() and qemu_coroutine_enter()
>>>to start a coroutine).
>>>
>>
>>So, this is definitely a tempting way to implement this. I must admit
>>that I did not consider it like this because I thought this was at the
>>wrong level of abstraction (looked to me like this was something that
>>belonged in block/, not hw/). Again, I referred to the Trim
>>implementation in hw/ide as the source of inspiration on the
>>sequential AIOCB approach.
>
>No, I think it's OK from abstraction point of view. Everybody is
>welcome to use coroutines if it is appropriate and especially for doing
>sequential IOs :)
>Actually, it's just more efficient: the way I propose, you create one
>coroutine, which does all your logic as you want, when blk_aio_
>functions actually create a coroutine under the hood each time (I don't
>think that it noticeably affects performance, but logic becomes more
>straightforward)
>
>The only problem is that for this way we don't have cancellation API,
>so you can't use it for cancellation anyway :(
>
Yeah, I'm not really feeling up for adding that :P
>>
>>>Still, I'm not sure that moving from simultaneous issuing several IO
>>>commands to sequential is good idea..
>>>And this way you of course can't use blk_aio_canel.. This leads to my
>>>last doubt:
>>>
>>>One more thing I don't understand after fast look at the series: how
>>>cancelation works? It seems to me, that you just call cancel on
>>>nested AIOCBs, produced by blk_<io_functions>, but no one of them
>>>implement cancel.. I see only four implementations of .cancel_async
>>>callback in the whole Qemu code: in iscsi, in ide/core.c, in
>>>dma-helpers.c and in thread-pool.c.. Seems no one is related to
>>>blk_aio_flush() and other blk_* functions you call in the series. Or,
>>>what I miss?
>>>
>>
>>Right now, cancellation is only initiated by the device when a
>>submission queue is deleted. This causes blk_aio_cancel() to be called
>>on each BlockAIOCB (NvmeRequest.aiocb) for outstanding requests. In
>>most cases this BlockAIOCB is a DMAAIOCB from softmmu/dma-helpers.c,
>>which implements .cancel_async. Prior to this patchset, Flush, DSM,
>>Copy and so on, did not have any BlockAIOCB to cancel since we did not
>>keep references to the ongoing AIOs.
>
>Hmm. Looking at flush for example, I don't see how DMAAIOCB comes.
>
>You do
>
> iocb->aiocb = blk_aio_flush(ns->blkconf.blk, nvme_flush_ns_cb, iocb);
>
>it calls blk_aio_prwv(), it calls blk_aio_get() with
>blk_aio_em_aiocb_info, that doesn't implement .cancel_async..
>
I meant that most I/O in the regular path (read/write) are using the dma
helpers (since they do DMA). We might use the blk_aio_p{read,write}
directly when we read from/write to memory on the device (the controller
memory buffer), but it is not the common case.
You are correct that BlkAioEmAIOCB does not implement cancel, but the
"wrapper" (NvmeFlushAIOCB) *does*. This means that, from the NVMe
controller perspective, we can cancel the flush in between
(un-cancellable blk_aio_flush-initiated) flushes to multiple namespaces.
>>
>>The blk_aio_cancel() call is synchronous, but it does call
>>bdrv_aio_cancel_async() which calls the .cancel_async callback if
>>implemented. This means that we can now cancel ongoing DSM or Copy
>>commands while they are processing their individual LBA ranges. So
>>while blk_aio_cancel() subsequently waits for the AIO to complete this
>>may cause them to complete earlier than if they had run to full
>>completion (i.e. if they did not implement .cancel_async).
>>
>>There are two things I'd like to do subsequent to this patch series:
>>
>> 1. Fix the Abort command to actually do something. Currently the
>> command is a no-op (which is allowed by the spec), but I'd like it to
>> actually cancel the command that the host specified.
>>
>> 2. Make submission queue deletion asynchronous.
>>
>>The infrastructure provided by this refactor should allow this if I am
>>not mistaken.
>>
>>Overall, I think this "sequentialization" makes it easier to reason
>>about cancellation, but that might just be me ;)
>>
>
>I just don't like sequential logic simulated on top of aio-callback
>async API, which in turn is simulated on top of coroutine-driven
>sequential API (which is made on top of real async block API
>(thread-based or linux-aio based, etc)) :)
Ha! Yes, we are not exactly improving on that layering here ;)
> Still I can't suggest now an alternative that supports cancellation.
>But I still think that cancellation doesn't work for blk_aio_flush and
>friends either..
>
The aiocb from blk_aio_flush is considered "un-cancellable" I guess (by
design from the block layer), but the NVMe command "Flush" is
cancellable from the perspective of the NVMe controller. Or at least,
that's what I am intending to do here.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2021-06-07 10:07 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-04 6:52 [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 01/11] hw/nvme: reimplement flush to allow cancellation Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 02/11] hw/nvme: add nvme_block_status_all helper Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 03/11] hw/nvme: reimplement dsm to allow cancellation Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 04/11] hw/nvme: save reftag when generating pi Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 05/11] hw/nvme: remove assert from nvme_get_zone_by_slba Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 06/11] hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 07/11] hw/nvme: add dw0/1 to the req completion trace event Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 08/11] hw/nvme: reimplement the copy command to allow aio cancellation Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 09/11] hw/nvme: reimplement zone reset to allow cancellation Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 10/11] hw/nvme: reimplement format nvm " Klaus Jensen
2021-06-04 6:52 ` [RFC PATCH 11/11] Partially revert "hw/block/nvme: drain namespaces on sq deletion" Klaus Jensen
2021-06-07 5:14 ` [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs Vladimir Sementsov-Ogievskiy
2021-06-07 6:17 ` Klaus Jensen
2021-06-07 7:11 ` Vladimir Sementsov-Ogievskiy
2021-06-07 10:00 ` Klaus Jensen [this message]
2021-06-07 10:24 ` Vladimir Sementsov-Ogievskiy
2021-06-07 11:02 ` Klaus Jensen
2021-06-08 10:40 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YL3uMOfMBKGM9KpQ@apples.localdomain \
--to=its@irrelevant.dk \
--cc=fam@euphon.net \
--cc=k.jensen@samsung.com \
--cc=kbusch@kernel.org \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).