qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Klaus Jensen <its@irrelevant.dk>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, Klaus Jensen <k.jensen@samsung.com>,
	qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Keith Busch <kbusch@kernel.org>
Subject: Re: [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs
Date: Mon, 7 Jun 2021 12:00:16 +0200	[thread overview]
Message-ID: <YL3uMOfMBKGM9KpQ@apples.localdomain> (raw)
In-Reply-To: <a9104cf3-efed-524b-803f-b49d93fd062f@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 6240 bytes --]

On Jun  7 10:11, Vladimir Sementsov-Ogievskiy wrote:
>07.06.2021 09:17, Klaus Jensen wrote:
>>On Jun  7 08:14, Vladimir Sementsov-Ogievskiy wrote:
>>>04.06.2021 09:52, Klaus Jensen wrote:
>>>>
>>>>I've kept the RFC since I'm still new to using the block layer like
>>>>this. I was hoping that Stefan could find some time to look over this -
>>>>this is a huge series, so I don't expect non-nvme folks to spend a large
>>>>amount of time on it, but I would really like feedback on my approach in
>>>>the reimplementation of flush and format.
>>>
>>>If I understand your code correctly, you do stat next io operation 
>>>from call-back of a previous. It works, and it is similar to haw 
>>>mirror block-job was operating some time ago (still it maintained 
>>>several in-flight requests simultaneously, but I'm about using _aio_ 
>>>functions). Still, now mirror doesn't use _aio_ functions like this.
>>>
>>>Better approach to call several io functions of block layer 
>>>one-by-one is creating a coroutine. You may just add a coroutine 
>>>function, that does all your linear logic as you want, without any 
>>>callbacks like:
>>>
>>>nvme_co_flush()
>>>{
>>>  for (...) {
>>>     blk_co_flush();
>>>  }
>>>}
>>>
>>>(and you'll need qemu_coroutine_create() and qemu_coroutine_enter() 
>>>to start a coroutine).
>>>
>>
>>So, this is definitely a tempting way to implement this. I must admit 
>>that I did not consider it like this because I thought this was at the 
>>wrong level of abstraction (looked to me like this was something that 
>>belonged in block/, not hw/). Again, I referred to the Trim 
>>implementation in hw/ide as the source of inspiration on the 
>>sequential AIOCB approach.
>
>No, I think it's OK from abstraction point of view. Everybody is 
>welcome to use coroutines if it is appropriate and especially for doing 
>sequential IOs :)
>Actually, it's just more efficient: the way I propose, you create one 
>coroutine, which does all your logic as you want, when blk_aio_ 
>functions actually create a coroutine under the hood each time (I don't 
>think that it noticeably affects performance, but logic becomes more 
>straightforward)
>
>The only problem is that for this way we don't have cancellation API, 
>so you can't use it for cancellation anyway :(
>

Yeah, I'm not really feeling up for adding that :P

>>
>>>Still, I'm not sure that moving from simultaneous issuing several IO 
>>>commands to sequential is good idea..
>>>And this way you of course can't use blk_aio_canel.. This leads to my 
>>>last doubt:
>>>
>>>One more thing I don't understand after fast look at the series: how 
>>>cancelation works? It seems to me, that you just call cancel on 
>>>nested AIOCBs, produced by blk_<io_functions>, but no one of them 
>>>implement cancel.. I see only four implementations of .cancel_async 
>>>callback in the whole Qemu code: in iscsi, in ide/core.c, in 
>>>dma-helpers.c and in thread-pool.c.. Seems no one is related to 
>>>blk_aio_flush() and other blk_* functions you call in the series. Or, 
>>>what I miss?
>>>
>>
>>Right now, cancellation is only initiated by the device when a 
>>submission queue is deleted. This causes blk_aio_cancel() to be called 
>>on each BlockAIOCB (NvmeRequest.aiocb) for outstanding requests. In 
>>most cases this BlockAIOCB is a DMAAIOCB from softmmu/dma-helpers.c, 
>>which implements .cancel_async. Prior to this patchset, Flush, DSM, 
>>Copy and so on, did not have any BlockAIOCB to cancel since we did not 
>>keep references to the ongoing AIOs.
>
>Hmm. Looking at flush for example, I don't see how DMAAIOCB comes.
>
>You do
>
>  iocb->aiocb = blk_aio_flush(ns->blkconf.blk, nvme_flush_ns_cb, iocb);
>
>it calls blk_aio_prwv(), it calls blk_aio_get() with 
>blk_aio_em_aiocb_info, that doesn't implement .cancel_async..
>

I meant that most I/O in the regular path (read/write) are using the dma 
helpers (since they do DMA). We might use the blk_aio_p{read,write} 
directly when we read from/write to memory on the device (the controller 
memory buffer), but it is not the common case.

You are correct that BlkAioEmAIOCB does not implement cancel, but the 
"wrapper" (NvmeFlushAIOCB) *does*. This means that, from the NVMe 
controller perspective, we can cancel the flush in between 
(un-cancellable blk_aio_flush-initiated) flushes to multiple namespaces.

>>
>>The blk_aio_cancel() call is synchronous, but it does call 
>>bdrv_aio_cancel_async() which calls the .cancel_async callback if 
>>implemented. This means that we can now cancel ongoing DSM or Copy 
>>commands while they are processing their individual LBA ranges. So 
>>while blk_aio_cancel() subsequently waits for the AIO to complete this 
>>may cause them to complete earlier than if they had run to full 
>>completion (i.e. if they did not implement .cancel_async).
>>
>>There are two things I'd like to do subsequent to this patch series:
>>
>>   1. Fix the Abort command to actually do something. Currently the 
>> command is a no-op (which is allowed by the spec), but I'd like it to 
>> actually cancel the command that the host specified.
>>
>>   2. Make submission queue deletion asynchronous.
>>
>>The infrastructure provided by this refactor should allow this if I am 
>>not mistaken.
>>
>>Overall, I think this "sequentialization" makes it easier to reason 
>>about cancellation, but that might just be me ;)
>>
>
>I just don't like sequential logic simulated on top of aio-callback 
>async API, which in turn is simulated on top of coroutine-driven 
>sequential API (which is made on top of real async block API 
>(thread-based or linux-aio based, etc)) :)

Ha! Yes, we are not exactly improving on that layering here ;)

> Still I can't suggest now an alternative that supports cancellation. 
>But I still think that cancellation doesn't work for blk_aio_flush and 
>friends either..
>

The aiocb from blk_aio_flush is considered "un-cancellable" I guess (by 
design from the block layer), but the NVMe command "Flush" is 
cancellable from the perspective of the NVMe controller. Or at least, 
that's what I am intending to do here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-06-07 10:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04  6:52 [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 01/11] hw/nvme: reimplement flush to allow cancellation Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 02/11] hw/nvme: add nvme_block_status_all helper Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 03/11] hw/nvme: reimplement dsm to allow cancellation Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 04/11] hw/nvme: save reftag when generating pi Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 05/11] hw/nvme: remove assert from nvme_get_zone_by_slba Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 06/11] hw/nvme: use prinfo directly in nvme_check_prinfo and nvme_dif_check Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 07/11] hw/nvme: add dw0/1 to the req completion trace event Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 08/11] hw/nvme: reimplement the copy command to allow aio cancellation Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 09/11] hw/nvme: reimplement zone reset to allow cancellation Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 10/11] hw/nvme: reimplement format nvm " Klaus Jensen
2021-06-04  6:52 ` [RFC PATCH 11/11] Partially revert "hw/block/nvme: drain namespaces on sq deletion" Klaus Jensen
2021-06-07  5:14 ` [RFC PATCH 00/11] hw/nvme: reimplement all multi-aio commands with custom aiocbs Vladimir Sementsov-Ogievskiy
2021-06-07  6:17   ` Klaus Jensen
2021-06-07  7:11     ` Vladimir Sementsov-Ogievskiy
2021-06-07 10:00       ` Klaus Jensen [this message]
2021-06-07 10:24         ` Vladimir Sementsov-Ogievskiy
2021-06-07 11:02           ` Klaus Jensen
2021-06-08 10:40 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YL3uMOfMBKGM9KpQ@apples.localdomain \
    --to=its@irrelevant.dk \
    --cc=fam@euphon.net \
    --cc=k.jensen@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).