From: Stefan Hajnoczi <stefanha@redhat.com>
To: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: qemu-devel@nongnu.org, "Peter Xu" <peterx@redhat.com>,
"Fabiano Rosas" <farosas@suse.de>,
"Jesper Devantier" <foss@defmacro.it>,
"Klaus Jensen" <its@irrelevant.dk>,
"Stéphane Graber" <stgraber@stgraber.org>,
qemu-block@nongnu.org, "Hanna Reitz" <hreitz@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Keith Busch" <kbusch@kernel.org>, "Fam Zheng" <fam@euphon.net>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Zhao Liu" <zhao1.liu@intel.com>, "Kevin Wolf" <kwolf@redhat.com>,
"Alexander Mikhalitsyn" <aleksandr.mikhalitsyn@futurfusion.io>
Subject: Re: [PATCH v5 7/8] hw/nvme: add basic live migration support
Date: Wed, 8 Apr 2026 14:27:08 -0400 [thread overview]
Message-ID: <20260408182708.GA319710@fedora> (raw)
In-Reply-To: <CAJqdLroJk_EmciJQvYPsywpaOoyFSyNf67DGpOJMGpr+MuhQFg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4037 bytes --]
On Tue, Apr 07, 2026 at 09:02:26PM +0200, Alexander Mikhalitsyn wrote:
> Am Di., 7. Apr. 2026 um 17:48 Uhr schrieb Stefan Hajnoczi <stefanha@redhat.com>:
> >
> > On Tue, Mar 17, 2026 at 11:27:07AM +0100, Alexander Mikhalitsyn wrote:
> > > + /* wait when all in-flight IO requests (except NVME_ADM_CMD_ASYNC_EV_REQ) are processed */
> > > + for (i = 0; i < n->num_queues; i++) {
> > > + NvmeRequest *req;
> > > + NvmeSQueue *sq = n->sq[i];
> > > +
> > > + if (!sq)
> > > + continue;
> > > +
> > > + trace_pci_nvme_pre_save_sq_out_req_drain_wait(n, i, sq->head, sq->tail, sq->size);
> > > +
> > > +wait_out_reqs:
> > > + QTAILQ_FOREACH(req, &sq->out_req_list, entry) {
> > > + if (req->cmd.opcode != NVME_ADM_CMD_ASYNC_EV_REQ) {
> > > + cpu_relax();
> > > + goto wait_out_reqs;
> > > + }
> > > + }
> > > +
> > > + trace_pci_nvme_pre_save_sq_out_req_drain_wait_end(n, i, sq->head, sq->tail);
> > > + }
> >
>
> Hi Stefan,
>
> > Emulated storage controllers usually do not drain requests themselves.
> > They rely on core migration code (e.g. migration_completion_precopy())
> > to stop vCPUs and call bdrv_drain_all_begin/end() to quiesce I/O. Why
> > does NVMe busy wait for requests here?
>
> I rely on core migration code to stop vCPUs and drain requests, *but*
> a challenge here is that
> a concept of "in-flight" request in NVMe is not that simple and we
> have a few different types of in-flight requests:
> - request was written in SQ (sq->head != sq->tail) -> this I don't
> even consider as in-flight, because we just stop SQ processing
> and these requests don't require any special handling during migration
> - request was taken from SQ by nvme_process_sq() and it now lives in
> sq->out_req_list - this means that
> we have also initialized req->aiocb and submitted IO for processing
> in QEMU block layer. After request is processed, completion callback
> will be called (for read/write requests it is
> nvme_rw_complete_cb()), then nvme_enqueue_req_completion() will be
> called and remove
> NvmeRequest from sq->out_req_list and put it into cq->req_list.
> I expect, that by the time when we enter nvme_ctrl_pre_save(),
> bdrv_drain_all_begin/end() were called and
> all AIO is finished and sq->out_req_list is empty (except AERs).
> *But* to be on a safe side I also added busy loop on
> sq->out_req_list.
>
> So, I tend to agree that this busy wait is probably not required, but
> I believe that we still need to verify that sq->out_req_list
> is in fact empty. Because if we messed up, then it's better to crash
> on assert() than to have silent data corruption.
>
> Then after I have a loop cq->req_list, and this time it is absolutely
> required because we need to write all NvmeRequest
> results to CQ and free NvmeRequest structure, cause I didn't want to
> deal with NvmeRequest serialization.
I don't see how the busy wait approach can work since
migration_completion_precopy() holds the Big QEMU Lock
(bql_lock()/bql_unlock()) while .pre_save() is called. The main loop
thread's event loop will not be able to make progress while .pre_save()
is busy waiting.
Based on what you and Klaus have said, how about:
1. assert(QTAILQ_EMPTY(&sq->out_req_list))
2. Call nvme_post_cqes() from .pre_save() to write CQEs back to RAM.
Whether or not cq->bh is cancelled too doesn't really matter.
3. Full CQs need to be handled. It will be necessary to serialize the
remainder of cq->req_list them. Failing live migration for this
reason is not acceptable from a user perspective.
(virtio-scsi and virtio-blk have a mechanism for serializing failed
I/O requests that need to be restarted after migration. There might
be a nicer way to do this with vmstate nowadays, but they are at
least two examples of how to serialize a storage controller's
internal request state.)
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2026-04-08 18:49 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 10:27 [PATCH v5 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 1/8] migration/vmstate: export vmstate_{load, save}_field helpers Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 2/8] migration: add VMSTATE_VARRAY_OF_POINTER_TO_STRUCT_UINT{8, 32}_ALLOC Alexander Mikhalitsyn
2026-03-17 21:39 ` Peter Xu
2026-03-17 23:30 ` Peter Xu
2026-03-18 10:06 ` Alexander Mikhalitsyn
2026-03-18 10:05 ` Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 3/8] tests/unit/test-vmstate: add tests for VMS_ARRAY_OF_POINTER_ALLOW_NULL Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 4/8] tests/functional/migration: add VM launch/configure hooks Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 5/8] hw/nvme: add migration blockers for non-supported cases Alexander Mikhalitsyn
2026-04-07 15:34 ` Stefan Hajnoczi
2026-04-07 18:39 ` Alexander Mikhalitsyn
2026-04-08 6:32 ` Klaus Jensen
2026-04-08 11:47 ` Stefan Hajnoczi
2026-04-08 11:50 ` Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 6/8] hw/nvme: split nvme_init_sq/nvme_init_cq into helpers Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 7/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
2026-04-07 15:48 ` Stefan Hajnoczi
2026-04-07 19:02 ` Alexander Mikhalitsyn
2026-04-08 6:41 ` Klaus Jensen
2026-04-08 11:31 ` Alexander Mikhalitsyn
2026-04-08 18:35 ` Stefan Hajnoczi
2026-04-08 19:59 ` Alexander Mikhalitsyn
2026-04-08 18:27 ` Stefan Hajnoczi [this message]
2026-04-08 19:55 ` Alexander Mikhalitsyn
2026-04-09 13:36 ` Stefan Hajnoczi
2026-03-17 10:27 ` [PATCH v5 8/8] tests/functional/x86_64: add migration test for NVMe device Alexander Mikhalitsyn
2026-03-30 11:38 ` [PATCH v5 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260408182708.GA319710@fedora \
--to=stefanha@redhat.com \
--cc=aleksandr.mikhalitsyn@futurfusion.io \
--cc=alexander@mihalicyn.com \
--cc=fam@euphon.net \
--cc=farosas@suse.de \
--cc=foss@defmacro.it \
--cc=hreitz@redhat.com \
--cc=its@irrelevant.dk \
--cc=kbusch@kernel.org \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stgraber@stgraber.org \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.