From: Stefan Hajnoczi <stefanha@redhat.com>
To: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: "Klaus Jensen" <its@irrelevant.dk>,
qemu-devel@nongnu.org, "Peter Xu" <peterx@redhat.com>,
"Fabiano Rosas" <farosas@suse.de>,
"Jesper Devantier" <foss@defmacro.it>,
"Stéphane Graber" <stgraber@stgraber.org>,
qemu-block@nongnu.org, "Hanna Reitz" <hreitz@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Keith Busch" <kbusch@kernel.org>, "Fam Zheng" <fam@euphon.net>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Zhao Liu" <zhao1.liu@intel.com>, "Kevin Wolf" <kwolf@redhat.com>,
"Alexander Mikhalitsyn" <aleksandr.mikhalitsyn@futurfusion.io>
Subject: Re: [PATCH v5 7/8] hw/nvme: add basic live migration support
Date: Wed, 8 Apr 2026 14:35:29 -0400 [thread overview]
Message-ID: <20260408183529.GB319710@fedora> (raw)
In-Reply-To: <CAJqdLrrBosYRyh8A75bJv9tvgQx5VWxgZuS_E1G=-28YpaBSyw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4720 bytes --]
On Wed, Apr 08, 2026 at 01:31:33PM +0200, Alexander Mikhalitsyn wrote:
> Am Mi., 8. Apr. 2026 um 08:41 Uhr schrieb Klaus Jensen <its@irrelevant.dk>:
> >
> > On Apr 7 21:02, Alexander Mikhalitsyn wrote:
> > > Am Di., 7. Apr. 2026 um 17:48 Uhr schrieb Stefan Hajnoczi <stefanha@redhat.com>:
> > > >
> > > > On Tue, Mar 17, 2026 at 11:27:07AM +0100, Alexander Mikhalitsyn wrote:
> > > > > + /* wait when all in-flight IO requests (except NVME_ADM_CMD_ASYNC_EV_REQ) are processed */
> > > > > + for (i = 0; i < n->num_queues; i++) {
> > > > > + NvmeRequest *req;
> > > > > + NvmeSQueue *sq = n->sq[i];
> > > > > +
> > > > > + if (!sq)
> > > > > + continue;
> > > > > +
> > > > > + trace_pci_nvme_pre_save_sq_out_req_drain_wait(n, i, sq->head, sq->tail, sq->size);
> > > > > +
> > > > > +wait_out_reqs:
> > > > > + QTAILQ_FOREACH(req, &sq->out_req_list, entry) {
> > > > > + if (req->cmd.opcode != NVME_ADM_CMD_ASYNC_EV_REQ) {
> > > > > + cpu_relax();
> > > > > + goto wait_out_reqs;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + trace_pci_nvme_pre_save_sq_out_req_drain_wait_end(n, i, sq->head, sq->tail);
> > > > > + }
> > > >
> > >
> > > Hi Stefan,
> > >
> > > > Emulated storage controllers usually do not drain requests themselves.
> > > > They rely on core migration code (e.g. migration_completion_precopy())
> > > > to stop vCPUs and call bdrv_drain_all_begin/end() to quiesce I/O. Why
> > > > does NVMe busy wait for requests here?
> > >
> > > I rely on core migration code to stop vCPUs and drain requests, *but*
> > > a challenge here is that
> > > a concept of "in-flight" request in NVMe is not that simple and we
> > > have a few different types of in-flight requests:
> > > - request was written in SQ (sq->head != sq->tail) -> this I don't
> > > even consider as in-flight, because we just stop SQ processing
> > > and these requests don't require any special handling during migration
> > > - request was taken from SQ by nvme_process_sq() and it now lives in
> > > sq->out_req_list - this means that
> > > we have also initialized req->aiocb and submitted IO for processing
> > > in QEMU block layer. After request is processed, completion callback
> > > will be called (for read/write requests it is
> > > nvme_rw_complete_cb()), then nvme_enqueue_req_completion() will be
> > > called and remove
> > > NvmeRequest from sq->out_req_list and put it into cq->req_list.
> > > I expect, that by the time when we enter nvme_ctrl_pre_save(),
> > > bdrv_drain_all_begin/end() were called and
> > > all AIO is finished and sq->out_req_list is empty (except AERs).
> > > *But* to be on a safe side I also added busy loop on
> > > sq->out_req_list.
> > >
> >
> > Correct. When nvme_rw_complete_cb (and friends) returns, we still have
> > the completion laying around, not posted on the CQ. That is queued up
> > for a BH to handle to coalesce CQE posting.
> >
> > > So, I tend to agree that this busy wait is probably not required, but
> > > I believe that we still need to verify that sq->out_req_list
> > > is in fact empty. Because if we messed up, then it's better to crash
> > > on assert() than to have silent data corruption.
> > >
> > > Then after I have a loop cq->req_list, and this time it is absolutely
> > > required because we need to write all NvmeRequest
> > > results to CQ and free NvmeRequest structure, cause I didn't want to
> > > deal with NvmeRequest serialization.
> > >
>
> Hi Klaus,
>
> >
> > There is a subtle catch here. There may not be room in the CQ to post
> > all CQEs. For example, in the extreme, the host has allocated a CQ with
> > room for 1 entry (size 2) and several deep SQs all associated with the
> > same CQ. If the controller has nowhere to post CQEs, then we need to
> > either abort migration (and try again) or migrate the CQEs.
>
> Exactly. This is why I have a next loop, where I ensure that
> cq->req_list is empty
> and also check nvme_cq_full(cq) to ensure that all CQEs can be posted.
Hi Alexander,
nvme_cq_full(cq) will never become false because the migration thread
holds the Big QEMU Lock while calling .pre_save(). The busy loop will
hang forever. See my reply to an earlier email in this thread for
details.
A qtest test case would be useful here (see tests/qtest/nvme-test.c) to
fill the CQ and live migrate. It might be tricky to deterministically
run .pre_save() (via the `migrate` QMP command) before nvme_post_cqes()
gets called, but then you'd have a test case that covers the CQ full
code path.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2026-04-08 18:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 10:27 [PATCH v5 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 1/8] migration/vmstate: export vmstate_{load, save}_field helpers Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 2/8] migration: add VMSTATE_VARRAY_OF_POINTER_TO_STRUCT_UINT{8, 32}_ALLOC Alexander Mikhalitsyn
2026-03-17 21:39 ` Peter Xu
2026-03-17 23:30 ` Peter Xu
2026-03-18 10:06 ` Alexander Mikhalitsyn
2026-03-18 10:05 ` Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 3/8] tests/unit/test-vmstate: add tests for VMS_ARRAY_OF_POINTER_ALLOW_NULL Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 4/8] tests/functional/migration: add VM launch/configure hooks Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 5/8] hw/nvme: add migration blockers for non-supported cases Alexander Mikhalitsyn
2026-04-07 15:34 ` Stefan Hajnoczi
2026-04-07 18:39 ` Alexander Mikhalitsyn
2026-04-08 6:32 ` Klaus Jensen
2026-04-08 11:47 ` Stefan Hajnoczi
2026-04-08 11:50 ` Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 6/8] hw/nvme: split nvme_init_sq/nvme_init_cq into helpers Alexander Mikhalitsyn
2026-03-17 10:27 ` [PATCH v5 7/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
2026-04-07 15:48 ` Stefan Hajnoczi
2026-04-07 19:02 ` Alexander Mikhalitsyn
2026-04-08 6:41 ` Klaus Jensen
2026-04-08 11:31 ` Alexander Mikhalitsyn
2026-04-08 18:35 ` Stefan Hajnoczi [this message]
2026-04-08 19:59 ` Alexander Mikhalitsyn
2026-04-08 18:27 ` Stefan Hajnoczi
2026-04-08 19:55 ` Alexander Mikhalitsyn
2026-04-09 13:36 ` Stefan Hajnoczi
2026-03-17 10:27 ` [PATCH v5 8/8] tests/functional/x86_64: add migration test for NVMe device Alexander Mikhalitsyn
2026-03-30 11:38 ` [PATCH v5 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260408183529.GB319710@fedora \
--to=stefanha@redhat.com \
--cc=aleksandr.mikhalitsyn@futurfusion.io \
--cc=alexander@mihalicyn.com \
--cc=fam@euphon.net \
--cc=farosas@suse.de \
--cc=foss@defmacro.it \
--cc=hreitz@redhat.com \
--cc=its@irrelevant.dk \
--cc=kbusch@kernel.org \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stgraber@stgraber.org \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.