All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/8] hw/nvme: add basic live migration support
@ 2026-06-11 11:22 Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 1/8] tests/functional/migration: add VM launch/configure hooks Alexander Mikhalitsyn
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

Dear friends,

This patchset adds basic live migration support for
QEMU emulated NVMe device.

Implementation has some limitations:
- only one NVMe namespace is supported
- SMART counters are not preserved
- CMB is not supported
- PMR is not supported
- SPDM is not supported
- SR-IOV is not supported

I believe this is something I can support in next patchset versions or
separately on-demand (when usecase appears).

Testing.

This patch series was manually tested on:
- Debian 13.3 VM (kernel 6.12.69+deb13-amd64) using fio on *non-root* NVMe disk
  (root disk was virtio-scsi):

time fio --name=nvme-verify \
    --filename=/dev/nvme0n1 \
    --size=5G \
    --rw=randwrite \
    --bs=4k \
    --iodepth=16 \
    --numjobs=1 \
    --direct=0 \
    --ioengine=io_uring \
    --verify=crc32c \
    --verify_fatal=1

- Windows Server 2022 VM (NVMe drive was a *root* disk) with opened browser
  playing video.

No defects were found.

Git tree:
https://gitlab.com/mihalicyn/qemu/-/commits/nvme-live-migration

Changelog for version 9:
- many trivial check-patch fixes here and there
- disabled qtest on ppc64 (spapr bus) by adding qpci_check_buggy_msi() check
  [ we had a private discussion with Klaus and he told me that ppc64
    tests are failing after debugging I found this old patch which wasn't merged
    for some reason https://lore.kernel.org/qemu-devel/20250502030446.88310-5-npiggin@gmail.com/ ]
- I kept all RWB/ACK-tags in place (hope it is fine, cause really I hasn't changed too much)
- QEMU CI tests results https://gitlab.com/mihalicyn/qemu/-/pipelines/2593676152

Changelog for version 8:
- rebased
- added Acked-by from Stefan
- added RWB tag from Klaus
- added RWB tag from Peter

Changelog for version 7:
- rebased on top of recent main
- addressed review comments from Stefan Hajnoczi:
  - better incoming migration stream validation (SQ/CQids correctness)
  - endianness bugs are fixed in qtest (validated on s390x)
- added RWB tags from Klaus

Changelog for version 6:
- rebased on top of:
  https://gitlab.com/peterx/qemu/-/tree/vmstate-array-null
  (see also https://lore.kernel.org/all/20260401202844.673494-1-peterx@redhat.com)
- addressed review comments from Stefan Hajnoczi:
  - supported "full CQ" case by serializing NvmeRequest state
  - added qtest for NVMe device migration with full CQ

Changelog for version 5:
- rebased on top of https://lore.kernel.org/all/20260304212303.667141-1-vsementsov@yandex-team.ru/
  (as Peter has requested)

Changelog for version 4:
- vmstate dynamic array support reworked as suggested by Peter Xu
  VMS_ARRAY_OF_POINTER_ALLOW_NULL flag was introduced
  qtests were added
- NVMe migration blockers were reworked as Klaus has requested earlier
  Now, instead of having "deny list" approach, we have more strict pattern
  of NVMe features filtering and it should be harded to break migration when
  adding new NVMe features.

Changelog for version 3:
- rebased
- simple functional test was added (in accordance with Klaus Jensen's review comment)
  $ meson test 'func-x86_64-nvme_migration' --setup thorough -C build

Changelog for version 2:
- full support for AERs (in-flight requests and queued events too)

Kind regards,
Alex

Alexander Mikhalitsyn (8):
  tests/functional/migration: add VM launch/configure hooks
  hw/nvme: add migration blockers for non-supported cases
  hw/nvme: split nvme_init_sq/nvme_init_cq into helpers
  hw/nvme: set CQE.sq_id earlier in nvme_process_sq
  hw/nvme: unmap req->sg earlier in nvme_enqueue_req_completion
  hw/nvme: add basic live migration support
  tests/functional/x86_64: add migration test for NVMe device
  tests/qtest/nvme-test: add migration test with full CQ

 MAINTAINERS                                   |    1 +
 hw/nvme/ctrl.c                                | 1035 ++++++++++++++++-
 hw/nvme/ns.c                                  |  164 +++
 hw/nvme/nvme.h                                |   12 +
 hw/nvme/trace-events                          |   10 +
 include/block/nvme.h                          |   12 +
 tests/functional/migration.py                 |   23 +-
 tests/functional/x86_64/meson.build           |    1 +
 .../functional/x86_64/test_nvme_migration.py  |  172 +++
 tests/qtest/nvme-test.c                       |  421 +++++++
 10 files changed, 1815 insertions(+), 36 deletions(-)
 create mode 100755 tests/functional/x86_64/test_nvme_migration.py

-- 
2.47.3



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v9 1/8] tests/functional/migration: add VM launch/configure hooks
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases Alexander Mikhalitsyn
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

Introduce configure_machine, launch_source_vm and assert_dest_vm
methods to allow child classes to override some pieces of
source/dest VMs creation, start and check logic.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
 tests/functional/migration.py | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/tests/functional/migration.py b/tests/functional/migration.py
index 3b7674af3b6..4344e03be41 100644
--- a/tests/functional/migration.py
+++ b/tests/functional/migration.py
@@ -40,19 +40,36 @@ def assert_migration(self, src_vm, dst_vm):
         self.assertEqual(dst_vm.cmd('query-status')['status'], 'running')
         self.assertEqual(src_vm.cmd('query-status')['status'],'postmigrate')
 
+    # Can be overridden by subclasses to configure both source/dest VMs.
+    def configure_machine(self, vm):
+        vm.add_args('-nodefaults')
+
+    # Can be overridden by subclasses to prepare the source VM before
+    # migration, e.g. by running some workload inside the source VM
+    # to see if it continues to run properly after migration.
+    def launch_source_vm(self, vm):
+        vm.launch()
+
+    # Can be overridden by subclasses to check the destination VM after
+    # migration, e.g. by checking if the workload is still running after
+    # migration.
+    def assert_dest_vm(self, vm):
+        pass
+
     def migrate_vms(self, dst_uri, src_uri, dst_vm, src_vm):
         dst_vm.qmp('migrate-incoming', uri=dst_uri)
         src_vm.qmp('migrate', uri=src_uri)
         self.assert_migration(src_vm, dst_vm)
+        self.assert_dest_vm(dst_vm)
 
     def migrate(self, dst_uri, src_uri=None):
         dst_vm = self.get_vm('-incoming', 'defer', name="dst-qemu")
-        dst_vm.add_args('-nodefaults')
+        self.configure_machine(dst_vm)
         dst_vm.launch()
 
         src_vm = self.get_vm(name="src-qemu")
-        src_vm.add_args('-nodefaults')
-        src_vm.launch()
+        self.configure_machine(src_vm)
+        self.launch_source_vm(src_vm)
 
         if src_uri is None:
             src_uri = dst_uri
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 1/8] tests/functional/migration: add VM launch/configure hooks Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  2026-06-11 13:31   ` Fabiano Rosas
  2026-06-11 11:22 ` [PATCH v9 3/8] hw/nvme: split nvme_init_sq/nvme_init_cq into helpers Alexander Mikhalitsyn
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn, Klaus Jensen

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

Let's block migration for cases we don't support:
- SR-IOV
- CMB
- PMR
- SPDM

No functional changes here, because NVMe migration is
not supported at all as of this commit.

Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
v9:
    - check-patch trivial fixes
---
 hw/nvme/ctrl.c       | 211 +++++++++++++++++++++++++++++++++++++++++++
 hw/nvme/nvme.h       |   3 +
 include/block/nvme.h |  12 +++
 3 files changed, 226 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 815f39173c8..7510a9e0296 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -209,6 +209,7 @@
 #include "hw/pci/msix.h"
 #include "hw/pci/pcie_sriov.h"
 #include "system/spdm-socket.h"
+#include "migration/blocker.h"
 #include "migration/vmstate.h"
 
 #include "nvme.h"
@@ -252,6 +253,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
     [NVME_COMMAND_SET_PROFILE]      = true,
     [NVME_FDP_MODE]                 = true,
     [NVME_FDP_EVENTS]               = true,
+    /* if you add something here, please update nvme_set_migration_blockers() */
 };
 
 static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
@@ -4603,6 +4605,7 @@ static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
         return 0;
     case NVME_IOMS_MO_RUH_UPDATE:
         return nvme_io_mgmt_send_ruh_update(n, req);
+    /* if you add something here, please update nvme_set_migration_blockers() */
     default:
         return NVME_INVALID_FIELD | NVME_DNR;
     };
@@ -7522,6 +7525,10 @@ static uint16_t nvme_security_receive(NvmeCtrl *n, NvmeRequest *req)
 
 static uint16_t nvme_directive_send(NvmeCtrl *n, NvmeRequest *req)
 {
+    /*
+     * When adding a new dtype handling here,
+     * please also update nvme_set_migration_blockers().
+     */
     return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -9233,6 +9240,204 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
     }
 }
 
+#define BLOCKER_FEATURES_MAX_LEN 256
+
+static inline void nvme_add_blocker_feature(char *blocker_features,
+                                            const char *feature)
+{
+    if (strlen(blocker_features) > 0) {
+        g_strlcat(blocker_features, ", ", BLOCKER_FEATURES_MAX_LEN);
+    }
+    g_strlcat(blocker_features, feature, BLOCKER_FEATURES_MAX_LEN);
+}
+
+static bool nvme_set_migration_blockers(NvmeCtrl *n, PCIDevice *pci_dev,
+                                        Error **errp)
+{
+    uint64_t unsupported_cap, cap = ldq_le_p(&n->bar.cap);
+    char blocker_features[BLOCKER_FEATURES_MAX_LEN] = "";
+    bool adm_cmd_security_checked = false;
+    bool cmd_io_mgmt_checked = false;
+    bool cmd_zone_checked = false;
+
+    /*
+     * Idea of this function is simple, we iterate over all Command Sets and
+     * for each supported command we provide a special handling logic to
+     * determine if we should block migration or not.
+     *
+     * For instance, we have NVME_ADM_CMD_NS_ATTACHMENT and it is always
+     * available to the guest, but if there is only 1 namespace, then it is
+     * safe to allow migration, but if there are more, then we need to block
+     * migration because we don't handle this in migration code yet.
+     */
+    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.acs); opcode++) {
+        /* Is command supported? */
+        if (!n->cse.acs[opcode]) {
+            continue;
+        }
+
+        switch (opcode) {
+        case NVME_ADM_CMD_DELETE_SQ:
+        case NVME_ADM_CMD_CREATE_SQ:
+        case NVME_ADM_CMD_GET_LOG_PAGE:
+        case NVME_ADM_CMD_DELETE_CQ:
+        case NVME_ADM_CMD_CREATE_CQ:
+        case NVME_ADM_CMD_IDENTIFY:
+        case NVME_ADM_CMD_ABORT:
+        case NVME_ADM_CMD_SET_FEATURES:
+        case NVME_ADM_CMD_GET_FEATURES:
+        case NVME_ADM_CMD_ASYNC_EV_REQ:
+        case NVME_ADM_CMD_DBBUF_CONFIG:
+        case NVME_ADM_CMD_FORMAT_NVM:
+        case NVME_ADM_CMD_DIRECTIVE_SEND:
+        case NVME_ADM_CMD_DIRECTIVE_RECV:
+            break;
+        case NVME_ADM_CMD_NS_ATTACHMENT:
+            int namespaces_num = 0;
+            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
+                if (!ns) {
+                    continue;
+                }
+
+                namespaces_num++;
+            }
+
+            if (namespaces_num > 1) {
+                nvme_add_blocker_feature(blocker_features,
+                                         "Namespace Attachment");
+            }
+
+            break;
+        case NVME_ADM_CMD_VIRT_MNGMT:
+            if (n->params.sriov_max_vfs) {
+                nvme_add_blocker_feature(blocker_features, "SR-IOV");
+            }
+
+            break;
+        case NVME_ADM_CMD_SECURITY_SEND:
+        case NVME_ADM_CMD_SECURITY_RECV:
+            if (adm_cmd_security_checked) {
+                break;
+            }
+
+            if (pci_dev->spdm_port) {
+                nvme_add_blocker_feature(blocker_features, "SPDM");
+            }
+
+            adm_cmd_security_checked = true;
+
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.nvm); opcode++) {
+        if (!n->cse.iocs.nvm[opcode]) {
+            continue;
+        }
+
+        switch (opcode) {
+        case NVME_CMD_FLUSH:
+        case NVME_CMD_WRITE:
+        case NVME_CMD_READ:
+        case NVME_CMD_COMPARE:
+        case NVME_CMD_WRITE_ZEROES:
+        case NVME_CMD_DSM:
+        case NVME_CMD_VERIFY:
+        case NVME_CMD_COPY:
+            break;
+        case NVME_CMD_IO_MGMT_RECV:
+        case NVME_CMD_IO_MGMT_SEND:
+            if (cmd_io_mgmt_checked) {
+                break;
+            }
+
+            /* check for NVME_IOMS_MO_RUH_UPDATE */
+            if (n->subsys->params.fdp.enabled) {
+                nvme_add_blocker_feature(blocker_features, "FDP");
+            }
+
+            cmd_io_mgmt_checked = true;
+
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.zoned); opcode++) {
+        /*
+         * If command isn't supported or we have the same command
+         * in n->cse.iocs.nvm, then we can skip it here.
+         */
+        if (!n->cse.iocs.zoned[opcode] || n->cse.iocs.nvm[opcode]) {
+            continue;
+        }
+
+        switch (opcode) {
+        case NVME_CMD_ZONE_APPEND:
+        case NVME_CMD_ZONE_MGMT_SEND:
+        case NVME_CMD_ZONE_MGMT_RECV:
+            if (cmd_zone_checked) {
+                break;
+            }
+
+            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
+                if (!ns) {
+                    continue;
+                }
+
+                if (ns->params.zoned) {
+                    nvme_add_blocker_feature(blocker_features,
+                                             "Zoned Namespace");
+                    break;
+                }
+            }
+
+            cmd_zone_checked = true;
+
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    /*
+     * Try our best to explicitly detect all not supported caps,
+     * to let users know what features cause migration to be blocked,
+     * but in case we miss handling here, everything else will be
+     * covered by unsupported_cap check.
+     */
+    if (NVME_CAP_CMBS(cap)) {
+        nvme_add_blocker_feature(blocker_features, "CMB");
+        cap &= ~((uint64_t)CAP_CMBS_MASK << CAP_CMBS_SHIFT);
+    }
+
+    if (NVME_CAP_PMRS(cap)) {
+        nvme_add_blocker_feature(blocker_features, "PMR");
+        cap &= ~((uint64_t)CAP_PMRS_MASK << CAP_PMRS_SHIFT);
+    }
+
+    unsupported_cap = cap & ~NVME_MIGRATION_SUPPORTED_CAP_BITS;
+    if (unsupported_cap) {
+        nvme_add_blocker_feature(blocker_features, "unknown capability");
+    }
+
+    assert(n->migration_blocker == NULL);
+    if (strlen(blocker_features) > 0) {
+        error_setg(&n->migration_blocker,
+                   "Migration is not supported for %s", blocker_features);
+        if (migrate_add_blocker(&n->migration_blocker, errp) < 0) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
 static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
 {
     int cntlid;
@@ -9338,6 +9543,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 
         n->subsys->namespaces[ns->params.nsid] = ns;
     }
+
+    if (!nvme_set_migration_blockers(n, pci_dev, errp)) {
+        return;
+    }
 }
 
 static void nvme_exit(PCIDevice *pci_dev)
@@ -9390,6 +9599,8 @@ static void nvme_exit(PCIDevice *pci_dev)
     }
 
     memory_region_del_subregion(&n->bar0, &n->iomem);
+
+    migrate_del_blocker(&n->migration_blocker);
 }
 
 static const Property nvme_props[] = {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 5ef3ebee29e..05aee24a15c 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -668,6 +668,9 @@ typedef struct NvmeCtrl {
 
     /* Socket mapping to SPDM over NVMe Security In/Out commands */
     int spdm_socket;
+
+    /* Migration-related stuff */
+    Error *migration_blocker;
 } NvmeCtrl;
 
 typedef enum NvmeResetType {
diff --git a/include/block/nvme.h b/include/block/nvme.h
index e4e7be51205..17a7c7818d7 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -141,6 +141,18 @@ enum NvmeCapMask {
 #define NVME_CAP_SET_CMBS(cap, val)   \
     ((cap) |= (uint64_t)((val) & CAP_CMBS_MASK)   << CAP_CMBS_SHIFT)
 
+#define NVME_MIGRATION_SUPPORTED_CAP_BITS ( \
+      ((uint64_t)CAP_MQES_MASK   << CAP_MQES_SHIFT)   \
+    | ((uint64_t)CAP_CQR_MASK    << CAP_CQR_SHIFT)    \
+    | ((uint64_t)CAP_AMS_MASK    << CAP_AMS_SHIFT)    \
+    | ((uint64_t)CAP_TO_MASK     << CAP_TO_SHIFT)     \
+    | ((uint64_t)CAP_DSTRD_MASK  << CAP_DSTRD_SHIFT)  \
+    | ((uint64_t)CAP_NSSRS_MASK  << CAP_NSSRS_SHIFT)  \
+    | ((uint64_t)CAP_CSS_MASK    << CAP_CSS_SHIFT)    \
+    | ((uint64_t)CAP_MPSMIN_MASK << CAP_MPSMIN_SHIFT) \
+    | ((uint64_t)CAP_MPSMAX_MASK << CAP_MPSMAX_SHIFT) \
+)
+
 enum NvmeCapCss {
     NVME_CAP_CSS_NCSS    = 1 << 0,
     NVME_CAP_CSS_IOCSS   = 1 << 6,
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 3/8] hw/nvme: split nvme_init_sq/nvme_init_cq into helpers
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 1/8] tests/functional/migration: add VM launch/configure hooks Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 4/8] hw/nvme: set CQE.sq_id earlier in nvme_process_sq Alexander Mikhalitsyn
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn, Klaus Jensen

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

We will make a benefit from this split in later patches.

Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
 hw/nvme/ctrl.c | 59 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 37 insertions(+), 22 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 7510a9e0296..26bb4b52d4d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4856,18 +4856,14 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req)
     return NVME_SUCCESS;
 }
 
-static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
-                         uint16_t sqid, uint16_t cqid, uint16_t size)
+static void __nvme_init_sq(NvmeSQueue *sq)
 {
+    NvmeCtrl *n = sq->ctrl;
+    uint16_t sqid = sq->sqid;
+    uint16_t cqid = sq->cqid;
     int i;
     NvmeCQueue *cq;
 
-    sq->ctrl = n;
-    sq->dma_addr = dma_addr;
-    sq->sqid = sqid;
-    sq->size = size;
-    sq->cqid = cqid;
-    sq->head = sq->tail = 0;
     sq->io_req = g_new0(NvmeRequest, sq->size);
 
     QTAILQ_INIT(&sq->req_list);
@@ -4897,6 +4893,18 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
     n->sq[sqid] = sq;
 }
 
+static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
+                         uint16_t sqid, uint16_t cqid, uint16_t size)
+{
+    sq->ctrl = n;
+    sq->dma_addr = dma_addr;
+    sq->sqid = sqid;
+    sq->size = size;
+    sq->cqid = cqid;
+    sq->head = sq->tail = 0;
+    __nvme_init_sq(sq);
+}
+
 static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeSQueue *sq;
@@ -5557,25 +5565,16 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req)
     return NVME_SUCCESS;
 }
 
-static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
-                         uint16_t cqid, uint16_t vector, uint16_t size,
-                         uint16_t irq_enabled)
+static void __nvme_init_cq(NvmeCQueue *cq)
 {
+    NvmeCtrl *n = cq->ctrl;
     PCIDevice *pci = PCI_DEVICE(n);
+    uint16_t cqid = cq->cqid;
 
-    if (msix_present(pci) && irq_enabled) {
-        msix_vector_use(pci, vector);
+    if (msix_present(pci) && cq->irq_enabled) {
+        msix_vector_use(pci, cq->vector);
     }
 
-    cq->ctrl = n;
-    cq->cqid = cqid;
-    cq->size = size;
-    cq->dma_addr = dma_addr;
-    cq->phase = 1;
-    cq->irq_enabled = irq_enabled;
-    cq->vector = vector;
-    cq->head = cq->tail = 0;
-    QTAILQ_INIT(&cq->req_list);
     QTAILQ_INIT(&cq->sq_list);
     if (n->dbbuf_enabled) {
         cq->db_addr = n->dbbuf_dbs + (cqid << 3) + (1 << 2);
@@ -5592,6 +5591,22 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
                                  &DEVICE(cq->ctrl)->mem_reentrancy_guard);
 }
 
+static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
+                         uint16_t cqid, uint16_t vector, uint16_t size,
+                         uint16_t irq_enabled)
+{
+    cq->ctrl = n;
+    cq->cqid = cqid;
+    cq->size = size;
+    cq->dma_addr = dma_addr;
+    cq->phase = 1;
+    cq->irq_enabled = irq_enabled;
+    cq->vector = vector;
+    cq->head = cq->tail = 0;
+    QTAILQ_INIT(&cq->req_list);
+    __nvme_init_cq(cq);
+}
+
 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeCQueue *cq;
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 4/8] hw/nvme: set CQE.sq_id earlier in nvme_process_sq
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
                   ` (2 preceding siblings ...)
  2026-06-11 11:22 ` [PATCH v9 3/8] hw/nvme: split nvme_init_sq/nvme_init_cq into helpers Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 5/8] hw/nvme: unmap req->sg earlier in nvme_enqueue_req_completion Alexander Mikhalitsyn
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn, Klaus Jensen

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

Instead of filling req->cqe.sq_id in nvme_post_cqes, let's set it earlier
in nvme_process_sq.

This shouldn't cause any issues, because req->cqe.sq_id never changes
during lifetime of req.

This will help us for migration support.

Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 26bb4b52d4d..5569e6872d6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1522,7 +1522,6 @@ static void nvme_post_cqes(void *opaque)
 
         sq = req->sq;
         req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
-        req->cqe.sq_id = cpu_to_le16(sq->sqid);
         req->cqe.sq_head = cpu_to_le16(sq->head);
         addr = cq->dma_addr + (cq->tail << NVME_CQES);
         ret = pci_dma_write(PCI_DEVICE(n), addr, (void *)&req->cqe,
@@ -7852,6 +7851,7 @@ static void nvme_process_sq(void *opaque)
         QTAILQ_REMOVE(&sq->req_list, req, entry);
         QTAILQ_INSERT_TAIL(&sq->out_req_list, req, entry);
         nvme_req_clear(req);
+        req->cqe.sq_id = cpu_to_le16(sq->sqid);
         req->cqe.cid = cmd.cid;
         memcpy(&req->cmd, &cmd, sizeof(NvmeCmd));
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 5/8] hw/nvme: unmap req->sg earlier in nvme_enqueue_req_completion
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
                   ` (3 preceding siblings ...)
  2026-06-11 11:22 ` [PATCH v9 4/8] hw/nvme: set CQE.sq_id earlier in nvme_process_sq Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 6/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn, Klaus Jensen

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

Instead of unmapping req->sg in nvme_post_cqes(), we can do it earlier in
nvme_enqueue_req_completion(). When req completion is enqueued we don't
need to access req->sg anymore. We only care about req->sq, req->cqe and
req->status.

Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
 hw/nvme/ctrl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 5569e6872d6..c65b43a04ca 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1536,7 +1536,6 @@ static void nvme_post_cqes(void *opaque)
         QTAILQ_REMOVE(&cq->req_list, req, entry);
 
         nvme_inc_cq_tail(cq);
-        nvme_sg_unmap(&req->sg);
 
         if (QTAILQ_EMPTY(&sq->req_list) && !nvme_sq_empty(sq)) {
             qemu_bh_schedule(sq->bh);
@@ -1566,6 +1565,8 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
                                       req->status, req->cmd.opcode);
     }
 
+    nvme_sg_unmap(&req->sg);
+
     QTAILQ_REMOVE(&req->sq->out_req_list, req, entry);
     QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 6/8] hw/nvme: add basic live migration support
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
                   ` (4 preceding siblings ...)
  2026-06-11 11:22 ` [PATCH v9 5/8] hw/nvme: unmap req->sg earlier in nvme_enqueue_req_completion Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 7/8] tests/functional/x86_64: add migration test for NVMe device Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 8/8] tests/qtest/nvme-test: add migration test with full CQ Alexander Mikhalitsyn
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

It has some limitations:
- only one NVMe namespace is supported
- SMART counters are not preserved
- CMB is not supported
- PMR is not supported
- SPDM is not supported
- SR-IOV is not supported

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
v2:
- AERs are now fully supported
v6:
- handle full CQ case
v7:
- renamed copy_cq_req_list to move_cq_req_list
- validate incoming migration stream better
v9:
- trivial check-patch fixes
---
 hw/nvme/ctrl.c       | 760 ++++++++++++++++++++++++++++++++++++++++++-
 hw/nvme/ns.c         | 164 ++++++++++
 hw/nvme/nvme.h       |   9 +
 hw/nvme/trace-events |  10 +
 4 files changed, 934 insertions(+), 9 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index c65b43a04ca..ced394276d3 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -210,6 +210,7 @@
 #include "hw/pci/pcie_sriov.h"
 #include "system/spdm-socket.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file-types.h"
 #include "migration/vmstate.h"
 
 #include "nvme.h"
@@ -1520,6 +1521,18 @@ static void nvme_post_cqes(void *opaque)
             break;
         }
 
+        /*
+         * Here we take the following fields from NvmeRequest structure
+         * and write cqe to the guest RAM based on them:
+         * - req->sq
+         * - req->status
+         * - req->cqe
+         *
+         * If you change this code and more fields from NvmeRequest are
+         * used, please make sure that you have handled this in:
+         * nvme_vmstate_request and nvme_ctrl_pre_save().
+         */
+
         sq = req->sq;
         req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
         req->cqe.sq_head = cpu_to_le16(sq->head);
@@ -4905,6 +4918,25 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
     __nvme_init_sq(sq);
 }
 
+static void nvme_restore_sq(NvmeSQueue *sq_from)
+{
+    NvmeCtrl *n = sq_from->ctrl;
+    NvmeSQueue *sq = sq_from;
+
+    if (sq_from->sqid == 0) {
+        sq = &n->admin_sq;
+        sq->ctrl = n;
+        sq->dma_addr = sq_from->dma_addr;
+        sq->sqid = sq_from->sqid;
+        sq->size = sq_from->size;
+        sq->cqid = sq_from->cqid;
+        sq->head = sq_from->head;
+        sq->tail = sq_from->tail;
+    }
+
+    __nvme_init_sq(sq);
+}
+
 static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeSQueue *sq;
@@ -5607,6 +5639,39 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
     __nvme_init_cq(cq);
 }
 
+static void move_cq_req_list(NvmeCQueue *cq_to, NvmeCQueue *cq_from)
+{
+    NvmeRequest *req, *next;
+
+    QTAILQ_FOREACH_SAFE(req, &cq_from->req_list, entry, next) {
+        QTAILQ_REMOVE(&cq_from->req_list, req, entry);
+        QTAILQ_INSERT_TAIL(&cq_to->req_list, req, entry);
+    }
+}
+
+static void nvme_restore_cq(NvmeCQueue *cq_from)
+{
+    NvmeCtrl *n = cq_from->ctrl;
+    NvmeCQueue *cq = cq_from;
+
+    if (cq_from->cqid == 0) {
+        cq = &n->admin_cq;
+        cq->ctrl = n;
+        cq->cqid = cq_from->cqid;
+        cq->size = cq_from->size;
+        cq->dma_addr = cq_from->dma_addr;
+        cq->phase = cq_from->phase;
+        cq->irq_enabled = cq_from->irq_enabled;
+        cq->vector = cq_from->vector;
+        cq->head = cq_from->head;
+        cq->tail = cq_from->tail;
+        QTAILQ_INIT(&cq->req_list);
+        move_cq_req_list(cq, cq_from);
+    }
+
+    __nvme_init_cq(cq);
+}
+
 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeCQueue *cq;
@@ -7297,7 +7362,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req)
     n->dbbuf_eis = eis_addr;
     n->dbbuf_enabled = true;
 
-    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
+    for (i = 0; i < n->num_queues; i++) {
         NvmeSQueue *sq = n->sq[i];
         NvmeCQueue *cq = n->cq[i];
 
@@ -7741,7 +7806,7 @@ static int nvme_atomic_write_check(NvmeCtrl *n, NvmeCmd *cmd,
     /*
      * Walk the queues to see if there are any atomic conflicts.
      */
-    for (i = 1; i < n->params.max_ioqpairs + 1; i++) {
+    for (i = 1; i < n->num_queues; i++) {
         NvmeSQueue *sq;
         NvmeRequest *req;
         NvmeRwCmd *req_rw;
@@ -7811,6 +7876,12 @@ static void nvme_process_sq(void *opaque)
     NvmeCmd cmd;
     NvmeRequest *req;
 
+    /*
+     * We don't want to have a race with nvme_ctrl_pre_save().
+     * What implicitly protects us from this is BQL.
+     */
+    assert(bql_locked());
+
     if (n->dbbuf_enabled) {
         nvme_update_sq_tail(sq);
     }
@@ -7928,12 +7999,12 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
         nvme_ns_drain(ns);
     }
 
-    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
+    for (i = 0; i < n->num_queues; i++) {
         if (n->sq[i] != NULL) {
             nvme_free_sq(n->sq[i], n);
         }
     }
-    for (i = 0; i < n->params.max_ioqpairs + 1; i++) {
+    for (i = 0; i < n->num_queues; i++) {
         if (n->cq[i] != NULL) {
             nvme_free_cq(n->cq[i], n);
         }
@@ -8603,6 +8674,8 @@ static bool nvme_check_params(NvmeCtrl *n, Error **errp)
         params->max_ioqpairs = params->num_queues - 1;
     }
 
+    n->num_queues = params->max_ioqpairs + 1;
+
     if (n->namespace.blkconf.blk && n->subsys) {
         error_setg(errp, "subsystem support is unavailable with legacy "
                    "namespace ('drive' property)");
@@ -8776,8 +8849,8 @@ static void nvme_init_state(NvmeCtrl *n)
         n->conf_msix_qsize = n->params.msix_qsize;
     }
 
-    n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
-    n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
+    n->sq = g_new0(NvmeSQueue *, n->num_queues);
+    n->cq = g_new0(NvmeCQueue *, n->num_queues);
     n->temperature = NVME_TEMPERATURE;
     n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
     n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
@@ -9012,7 +9085,7 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
     }
 
     if (n->params.msix_exclusive_bar && !pci_is_vf(pci_dev)) {
-        bar_size = nvme_mbar_size(n->params.max_ioqpairs + 1, 0, NULL, NULL);
+        bar_size = nvme_mbar_size(n->num_queues, 0, NULL, NULL);
         memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
                               bar_size);
         pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
@@ -9024,7 +9097,7 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
         /* add one to max_ioqpairs to account for the admin queue pair */
         if (!pci_is_vf(pci_dev)) {
             nr_vectors = n->params.msix_qsize;
-            bar_size = nvme_mbar_size(n->params.max_ioqpairs + 1,
+            bar_size = nvme_mbar_size(n->num_queues,
                                       nr_vectors, &msix_table_offset,
                                       &msix_pba_offset);
         } else {
@@ -9756,9 +9829,678 @@ static uint32_t nvme_pci_read_config(PCIDevice *dev, uint32_t address, int len)
     return pci_default_read_config(dev, address, len);
 }
 
+static const VMStateDescription nvme_vmstate_cqe = {
+    .name = "nvme-cqe",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT32(result, NvmeCqe),
+        VMSTATE_UINT32(dw1, NvmeCqe),
+        VMSTATE_UINT16(sq_head, NvmeCqe),
+        VMSTATE_UINT16(sq_id, NvmeCqe),
+        VMSTATE_UINT16(cid, NvmeCqe),
+        VMSTATE_UINT16(status, NvmeCqe),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_cmd_dptr_sgl = {
+    .name = "nvme-request-cmd-dptr-sgl",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(addr, NvmeSglDescriptor),
+        VMSTATE_UINT32(len, NvmeSglDescriptor),
+        VMSTATE_UINT8_ARRAY(rsvd, NvmeSglDescriptor, 3),
+        VMSTATE_UINT8(type, NvmeSglDescriptor),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_cmd_dptr = {
+    .name = "nvme-request-cmd-dptr",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(prp1, NvmeCmdDptr),
+        VMSTATE_UINT64(prp2, NvmeCmdDptr),
+        VMSTATE_STRUCT(sgl, NvmeCmdDptr, 0,
+                       nvme_vmstate_cmd_dptr_sgl, NvmeSglDescriptor),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_cmd = {
+    .name = "nvme-request-cmd",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT8(opcode, NvmeCmd),
+        VMSTATE_UINT8(flags, NvmeCmd),
+        VMSTATE_UINT16(cid, NvmeCmd),
+        VMSTATE_UINT32(nsid, NvmeCmd),
+        VMSTATE_UINT64(res1, NvmeCmd),
+        VMSTATE_UINT64(mptr, NvmeCmd),
+        VMSTATE_STRUCT(dptr, NvmeCmd, 0, nvme_vmstate_cmd_dptr, NvmeCmdDptr),
+        VMSTATE_UINT32(cdw10, NvmeCmd),
+        VMSTATE_UINT32(cdw11, NvmeCmd),
+        VMSTATE_UINT32(cdw12, NvmeCmd),
+        VMSTATE_UINT32(cdw13, NvmeCmd),
+        VMSTATE_UINT32(cdw14, NvmeCmd),
+        VMSTATE_UINT32(cdw15, NvmeCmd),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool nvme_req_pre_load(void *opaque, Error **errp)
+{
+    memset(opaque, 0x0, sizeof(NvmeRequest));
+    return true;
+}
+
+static const VMStateDescription nvme_vmstate_request = {
+    .name = "nvme-request",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .pre_load_errp = nvme_req_pre_load,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT16(status, NvmeRequest),
+        VMSTATE_STRUCT(cqe, NvmeRequest, 0, nvme_vmstate_cqe, NvmeCqe),
+        VMSTATE_STRUCT(cmd, NvmeRequest, 0, nvme_vmstate_cmd, NvmeCmd),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_bar = {
+    .name = "nvme-bar",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(cap, NvmeBar),
+        VMSTATE_UINT32(vs, NvmeBar),
+        VMSTATE_UINT32(intms, NvmeBar),
+        VMSTATE_UINT32(intmc, NvmeBar),
+        VMSTATE_UINT32(cc, NvmeBar),
+        VMSTATE_UINT8_ARRAY(rsvd24, NvmeBar, 4),
+        VMSTATE_UINT32(csts, NvmeBar),
+        VMSTATE_UINT32(nssr, NvmeBar),
+        VMSTATE_UINT32(aqa, NvmeBar),
+        VMSTATE_UINT64(asq, NvmeBar),
+        VMSTATE_UINT64(acq, NvmeBar),
+        VMSTATE_UINT32(cmbloc, NvmeBar),
+        VMSTATE_UINT32(cmbsz, NvmeBar),
+        VMSTATE_UINT32(bpinfo, NvmeBar),
+        VMSTATE_UINT32(bprsel, NvmeBar),
+        VMSTATE_UINT64(bpmbl, NvmeBar),
+        VMSTATE_UINT64(cmbmsc, NvmeBar),
+        VMSTATE_UINT32(cmbsts, NvmeBar),
+        VMSTATE_UINT8_ARRAY(rsvd92, NvmeBar, 3492),
+        VMSTATE_UINT32(pmrcap, NvmeBar),
+        VMSTATE_UINT32(pmrctl, NvmeBar),
+        VMSTATE_UINT32(pmrsts, NvmeBar),
+        VMSTATE_UINT32(pmrebs, NvmeBar),
+        VMSTATE_UINT32(pmrswtp, NvmeBar),
+        VMSTATE_UINT32(pmrmscl, NvmeBar),
+        VMSTATE_UINT32(pmrmscu, NvmeBar),
+        VMSTATE_UINT8_ARRAY(css, NvmeBar, 484),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool nvme_cqueue_pre_load(void *opaque, Error **errp)
+{
+    NvmeCQueue *cq = opaque;
+
+    QTAILQ_INIT(&cq->req_list);
+    return true;
+}
+
+static const VMStateDescription nvme_vmstate_cqueue = {
+    .name = "nvme-cq",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .pre_load_errp = nvme_cqueue_pre_load,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT8(phase, NvmeCQueue),
+        VMSTATE_UINT16(cqid, NvmeCQueue),
+        VMSTATE_UINT16(irq_enabled, NvmeCQueue),
+        VMSTATE_UINT32(head, NvmeCQueue),
+        VMSTATE_UINT32(tail, NvmeCQueue),
+        VMSTATE_UINT32(vector, NvmeCQueue),
+        VMSTATE_UINT32(size, NvmeCQueue),
+        VMSTATE_UINT64(dma_addr, NvmeCQueue),
+
+        VMSTATE_QTAILQ_V(req_list, NvmeCQueue, 1, nvme_vmstate_request,
+                         NvmeRequest, entry),
+
+        /* db_addr, ei_addr, etc will be recalculated */
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_squeue = {
+    .name = "nvme-sq",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT16(sqid, NvmeSQueue),
+        VMSTATE_UINT16(cqid, NvmeSQueue),
+        VMSTATE_UINT32(head, NvmeSQueue),
+        VMSTATE_UINT32(tail, NvmeSQueue),
+        VMSTATE_UINT32(size, NvmeSQueue),
+        VMSTATE_UINT64(dma_addr, NvmeSQueue),
+        /* db_addr, ei_addr, etc will be recalculated */
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_async_event_result = {
+    .name = "nvme-async-event-result",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT8(event_type, NvmeAerResult),
+        VMSTATE_UINT8(event_info, NvmeAerResult),
+        VMSTATE_UINT8(log_page, NvmeAerResult),
+        VMSTATE_UINT8(resv, NvmeAerResult),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_async_event = {
+    .name = "nvme-async-event",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_STRUCT(result, NvmeAsyncEvent, 0,
+                       nvme_vmstate_async_event_result, NvmeAerResult),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_hbs = {
+    .name = "nvme-hbs",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT8(acre, NvmeHostBehaviorSupport),
+        VMSTATE_UINT8(etdas, NvmeHostBehaviorSupport),
+        VMSTATE_UINT8(lbafee, NvmeHostBehaviorSupport),
+        VMSTATE_UINT8(rsvd3, NvmeHostBehaviorSupport),
+        VMSTATE_UINT16(cdfe, NvmeHostBehaviorSupport),
+        VMSTATE_UINT8_ARRAY(rsvd6, NvmeHostBehaviorSupport, 506),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+const VMStateDescription nvme_vmstate_atomic = {
+    .name = "nvme-atomic",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT32(atomic_max_write_size, NvmeAtomic),
+        VMSTATE_UINT64(atomic_boundary, NvmeAtomic),
+        VMSTATE_UINT64(atomic_nabo, NvmeAtomic),
+        VMSTATE_BOOL(atomic_writes, NvmeAtomic),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool pre_save_validate_aer_req(NvmeRequest *req, Error **errp)
+{
+    /*
+     * Can't use assert() here, because we don't want
+     * to just crash QEMU when user requests a migration.
+     */
+    if (!(req->cmd.opcode == NVME_ADM_CMD_ASYNC_EV_REQ)) {
+        error_setg(errp, "req->cmd.opcode (%u) != NVME_ADM_CMD_ASYNC_EV_REQ",
+                   req->cmd.opcode);
+        return false;
+    }
+
+    if (!(req->ns == NULL)) {
+        error_setg(errp, "req->ns != NULL");
+        return false;
+    }
+
+    if (!(req->sq == &req->sq->ctrl->admin_sq)) {
+        error_setg(errp, "req->sq != &req->sq->ctrl->admin_sq");
+        return false;
+    }
+
+    if (!(req->aiocb == NULL)) {
+        error_setg(errp, "req->aiocb != NULL");
+        return false;
+    }
+
+    if (!(req->opaque == NULL)) {
+        error_setg(errp, "req->opaque != NULL");
+        return false;
+    }
+
+    if (!(req->atomic_write == false)) {
+        error_setg(errp, "req->atomic_write != false");
+        return false;
+    }
+
+    if (req->sg.flags & NVME_SG_ALLOC) {
+        error_setg(errp, "unexpected NVME_SG_ALLOC flag in req->sg.flags");
+        return false;
+    }
+
+    return true;
+}
+
+static bool pre_save_validate_cq_req(NvmeRequest *req, Error **errp)
+{
+    if (!(req->ns == NULL)) {
+        error_setg(errp, "req->ns != NULL");
+        return false;
+    }
+
+    if (!(req->aiocb == NULL)) {
+        error_setg(errp, "req->aiocb != NULL");
+        return false;
+    }
+
+    if (!(req->opaque == NULL)) {
+        error_setg(errp, "req->opaque != NULL");
+        return false;
+    }
+
+    if (!(req->atomic_write == false)) {
+        error_setg(errp, "req->atomic_write != false");
+        return false;
+    }
+
+    if (req->sg.flags & NVME_SG_ALLOC) {
+        error_setg(errp, "unexpected NVME_SG_ALLOC flag in req->sg.flags");
+        return false;
+    }
+
+    return true;
+}
+
+static bool nvme_ctrl_pre_save(void *opaque, Error **errp)
+{
+    NvmeCtrl *n = opaque;
+    int i;
+
+    trace_pci_nvme_pre_save_enter(n);
+
+    /*
+     * We don't want to have a race with nvme_process_sq().
+     * What implicitly protects us from this is BQL.
+     */
+    assert(bql_locked());
+
+    /* cancel all SQ processing BHs */
+    for (i = 0; i < n->num_queues; i++) {
+        NvmeSQueue *sq = n->sq[i];
+
+        if (!sq) {
+            continue;
+        }
+
+        qemu_bh_cancel(sq->bh);
+    }
+
+    /* drain all IO */
+    for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+        NvmeNamespace *ns;
+
+        ns = nvme_ns(n, i);
+        if (!ns) {
+            continue;
+        }
+
+        trace_pci_nvme_pre_save_ns_drain(n, i);
+        nvme_ns_drain(ns);
+    }
+
+    /*
+     * Now, we should take care of AERs.
+     *
+     * 1. Save all queued events (n->aer_queue).
+     *    This is done automatically, see nvme_vmstate VMStateDescription.
+     *    Here we only need to print them for debugging purpose.
+     * 2. Go over outstanding AER requests (n->aer_reqs) and check they are
+     *    all have expected opcode (NVME_ADM_CMD_ASYNC_EV_REQ) and other fields.
+     *
+     * We must be really careful here, because in case of further
+     * QEMU NVMe changes, we may break migration without noticing it, or worse,
+     * introduce silent data corruption during migration.
+     */
+    if (n->aer_queued) {
+        NvmeAsyncEvent *event;
+
+        QTAILQ_FOREACH(event, &n->aer_queue, entry) {
+            trace_pci_nvme_pre_save_aer(event->result.event_type,
+                                        event->result.event_info,
+                                        event->result.log_page);
+        }
+    }
+
+    for (i = 0; i < n->outstanding_aers; i++) {
+        NvmeRequest *req = n->aer_reqs[i];
+
+        if (!pre_save_validate_aer_req(req, errp)) {
+            return false;
+        }
+    }
+
+    /*
+     * Make sure that all in-flight IO requests
+     * (except NVME_ADM_CMD_ASYNC_EV_REQ) are processed.
+     */
+    for (i = 0; i < n->num_queues; i++) {
+        NvmeRequest *req;
+        NvmeSQueue *sq = n->sq[i];
+
+        if (!sq) {
+            continue;
+        }
+
+        trace_pci_nvme_pre_save_sq_out_req_check(n, i,
+                                                 sq->head, sq->tail, sq->size);
+
+        QTAILQ_FOREACH(req, &sq->out_req_list, entry) {
+            assert(req->cmd.opcode == NVME_ADM_CMD_ASYNC_EV_REQ);
+        }
+    }
+
+    /* wait when all IO requests completions are written to guest memory */
+    for (i = 0; i < n->num_queues; i++) {
+        NvmeCQueue *cq = n->cq[i];
+
+        if (!cq) {
+            continue;
+        }
+
+        qemu_bh_cancel(cq->bh);
+        /* this should empty cq->req_list unless CQ is full */
+        nvme_post_cqes(cq);
+
+        trace_pci_nvme_pre_save_cq_req_check(n, i,
+                                             cq->head, cq->tail, cq->size);
+
+        if (!QTAILQ_EMPTY(&cq->req_list)) {
+            NvmeRequest *req;
+
+            assert(nvme_cq_full(cq));
+
+            QTAILQ_FOREACH(req, &cq->req_list, entry) {
+                trace_pci_nvme_pre_save_cq_unposted_cqe(
+                                            n, i, nvme_cid(req),
+                                            nvme_nsid(req->ns),
+                                            le32_to_cpu(req->cqe.result),
+                                            le32_to_cpu(req->cqe.dw1),
+                                            req->status, req->cmd.opcode);
+                if (!pre_save_validate_cq_req(req, errp)) {
+                    return false;
+                }
+            }
+        }
+    }
+
+    for (uint32_t nsid = 0; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+        NvmeNamespace *ns = n->namespaces[nsid];
+
+        if (!ns) {
+            continue;
+        }
+
+        if (ns != &n->namespace) {
+            error_setg(errp,
+                       "only one NVMe namespace is supported for migration");
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static bool nvme_ctrl_post_load(void *opaque, int version_id, Error **errp)
+{
+    NvmeCtrl *n = opaque;
+    int i;
+
+    trace_pci_nvme_post_load_enter(n);
+
+    /* restore CQs first */
+    for (i = 0; i < n->num_queues; i++) {
+        NvmeCQueue *cq = n->cq[i];
+
+        if (!cq) {
+            continue;
+        }
+
+        if (cq->cqid != i) {
+            error_setg(errp, "inconsistent migration stream (cq->cqid != i)");
+            return false;
+        }
+
+        cq->ctrl = n;
+        nvme_restore_cq(cq);
+        trace_pci_nvme_post_load_restore_cq(n, i, cq->head, cq->tail, cq->size);
+
+        if (i == 0) {
+            /*
+             * Admin CQ lives in n->admin_cq, we don't need
+             * memory allocated for it in get_ptrs_array_entry() anymore.
+             *
+             * nvme_restore_cq() also takes care of:
+             * n->cq[0] = &n->admin_cq;
+             * so n->cq[0] remains valid.
+             */
+            g_free(cq);
+        }
+    }
+
+    for (i = 0; i < n->num_queues; i++) {
+        NvmeSQueue *sq = n->sq[i];
+
+        if (!sq) {
+            continue;
+        }
+
+        if (sq->sqid != i) {
+            error_setg(errp, "inconsistent migration stream (sq->sqid != i)");
+            return false;
+        }
+
+        if (!n->cq[sq->cqid]) {
+            error_setg(errp,
+                       "inconsistent migration stream (n->cq[sq->cqid] is NULL)");
+            return false;
+        }
+
+        sq->ctrl = n;
+        nvme_restore_sq(sq);
+        trace_pci_nvme_post_load_restore_sq(n, i, sq->head, sq->tail, sq->size);
+
+        if (i == 0) {
+            /* same as for CQ */
+            g_free(sq);
+        }
+    }
+
+    /* restore cq->req_list-s */
+    for (i = 0; i < n->num_queues; i++) {
+        NvmeRequest *req_from, *next;
+        typeof_field(NvmeCQueue, req_list) req_list;
+        NvmeCQueue *cq = n->cq[i];
+
+        if (!cq || QTAILQ_EMPTY(&cq->req_list)) {
+            continue;
+        }
+
+        /*
+         * We use nvme_vmstate_request VMStateDescription to save/restore
+         * NvmeRequest structures, but tricky thing here is that
+         * memory for each cq->req_list item is allocated separately
+         * during restore. It doesn't work for us. We need to take
+         * an existing NvmeRequest structure from SQ's req_list pool
+         * and fill it with data from the newly allocated one (req_from).
+         * Then, we can safely release allocated memory for it.
+         */
+
+        /* make a copy of cq->req_list (QTAILQ head) and clean cq->req_list */
+        QTAILQ_INIT(&req_list);
+        QTAILQ_FOREACH_SAFE(req_from, &cq->req_list, entry, next) {
+            QTAILQ_REMOVE(&cq->req_list, req_from, entry);
+            QTAILQ_INSERT_TAIL(&req_list, req_from, entry);
+        }
+        QTAILQ_INIT(&cq->req_list);
+
+        QTAILQ_FOREACH_SAFE(req_from, &req_list, entry, next) {
+            uint16_t sqid = le16_to_cpu(req_from->cqe.sq_id);
+            NvmeRequest *req;
+            NvmeSQueue *sq;
+
+            assert(!nvme_check_sqid(n, sqid));
+            sq = n->sq[sqid];
+
+            req = QTAILQ_FIRST(&sq->req_list);
+            QTAILQ_REMOVE(&sq->req_list, req, entry);
+            QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
+            nvme_req_clear(req);
+
+            /* copy data from the source NvmeRequest */
+            req->status = req_from->status;
+            memcpy(&req->cqe, &req_from->cqe, sizeof(NvmeCqe));
+            memcpy(&req->cmd, &req_from->cmd, sizeof(NvmeCmd));
+
+            QTAILQ_REMOVE(&req_list, req_from, entry);
+            g_free(req_from);
+        }
+
+        qemu_bh_schedule(cq->bh);
+    }
+
+    if (n->aer_queued) {
+        NvmeAsyncEvent *event;
+
+        QTAILQ_FOREACH(event, &n->aer_queue, entry) {
+            trace_pci_nvme_post_load_aer(event->result.event_type,
+                                         event->result.event_info,
+                                         event->result.log_page);
+        }
+    }
+
+    for (i = 0; i < n->outstanding_aers; i++) {
+        NvmeSQueue *sq = &n->admin_sq;
+        NvmeRequest *req_from = n->aer_reqs[i];
+        NvmeRequest *req;
+
+        /* Idea here is the same as for "restore cq->req_list-s" step */
+
+        /* take an NvmeRequest struct from SQ */
+        req = QTAILQ_FIRST(&sq->req_list);
+        QTAILQ_REMOVE(&sq->req_list, req, entry);
+        QTAILQ_INSERT_TAIL(&sq->out_req_list, req, entry);
+        nvme_req_clear(req);
+
+        /* copy data from the source NvmeRequest */
+        req->status = req_from->status;
+        memcpy(&req->cqe, &req_from->cqe, sizeof(NvmeCqe));
+        memcpy(&req->cmd, &req_from->cmd, sizeof(NvmeCmd));
+
+        n->aer_reqs[i] = req;
+        g_free(req_from);
+    }
+
+    /*
+     * We need to attach namespaces (currently, only one namespace is
+     * supported for migration).
+     * This logic comes from nvme_start_ctrl().
+     */
+    for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+        NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
+
+        if (!ns || (!ns->params.shared && ns->ctrl != n)) {
+            continue;
+        }
+
+        if (nvme_csi_supported(n, ns->csi) && !ns->params.detached) {
+            if (!ns->attached || ns->params.shared) {
+                nvme_attach_ns(n, ns);
+            }
+        }
+    }
+
+    /* schedule SQ processing */
+    for (i = 0; i < n->num_queues; i++) {
+        NvmeSQueue *sq = n->sq[i];
+
+        if (!sq) {
+            continue;
+        }
+
+        qemu_bh_schedule(sq->bh);
+    }
+
+    return true;
+}
+
 static const VMStateDescription nvme_vmstate = {
     .name = "nvme",
-    .unmigratable = 1,
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .pre_save_errp = nvme_ctrl_pre_save,
+    .post_load_errp = nvme_ctrl_post_load,
+    .fields = (const VMStateField[]) {
+        VMSTATE_PCI_DEVICE(parent_obj, NvmeCtrl),
+        VMSTATE_MSIX(parent_obj, NvmeCtrl),
+        VMSTATE_STRUCT(bar, NvmeCtrl, 0, nvme_vmstate_bar, NvmeBar),
+
+        VMSTATE_BOOL(qs_created, NvmeCtrl),
+        VMSTATE_UINT32(page_size, NvmeCtrl),
+        VMSTATE_UINT16(page_bits, NvmeCtrl),
+        VMSTATE_UINT16(max_prp_ents, NvmeCtrl),
+        VMSTATE_UINT32(max_q_ents, NvmeCtrl),
+        VMSTATE_UINT8(outstanding_aers, NvmeCtrl),
+        VMSTATE_UINT32(irq_status, NvmeCtrl),
+        VMSTATE_INT32(cq_pending, NvmeCtrl),
+
+        VMSTATE_UINT64(host_timestamp, NvmeCtrl),
+        VMSTATE_UINT64(timestamp_set_qemu_clock_ms, NvmeCtrl),
+        VMSTATE_UINT64(starttime_ms, NvmeCtrl),
+        VMSTATE_UINT16(temperature, NvmeCtrl),
+        VMSTATE_UINT8(smart_critical_warning, NvmeCtrl),
+
+        VMSTATE_UINT32(conf_msix_qsize, NvmeCtrl),
+        VMSTATE_UINT32(conf_ioqpairs, NvmeCtrl),
+        VMSTATE_UINT64(dbbuf_dbs, NvmeCtrl),
+        VMSTATE_UINT64(dbbuf_eis, NvmeCtrl),
+        VMSTATE_BOOL(dbbuf_enabled, NvmeCtrl),
+
+        VMSTATE_UINT8(aer_mask, NvmeCtrl),
+        VMSTATE_VARRAY_OF_POINTER_TO_STRUCT_UINT8_ALLOC(
+            aer_reqs, NvmeCtrl, outstanding_aers, 0,
+            nvme_vmstate_request, NvmeRequest),
+        VMSTATE_QTAILQ_V(aer_queue, NvmeCtrl, 1, nvme_vmstate_async_event,
+                         NvmeAsyncEvent, entry),
+        VMSTATE_INT32(aer_queued, NvmeCtrl),
+
+        VMSTATE_STRUCT(namespace, NvmeCtrl, 0, nvme_vmstate_ns, NvmeNamespace),
+
+        VMSTATE_VARRAY_OF_POINTER_TO_STRUCT_UINT32_ALLOC(
+            sq, NvmeCtrl, num_queues, 0, nvme_vmstate_squeue, NvmeSQueue),
+        VMSTATE_VARRAY_OF_POINTER_TO_STRUCT_UINT32_ALLOC(
+            cq, NvmeCtrl, num_queues, 0, nvme_vmstate_cqueue, NvmeCQueue),
+
+        VMSTATE_UINT16(features.temp_thresh_hi, NvmeCtrl),
+        VMSTATE_UINT16(features.temp_thresh_low, NvmeCtrl),
+        VMSTATE_UINT32(features.async_config, NvmeCtrl),
+        VMSTATE_STRUCT(features.hbs, NvmeCtrl, 0,
+                       nvme_vmstate_hbs, NvmeHostBehaviorSupport),
+
+        VMSTATE_UINT32(dn, NvmeCtrl),
+        VMSTATE_STRUCT(atomic, NvmeCtrl, 0, nvme_vmstate_atomic, NvmeAtomic),
+
+        VMSTATE_END_OF_LIST()
+    },
 };
 
 static void nvme_class_init(ObjectClass *oc, const void *data)
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index b0106eaa5c8..4caab590977 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -20,6 +20,7 @@
 #include "qemu/bitops.h"
 #include "system/system.h"
 #include "system/block-backend.h"
+#include "migration/vmstate.h"
 
 #include "nvme.h"
 #include "trace.h"
@@ -886,6 +887,168 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
     }
 }
 
+static const VMStateDescription nvme_vmstate_lbaf = {
+    .name = "nvme_lbaf",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT16(ms, NvmeLBAF),
+        VMSTATE_UINT8(ds, NvmeLBAF),
+        VMSTATE_UINT8(rp, NvmeLBAF),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_id_ns = {
+    .name = "nvme_id_ns",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(nsze, NvmeIdNs),
+        VMSTATE_UINT64(ncap, NvmeIdNs),
+        VMSTATE_UINT64(nuse, NvmeIdNs),
+        VMSTATE_UINT8(nsfeat, NvmeIdNs),
+        VMSTATE_UINT8(nlbaf, NvmeIdNs),
+        VMSTATE_UINT8(flbas, NvmeIdNs),
+        VMSTATE_UINT8(mc, NvmeIdNs),
+        VMSTATE_UINT8(dpc, NvmeIdNs),
+        VMSTATE_UINT8(dps, NvmeIdNs),
+        VMSTATE_UINT8(nmic, NvmeIdNs),
+        VMSTATE_UINT8(rescap, NvmeIdNs),
+        VMSTATE_UINT8(fpi, NvmeIdNs),
+        VMSTATE_UINT8(dlfeat, NvmeIdNs),
+        VMSTATE_UINT16(nawun, NvmeIdNs),
+        VMSTATE_UINT16(nawupf, NvmeIdNs),
+        VMSTATE_UINT16(nacwu, NvmeIdNs),
+        VMSTATE_UINT16(nabsn, NvmeIdNs),
+        VMSTATE_UINT16(nabo, NvmeIdNs),
+        VMSTATE_UINT16(nabspf, NvmeIdNs),
+        VMSTATE_UINT16(noiob, NvmeIdNs),
+        VMSTATE_UINT8_ARRAY(nvmcap, NvmeIdNs, 16),
+        VMSTATE_UINT16(npwg, NvmeIdNs),
+        VMSTATE_UINT16(npwa, NvmeIdNs),
+        VMSTATE_UINT16(npdg, NvmeIdNs),
+        VMSTATE_UINT16(npda, NvmeIdNs),
+        VMSTATE_UINT16(nows, NvmeIdNs),
+        VMSTATE_UINT16(mssrl, NvmeIdNs),
+        VMSTATE_UINT32(mcl, NvmeIdNs),
+        VMSTATE_UINT8(msrc, NvmeIdNs),
+        VMSTATE_UINT8_ARRAY(rsvd81, NvmeIdNs, 18),
+        VMSTATE_UINT8(nsattr, NvmeIdNs),
+        VMSTATE_UINT16(nvmsetid, NvmeIdNs),
+        VMSTATE_UINT16(endgid, NvmeIdNs),
+        VMSTATE_UINT8_ARRAY(nguid, NvmeIdNs, 16),
+        VMSTATE_UINT64(eui64, NvmeIdNs),
+        VMSTATE_STRUCT_ARRAY(lbaf, NvmeIdNs, NVME_MAX_NLBAF, 1,
+                             nvme_vmstate_lbaf, NvmeLBAF),
+        VMSTATE_UINT8_ARRAY(vs, NvmeIdNs, 3712),
+
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_id_ns_nvm = {
+    .name = "nvme_id_ns_nvm",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(lbstm, NvmeIdNsNvm),
+        VMSTATE_UINT8(pic, NvmeIdNsNvm),
+        VMSTATE_UINT8_ARRAY(rsvd9, NvmeIdNsNvm, 3),
+        VMSTATE_UINT32_ARRAY(elbaf, NvmeIdNsNvm, NVME_MAX_NLBAF),
+        VMSTATE_UINT32(npdgl, NvmeIdNsNvm),
+        VMSTATE_UINT32(nprg, NvmeIdNsNvm),
+        VMSTATE_UINT32(npra, NvmeIdNsNvm),
+        VMSTATE_UINT32(nors, NvmeIdNsNvm),
+        VMSTATE_UINT32(npdal, NvmeIdNsNvm),
+        VMSTATE_UINT8_ARRAY(rsvd288, NvmeIdNsNvm, 3808),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription nvme_vmstate_id_ns_ind = {
+    .name = "nvme_id_ns_ind",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT8(nsfeat, NvmeIdNsInd),
+        VMSTATE_UINT8(nmic, NvmeIdNsInd),
+        VMSTATE_UINT8(rescap, NvmeIdNsInd),
+        VMSTATE_UINT8(fpi, NvmeIdNsInd),
+        VMSTATE_UINT32(anagrpid, NvmeIdNsInd),
+        VMSTATE_UINT8(nsattr, NvmeIdNsInd),
+        VMSTATE_UINT8(rsvd9, NvmeIdNsInd),
+        VMSTATE_UINT16(nvmsetid, NvmeIdNsInd),
+        VMSTATE_UINT16(endgrpid, NvmeIdNsInd),
+        VMSTATE_UINT8(nstat, NvmeIdNsInd),
+        VMSTATE_UINT8_ARRAY(rsvd15, NvmeIdNsInd, 4081),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+typedef struct TmpNvmeNamespace {
+    NvmeNamespace *parent;
+    bool enable_write_cache;
+} TmpNvmeNamespace;
+
+static bool nvme_ns_tmp_pre_save(void *opaque, Error **errp)
+{
+    struct TmpNvmeNamespace *tns = opaque;
+
+    tns->enable_write_cache = blk_enable_write_cache(tns->parent->blkconf.blk);
+
+    return true;
+}
+
+static bool nvme_ns_tmp_post_load(void *opaque, int version_id, Error **errp)
+{
+    struct TmpNvmeNamespace *tns = opaque;
+
+    blk_set_enable_write_cache(tns->parent->blkconf.blk,
+                               tns->enable_write_cache);
+
+    return true;
+}
+
+static const VMStateDescription nvme_vmstate_ns_tmp = {
+    .name = "nvme_ns_tmp",
+    .pre_save_errp = nvme_ns_tmp_pre_save,
+    .post_load_errp = nvme_ns_tmp_post_load,
+    .fields = (const VMStateField[]) {
+        VMSTATE_BOOL(enable_write_cache, TmpNvmeNamespace),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+const VMStateDescription nvme_vmstate_ns = {
+    .name = "nvme_ns",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_WITH_TMP(NvmeNamespace, TmpNvmeNamespace, nvme_vmstate_ns_tmp),
+
+        VMSTATE_STRUCT(id_ns, NvmeNamespace, 0, nvme_vmstate_id_ns, NvmeIdNs),
+        VMSTATE_STRUCT(id_ns_nvm, NvmeNamespace, 0,
+                       nvme_vmstate_id_ns_nvm, NvmeIdNsNvm),
+        VMSTATE_STRUCT(id_ns_ind, NvmeNamespace, 0,
+                       nvme_vmstate_id_ns_ind, NvmeIdNsInd),
+        VMSTATE_STRUCT(lbaf, NvmeNamespace, 0, nvme_vmstate_lbaf, NvmeLBAF),
+        VMSTATE_UINT32(nlbaf, NvmeNamespace),
+        VMSTATE_UINT8(csi, NvmeNamespace),
+        VMSTATE_UINT16(status, NvmeNamespace),
+        VMSTATE_UINT8(pif, NvmeNamespace),
+
+        VMSTATE_UINT16(zns.zrwas, NvmeNamespace),
+        VMSTATE_UINT16(zns.zrwafg, NvmeNamespace),
+        VMSTATE_UINT32(zns.numzrwa, NvmeNamespace),
+
+        VMSTATE_UINT32(features.err_rec, NvmeNamespace),
+        VMSTATE_STRUCT(atomic, NvmeNamespace, 0,
+                       nvme_vmstate_atomic, NvmeAtomic),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const Property nvme_ns_props[] = {
     DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
     DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
@@ -937,6 +1100,7 @@ static void nvme_ns_class_init(ObjectClass *oc, const void *data)
     dc->bus_type = TYPE_NVME_BUS;
     dc->realize = nvme_ns_realize;
     dc->unrealize = nvme_ns_unrealize;
+    dc->vmsd = &nvme_vmstate_ns;
     device_class_set_props(dc, nvme_ns_props);
     dc->desc = "Virtual NVMe namespace";
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 05aee24a15c..78a6eaa1774 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -444,6 +444,11 @@ typedef struct NvmeRequest {
     NvmeSg                  sg;
     bool                    atomic_write;
     QTAILQ_ENTRY(NvmeRequest)entry;
+    /*
+     * If you add a new field here, please make sure to update
+     * nvme_vmstate_request, pre_save_validate_aer_req() and
+     * pre_save_validate_cq_req().
+     */
 } NvmeRequest;
 
 typedef struct NvmeBounceContext {
@@ -640,6 +645,7 @@ typedef struct NvmeCtrl {
 
     NvmeNamespace   namespace;
     NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES + 1];
+    uint32_t        num_queues;
     NvmeSQueue      **sq;
     NvmeCQueue      **cq;
     NvmeSQueue      admin_sq;
@@ -751,4 +757,7 @@ void nvme_atomic_configure_max_write_size(bool dn, uint16_t awun,
 void nvme_ns_atomic_configure_boundary(bool dn, uint16_t nabsn,
                                        uint16_t nabspf, NvmeAtomic *atomic);
 
+extern const VMStateDescription nvme_vmstate_atomic;
+extern const VMStateDescription nvme_vmstate_ns;
+
 #endif /* HW_NVME_NVME_H */
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index 6be0bfa1c1f..f97a6a11f36 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -7,6 +7,16 @@ pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=0x%"PRIx64
 pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
 pci_nvme_map_addr_cmb(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
 pci_nvme_map_prp(uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
+pci_nvme_pre_save_enter(void *n) "n=%p"
+pci_nvme_pre_save_ns_drain(void *n, int i) "n=%p i=%d"
+pci_nvme_pre_save_sq_out_req_check(void *n, int i, uint32_t head, uint32_t tail, uint32_t size) "n=%p i=%d head=0x%"PRIx32" tail=0x%"PRIx32" size=0x%"PRIx32""
+pci_nvme_pre_save_cq_req_check(void *n, int i, uint32_t head, uint32_t tail, uint32_t size) "n=%p i=%d head=0x%"PRIx32" tail=0x%"PRIx32" size=0x%"PRIx32""
+pci_nvme_pre_save_cq_unposted_cqe(void *n, int i, uint16_t cid, uint32_t nsid, uint32_t dw0, uint32_t dw1, uint16_t status, uint8_t opc) "n=%p i=%d cid %"PRIu16" nsid %"PRIu32" dw0 0x%"PRIx32" dw1 0x%"PRIx32" status 0x%"PRIx16" opc 0x%"PRIx8""
+pci_nvme_pre_save_aer(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+pci_nvme_post_load_enter(void *n) "n=%p"
+pci_nvme_post_load_restore_cq(void *n, int i, uint32_t head, uint32_t tail, uint32_t size) "n=%p i=%d head=0x%"PRIx32" tail=0x%"PRIx32" size=0x%"PRIx32""
+pci_nvme_post_load_restore_sq(void *n, int i, uint32_t head, uint32_t tail, uint32_t size) "n=%p i=%d head=0x%"PRIx32" tail=0x%"PRIx32" size=0x%"PRIx32""
+pci_nvme_post_load_aer(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
 pci_nvme_map_sgl(uint8_t typ, uint64_t len) "type 0x%"PRIx8" len %"PRIu64""
 pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" nsid 0x%"PRIx32" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'"
 pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'"
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 7/8] tests/functional/x86_64: add migration test for NVMe device
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
                   ` (5 preceding siblings ...)
  2026-06-11 11:22 ` [PATCH v9 6/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  2026-06-11 11:22 ` [PATCH v9 8/8] tests/qtest/nvme-test: add migration test with full CQ Alexander Mikhalitsyn
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

Introduce a very simple test to ensure that NVMe device
migration works fine.

Test plan is simple:
1. prepare VM with NVMe device
2. run workload that produces relatively heavy IO on the device
3. migrate VM
4. ensure that workload is alive and finishes without errors

Test can be run as simple as:
$ meson test 'func-x86_64-nvme_migration' --setup thorough -C build

In the future we can extend this approach, and introduce some
fio-based tests. And probably, it makes sense to make this test
to apply not only to NVMe device, but also virtio-{blk,scsi},
ide, sata and other migratable devices.

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
v9:
- check-patch fixes
---
 MAINTAINERS                                   |   1 +
 tests/functional/x86_64/meson.build           |   1 +
 .../functional/x86_64/test_nvme_migration.py  | 172 ++++++++++++++++++
 3 files changed, 174 insertions(+)
 create mode 100755 tests/functional/x86_64/test_nvme_migration.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 2b5b581e173..d705f5c8e0a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2622,6 +2622,7 @@ S: Supported
 F: hw/nvme/*
 F: include/block/nvme.h
 F: tests/qtest/nvme-test.c
+F: tests/functional/x86_64/test_nvme_migration.py
 F: docs/system/devices/nvme.rst
 T: git git://git.infradead.org/qemu-nvme.git nvme-next
 
diff --git a/tests/functional/x86_64/meson.build b/tests/functional/x86_64/meson.build
index 1ed10ad6c29..fd77f19d726 100644
--- a/tests/functional/x86_64/meson.build
+++ b/tests/functional/x86_64/meson.build
@@ -37,6 +37,7 @@ tests_x86_64_system_thorough = [
   'linux_initrd',
   'multiprocess',
   'netdev_ethtool',
+  'nvme_migration',
   'replay',
   'reverse_debug',
   'tuxrun',
diff --git a/tests/functional/x86_64/test_nvme_migration.py b/tests/functional/x86_64/test_nvme_migration.py
new file mode 100755
index 00000000000..890f0aab6d6
--- /dev/null
+++ b/tests/functional/x86_64/test_nvme_migration.py
@@ -0,0 +1,172 @@
+#!/usr/bin/env python3
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# x86_64 NVMe migration test
+
+from migration import MigrationTest
+from qemu_test import QemuSystemTest, Asset
+from qemu_test import wait_for_console_pattern
+from qemu_test import exec_command, exec_command_and_wait_for_pattern
+
+
+class X8664NVMeMigrationTest(MigrationTest):
+    ASSET_KERNEL = Asset(
+        ('https://archives.fedoraproject.org/pub/archive/fedora/linux/releases'
+         '/31/Server/x86_64/os/images/pxeboot/vmlinuz'),
+        'd4738d03dbbe083ca610d0821d0a8f1488bebbdccef54ce33e3adb35fda00129')
+
+    ASSET_INITRD = Asset(
+        ('https://archives.fedoraproject.org/pub/archive/fedora/linux/releases'
+         '/31/Server/x86_64/os/images/pxeboot/initrd.img'),
+        '277cd6c7adf77c7e63d73bbb2cded8ef9e2d3a2f100000e92ff1f8396513cd8b')
+
+    ASSET_DISKIMAGE = Asset(
+        ('https://archives.fedoraproject.org/pub/archive/fedora/linux/releases'
+         '/31/Cloud/x86_64/images/Fedora-Cloud-Base-31-1.9.x86_64.qcow2'),
+        'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
+
+    DEFAULT_KERNEL_PARAMS = ('root=/dev/nvme0n1p1 console=ttyS0 net.ifnames=0 '
+                             'rd.rescue quiet')
+
+    def wait_for_console_pattern(self, success_message, vm):
+        wait_for_console_pattern(
+            self,
+            success_message,
+            failure_message="Kernel panic - not syncing",
+            vm=vm,
+        )
+
+    def exec_command_and_check(self, command, vm):
+        prompt = '# '
+        exec_command_and_wait_for_pattern(self,
+                                        f"{command} && echo OK || echo FAIL",
+                                        'FAIL', vm=vm)
+        # Note, that commands we send to the console are echo-ed back,
+        # so if we have a word "FAIL" in the command itself, we should
+        # expect to see it once.
+        wait_for_console_pattern(self, 'OK', failure_message="FAIL", vm=vm)
+        self.wait_for_console_pattern(prompt, vm)
+
+    def configure_machine(self, vm):
+        kernel_path = self.ASSET_KERNEL.fetch()
+        initrd_path = self.ASSET_INITRD.fetch()
+        diskimage_path = self.ASSET_DISKIMAGE.fetch()
+
+        vm.set_console()
+        vm.add_args("-cpu", "max")
+        vm.add_args("-m", "2G")
+        vm.add_args("-accel", "kvm")
+
+        vm.add_args('-drive',
+                         f'file={diskimage_path},if=none,id=drv0,snapshot=on')
+        vm.add_args('-device', 'nvme,bus=pcie.0,' +
+                    'drive=drv0,id=nvme-disk0,serial=nvmemigtest,bootindex=1')
+
+        vm.add_args(
+            "-kernel",
+            kernel_path,
+            "-initrd",
+            initrd_path,
+            "-append",
+            self.DEFAULT_KERNEL_PARAMS
+        )
+
+    def launch_source_vm(self, vm):
+        vm.launch()
+
+        self.wait_for_console_pattern('Entering emergency mode.', vm)
+        prompt = '# '
+        self.wait_for_console_pattern(prompt, vm)
+
+        # Synchronize on NVMe driver creating the root device
+        exec_command_and_wait_for_pattern(self,
+                        "while ! (dmesg -c | grep nvme0n1:) ; do sleep 1 ; done",
+                        "nvme0n1", vm=vm)
+        self.wait_for_console_pattern(prompt, vm)
+
+        # prepare system
+        exec_command_and_wait_for_pattern(self, 'mount /dev/nvme0n1p1 /sysroot',
+                                          prompt, vm=vm)
+        exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
+                                          prompt, vm=vm)
+        exec_command_and_wait_for_pattern(self, 'mount -t proc proc /proc',
+                                          prompt, vm=vm)
+        exec_command_and_wait_for_pattern(self, 'mount -t sysfs sysfs /sys',
+                                          prompt, vm=vm)
+
+        # Run workload before migration to check if it continues
+        # to run properly after migration.
+        #
+        # Workload is simple: it continuously calculates checksums of
+        # all files in /usr/bin to generate some I/O load on
+        # the NVMe disk and at the same time it drops caches to
+        # make sure that we have some read I/O on the disk as well.
+        # If there are any issues with the migration of the NVMe device,
+        # we should see errors in dmesg and consequently in the workload log.
+        exec_command_and_wait_for_pattern(self,
+            "(while [ ! -f /tmp/test_nvme_mig_workload.stop ]; do \
+                rm -f /tmp/test_nvme_mig_workload.iter_finished; \
+                echo 3 > /proc/sys/vm/drop_caches; \
+                find /usr/bin -type f -exec cksum {} \\;; \
+                touch /tmp/test_nvme_mig_workload.iter_finished; \
+            done) > /dev/null 2> /tmp/test_nvme_mig_workload.errors &",
+            prompt, vm=vm)
+        exec_command_and_wait_for_pattern(self,
+            'echo $! > /tmp/test_nvme_mig_workload.pid',
+            prompt, vm=vm)
+
+        # check if process is alive and running
+        self.exec_command_and_check(
+            "kill -0 $(cat /tmp/test_nvme_mig_workload.pid)", vm)
+
+    def assert_dest_vm(self, vm):
+        prompt = '# '
+
+        # check if process is alive and running after migration,
+        # if not - fail the test
+        self.exec_command_and_check(
+            "kill -0 $(cat /tmp/test_nvme_mig_workload.pid)", vm)
+
+        # signal workload to stop
+        exec_command_and_wait_for_pattern(self,
+            'touch /tmp/test_nvme_mig_workload.stop',
+            prompt, vm=vm)
+
+        # wait workload to finish, because we want to examine log
+        # to see if there are any errors
+        exec_command_and_wait_for_pattern(self,
+            "while [ ! -f /tmp/test_nvme_mig_workload.iter_finished ]; do \
+                sleep 1; \
+            done;",
+            prompt, vm=vm)
+
+        exec_command_and_wait_for_pattern(self,
+            'cat /tmp/test_nvme_mig_workload.errors',
+            prompt, vm=vm)
+
+        # fail the test if non-empty
+        self.exec_command_and_check(
+            "[ ! -s /tmp/test_nvme_mig_workload.errors ]", vm)
+
+    def test_migration_with_tcp_localhost(self):
+        self.set_machine('q35')
+        self.require_accelerator("kvm")
+
+        self.migration_with_tcp_localhost()
+
+    def test_migration_with_unix(self):
+        self.set_machine('q35')
+        self.require_accelerator("kvm")
+
+        self.migration_with_unix()
+
+    def test_migration_with_exec(self):
+        self.set_machine('q35')
+        self.require_accelerator("kvm")
+
+        self.migration_with_exec()
+
+
+if __name__ == '__main__':
+    MigrationTest.main()
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v9 8/8] tests/qtest/nvme-test: add migration test with full CQ
  2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
                   ` (6 preceding siblings ...)
  2026-06-11 11:22 ` [PATCH v9 7/8] tests/functional/x86_64: add migration test for NVMe device Alexander Mikhalitsyn
@ 2026-06-11 11:22 ` Alexander Mikhalitsyn
  7 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 11:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Fabiano Rosas, Zhao Liu,
	Alexander Mikhalitsyn, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé, Alexander Mikhalitsyn

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>

As suggested by Stefan [1], let's add a migration test to cover
rare scenario when CQ is full of non-processed CQEs and migration
happens.

To run this test:
$ meson test -C build 'qtest-x86_64/qos-test'

Link: https://lore.kernel.org/qemu-devel/20260408183529.GB319710@fedora/ [1]
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
---
v9:
    - check-patch fixes
    - added qpci_check_buggy_msi() check to skip test on ppc64 / spapr pci bus
v7:
    - fixed endianness bugs (and tested on s390x machine)
    - code style changes (don't use ptr type for physical addresses)
v6:
    - test added
---
 tests/qtest/nvme-test.c | 421 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 421 insertions(+)

diff --git a/tests/qtest/nvme-test.c b/tests/qtest/nvme-test.c
index 4aec1651e6e..7053f387fec 100644
--- a/tests/qtest/nvme-test.c
+++ b/tests/qtest/nvme-test.c
@@ -8,9 +8,12 @@
  */
 
 #include "qemu/osdep.h"
+#include <glib/gstdio.h>
+#include "qemu/bswap.h"
 #include "qemu/module.h"
 #include "qemu/units.h"
 #include "libqtest.h"
+#include "libqtest-single.h"
 #include "libqos/qgraph.h"
 #include "libqos/pci.h"
 #include "block/nvme.h"
@@ -142,6 +145,422 @@ static void nvmetest_pmr_reg_test(void *obj, void *data, QGuestAllocator *alloc)
     qpci_iounmap(pdev, pmr_bar);
 }
 
+#define PAGE_SIZE 4096
+
+typedef struct nvme_ctrl nvme_ctrl;
+
+typedef struct nvme_queue {
+    nvme_ctrl *ctrl;
+    uint64_t doorbell;
+    uint32_t size;
+} nvme_queue;
+
+typedef struct nvme_cq {
+    nvme_queue common;
+    uint64_t phys_cqe; /* NvmeCqe* */
+    uint16_t head;
+    uint8_t phase;
+} nvme_cq;
+
+typedef struct nvme_sq {
+    nvme_queue common;
+    uint64_t phys_sqe; /* NvmeCmd* */
+    nvme_cq *cq;
+    uint16_t head;
+    uint16_t tail;
+} nvme_sq;
+
+struct nvme_ctrl {
+    QGuestAllocator *alloc;
+    QPCIDevice *pdev;
+    QPCIBar bar;
+
+    uint32_t db_stride;
+
+    nvme_sq admin_sq;
+    nvme_cq admin_cq;
+};
+
+#define PHYS_ADDR_OF_FIELD(T, base_phys_addr, field) \
+    ((uint64_t)&((T *)(base_phys_addr))->field)
+
+#define PHYS_ADDR_OF(T, base_phys_addr, accessor) \
+    ((uint64_t)&((T *)(base_phys_addr))accessor)
+
+static void nvme_init_queue_common(nvme_ctrl *ctrl, nvme_queue *q,
+                                   uint16_t db_idx, uint32_t size)
+{
+    q->ctrl = ctrl;
+    q->doorbell = (sizeof(NvmeBar) + db_idx * ctrl->db_stride);
+    g_test_message(" q %p db_idx %u doorbell %lx", q, db_idx, q->doorbell);
+    q->size = size;
+}
+
+static void nvme_init_sq(nvme_ctrl *ctrl, nvme_sq *sq, uint16_t db_idx,
+                         uint32_t size, nvme_cq *cq)
+{
+    nvme_init_queue_common(ctrl, &sq->common, db_idx, size);
+
+    sq->phys_sqe = guest_alloc(ctrl->alloc, PAGE_SIZE);
+    g_assert(sq->phys_sqe);
+
+    g_test_message("sq %p db_idx %u sqe 0x%" PRIx64, sq, db_idx, sq->phys_sqe);
+    sq->cq = cq;
+    sq->head = 0;
+    sq->tail = 0;
+}
+
+static void nvme_init_cq(nvme_ctrl *ctrl, nvme_cq *cq, uint16_t db_idx,
+                         uint32_t size)
+{
+    nvme_init_queue_common(ctrl, &cq->common, db_idx, size);
+
+    cq->phys_cqe = guest_alloc(ctrl->alloc, PAGE_SIZE);
+    g_assert(cq->phys_cqe);
+
+    g_test_message("cq %p db_idx %u cqe 0x%" PRIx64, cq, db_idx, cq->phys_cqe);
+    cq->head = 0;
+    cq->phase = 1;
+}
+
+static int nvme_cqe_pending(nvme_cq *cq)
+{
+    uint16_t status = qtest_readw(
+        cq->common.ctrl->pdev->bus->qts,
+        PHYS_ADDR_OF(NvmeCqe, cq->phys_cqe, [cq->head].status));
+    return (status & 1) == cq->phase;
+}
+
+static int nvme_is_cqe_success(NvmeCqe *cqe)
+{
+    return (le16_to_cpu(cqe->status) >> 1) == NVME_SUCCESS;
+}
+
+static NvmeCqe nvme_handle_cqe(nvme_sq *sq)
+{
+    nvme_cq *cq = sq->cq;
+    uint64_t phys_cqe = PHYS_ADDR_OF(
+                            NvmeCqe, cq->phys_cqe, [cq->head]); /* NvmeCqe* */
+    NvmeCqe cqe;
+    uint16_t cq_next_head;
+
+    g_assert(nvme_cqe_pending(cq));
+
+    qtest_memread(sq->common.ctrl->pdev->bus->qts, phys_cqe, &cqe, sizeof(cqe));
+
+    cq_next_head = (cq->head + 1) % cq->common.size;
+    g_test_message("cq %p head %u -> %u", cq, cq->head, cq_next_head);
+    if (cq_next_head < cq->head) {
+        cq->phase ^= 1;
+    }
+    cq->head = cq_next_head;
+
+    if (cqe.sq_head != sq->head) {
+        sq->head = cqe.sq_head;
+        g_test_message("sq %p head = %u", sq, sq->head);
+    }
+
+    qpci_io_writel(cq->common.ctrl->pdev, cq->common.ctrl->bar,
+                   cq->common.doorbell, cq->head);
+
+    return cqe;
+}
+
+static NvmeCqe nvme_wait(nvme_sq *sq)
+{
+    int i;
+    bool ready = false;
+
+    for (i = 0; i < 10; i++) {
+        if (nvme_cqe_pending(sq->cq)) {
+            ready = true;
+            break;
+        }
+
+        g_usleep(1000);
+    }
+
+    g_assert(ready);
+
+    return nvme_handle_cqe(sq);
+}
+
+static uint64_t nvme_get_next_sqe(nvme_sq *sq, uint8_t opcode,
+                                  uint16_t cid, uint64_t prp1)
+{
+    uint64_t phys_sqe = PHYS_ADDR_OF(NvmeCmd, sq->phys_sqe, [sq->tail]);
+
+    if (((sq->tail + 1) % sq->common.size) == sq->head) {
+        /* no space in SQ */
+        g_test_message("%s head %d tail %d", __func__, sq->head, sq->tail);
+        g_assert_not_reached();
+        return 0;
+    }
+
+    qtest_memset(sq->common.ctrl->pdev->bus->qts,
+                 phys_sqe, 0, sizeof(NvmeCmd));
+
+    #define GUEST_MEM_WRITE(fn, phys_addr, val) \
+        fn(sq->common.ctrl->pdev->bus->qts, phys_addr, (val))
+
+    GUEST_MEM_WRITE(qtest_writeb,
+                    PHYS_ADDR_OF_FIELD(NvmeCmd, phys_sqe, opcode), opcode);
+    GUEST_MEM_WRITE(qtest_writew,
+                    PHYS_ADDR_OF_FIELD(NvmeCmd, phys_sqe, cid), cid);
+    GUEST_MEM_WRITE(qtest_writeq,
+                    PHYS_ADDR_OF_FIELD(NvmeCmd, phys_sqe, dptr.prp1), prp1);
+
+    #undef GUEST_MEM_WRITE
+
+    g_test_message("sq %p next_sqe %u sqe 0x%" PRIx64, sq, sq->tail, phys_sqe);
+    return phys_sqe;
+}
+
+static void nvme_commit_sqe(nvme_sq *sq)
+{
+    g_test_message("sq %p commit sqe tail %u", sq, sq->tail);
+    sq->tail = (sq->tail + 1) % sq->common.size;
+    qpci_io_writel(sq->common.ctrl->pdev, sq->common.ctrl->bar,
+                   sq->common.doorbell, sq->tail);
+}
+
+static uint64_t nvme_admin_identify_ctrl(nvme_ctrl *ctrl,
+                                         uint16_t cid, bool no_wait)
+{
+    uint64_t phys_cmd_identify; /* NvmeCmd* */
+    uint64_t phys_identify; /* NvmeIdCtrl* */
+    NvmeCqe cqe;
+
+    g_test_message("sending req cid %u no_wait %d", cid, no_wait);
+
+    phys_identify = guest_alloc(ctrl->alloc, PAGE_SIZE);
+    g_assert(phys_identify);
+
+    phys_cmd_identify = nvme_get_next_sqe(&ctrl->admin_sq,
+                                          NVME_ADM_CMD_IDENTIFY, cid,
+                                          phys_identify);
+    g_assert(phys_cmd_identify);
+
+    #define GUEST_MEM_WRITE(fn, phys_addr, val) \
+        fn(ctrl->pdev->bus->qts, phys_addr, (val))
+
+    GUEST_MEM_WRITE(qtest_writel,
+                    PHYS_ADDR_OF_FIELD(NvmeCmd, phys_cmd_identify, nsid), 0);
+    GUEST_MEM_WRITE(qtest_writel,
+                    PHYS_ADDR_OF_FIELD(NvmeIdentify, phys_cmd_identify, cns),
+                    NVME_ID_CNS_CTRL);
+
+    #undef GUEST_MEM_WRITE
+
+    nvme_commit_sqe(&ctrl->admin_sq);
+
+    if (no_wait) {
+        return phys_identify;
+    }
+
+    cqe = nvme_wait(&ctrl->admin_sq);
+    g_assert(nvme_is_cqe_success(&cqe));
+    g_assert(le16_to_cpu(cqe.cid) == cid);
+
+    return phys_identify;
+}
+
+static void nvme_wait_ready(nvme_ctrl *ctrl, int val)
+{
+    int i;
+
+    for (i = 0; i < 10; i++) {
+        uint32_t csts = qpci_io_readl(ctrl->pdev, ctrl->bar, NVME_REG_CSTS);
+        g_test_message("%s: csts %x", __func__, csts);
+
+        if (NVME_CSTS_RDY(csts) == val) {
+            return;
+        }
+
+        g_usleep(1000);
+    }
+
+    g_assert_not_reached();
+}
+
+static void test_migrate_setup_nvme_ctrl(nvme_ctrl *ctrl)
+{
+    uint64_t cap;
+
+    /* disable controller */
+    qpci_io_writel(ctrl->pdev, ctrl->bar, NVME_REG_CC, 0);
+    nvme_wait_ready(ctrl, 0);
+
+    cap = qpci_io_readq(ctrl->pdev, ctrl->bar, NVME_REG_CAP);
+    ctrl->db_stride = 4 << NVME_CAP_DSTRD(cap);
+
+    nvme_init_cq(ctrl, &ctrl->admin_cq, 1, 2 /* CQEs num */);
+    nvme_init_sq(ctrl, &ctrl->admin_sq, 0, 4 /* SQEs num */, &ctrl->admin_cq);
+
+    qpci_io_writel(ctrl->pdev, ctrl->bar, NVME_REG_AQA,
+        ((ctrl->admin_cq.common.size - 1) << AQA_ACQS_SHIFT) |
+        ((ctrl->admin_sq.common.size - 1) << AQA_ASQS_SHIFT)
+    );
+
+    qpci_io_writeq(ctrl->pdev, ctrl->bar,
+                   NVME_REG_ASQ, (uint64_t)ctrl->admin_sq.phys_sqe);
+    qpci_io_writeq(ctrl->pdev, ctrl->bar,
+                   NVME_REG_ACQ, (uint64_t)ctrl->admin_cq.phys_cqe);
+
+    /* enable controller */
+    {
+        uint32_t cc = 0;
+        NVME_SET_CC_EN(cc, 1);
+        qpci_io_writel(ctrl->pdev, ctrl->bar, NVME_REG_CC, cc);
+    }
+
+    nvme_wait_ready(ctrl, 1);
+}
+
+typedef struct test_migrate_req {
+    uint16_t cid;
+    bool handle_cqe;
+    uint64_t phys_identify; /* NvmeIdCtrl* */
+} test_migrate_req;
+
+static void test_migrate_send_nvme_reqs(nvme_ctrl *ctrl, test_migrate_req *reqs,
+                                        int num)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        reqs[i].phys_identify = nvme_admin_identify_ctrl(ctrl, reqs[i].cid,
+                                                         !reqs[i].handle_cqe);
+        g_assert(reqs[i].phys_identify);
+
+        if (reqs[i].handle_cqe) {
+            guest_free(ctrl->alloc, reqs[i].phys_identify);
+        }
+    }
+}
+
+static void test_migrate_check_nvme(nvme_ctrl *ctrl,
+                                    test_migrate_req *reqs, int num)
+{
+    int i;
+
+    for (i = 0; i < num; i++) {
+        NvmeCqe cqe;
+
+        if (reqs[i].handle_cqe) {
+            continue;
+        }
+
+        cqe = nvme_wait(&ctrl->admin_sq);
+        g_assert(nvme_is_cqe_success(&cqe));
+
+        g_assert_cmpint(le16_to_cpu(cqe.cid), ==, reqs[i].cid);
+
+        #define GUEST_MEM_READB(phys_addr) \
+                    qtest_readb(ctrl->pdev->bus->qts, (phys_addr))
+
+        g_assert_cmpint(GUEST_MEM_READB(
+            PHYS_ADDR_OF_FIELD(NvmeIdCtrl, reqs[i].phys_identify, ieee[0])),
+            ==, 0x0);
+        g_assert_cmpint(GUEST_MEM_READB(
+            PHYS_ADDR_OF_FIELD(NvmeIdCtrl, reqs[i].phys_identify, ieee[1])),
+            ==, 0x54);
+        g_assert_cmpint(GUEST_MEM_READB(
+            PHYS_ADDR_OF_FIELD(NvmeIdCtrl, reqs[i].phys_identify, ieee[2])),
+            ==, 0x52);
+
+        #undef GUEST_MEM_READB
+
+        guest_free(ctrl->alloc, reqs[i].phys_identify);
+    }
+}
+
+static void test_migrate(void *obj, void *data, QGuestAllocator *alloc)
+{
+    g_autofree gchar *tmpfs = NULL;
+    GError *err = NULL;
+    g_autofree gchar *mig_path = NULL;
+    g_autofree gchar *uri = NULL;
+    GString *dest_cmdline;
+    QTestState *to;
+    QDict *rsp;
+    QNvme *nvme = obj;
+    QPCIDevice *pdev = &nvme->dev;
+    g_autofree nvme_ctrl *ctrl = NULL;
+    test_migrate_req test_reqs[] = {
+        { 123, true },
+        { 456, false },
+        { 300, false },
+        { 333, false }
+    };
+
+    if (qpci_check_buggy_msi(pdev)) {
+        return;
+    }
+
+    /* create temporary dir and prepare unix socket path for migration */
+    tmpfs = g_dir_make_tmp("nvme-test-XXXXXX", &err);
+    if (!tmpfs) {
+        g_test_message("Can't create temporary directory in %s: %s",
+                       g_get_tmp_dir(), err->message);
+        g_error_free(err);
+    }
+    g_assert(tmpfs);
+
+    mig_path = g_strdup_printf("%s/socket.mig", tmpfs);
+    uri = g_strdup_printf("unix:%s", mig_path);
+
+    /* enable NVMe PCI device */
+    qpci_device_enable(pdev);
+
+    ctrl = g_malloc0(sizeof(*ctrl));
+    ctrl->alloc = alloc;
+    ctrl->pdev = pdev;
+    ctrl->bar = qpci_iomap(ctrl->pdev, 0, NULL);
+    g_assert(pdev->bus->qts == global_qtest);
+
+    test_migrate_setup_nvme_ctrl(ctrl);
+    test_migrate_send_nvme_reqs(ctrl, test_reqs, ARRAY_SIZE(test_reqs));
+
+    qpci_iounmap(ctrl->pdev, ctrl->bar);
+
+    dest_cmdline = g_string_new(qos_get_current_command_line());
+    g_string_append_printf(dest_cmdline, " -incoming %s", uri);
+
+    /* Create destination VM */
+    to = qtest_init(dest_cmdline->str);
+
+    /* Get access to PCI device from destination VM */
+    nvme = qos_allocate_objects(to, &ctrl->alloc);
+    pdev = &nvme->dev;
+    ctrl->pdev = pdev;
+    ctrl->bar = qpci_iomap(ctrl->pdev, 0, NULL);
+    g_assert(pdev->bus->qts == to);
+
+    /* Migrate VM */
+    rsp = qmp("{ 'execute': 'migrate', 'arguments': { 'uri': %s } }", uri);
+    g_assert(qdict_haskey(rsp, "return"));
+    qobject_unref(rsp);
+
+    /* Wait when source VM is stopped */
+    qmp_eventwait("STOP");
+
+    /* Copy guest physical memory allocator state */
+    migrate_allocator(alloc, ctrl->alloc);
+
+    /* Wait for destination VM to become alive */
+    qtest_qmp_eventwait(to, "RESUME");
+
+    test_migrate_check_nvme(ctrl, test_reqs, ARRAY_SIZE(test_reqs));
+
+    qpci_iounmap(ctrl->pdev, ctrl->bar);
+
+    qtest_quit(to);
+    g_unlink(mig_path);
+    g_rmdir(tmpfs);
+    g_string_free(dest_cmdline, true);
+}
+
 static void nvme_register_nodes(void)
 {
     QOSGraphEdgeOptions opts = {
@@ -168,6 +587,8 @@ static void nvme_register_nodes(void)
     });
 
     qos_add_test("reg-read", "nvme", nvmetest_reg_read_test, NULL);
+
+    qos_add_test("migrate", "nvme", test_migrate, NULL);
 }
 
 libqos_init(nvme_register_nodes);
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases
  2026-06-11 11:22 ` [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases Alexander Mikhalitsyn
@ 2026-06-11 13:31   ` Fabiano Rosas
  2026-06-11 13:53     ` Alexander Mikhalitsyn
  0 siblings, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2026-06-11 13:31 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, qemu-devel
  Cc: Klaus Jensen, Stefan Hajnoczi, Jesper Devantier, qemu-block,
	Hanna Reitz, Fam Zheng, Stéphane Graber, Kevin Wolf,
	Keith Busch, Laurent Vivier, Zhao Liu, Alexander Mikhalitsyn,
	Paolo Bonzini, Peter Xu, Philippe Mathieu-Daudé,
	Alexander Mikhalitsyn, Klaus Jensen

Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:

> From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
>
> Let's block migration for cases we don't support:
> - SR-IOV
> - CMB
> - PMR
> - SPDM
>
> No functional changes here, because NVMe migration is
> not supported at all as of this commit.
>
> Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> ---
> v9:
>     - check-patch trivial fixes
> ---
>  hw/nvme/ctrl.c       | 211 +++++++++++++++++++++++++++++++++++++++++++
>  hw/nvme/nvme.h       |   3 +
>  include/block/nvme.h |  12 +++
>  3 files changed, 226 insertions(+)
>
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 815f39173c8..7510a9e0296 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -209,6 +209,7 @@
>  #include "hw/pci/msix.h"
>  #include "hw/pci/pcie_sriov.h"
>  #include "system/spdm-socket.h"
> +#include "migration/blocker.h"
>  #include "migration/vmstate.h"
>  
>  #include "nvme.h"
> @@ -252,6 +253,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
>      [NVME_COMMAND_SET_PROFILE]      = true,
>      [NVME_FDP_MODE]                 = true,
>      [NVME_FDP_EVENTS]               = true,
> +    /* if you add something here, please update nvme_set_migration_blockers() */
>  };
>  
>  static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
> @@ -4603,6 +4605,7 @@ static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
>          return 0;
>      case NVME_IOMS_MO_RUH_UPDATE:
>          return nvme_io_mgmt_send_ruh_update(n, req);
> +    /* if you add something here, please update nvme_set_migration_blockers() */
>      default:
>          return NVME_INVALID_FIELD | NVME_DNR;
>      };
> @@ -7522,6 +7525,10 @@ static uint16_t nvme_security_receive(NvmeCtrl *n, NvmeRequest *req)
>  
>  static uint16_t nvme_directive_send(NvmeCtrl *n, NvmeRequest *req)
>  {
> +    /*
> +     * When adding a new dtype handling here,
> +     * please also update nvme_set_migration_blockers().
> +     */
>      return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> @@ -9233,6 +9240,204 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
>      }
>  }
>  
> +#define BLOCKER_FEATURES_MAX_LEN 256
> +
> +static inline void nvme_add_blocker_feature(char *blocker_features,
> +                                            const char *feature)
> +{
> +    if (strlen(blocker_features) > 0) {
> +        g_strlcat(blocker_features, ", ", BLOCKER_FEATURES_MAX_LEN);
> +    }
> +    g_strlcat(blocker_features, feature, BLOCKER_FEATURES_MAX_LEN);
> +}
> +
> +static bool nvme_set_migration_blockers(NvmeCtrl *n, PCIDevice *pci_dev,
> +                                        Error **errp)
> +{
> +    uint64_t unsupported_cap, cap = ldq_le_p(&n->bar.cap);
> +    char blocker_features[BLOCKER_FEATURES_MAX_LEN] = "";
> +    bool adm_cmd_security_checked = false;
> +    bool cmd_io_mgmt_checked = false;
> +    bool cmd_zone_checked = false;
> +
> +    /*
> +     * Idea of this function is simple, we iterate over all Command Sets and
> +     * for each supported command we provide a special handling logic to
> +     * determine if we should block migration or not.
> +     *
> +     * For instance, we have NVME_ADM_CMD_NS_ATTACHMENT and it is always
> +     * available to the guest, but if there is only 1 namespace, then it is
> +     * safe to allow migration, but if there are more, then we need to block
> +     * migration because we don't handle this in migration code yet.
> +     */
> +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.acs); opcode++) {
> +        /* Is command supported? */
> +        if (!n->cse.acs[opcode]) {
> +            continue;
> +        }
> +
> +        switch (opcode) {
> +        case NVME_ADM_CMD_DELETE_SQ:
> +        case NVME_ADM_CMD_CREATE_SQ:
> +        case NVME_ADM_CMD_GET_LOG_PAGE:
> +        case NVME_ADM_CMD_DELETE_CQ:
> +        case NVME_ADM_CMD_CREATE_CQ:
> +        case NVME_ADM_CMD_IDENTIFY:
> +        case NVME_ADM_CMD_ABORT:
> +        case NVME_ADM_CMD_SET_FEATURES:
> +        case NVME_ADM_CMD_GET_FEATURES:
> +        case NVME_ADM_CMD_ASYNC_EV_REQ:
> +        case NVME_ADM_CMD_DBBUF_CONFIG:
> +        case NVME_ADM_CMD_FORMAT_NVM:
> +        case NVME_ADM_CMD_DIRECTIVE_SEND:
> +        case NVME_ADM_CMD_DIRECTIVE_RECV:
> +            break;
> +        case NVME_ADM_CMD_NS_ATTACHMENT:
> +            int namespaces_num = 0;

Hi, clang flags this:

../hw/nvme/ctrl.c:9385:13: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
 9385 |             int namespaces_num = 0;
      |             ^


> +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> +                if (!ns) {
> +                    continue;
> +                }
> +
> +                namespaces_num++;
> +            }
> +
> +            if (namespaces_num > 1) {
> +                nvme_add_blocker_feature(blocker_features,
> +                                         "Namespace Attachment");
> +            }
> +
> +            break;
> +        case NVME_ADM_CMD_VIRT_MNGMT:
> +            if (n->params.sriov_max_vfs) {
> +                nvme_add_blocker_feature(blocker_features, "SR-IOV");
> +            }
> +
> +            break;
> +        case NVME_ADM_CMD_SECURITY_SEND:
> +        case NVME_ADM_CMD_SECURITY_RECV:
> +            if (adm_cmd_security_checked) {
> +                break;
> +            }
> +
> +            if (pci_dev->spdm_port) {
> +                nvme_add_blocker_feature(blocker_features, "SPDM");
> +            }
> +
> +            adm_cmd_security_checked = true;
> +
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +    }
> +
> +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.nvm); opcode++) {
> +        if (!n->cse.iocs.nvm[opcode]) {
> +            continue;
> +        }
> +
> +        switch (opcode) {
> +        case NVME_CMD_FLUSH:
> +        case NVME_CMD_WRITE:
> +        case NVME_CMD_READ:
> +        case NVME_CMD_COMPARE:
> +        case NVME_CMD_WRITE_ZEROES:
> +        case NVME_CMD_DSM:
> +        case NVME_CMD_VERIFY:
> +        case NVME_CMD_COPY:
> +            break;
> +        case NVME_CMD_IO_MGMT_RECV:
> +        case NVME_CMD_IO_MGMT_SEND:
> +            if (cmd_io_mgmt_checked) {
> +                break;
> +            }
> +
> +            /* check for NVME_IOMS_MO_RUH_UPDATE */
> +            if (n->subsys->params.fdp.enabled) {
> +                nvme_add_blocker_feature(blocker_features, "FDP");
> +            }
> +
> +            cmd_io_mgmt_checked = true;
> +
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +    }
> +
> +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.zoned); opcode++) {
> +        /*
> +         * If command isn't supported or we have the same command
> +         * in n->cse.iocs.nvm, then we can skip it here.
> +         */
> +        if (!n->cse.iocs.zoned[opcode] || n->cse.iocs.nvm[opcode]) {
> +            continue;
> +        }
> +
> +        switch (opcode) {
> +        case NVME_CMD_ZONE_APPEND:
> +        case NVME_CMD_ZONE_MGMT_SEND:
> +        case NVME_CMD_ZONE_MGMT_RECV:
> +            if (cmd_zone_checked) {
> +                break;
> +            }
> +
> +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> +                if (!ns) {
> +                    continue;
> +                }
> +
> +                if (ns->params.zoned) {
> +                    nvme_add_blocker_feature(blocker_features,
> +                                             "Zoned Namespace");
> +                    break;
> +                }
> +            }
> +
> +            cmd_zone_checked = true;
> +
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +    }
> +
> +    /*
> +     * Try our best to explicitly detect all not supported caps,
> +     * to let users know what features cause migration to be blocked,
> +     * but in case we miss handling here, everything else will be
> +     * covered by unsupported_cap check.
> +     */
> +    if (NVME_CAP_CMBS(cap)) {
> +        nvme_add_blocker_feature(blocker_features, "CMB");
> +        cap &= ~((uint64_t)CAP_CMBS_MASK << CAP_CMBS_SHIFT);
> +    }
> +
> +    if (NVME_CAP_PMRS(cap)) {
> +        nvme_add_blocker_feature(blocker_features, "PMR");
> +        cap &= ~((uint64_t)CAP_PMRS_MASK << CAP_PMRS_SHIFT);
> +    }
> +
> +    unsupported_cap = cap & ~NVME_MIGRATION_SUPPORTED_CAP_BITS;
> +    if (unsupported_cap) {
> +        nvme_add_blocker_feature(blocker_features, "unknown capability");
> +    }
> +
> +    assert(n->migration_blocker == NULL);
> +    if (strlen(blocker_features) > 0) {
> +        error_setg(&n->migration_blocker,
> +                   "Migration is not supported for %s", blocker_features);
> +        if (migrate_add_blocker(&n->migration_blocker, errp) < 0) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
>  {
>      int cntlid;
> @@ -9338,6 +9543,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>  
>          n->subsys->namespaces[ns->params.nsid] = ns;
>      }
> +
> +    if (!nvme_set_migration_blockers(n, pci_dev, errp)) {
> +        return;
> +    }
>  }
>  
>  static void nvme_exit(PCIDevice *pci_dev)
> @@ -9390,6 +9599,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>      }
>  
>      memory_region_del_subregion(&n->bar0, &n->iomem);
> +
> +    migrate_del_blocker(&n->migration_blocker);
>  }
>  
>  static const Property nvme_props[] = {
> diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> index 5ef3ebee29e..05aee24a15c 100644
> --- a/hw/nvme/nvme.h
> +++ b/hw/nvme/nvme.h
> @@ -668,6 +668,9 @@ typedef struct NvmeCtrl {
>  
>      /* Socket mapping to SPDM over NVMe Security In/Out commands */
>      int spdm_socket;
> +
> +    /* Migration-related stuff */
> +    Error *migration_blocker;
>  } NvmeCtrl;
>  
>  typedef enum NvmeResetType {
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index e4e7be51205..17a7c7818d7 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -141,6 +141,18 @@ enum NvmeCapMask {
>  #define NVME_CAP_SET_CMBS(cap, val)   \
>      ((cap) |= (uint64_t)((val) & CAP_CMBS_MASK)   << CAP_CMBS_SHIFT)
>  
> +#define NVME_MIGRATION_SUPPORTED_CAP_BITS ( \
> +      ((uint64_t)CAP_MQES_MASK   << CAP_MQES_SHIFT)   \
> +    | ((uint64_t)CAP_CQR_MASK    << CAP_CQR_SHIFT)    \
> +    | ((uint64_t)CAP_AMS_MASK    << CAP_AMS_SHIFT)    \
> +    | ((uint64_t)CAP_TO_MASK     << CAP_TO_SHIFT)     \
> +    | ((uint64_t)CAP_DSTRD_MASK  << CAP_DSTRD_SHIFT)  \
> +    | ((uint64_t)CAP_NSSRS_MASK  << CAP_NSSRS_SHIFT)  \
> +    | ((uint64_t)CAP_CSS_MASK    << CAP_CSS_SHIFT)    \
> +    | ((uint64_t)CAP_MPSMIN_MASK << CAP_MPSMIN_SHIFT) \
> +    | ((uint64_t)CAP_MPSMAX_MASK << CAP_MPSMAX_SHIFT) \
> +)
> +
>  enum NvmeCapCss {
>      NVME_CAP_CSS_NCSS    = 1 << 0,
>      NVME_CAP_CSS_IOCSS   = 1 << 6,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases
  2026-06-11 13:31   ` Fabiano Rosas
@ 2026-06-11 13:53     ` Alexander Mikhalitsyn
  2026-06-11 14:24       ` Fabiano Rosas
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 13:53 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Klaus Jensen, Stefan Hajnoczi, Jesper Devantier,
	qemu-block, Hanna Reitz, Fam Zheng, Stéphane Graber,
	Kevin Wolf, Keith Busch, Laurent Vivier, Zhao Liu, Paolo Bonzini,
	Peter Xu, Philippe Mathieu-Daudé, Alexander Mikhalitsyn,
	Klaus Jensen

Am Do., 11. Juni 2026 um 15:31 Uhr schrieb Fabiano Rosas <farosas@suse.de>:
>
> Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:
>
> > From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> >
> > Let's block migration for cases we don't support:
> > - SR-IOV
> > - CMB
> > - PMR
> > - SPDM
> >
> > No functional changes here, because NVMe migration is
> > not supported at all as of this commit.
> >
> > Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> > ---
> > v9:
> >     - check-patch trivial fixes
> > ---
> >  hw/nvme/ctrl.c       | 211 +++++++++++++++++++++++++++++++++++++++++++
> >  hw/nvme/nvme.h       |   3 +
> >  include/block/nvme.h |  12 +++
> >  3 files changed, 226 insertions(+)
> >
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index 815f39173c8..7510a9e0296 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -209,6 +209,7 @@
> >  #include "hw/pci/msix.h"
> >  #include "hw/pci/pcie_sriov.h"
> >  #include "system/spdm-socket.h"
> > +#include "migration/blocker.h"
> >  #include "migration/vmstate.h"
> >
> >  #include "nvme.h"
> > @@ -252,6 +253,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
> >      [NVME_COMMAND_SET_PROFILE]      = true,
> >      [NVME_FDP_MODE]                 = true,
> >      [NVME_FDP_EVENTS]               = true,
> > +    /* if you add something here, please update nvme_set_migration_blockers() */
> >  };
> >
> >  static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
> > @@ -4603,6 +4605,7 @@ static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
> >          return 0;
> >      case NVME_IOMS_MO_RUH_UPDATE:
> >          return nvme_io_mgmt_send_ruh_update(n, req);
> > +    /* if you add something here, please update nvme_set_migration_blockers() */
> >      default:
> >          return NVME_INVALID_FIELD | NVME_DNR;
> >      };
> > @@ -7522,6 +7525,10 @@ static uint16_t nvme_security_receive(NvmeCtrl *n, NvmeRequest *req)
> >
> >  static uint16_t nvme_directive_send(NvmeCtrl *n, NvmeRequest *req)
> >  {
> > +    /*
> > +     * When adding a new dtype handling here,
> > +     * please also update nvme_set_migration_blockers().
> > +     */
> >      return NVME_INVALID_FIELD | NVME_DNR;
> >  }
> >
> > @@ -9233,6 +9240,204 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
> >      }
> >  }
> >
> > +#define BLOCKER_FEATURES_MAX_LEN 256
> > +
> > +static inline void nvme_add_blocker_feature(char *blocker_features,
> > +                                            const char *feature)
> > +{
> > +    if (strlen(blocker_features) > 0) {
> > +        g_strlcat(blocker_features, ", ", BLOCKER_FEATURES_MAX_LEN);
> > +    }
> > +    g_strlcat(blocker_features, feature, BLOCKER_FEATURES_MAX_LEN);
> > +}
> > +
> > +static bool nvme_set_migration_blockers(NvmeCtrl *n, PCIDevice *pci_dev,
> > +                                        Error **errp)
> > +{
> > +    uint64_t unsupported_cap, cap = ldq_le_p(&n->bar.cap);
> > +    char blocker_features[BLOCKER_FEATURES_MAX_LEN] = "";
> > +    bool adm_cmd_security_checked = false;
> > +    bool cmd_io_mgmt_checked = false;
> > +    bool cmd_zone_checked = false;
> > +
> > +    /*
> > +     * Idea of this function is simple, we iterate over all Command Sets and
> > +     * for each supported command we provide a special handling logic to
> > +     * determine if we should block migration or not.
> > +     *
> > +     * For instance, we have NVME_ADM_CMD_NS_ATTACHMENT and it is always
> > +     * available to the guest, but if there is only 1 namespace, then it is
> > +     * safe to allow migration, but if there are more, then we need to block
> > +     * migration because we don't handle this in migration code yet.
> > +     */
> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.acs); opcode++) {
> > +        /* Is command supported? */
> > +        if (!n->cse.acs[opcode]) {
> > +            continue;
> > +        }
> > +
> > +        switch (opcode) {
> > +        case NVME_ADM_CMD_DELETE_SQ:
> > +        case NVME_ADM_CMD_CREATE_SQ:
> > +        case NVME_ADM_CMD_GET_LOG_PAGE:
> > +        case NVME_ADM_CMD_DELETE_CQ:
> > +        case NVME_ADM_CMD_CREATE_CQ:
> > +        case NVME_ADM_CMD_IDENTIFY:
> > +        case NVME_ADM_CMD_ABORT:
> > +        case NVME_ADM_CMD_SET_FEATURES:
> > +        case NVME_ADM_CMD_GET_FEATURES:
> > +        case NVME_ADM_CMD_ASYNC_EV_REQ:
> > +        case NVME_ADM_CMD_DBBUF_CONFIG:
> > +        case NVME_ADM_CMD_FORMAT_NVM:
> > +        case NVME_ADM_CMD_DIRECTIVE_SEND:
> > +        case NVME_ADM_CMD_DIRECTIVE_RECV:
> > +            break;
> > +        case NVME_ADM_CMD_NS_ATTACHMENT:
> > +            int namespaces_num = 0;
>
> Hi, clang flags this:
>
> ../hw/nvme/ctrl.c:9385:13: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
>  9385 |             int namespaces_num = 0;
>       |             ^
>

Hi Fabiano,

ugh, I'll handle this. Let me then burn even more CI minutes and run
every single test in the matrix :)

Sorry about making a noise by making submissions with a stupid mistakes :(

Kind regards,
Alex

>
> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> > +                if (!ns) {
> > +                    continue;
> > +                }
> > +
> > +                namespaces_num++;
> > +            }
> > +
> > +            if (namespaces_num > 1) {
> > +                nvme_add_blocker_feature(blocker_features,
> > +                                         "Namespace Attachment");
> > +            }
> > +
> > +            break;
> > +        case NVME_ADM_CMD_VIRT_MNGMT:
> > +            if (n->params.sriov_max_vfs) {
> > +                nvme_add_blocker_feature(blocker_features, "SR-IOV");
> > +            }
> > +
> > +            break;
> > +        case NVME_ADM_CMD_SECURITY_SEND:
> > +        case NVME_ADM_CMD_SECURITY_RECV:
> > +            if (adm_cmd_security_checked) {
> > +                break;
> > +            }
> > +
> > +            if (pci_dev->spdm_port) {
> > +                nvme_add_blocker_feature(blocker_features, "SPDM");
> > +            }
> > +
> > +            adm_cmd_security_checked = true;
> > +
> > +            break;
> > +        default:
> > +            g_assert_not_reached();
> > +        }
> > +    }
> > +
> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.nvm); opcode++) {
> > +        if (!n->cse.iocs.nvm[opcode]) {
> > +            continue;
> > +        }
> > +
> > +        switch (opcode) {
> > +        case NVME_CMD_FLUSH:
> > +        case NVME_CMD_WRITE:
> > +        case NVME_CMD_READ:
> > +        case NVME_CMD_COMPARE:
> > +        case NVME_CMD_WRITE_ZEROES:
> > +        case NVME_CMD_DSM:
> > +        case NVME_CMD_VERIFY:
> > +        case NVME_CMD_COPY:
> > +            break;
> > +        case NVME_CMD_IO_MGMT_RECV:
> > +        case NVME_CMD_IO_MGMT_SEND:
> > +            if (cmd_io_mgmt_checked) {
> > +                break;
> > +            }
> > +
> > +            /* check for NVME_IOMS_MO_RUH_UPDATE */
> > +            if (n->subsys->params.fdp.enabled) {
> > +                nvme_add_blocker_feature(blocker_features, "FDP");
> > +            }
> > +
> > +            cmd_io_mgmt_checked = true;
> > +
> > +            break;
> > +        default:
> > +            g_assert_not_reached();
> > +        }
> > +    }
> > +
> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.zoned); opcode++) {
> > +        /*
> > +         * If command isn't supported or we have the same command
> > +         * in n->cse.iocs.nvm, then we can skip it here.
> > +         */
> > +        if (!n->cse.iocs.zoned[opcode] || n->cse.iocs.nvm[opcode]) {
> > +            continue;
> > +        }
> > +
> > +        switch (opcode) {
> > +        case NVME_CMD_ZONE_APPEND:
> > +        case NVME_CMD_ZONE_MGMT_SEND:
> > +        case NVME_CMD_ZONE_MGMT_RECV:
> > +            if (cmd_zone_checked) {
> > +                break;
> > +            }
> > +
> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> > +                if (!ns) {
> > +                    continue;
> > +                }
> > +
> > +                if (ns->params.zoned) {
> > +                    nvme_add_blocker_feature(blocker_features,
> > +                                             "Zoned Namespace");
> > +                    break;
> > +                }
> > +            }
> > +
> > +            cmd_zone_checked = true;
> > +
> > +            break;
> > +        default:
> > +            g_assert_not_reached();
> > +        }
> > +    }
> > +
> > +    /*
> > +     * Try our best to explicitly detect all not supported caps,
> > +     * to let users know what features cause migration to be blocked,
> > +     * but in case we miss handling here, everything else will be
> > +     * covered by unsupported_cap check.
> > +     */
> > +    if (NVME_CAP_CMBS(cap)) {
> > +        nvme_add_blocker_feature(blocker_features, "CMB");
> > +        cap &= ~((uint64_t)CAP_CMBS_MASK << CAP_CMBS_SHIFT);
> > +    }
> > +
> > +    if (NVME_CAP_PMRS(cap)) {
> > +        nvme_add_blocker_feature(blocker_features, "PMR");
> > +        cap &= ~((uint64_t)CAP_PMRS_MASK << CAP_PMRS_SHIFT);
> > +    }
> > +
> > +    unsupported_cap = cap & ~NVME_MIGRATION_SUPPORTED_CAP_BITS;
> > +    if (unsupported_cap) {
> > +        nvme_add_blocker_feature(blocker_features, "unknown capability");
> > +    }
> > +
> > +    assert(n->migration_blocker == NULL);
> > +    if (strlen(blocker_features) > 0) {
> > +        error_setg(&n->migration_blocker,
> > +                   "Migration is not supported for %s", blocker_features);
> > +        if (migrate_add_blocker(&n->migration_blocker, errp) < 0) {
> > +            return false;
> > +        }
> > +    }
> > +
> > +    return true;
> > +}
> > +
> >  static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
> >  {
> >      int cntlid;
> > @@ -9338,6 +9543,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> >
> >          n->subsys->namespaces[ns->params.nsid] = ns;
> >      }
> > +
> > +    if (!nvme_set_migration_blockers(n, pci_dev, errp)) {
> > +        return;
> > +    }
> >  }
> >
> >  static void nvme_exit(PCIDevice *pci_dev)
> > @@ -9390,6 +9599,8 @@ static void nvme_exit(PCIDevice *pci_dev)
> >      }
> >
> >      memory_region_del_subregion(&n->bar0, &n->iomem);
> > +
> > +    migrate_del_blocker(&n->migration_blocker);
> >  }
> >
> >  static const Property nvme_props[] = {
> > diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> > index 5ef3ebee29e..05aee24a15c 100644
> > --- a/hw/nvme/nvme.h
> > +++ b/hw/nvme/nvme.h
> > @@ -668,6 +668,9 @@ typedef struct NvmeCtrl {
> >
> >      /* Socket mapping to SPDM over NVMe Security In/Out commands */
> >      int spdm_socket;
> > +
> > +    /* Migration-related stuff */
> > +    Error *migration_blocker;
> >  } NvmeCtrl;
> >
> >  typedef enum NvmeResetType {
> > diff --git a/include/block/nvme.h b/include/block/nvme.h
> > index e4e7be51205..17a7c7818d7 100644
> > --- a/include/block/nvme.h
> > +++ b/include/block/nvme.h
> > @@ -141,6 +141,18 @@ enum NvmeCapMask {
> >  #define NVME_CAP_SET_CMBS(cap, val)   \
> >      ((cap) |= (uint64_t)((val) & CAP_CMBS_MASK)   << CAP_CMBS_SHIFT)
> >
> > +#define NVME_MIGRATION_SUPPORTED_CAP_BITS ( \
> > +      ((uint64_t)CAP_MQES_MASK   << CAP_MQES_SHIFT)   \
> > +    | ((uint64_t)CAP_CQR_MASK    << CAP_CQR_SHIFT)    \
> > +    | ((uint64_t)CAP_AMS_MASK    << CAP_AMS_SHIFT)    \
> > +    | ((uint64_t)CAP_TO_MASK     << CAP_TO_SHIFT)     \
> > +    | ((uint64_t)CAP_DSTRD_MASK  << CAP_DSTRD_SHIFT)  \
> > +    | ((uint64_t)CAP_NSSRS_MASK  << CAP_NSSRS_SHIFT)  \
> > +    | ((uint64_t)CAP_CSS_MASK    << CAP_CSS_SHIFT)    \
> > +    | ((uint64_t)CAP_MPSMIN_MASK << CAP_MPSMIN_SHIFT) \
> > +    | ((uint64_t)CAP_MPSMAX_MASK << CAP_MPSMAX_SHIFT) \
> > +)
> > +
> >  enum NvmeCapCss {
> >      NVME_CAP_CSS_NCSS    = 1 << 0,
> >      NVME_CAP_CSS_IOCSS   = 1 << 6,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases
  2026-06-11 13:53     ` Alexander Mikhalitsyn
@ 2026-06-11 14:24       ` Fabiano Rosas
  2026-06-11 14:32         ` Alexander Mikhalitsyn
  0 siblings, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2026-06-11 14:24 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: qemu-devel, Klaus Jensen, Stefan Hajnoczi, Jesper Devantier,
	qemu-block, Hanna Reitz, Fam Zheng, Stéphane Graber,
	Kevin Wolf, Keith Busch, Laurent Vivier, Zhao Liu, Paolo Bonzini,
	Peter Xu, Philippe Mathieu-Daudé, Alexander Mikhalitsyn,
	Klaus Jensen

Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:

> Am Do., 11. Juni 2026 um 15:31 Uhr schrieb Fabiano Rosas <farosas@suse.de>:
>>
>> Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:
>>
>> > From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
>> >
>> > Let's block migration for cases we don't support:
>> > - SR-IOV
>> > - CMB
>> > - PMR
>> > - SPDM
>> >
>> > No functional changes here, because NVMe migration is
>> > not supported at all as of this commit.
>> >
>> > Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
>> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
>> > ---
>> > v9:
>> >     - check-patch trivial fixes
>> > ---
>> >  hw/nvme/ctrl.c       | 211 +++++++++++++++++++++++++++++++++++++++++++
>> >  hw/nvme/nvme.h       |   3 +
>> >  include/block/nvme.h |  12 +++
>> >  3 files changed, 226 insertions(+)
>> >
>> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>> > index 815f39173c8..7510a9e0296 100644
>> > --- a/hw/nvme/ctrl.c
>> > +++ b/hw/nvme/ctrl.c
>> > @@ -209,6 +209,7 @@
>> >  #include "hw/pci/msix.h"
>> >  #include "hw/pci/pcie_sriov.h"
>> >  #include "system/spdm-socket.h"
>> > +#include "migration/blocker.h"
>> >  #include "migration/vmstate.h"
>> >
>> >  #include "nvme.h"
>> > @@ -252,6 +253,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
>> >      [NVME_COMMAND_SET_PROFILE]      = true,
>> >      [NVME_FDP_MODE]                 = true,
>> >      [NVME_FDP_EVENTS]               = true,
>> > +    /* if you add something here, please update nvme_set_migration_blockers() */
>> >  };
>> >
>> >  static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
>> > @@ -4603,6 +4605,7 @@ static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
>> >          return 0;
>> >      case NVME_IOMS_MO_RUH_UPDATE:
>> >          return nvme_io_mgmt_send_ruh_update(n, req);
>> > +    /* if you add something here, please update nvme_set_migration_blockers() */
>> >      default:
>> >          return NVME_INVALID_FIELD | NVME_DNR;
>> >      };
>> > @@ -7522,6 +7525,10 @@ static uint16_t nvme_security_receive(NvmeCtrl *n, NvmeRequest *req)
>> >
>> >  static uint16_t nvme_directive_send(NvmeCtrl *n, NvmeRequest *req)
>> >  {
>> > +    /*
>> > +     * When adding a new dtype handling here,
>> > +     * please also update nvme_set_migration_blockers().
>> > +     */
>> >      return NVME_INVALID_FIELD | NVME_DNR;
>> >  }
>> >
>> > @@ -9233,6 +9240,204 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
>> >      }
>> >  }
>> >
>> > +#define BLOCKER_FEATURES_MAX_LEN 256
>> > +
>> > +static inline void nvme_add_blocker_feature(char *blocker_features,
>> > +                                            const char *feature)
>> > +{
>> > +    if (strlen(blocker_features) > 0) {
>> > +        g_strlcat(blocker_features, ", ", BLOCKER_FEATURES_MAX_LEN);
>> > +    }
>> > +    g_strlcat(blocker_features, feature, BLOCKER_FEATURES_MAX_LEN);
>> > +}
>> > +
>> > +static bool nvme_set_migration_blockers(NvmeCtrl *n, PCIDevice *pci_dev,
>> > +                                        Error **errp)
>> > +{
>> > +    uint64_t unsupported_cap, cap = ldq_le_p(&n->bar.cap);
>> > +    char blocker_features[BLOCKER_FEATURES_MAX_LEN] = "";
>> > +    bool adm_cmd_security_checked = false;
>> > +    bool cmd_io_mgmt_checked = false;
>> > +    bool cmd_zone_checked = false;
>> > +
>> > +    /*
>> > +     * Idea of this function is simple, we iterate over all Command Sets and
>> > +     * for each supported command we provide a special handling logic to
>> > +     * determine if we should block migration or not.
>> > +     *
>> > +     * For instance, we have NVME_ADM_CMD_NS_ATTACHMENT and it is always
>> > +     * available to the guest, but if there is only 1 namespace, then it is
>> > +     * safe to allow migration, but if there are more, then we need to block
>> > +     * migration because we don't handle this in migration code yet.
>> > +     */
>> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.acs); opcode++) {
>> > +        /* Is command supported? */
>> > +        if (!n->cse.acs[opcode]) {
>> > +            continue;
>> > +        }
>> > +
>> > +        switch (opcode) {
>> > +        case NVME_ADM_CMD_DELETE_SQ:
>> > +        case NVME_ADM_CMD_CREATE_SQ:
>> > +        case NVME_ADM_CMD_GET_LOG_PAGE:
>> > +        case NVME_ADM_CMD_DELETE_CQ:
>> > +        case NVME_ADM_CMD_CREATE_CQ:
>> > +        case NVME_ADM_CMD_IDENTIFY:
>> > +        case NVME_ADM_CMD_ABORT:
>> > +        case NVME_ADM_CMD_SET_FEATURES:
>> > +        case NVME_ADM_CMD_GET_FEATURES:
>> > +        case NVME_ADM_CMD_ASYNC_EV_REQ:
>> > +        case NVME_ADM_CMD_DBBUF_CONFIG:
>> > +        case NVME_ADM_CMD_FORMAT_NVM:
>> > +        case NVME_ADM_CMD_DIRECTIVE_SEND:
>> > +        case NVME_ADM_CMD_DIRECTIVE_RECV:
>> > +            break;
>> > +        case NVME_ADM_CMD_NS_ATTACHMENT:
>> > +            int namespaces_num = 0;
>>
>> Hi, clang flags this:
>>
>> ../hw/nvme/ctrl.c:9385:13: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
>>  9385 |             int namespaces_num = 0;
>>       |             ^
>>
>
> Hi Fabiano,
>
> ugh, I'll handle this. Let me then burn even more CI minutes and run
> every single test in the matrix :)
>

I'm not sure if the CI would catch this, I run different tests
locally.

> Sorry about making a noise by making submissions with a stupid mistakes :(
>

It's completely fine, don't worry. We're close to merging this, I was
just running a final set of tests to give my ack. It's good that we
caught it before merge.

> Kind regards,
> Alex
>
>>
>> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
>> > +                if (!ns) {
>> > +                    continue;
>> > +                }
>> > +
>> > +                namespaces_num++;
>> > +            }
>> > +
>> > +            if (namespaces_num > 1) {
>> > +                nvme_add_blocker_feature(blocker_features,
>> > +                                         "Namespace Attachment");
>> > +            }
>> > +
>> > +            break;
>> > +        case NVME_ADM_CMD_VIRT_MNGMT:
>> > +            if (n->params.sriov_max_vfs) {
>> > +                nvme_add_blocker_feature(blocker_features, "SR-IOV");
>> > +            }
>> > +
>> > +            break;
>> > +        case NVME_ADM_CMD_SECURITY_SEND:
>> > +        case NVME_ADM_CMD_SECURITY_RECV:
>> > +            if (adm_cmd_security_checked) {
>> > +                break;
>> > +            }
>> > +
>> > +            if (pci_dev->spdm_port) {
>> > +                nvme_add_blocker_feature(blocker_features, "SPDM");
>> > +            }
>> > +
>> > +            adm_cmd_security_checked = true;
>> > +
>> > +            break;
>> > +        default:
>> > +            g_assert_not_reached();
>> > +        }
>> > +    }
>> > +
>> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.nvm); opcode++) {
>> > +        if (!n->cse.iocs.nvm[opcode]) {
>> > +            continue;
>> > +        }
>> > +
>> > +        switch (opcode) {
>> > +        case NVME_CMD_FLUSH:
>> > +        case NVME_CMD_WRITE:
>> > +        case NVME_CMD_READ:
>> > +        case NVME_CMD_COMPARE:
>> > +        case NVME_CMD_WRITE_ZEROES:
>> > +        case NVME_CMD_DSM:
>> > +        case NVME_CMD_VERIFY:
>> > +        case NVME_CMD_COPY:
>> > +            break;
>> > +        case NVME_CMD_IO_MGMT_RECV:
>> > +        case NVME_CMD_IO_MGMT_SEND:
>> > +            if (cmd_io_mgmt_checked) {
>> > +                break;
>> > +            }
>> > +
>> > +            /* check for NVME_IOMS_MO_RUH_UPDATE */
>> > +            if (n->subsys->params.fdp.enabled) {
>> > +                nvme_add_blocker_feature(blocker_features, "FDP");
>> > +            }
>> > +
>> > +            cmd_io_mgmt_checked = true;
>> > +
>> > +            break;
>> > +        default:
>> > +            g_assert_not_reached();
>> > +        }
>> > +    }
>> > +
>> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.zoned); opcode++) {
>> > +        /*
>> > +         * If command isn't supported or we have the same command
>> > +         * in n->cse.iocs.nvm, then we can skip it here.
>> > +         */
>> > +        if (!n->cse.iocs.zoned[opcode] || n->cse.iocs.nvm[opcode]) {
>> > +            continue;
>> > +        }
>> > +
>> > +        switch (opcode) {
>> > +        case NVME_CMD_ZONE_APPEND:
>> > +        case NVME_CMD_ZONE_MGMT_SEND:
>> > +        case NVME_CMD_ZONE_MGMT_RECV:
>> > +            if (cmd_zone_checked) {
>> > +                break;
>> > +            }
>> > +
>> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
>> > +                if (!ns) {
>> > +                    continue;
>> > +                }
>> > +
>> > +                if (ns->params.zoned) {
>> > +                    nvme_add_blocker_feature(blocker_features,
>> > +                                             "Zoned Namespace");
>> > +                    break;
>> > +                }
>> > +            }
>> > +
>> > +            cmd_zone_checked = true;
>> > +
>> > +            break;
>> > +        default:
>> > +            g_assert_not_reached();
>> > +        }
>> > +    }
>> > +
>> > +    /*
>> > +     * Try our best to explicitly detect all not supported caps,
>> > +     * to let users know what features cause migration to be blocked,
>> > +     * but in case we miss handling here, everything else will be
>> > +     * covered by unsupported_cap check.
>> > +     */
>> > +    if (NVME_CAP_CMBS(cap)) {
>> > +        nvme_add_blocker_feature(blocker_features, "CMB");
>> > +        cap &= ~((uint64_t)CAP_CMBS_MASK << CAP_CMBS_SHIFT);
>> > +    }
>> > +
>> > +    if (NVME_CAP_PMRS(cap)) {
>> > +        nvme_add_blocker_feature(blocker_features, "PMR");
>> > +        cap &= ~((uint64_t)CAP_PMRS_MASK << CAP_PMRS_SHIFT);
>> > +    }
>> > +
>> > +    unsupported_cap = cap & ~NVME_MIGRATION_SUPPORTED_CAP_BITS;
>> > +    if (unsupported_cap) {
>> > +        nvme_add_blocker_feature(blocker_features, "unknown capability");
>> > +    }
>> > +
>> > +    assert(n->migration_blocker == NULL);
>> > +    if (strlen(blocker_features) > 0) {
>> > +        error_setg(&n->migration_blocker,
>> > +                   "Migration is not supported for %s", blocker_features);
>> > +        if (migrate_add_blocker(&n->migration_blocker, errp) < 0) {
>> > +            return false;
>> > +        }
>> > +    }
>> > +
>> > +    return true;
>> > +}
>> > +
>> >  static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
>> >  {
>> >      int cntlid;
>> > @@ -9338,6 +9543,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
>> >
>> >          n->subsys->namespaces[ns->params.nsid] = ns;
>> >      }
>> > +
>> > +    if (!nvme_set_migration_blockers(n, pci_dev, errp)) {
>> > +        return;
>> > +    }
>> >  }
>> >
>> >  static void nvme_exit(PCIDevice *pci_dev)
>> > @@ -9390,6 +9599,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>> >      }
>> >
>> >      memory_region_del_subregion(&n->bar0, &n->iomem);
>> > +
>> > +    migrate_del_blocker(&n->migration_blocker);
>> >  }
>> >
>> >  static const Property nvme_props[] = {
>> > diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>> > index 5ef3ebee29e..05aee24a15c 100644
>> > --- a/hw/nvme/nvme.h
>> > +++ b/hw/nvme/nvme.h
>> > @@ -668,6 +668,9 @@ typedef struct NvmeCtrl {
>> >
>> >      /* Socket mapping to SPDM over NVMe Security In/Out commands */
>> >      int spdm_socket;
>> > +
>> > +    /* Migration-related stuff */
>> > +    Error *migration_blocker;
>> >  } NvmeCtrl;
>> >
>> >  typedef enum NvmeResetType {
>> > diff --git a/include/block/nvme.h b/include/block/nvme.h
>> > index e4e7be51205..17a7c7818d7 100644
>> > --- a/include/block/nvme.h
>> > +++ b/include/block/nvme.h
>> > @@ -141,6 +141,18 @@ enum NvmeCapMask {
>> >  #define NVME_CAP_SET_CMBS(cap, val)   \
>> >      ((cap) |= (uint64_t)((val) & CAP_CMBS_MASK)   << CAP_CMBS_SHIFT)
>> >
>> > +#define NVME_MIGRATION_SUPPORTED_CAP_BITS ( \
>> > +      ((uint64_t)CAP_MQES_MASK   << CAP_MQES_SHIFT)   \
>> > +    | ((uint64_t)CAP_CQR_MASK    << CAP_CQR_SHIFT)    \
>> > +    | ((uint64_t)CAP_AMS_MASK    << CAP_AMS_SHIFT)    \
>> > +    | ((uint64_t)CAP_TO_MASK     << CAP_TO_SHIFT)     \
>> > +    | ((uint64_t)CAP_DSTRD_MASK  << CAP_DSTRD_SHIFT)  \
>> > +    | ((uint64_t)CAP_NSSRS_MASK  << CAP_NSSRS_SHIFT)  \
>> > +    | ((uint64_t)CAP_CSS_MASK    << CAP_CSS_SHIFT)    \
>> > +    | ((uint64_t)CAP_MPSMIN_MASK << CAP_MPSMIN_SHIFT) \
>> > +    | ((uint64_t)CAP_MPSMAX_MASK << CAP_MPSMAX_SHIFT) \
>> > +)
>> > +
>> >  enum NvmeCapCss {
>> >      NVME_CAP_CSS_NCSS    = 1 << 0,
>> >      NVME_CAP_CSS_IOCSS   = 1 << 6,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases
  2026-06-11 14:24       ` Fabiano Rosas
@ 2026-06-11 14:32         ` Alexander Mikhalitsyn
  2026-06-11 18:11           ` Alexander Mikhalitsyn
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 14:32 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Klaus Jensen, Stefan Hajnoczi, Jesper Devantier,
	qemu-block, Hanna Reitz, Fam Zheng, Stéphane Graber,
	Kevin Wolf, Keith Busch, Laurent Vivier, Zhao Liu, Paolo Bonzini,
	Peter Xu, Philippe Mathieu-Daudé, Alexander Mikhalitsyn,
	Klaus Jensen

Am Do., 11. Juni 2026 um 16:24 Uhr schrieb Fabiano Rosas <farosas@suse.de>:
>
> Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:
>
> > Am Do., 11. Juni 2026 um 15:31 Uhr schrieb Fabiano Rosas <farosas@suse.de>:
> >>
> >> Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:
> >>
> >> > From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> >> >
> >> > Let's block migration for cases we don't support:
> >> > - SR-IOV
> >> > - CMB
> >> > - PMR
> >> > - SPDM
> >> >
> >> > No functional changes here, because NVMe migration is
> >> > not supported at all as of this commit.
> >> >
> >> > Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> >> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> >> > ---
> >> > v9:
> >> >     - check-patch trivial fixes
> >> > ---
> >> >  hw/nvme/ctrl.c       | 211 +++++++++++++++++++++++++++++++++++++++++++
> >> >  hw/nvme/nvme.h       |   3 +
> >> >  include/block/nvme.h |  12 +++
> >> >  3 files changed, 226 insertions(+)
> >> >
> >> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> >> > index 815f39173c8..7510a9e0296 100644
> >> > --- a/hw/nvme/ctrl.c
> >> > +++ b/hw/nvme/ctrl.c
> >> > @@ -209,6 +209,7 @@
> >> >  #include "hw/pci/msix.h"
> >> >  #include "hw/pci/pcie_sriov.h"
> >> >  #include "system/spdm-socket.h"
> >> > +#include "migration/blocker.h"
> >> >  #include "migration/vmstate.h"
> >> >
> >> >  #include "nvme.h"
> >> > @@ -252,6 +253,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
> >> >      [NVME_COMMAND_SET_PROFILE]      = true,
> >> >      [NVME_FDP_MODE]                 = true,
> >> >      [NVME_FDP_EVENTS]               = true,
> >> > +    /* if you add something here, please update nvme_set_migration_blockers() */
> >> >  };
> >> >
> >> >  static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
> >> > @@ -4603,6 +4605,7 @@ static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
> >> >          return 0;
> >> >      case NVME_IOMS_MO_RUH_UPDATE:
> >> >          return nvme_io_mgmt_send_ruh_update(n, req);
> >> > +    /* if you add something here, please update nvme_set_migration_blockers() */
> >> >      default:
> >> >          return NVME_INVALID_FIELD | NVME_DNR;
> >> >      };
> >> > @@ -7522,6 +7525,10 @@ static uint16_t nvme_security_receive(NvmeCtrl *n, NvmeRequest *req)
> >> >
> >> >  static uint16_t nvme_directive_send(NvmeCtrl *n, NvmeRequest *req)
> >> >  {
> >> > +    /*
> >> > +     * When adding a new dtype handling here,
> >> > +     * please also update nvme_set_migration_blockers().
> >> > +     */
> >> >      return NVME_INVALID_FIELD | NVME_DNR;
> >> >  }
> >> >
> >> > @@ -9233,6 +9240,204 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
> >> >      }
> >> >  }
> >> >
> >> > +#define BLOCKER_FEATURES_MAX_LEN 256
> >> > +
> >> > +static inline void nvme_add_blocker_feature(char *blocker_features,
> >> > +                                            const char *feature)
> >> > +{
> >> > +    if (strlen(blocker_features) > 0) {
> >> > +        g_strlcat(blocker_features, ", ", BLOCKER_FEATURES_MAX_LEN);
> >> > +    }
> >> > +    g_strlcat(blocker_features, feature, BLOCKER_FEATURES_MAX_LEN);
> >> > +}
> >> > +
> >> > +static bool nvme_set_migration_blockers(NvmeCtrl *n, PCIDevice *pci_dev,
> >> > +                                        Error **errp)
> >> > +{
> >> > +    uint64_t unsupported_cap, cap = ldq_le_p(&n->bar.cap);
> >> > +    char blocker_features[BLOCKER_FEATURES_MAX_LEN] = "";
> >> > +    bool adm_cmd_security_checked = false;
> >> > +    bool cmd_io_mgmt_checked = false;
> >> > +    bool cmd_zone_checked = false;
> >> > +
> >> > +    /*
> >> > +     * Idea of this function is simple, we iterate over all Command Sets and
> >> > +     * for each supported command we provide a special handling logic to
> >> > +     * determine if we should block migration or not.
> >> > +     *
> >> > +     * For instance, we have NVME_ADM_CMD_NS_ATTACHMENT and it is always
> >> > +     * available to the guest, but if there is only 1 namespace, then it is
> >> > +     * safe to allow migration, but if there are more, then we need to block
> >> > +     * migration because we don't handle this in migration code yet.
> >> > +     */
> >> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.acs); opcode++) {
> >> > +        /* Is command supported? */
> >> > +        if (!n->cse.acs[opcode]) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        switch (opcode) {
> >> > +        case NVME_ADM_CMD_DELETE_SQ:
> >> > +        case NVME_ADM_CMD_CREATE_SQ:
> >> > +        case NVME_ADM_CMD_GET_LOG_PAGE:
> >> > +        case NVME_ADM_CMD_DELETE_CQ:
> >> > +        case NVME_ADM_CMD_CREATE_CQ:
> >> > +        case NVME_ADM_CMD_IDENTIFY:
> >> > +        case NVME_ADM_CMD_ABORT:
> >> > +        case NVME_ADM_CMD_SET_FEATURES:
> >> > +        case NVME_ADM_CMD_GET_FEATURES:
> >> > +        case NVME_ADM_CMD_ASYNC_EV_REQ:
> >> > +        case NVME_ADM_CMD_DBBUF_CONFIG:
> >> > +        case NVME_ADM_CMD_FORMAT_NVM:
> >> > +        case NVME_ADM_CMD_DIRECTIVE_SEND:
> >> > +        case NVME_ADM_CMD_DIRECTIVE_RECV:
> >> > +            break;
> >> > +        case NVME_ADM_CMD_NS_ATTACHMENT:
> >> > +            int namespaces_num = 0;
> >>
> >> Hi, clang flags this:
> >>
> >> ../hw/nvme/ctrl.c:9385:13: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
> >>  9385 |             int namespaces_num = 0;
> >>       |             ^
> >>
> >
> > Hi Fabiano,
> >
> > ugh, I'll handle this. Let me then burn even more CI minutes and run
> > every single test in the matrix :)
> >
>
> I'm not sure if the CI would catch this, I run different tests
> locally.
>
> > Sorry about making a noise by making submissions with a stupid mistakes :(
> >
>
> It's completely fine, don't worry. We're close to merging this, I was
> just running a final set of tests to give my ack. It's good that we
> caught it before merge.

thank you! ;)

I've also caught something now on Alpine (PAGE_SIZE macro redefinition
error), I've fixed this
and now running tests just to make sure all fine now.

I'll get back to this thread once I get everything green on CI.

>
> > Kind regards,
> > Alex
> >
> >>
> >> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> >> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> >> > +                if (!ns) {
> >> > +                    continue;
> >> > +                }
> >> > +
> >> > +                namespaces_num++;
> >> > +            }
> >> > +
> >> > +            if (namespaces_num > 1) {
> >> > +                nvme_add_blocker_feature(blocker_features,
> >> > +                                         "Namespace Attachment");
> >> > +            }
> >> > +
> >> > +            break;
> >> > +        case NVME_ADM_CMD_VIRT_MNGMT:
> >> > +            if (n->params.sriov_max_vfs) {
> >> > +                nvme_add_blocker_feature(blocker_features, "SR-IOV");
> >> > +            }
> >> > +
> >> > +            break;
> >> > +        case NVME_ADM_CMD_SECURITY_SEND:
> >> > +        case NVME_ADM_CMD_SECURITY_RECV:
> >> > +            if (adm_cmd_security_checked) {
> >> > +                break;
> >> > +            }
> >> > +
> >> > +            if (pci_dev->spdm_port) {
> >> > +                nvme_add_blocker_feature(blocker_features, "SPDM");
> >> > +            }
> >> > +
> >> > +            adm_cmd_security_checked = true;
> >> > +
> >> > +            break;
> >> > +        default:
> >> > +            g_assert_not_reached();
> >> > +        }
> >> > +    }
> >> > +
> >> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.nvm); opcode++) {
> >> > +        if (!n->cse.iocs.nvm[opcode]) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        switch (opcode) {
> >> > +        case NVME_CMD_FLUSH:
> >> > +        case NVME_CMD_WRITE:
> >> > +        case NVME_CMD_READ:
> >> > +        case NVME_CMD_COMPARE:
> >> > +        case NVME_CMD_WRITE_ZEROES:
> >> > +        case NVME_CMD_DSM:
> >> > +        case NVME_CMD_VERIFY:
> >> > +        case NVME_CMD_COPY:
> >> > +            break;
> >> > +        case NVME_CMD_IO_MGMT_RECV:
> >> > +        case NVME_CMD_IO_MGMT_SEND:
> >> > +            if (cmd_io_mgmt_checked) {
> >> > +                break;
> >> > +            }
> >> > +
> >> > +            /* check for NVME_IOMS_MO_RUH_UPDATE */
> >> > +            if (n->subsys->params.fdp.enabled) {
> >> > +                nvme_add_blocker_feature(blocker_features, "FDP");
> >> > +            }
> >> > +
> >> > +            cmd_io_mgmt_checked = true;
> >> > +
> >> > +            break;
> >> > +        default:
> >> > +            g_assert_not_reached();
> >> > +        }
> >> > +    }
> >> > +
> >> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.zoned); opcode++) {
> >> > +        /*
> >> > +         * If command isn't supported or we have the same command
> >> > +         * in n->cse.iocs.nvm, then we can skip it here.
> >> > +         */
> >> > +        if (!n->cse.iocs.zoned[opcode] || n->cse.iocs.nvm[opcode]) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        switch (opcode) {
> >> > +        case NVME_CMD_ZONE_APPEND:
> >> > +        case NVME_CMD_ZONE_MGMT_SEND:
> >> > +        case NVME_CMD_ZONE_MGMT_RECV:
> >> > +            if (cmd_zone_checked) {
> >> > +                break;
> >> > +            }
> >> > +
> >> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> >> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> >> > +                if (!ns) {
> >> > +                    continue;
> >> > +                }
> >> > +
> >> > +                if (ns->params.zoned) {
> >> > +                    nvme_add_blocker_feature(blocker_features,
> >> > +                                             "Zoned Namespace");
> >> > +                    break;
> >> > +                }
> >> > +            }
> >> > +
> >> > +            cmd_zone_checked = true;
> >> > +
> >> > +            break;
> >> > +        default:
> >> > +            g_assert_not_reached();
> >> > +        }
> >> > +    }
> >> > +
> >> > +    /*
> >> > +     * Try our best to explicitly detect all not supported caps,
> >> > +     * to let users know what features cause migration to be blocked,
> >> > +     * but in case we miss handling here, everything else will be
> >> > +     * covered by unsupported_cap check.
> >> > +     */
> >> > +    if (NVME_CAP_CMBS(cap)) {
> >> > +        nvme_add_blocker_feature(blocker_features, "CMB");
> >> > +        cap &= ~((uint64_t)CAP_CMBS_MASK << CAP_CMBS_SHIFT);
> >> > +    }
> >> > +
> >> > +    if (NVME_CAP_PMRS(cap)) {
> >> > +        nvme_add_blocker_feature(blocker_features, "PMR");
> >> > +        cap &= ~((uint64_t)CAP_PMRS_MASK << CAP_PMRS_SHIFT);
> >> > +    }
> >> > +
> >> > +    unsupported_cap = cap & ~NVME_MIGRATION_SUPPORTED_CAP_BITS;
> >> > +    if (unsupported_cap) {
> >> > +        nvme_add_blocker_feature(blocker_features, "unknown capability");
> >> > +    }
> >> > +
> >> > +    assert(n->migration_blocker == NULL);
> >> > +    if (strlen(blocker_features) > 0) {
> >> > +        error_setg(&n->migration_blocker,
> >> > +                   "Migration is not supported for %s", blocker_features);
> >> > +        if (migrate_add_blocker(&n->migration_blocker, errp) < 0) {
> >> > +            return false;
> >> > +        }
> >> > +    }
> >> > +
> >> > +    return true;
> >> > +}
> >> > +
> >> >  static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
> >> >  {
> >> >      int cntlid;
> >> > @@ -9338,6 +9543,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> >> >
> >> >          n->subsys->namespaces[ns->params.nsid] = ns;
> >> >      }
> >> > +
> >> > +    if (!nvme_set_migration_blockers(n, pci_dev, errp)) {
> >> > +        return;
> >> > +    }
> >> >  }
> >> >
> >> >  static void nvme_exit(PCIDevice *pci_dev)
> >> > @@ -9390,6 +9599,8 @@ static void nvme_exit(PCIDevice *pci_dev)
> >> >      }
> >> >
> >> >      memory_region_del_subregion(&n->bar0, &n->iomem);
> >> > +
> >> > +    migrate_del_blocker(&n->migration_blocker);
> >> >  }
> >> >
> >> >  static const Property nvme_props[] = {
> >> > diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> >> > index 5ef3ebee29e..05aee24a15c 100644
> >> > --- a/hw/nvme/nvme.h
> >> > +++ b/hw/nvme/nvme.h
> >> > @@ -668,6 +668,9 @@ typedef struct NvmeCtrl {
> >> >
> >> >      /* Socket mapping to SPDM over NVMe Security In/Out commands */
> >> >      int spdm_socket;
> >> > +
> >> > +    /* Migration-related stuff */
> >> > +    Error *migration_blocker;
> >> >  } NvmeCtrl;
> >> >
> >> >  typedef enum NvmeResetType {
> >> > diff --git a/include/block/nvme.h b/include/block/nvme.h
> >> > index e4e7be51205..17a7c7818d7 100644
> >> > --- a/include/block/nvme.h
> >> > +++ b/include/block/nvme.h
> >> > @@ -141,6 +141,18 @@ enum NvmeCapMask {
> >> >  #define NVME_CAP_SET_CMBS(cap, val)   \
> >> >      ((cap) |= (uint64_t)((val) & CAP_CMBS_MASK)   << CAP_CMBS_SHIFT)
> >> >
> >> > +#define NVME_MIGRATION_SUPPORTED_CAP_BITS ( \
> >> > +      ((uint64_t)CAP_MQES_MASK   << CAP_MQES_SHIFT)   \
> >> > +    | ((uint64_t)CAP_CQR_MASK    << CAP_CQR_SHIFT)    \
> >> > +    | ((uint64_t)CAP_AMS_MASK    << CAP_AMS_SHIFT)    \
> >> > +    | ((uint64_t)CAP_TO_MASK     << CAP_TO_SHIFT)     \
> >> > +    | ((uint64_t)CAP_DSTRD_MASK  << CAP_DSTRD_SHIFT)  \
> >> > +    | ((uint64_t)CAP_NSSRS_MASK  << CAP_NSSRS_SHIFT)  \
> >> > +    | ((uint64_t)CAP_CSS_MASK    << CAP_CSS_SHIFT)    \
> >> > +    | ((uint64_t)CAP_MPSMIN_MASK << CAP_MPSMIN_SHIFT) \
> >> > +    | ((uint64_t)CAP_MPSMAX_MASK << CAP_MPSMAX_SHIFT) \
> >> > +)
> >> > +
> >> >  enum NvmeCapCss {
> >> >      NVME_CAP_CSS_NCSS    = 1 << 0,
> >> >      NVME_CAP_CSS_IOCSS   = 1 << 6,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases
  2026-06-11 14:32         ` Alexander Mikhalitsyn
@ 2026-06-11 18:11           ` Alexander Mikhalitsyn
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Mikhalitsyn @ 2026-06-11 18:11 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Klaus Jensen, Stefan Hajnoczi, Jesper Devantier,
	qemu-block, Hanna Reitz, Fam Zheng, Stéphane Graber,
	Kevin Wolf, Keith Busch, Laurent Vivier, Zhao Liu, Paolo Bonzini,
	Peter Xu, Philippe Mathieu-Daudé, Alexander Mikhalitsyn,
	Klaus Jensen

Am Do., 11. Juni 2026 um 16:32 Uhr schrieb Alexander Mikhalitsyn
<alexander@mihalicyn.com>:
>
> Am Do., 11. Juni 2026 um 16:24 Uhr schrieb Fabiano Rosas <farosas@suse.de>:
> >
> > Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:
> >
> > > Am Do., 11. Juni 2026 um 15:31 Uhr schrieb Fabiano Rosas <farosas@suse.de>:
> > >>
> > >> Alexander Mikhalitsyn <alexander@mihalicyn.com> writes:
> > >>
> > >> > From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> > >> >
> > >> > Let's block migration for cases we don't support:
> > >> > - SR-IOV
> > >> > - CMB
> > >> > - PMR
> > >> > - SPDM
> > >> >
> > >> > No functional changes here, because NVMe migration is
> > >> > not supported at all as of this commit.
> > >> >
> > >> > Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> > >> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
> > >> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> > >> > ---
> > >> > v9:
> > >> >     - check-patch trivial fixes
> > >> > ---
> > >> >  hw/nvme/ctrl.c       | 211 +++++++++++++++++++++++++++++++++++++++++++
> > >> >  hw/nvme/nvme.h       |   3 +
> > >> >  include/block/nvme.h |  12 +++
> > >> >  3 files changed, 226 insertions(+)
> > >> >
> > >> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > >> > index 815f39173c8..7510a9e0296 100644
> > >> > --- a/hw/nvme/ctrl.c
> > >> > +++ b/hw/nvme/ctrl.c
> > >> > @@ -209,6 +209,7 @@
> > >> >  #include "hw/pci/msix.h"
> > >> >  #include "hw/pci/pcie_sriov.h"
> > >> >  #include "system/spdm-socket.h"
> > >> > +#include "migration/blocker.h"
> > >> >  #include "migration/vmstate.h"
> > >> >
> > >> >  #include "nvme.h"
> > >> > @@ -252,6 +253,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
> > >> >      [NVME_COMMAND_SET_PROFILE]      = true,
> > >> >      [NVME_FDP_MODE]                 = true,
> > >> >      [NVME_FDP_EVENTS]               = true,
> > >> > +    /* if you add something here, please update nvme_set_migration_blockers() */
> > >> >  };
> > >> >
> > >> >  static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
> > >> > @@ -4603,6 +4605,7 @@ static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
> > >> >          return 0;
> > >> >      case NVME_IOMS_MO_RUH_UPDATE:
> > >> >          return nvme_io_mgmt_send_ruh_update(n, req);
> > >> > +    /* if you add something here, please update nvme_set_migration_blockers() */
> > >> >      default:
> > >> >          return NVME_INVALID_FIELD | NVME_DNR;
> > >> >      };
> > >> > @@ -7522,6 +7525,10 @@ static uint16_t nvme_security_receive(NvmeCtrl *n, NvmeRequest *req)
> > >> >
> > >> >  static uint16_t nvme_directive_send(NvmeCtrl *n, NvmeRequest *req)
> > >> >  {
> > >> > +    /*
> > >> > +     * When adding a new dtype handling here,
> > >> > +     * please also update nvme_set_migration_blockers().
> > >> > +     */
> > >> >      return NVME_INVALID_FIELD | NVME_DNR;
> > >> >  }
> > >> >
> > >> > @@ -9233,6 +9240,204 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
> > >> >      }
> > >> >  }
> > >> >
> > >> > +#define BLOCKER_FEATURES_MAX_LEN 256
> > >> > +
> > >> > +static inline void nvme_add_blocker_feature(char *blocker_features,
> > >> > +                                            const char *feature)
> > >> > +{
> > >> > +    if (strlen(blocker_features) > 0) {
> > >> > +        g_strlcat(blocker_features, ", ", BLOCKER_FEATURES_MAX_LEN);
> > >> > +    }
> > >> > +    g_strlcat(blocker_features, feature, BLOCKER_FEATURES_MAX_LEN);
> > >> > +}
> > >> > +
> > >> > +static bool nvme_set_migration_blockers(NvmeCtrl *n, PCIDevice *pci_dev,
> > >> > +                                        Error **errp)
> > >> > +{
> > >> > +    uint64_t unsupported_cap, cap = ldq_le_p(&n->bar.cap);
> > >> > +    char blocker_features[BLOCKER_FEATURES_MAX_LEN] = "";
> > >> > +    bool adm_cmd_security_checked = false;
> > >> > +    bool cmd_io_mgmt_checked = false;
> > >> > +    bool cmd_zone_checked = false;
> > >> > +
> > >> > +    /*
> > >> > +     * Idea of this function is simple, we iterate over all Command Sets and
> > >> > +     * for each supported command we provide a special handling logic to
> > >> > +     * determine if we should block migration or not.
> > >> > +     *
> > >> > +     * For instance, we have NVME_ADM_CMD_NS_ATTACHMENT and it is always
> > >> > +     * available to the guest, but if there is only 1 namespace, then it is
> > >> > +     * safe to allow migration, but if there are more, then we need to block
> > >> > +     * migration because we don't handle this in migration code yet.
> > >> > +     */
> > >> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.acs); opcode++) {
> > >> > +        /* Is command supported? */
> > >> > +        if (!n->cse.acs[opcode]) {
> > >> > +            continue;
> > >> > +        }
> > >> > +
> > >> > +        switch (opcode) {
> > >> > +        case NVME_ADM_CMD_DELETE_SQ:
> > >> > +        case NVME_ADM_CMD_CREATE_SQ:
> > >> > +        case NVME_ADM_CMD_GET_LOG_PAGE:
> > >> > +        case NVME_ADM_CMD_DELETE_CQ:
> > >> > +        case NVME_ADM_CMD_CREATE_CQ:
> > >> > +        case NVME_ADM_CMD_IDENTIFY:
> > >> > +        case NVME_ADM_CMD_ABORT:
> > >> > +        case NVME_ADM_CMD_SET_FEATURES:
> > >> > +        case NVME_ADM_CMD_GET_FEATURES:
> > >> > +        case NVME_ADM_CMD_ASYNC_EV_REQ:
> > >> > +        case NVME_ADM_CMD_DBBUF_CONFIG:
> > >> > +        case NVME_ADM_CMD_FORMAT_NVM:
> > >> > +        case NVME_ADM_CMD_DIRECTIVE_SEND:
> > >> > +        case NVME_ADM_CMD_DIRECTIVE_RECV:
> > >> > +            break;
> > >> > +        case NVME_ADM_CMD_NS_ATTACHMENT:
> > >> > +            int namespaces_num = 0;
> > >>
> > >> Hi, clang flags this:
> > >>
> > >> ../hw/nvme/ctrl.c:9385:13: error: label followed by a declaration is a C23 extension [-Werror,-Wc23-extensions]
> > >>  9385 |             int namespaces_num = 0;
> > >>       |             ^
> > >>
> > >
> > > Hi Fabiano,
> > >
> > > ugh, I'll handle this. Let me then burn even more CI minutes and run
> > > every single test in the matrix :)
> > >
> >
> > I'm not sure if the CI would catch this, I run different tests
> > locally.
> >
> > > Sorry about making a noise by making submissions with a stupid mistakes :(
> > >
> >
> > It's completely fine, don't worry. We're close to merging this, I was
> > just running a final set of tests to give my ack. It's good that we
> > caught it before merge.
>
> thank you! ;)
>
> I've also caught something now on Alpine (PAGE_SIZE macro redefinition
> error), I've fixed this
> and now running tests just to make sure all fine now.
>
> I'll get back to this thread once I get everything green on CI.

all fixed now https://gitlab.com/mihalicyn/qemu/-/pipelines/2594705360

I've just sent a revision 10
(https://lore.kernel.org/qemu-devel/20260611180842.6390-1-alexander@mihalicyn.com/)

Kind regards,
Alex

>
> >
> > > Kind regards,
> > > Alex
> > >
> > >>
> > >> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> > >> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> > >> > +                if (!ns) {
> > >> > +                    continue;
> > >> > +                }
> > >> > +
> > >> > +                namespaces_num++;
> > >> > +            }
> > >> > +
> > >> > +            if (namespaces_num > 1) {
> > >> > +                nvme_add_blocker_feature(blocker_features,
> > >> > +                                         "Namespace Attachment");
> > >> > +            }
> > >> > +
> > >> > +            break;
> > >> > +        case NVME_ADM_CMD_VIRT_MNGMT:
> > >> > +            if (n->params.sriov_max_vfs) {
> > >> > +                nvme_add_blocker_feature(blocker_features, "SR-IOV");
> > >> > +            }
> > >> > +
> > >> > +            break;
> > >> > +        case NVME_ADM_CMD_SECURITY_SEND:
> > >> > +        case NVME_ADM_CMD_SECURITY_RECV:
> > >> > +            if (adm_cmd_security_checked) {
> > >> > +                break;
> > >> > +            }
> > >> > +
> > >> > +            if (pci_dev->spdm_port) {
> > >> > +                nvme_add_blocker_feature(blocker_features, "SPDM");
> > >> > +            }
> > >> > +
> > >> > +            adm_cmd_security_checked = true;
> > >> > +
> > >> > +            break;
> > >> > +        default:
> > >> > +            g_assert_not_reached();
> > >> > +        }
> > >> > +    }
> > >> > +
> > >> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.nvm); opcode++) {
> > >> > +        if (!n->cse.iocs.nvm[opcode]) {
> > >> > +            continue;
> > >> > +        }
> > >> > +
> > >> > +        switch (opcode) {
> > >> > +        case NVME_CMD_FLUSH:
> > >> > +        case NVME_CMD_WRITE:
> > >> > +        case NVME_CMD_READ:
> > >> > +        case NVME_CMD_COMPARE:
> > >> > +        case NVME_CMD_WRITE_ZEROES:
> > >> > +        case NVME_CMD_DSM:
> > >> > +        case NVME_CMD_VERIFY:
> > >> > +        case NVME_CMD_COPY:
> > >> > +            break;
> > >> > +        case NVME_CMD_IO_MGMT_RECV:
> > >> > +        case NVME_CMD_IO_MGMT_SEND:
> > >> > +            if (cmd_io_mgmt_checked) {
> > >> > +                break;
> > >> > +            }
> > >> > +
> > >> > +            /* check for NVME_IOMS_MO_RUH_UPDATE */
> > >> > +            if (n->subsys->params.fdp.enabled) {
> > >> > +                nvme_add_blocker_feature(blocker_features, "FDP");
> > >> > +            }
> > >> > +
> > >> > +            cmd_io_mgmt_checked = true;
> > >> > +
> > >> > +            break;
> > >> > +        default:
> > >> > +            g_assert_not_reached();
> > >> > +        }
> > >> > +    }
> > >> > +
> > >> > +    for (int opcode = 0; opcode < ARRAY_SIZE(n->cse.iocs.zoned); opcode++) {
> > >> > +        /*
> > >> > +         * If command isn't supported or we have the same command
> > >> > +         * in n->cse.iocs.nvm, then we can skip it here.
> > >> > +         */
> > >> > +        if (!n->cse.iocs.zoned[opcode] || n->cse.iocs.nvm[opcode]) {
> > >> > +            continue;
> > >> > +        }
> > >> > +
> > >> > +        switch (opcode) {
> > >> > +        case NVME_CMD_ZONE_APPEND:
> > >> > +        case NVME_CMD_ZONE_MGMT_SEND:
> > >> > +        case NVME_CMD_ZONE_MGMT_RECV:
> > >> > +            if (cmd_zone_checked) {
> > >> > +                break;
> > >> > +            }
> > >> > +
> > >> > +            for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> > >> > +                NvmeNamespace *ns = nvme_subsys_ns(n->subsys, i);
> > >> > +                if (!ns) {
> > >> > +                    continue;
> > >> > +                }
> > >> > +
> > >> > +                if (ns->params.zoned) {
> > >> > +                    nvme_add_blocker_feature(blocker_features,
> > >> > +                                             "Zoned Namespace");
> > >> > +                    break;
> > >> > +                }
> > >> > +            }
> > >> > +
> > >> > +            cmd_zone_checked = true;
> > >> > +
> > >> > +            break;
> > >> > +        default:
> > >> > +            g_assert_not_reached();
> > >> > +        }
> > >> > +    }
> > >> > +
> > >> > +    /*
> > >> > +     * Try our best to explicitly detect all not supported caps,
> > >> > +     * to let users know what features cause migration to be blocked,
> > >> > +     * but in case we miss handling here, everything else will be
> > >> > +     * covered by unsupported_cap check.
> > >> > +     */
> > >> > +    if (NVME_CAP_CMBS(cap)) {
> > >> > +        nvme_add_blocker_feature(blocker_features, "CMB");
> > >> > +        cap &= ~((uint64_t)CAP_CMBS_MASK << CAP_CMBS_SHIFT);
> > >> > +    }
> > >> > +
> > >> > +    if (NVME_CAP_PMRS(cap)) {
> > >> > +        nvme_add_blocker_feature(blocker_features, "PMR");
> > >> > +        cap &= ~((uint64_t)CAP_PMRS_MASK << CAP_PMRS_SHIFT);
> > >> > +    }
> > >> > +
> > >> > +    unsupported_cap = cap & ~NVME_MIGRATION_SUPPORTED_CAP_BITS;
> > >> > +    if (unsupported_cap) {
> > >> > +        nvme_add_blocker_feature(blocker_features, "unknown capability");
> > >> > +    }
> > >> > +
> > >> > +    assert(n->migration_blocker == NULL);
> > >> > +    if (strlen(blocker_features) > 0) {
> > >> > +        error_setg(&n->migration_blocker,
> > >> > +                   "Migration is not supported for %s", blocker_features);
> > >> > +        if (migrate_add_blocker(&n->migration_blocker, errp) < 0) {
> > >> > +            return false;
> > >> > +        }
> > >> > +    }
> > >> > +
> > >> > +    return true;
> > >> > +}
> > >> > +
> > >> >  static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
> > >> >  {
> > >> >      int cntlid;
> > >> > @@ -9338,6 +9543,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
> > >> >
> > >> >          n->subsys->namespaces[ns->params.nsid] = ns;
> > >> >      }
> > >> > +
> > >> > +    if (!nvme_set_migration_blockers(n, pci_dev, errp)) {
> > >> > +        return;
> > >> > +    }
> > >> >  }
> > >> >
> > >> >  static void nvme_exit(PCIDevice *pci_dev)
> > >> > @@ -9390,6 +9599,8 @@ static void nvme_exit(PCIDevice *pci_dev)
> > >> >      }
> > >> >
> > >> >      memory_region_del_subregion(&n->bar0, &n->iomem);
> > >> > +
> > >> > +    migrate_del_blocker(&n->migration_blocker);
> > >> >  }
> > >> >
> > >> >  static const Property nvme_props[] = {
> > >> > diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> > >> > index 5ef3ebee29e..05aee24a15c 100644
> > >> > --- a/hw/nvme/nvme.h
> > >> > +++ b/hw/nvme/nvme.h
> > >> > @@ -668,6 +668,9 @@ typedef struct NvmeCtrl {
> > >> >
> > >> >      /* Socket mapping to SPDM over NVMe Security In/Out commands */
> > >> >      int spdm_socket;
> > >> > +
> > >> > +    /* Migration-related stuff */
> > >> > +    Error *migration_blocker;
> > >> >  } NvmeCtrl;
> > >> >
> > >> >  typedef enum NvmeResetType {
> > >> > diff --git a/include/block/nvme.h b/include/block/nvme.h
> > >> > index e4e7be51205..17a7c7818d7 100644
> > >> > --- a/include/block/nvme.h
> > >> > +++ b/include/block/nvme.h
> > >> > @@ -141,6 +141,18 @@ enum NvmeCapMask {
> > >> >  #define NVME_CAP_SET_CMBS(cap, val)   \
> > >> >      ((cap) |= (uint64_t)((val) & CAP_CMBS_MASK)   << CAP_CMBS_SHIFT)
> > >> >
> > >> > +#define NVME_MIGRATION_SUPPORTED_CAP_BITS ( \
> > >> > +      ((uint64_t)CAP_MQES_MASK   << CAP_MQES_SHIFT)   \
> > >> > +    | ((uint64_t)CAP_CQR_MASK    << CAP_CQR_SHIFT)    \
> > >> > +    | ((uint64_t)CAP_AMS_MASK    << CAP_AMS_SHIFT)    \
> > >> > +    | ((uint64_t)CAP_TO_MASK     << CAP_TO_SHIFT)     \
> > >> > +    | ((uint64_t)CAP_DSTRD_MASK  << CAP_DSTRD_SHIFT)  \
> > >> > +    | ((uint64_t)CAP_NSSRS_MASK  << CAP_NSSRS_SHIFT)  \
> > >> > +    | ((uint64_t)CAP_CSS_MASK    << CAP_CSS_SHIFT)    \
> > >> > +    | ((uint64_t)CAP_MPSMIN_MASK << CAP_MPSMIN_SHIFT) \
> > >> > +    | ((uint64_t)CAP_MPSMAX_MASK << CAP_MPSMAX_SHIFT) \
> > >> > +)
> > >> > +
> > >> >  enum NvmeCapCss {
> > >> >      NVME_CAP_CSS_NCSS    = 1 << 0,
> > >> >      NVME_CAP_CSS_IOCSS   = 1 << 6,


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-06-11 18:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 11:22 [PATCH v9 0/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 1/8] tests/functional/migration: add VM launch/configure hooks Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 2/8] hw/nvme: add migration blockers for non-supported cases Alexander Mikhalitsyn
2026-06-11 13:31   ` Fabiano Rosas
2026-06-11 13:53     ` Alexander Mikhalitsyn
2026-06-11 14:24       ` Fabiano Rosas
2026-06-11 14:32         ` Alexander Mikhalitsyn
2026-06-11 18:11           ` Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 3/8] hw/nvme: split nvme_init_sq/nvme_init_cq into helpers Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 4/8] hw/nvme: set CQE.sq_id earlier in nvme_process_sq Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 5/8] hw/nvme: unmap req->sg earlier in nvme_enqueue_req_completion Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 6/8] hw/nvme: add basic live migration support Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 7/8] tests/functional/x86_64: add migration test for NVMe device Alexander Mikhalitsyn
2026-06-11 11:22 ` [PATCH v9 8/8] tests/qtest/nvme-test: add migration test with full CQ Alexander Mikhalitsyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.