[PATCH 0/2] NVMe namespace hotplug and drive reconnection support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
@ 2026-04-09  7:01 mr-083
  2026-04-09  7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: mr-083 @ 2026-04-09  7:01 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083

This series adds two features that together enable transparent NVMe disk
hot-swap simulation in QEMU, matching the behavior of physical NVMe
drives being pulled and reinserted in the same PCIe slot.

Problem:
Currently, hot-swapping an NVMe disk in QEMU requires removing the
entire NVMe controller via device_del, which causes the Linux guest to
assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
This breaks storage software that tracks drives by device name.

Solution:
Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
proper Asynchronous Event Notification (AEN) so the guest kernel detects
namespace changes. This allows namespace-level hot-swap without removing
the NVMe controller.

Patch 2 adds a drive_insert HMP command that reconnects a host block
device file to an existing guest device after drive_del. This is the
counterpart to drive_del for non-removable devices where
blockdev-change-medium cannot be used.

The recommended hot-swap sequence is:
  1. drive_del <drive-id>          # disconnect backing store
  2. drive_insert <device> <file>  # reconnect backing store
  3. pcie_aer_inject_error <port> SDN  # trigger controller reset

After this sequence, the guest sees the same controller and namespace
names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
recovers transparently via the standard AER recovery path.

Tested with:
- Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
- NVMe subsystem model with multipath disabled
- DirectPV and MinIO AIStor storage stack

mr-083 (2):
  hw/nvme: add namespace hotplug support
  block/monitor: add drive_insert HMP command

 block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
 hmp-commands.hx                | 18 +++++++
 hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
 hw/nvme/ns.c                   |  1 +
 hw/nvme/subsys.c               |  2 +
 include/block/block-hmp-cmds.h |  1 +
 6 files changed, 166 insertions(+)

--
2.50.1 (Apple Git-155)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] hw/nvme: add namespace hotplug support
  2026-04-09  7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083
@ 2026-04-09  7:01 ` mr-083
  2026-04-09  7:01 ` [PATCH 2/2] block/monitor: add drive_insert HMP command mr-083
  2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi
  2 siblings, 0 replies; 18+ messages in thread
From: mr-083 @ 2026-04-09  7:01 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083

Add hotplug support for nvme-ns devices on the NvmeBus. This enables
namespace-level hot-swap without removing the NVMe controller, which
is how physical NVMe drives behave when hot-swapped in the same PCIe
slot.

Mark nvme-ns devices as hotpluggable and register the NvmeBus as a
hotplug handler with proper plug and unplug callbacks:

- plug: attach namespace to all started controllers and send an
  Asynchronous Event Notification (AEN) with NS_ATTR_CHANGED so
  the guest kernel rescans namespaces
- unplug: detach from all controllers, send AEN, remove from
  subsystem, then unrealize the device

The plug handler skips controllers that haven't started yet
(qs_created == false) to avoid interfering with boot-time namespace
attachment in nvme_start_ctrl().

Both the controller bus and subsystem bus are configured as hotplug
handlers via qbus_set_bus_hotplug_handler() since nvme-ns devices
may reparent to the subsystem bus during realize.

Signed-off-by: Matthieu Receveur <matthieu@min.io>
---
 hw/nvme/ctrl.c   | 85 ++++++++++++++++++++++++++++++++++++++++++++++++
 hw/nvme/ns.c     |  1 +
 hw/nvme/subsys.c |  2 ++
 3 files changed, 88 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index be6c7028cb..5502e4ea2b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -206,6 +206,7 @@
 #include "system/hostmem.h"
 #include "hw/pci/msix.h"
 #include "hw/pci/pcie_sriov.h"
+#include "hw/core/qdev.h"
 #include "system/spdm-socket.h"
 #include "migration/vmstate.h"
 
@@ -9293,6 +9294,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     }
 
     qbus_init(&n->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev, dev->id);
+    qbus_set_bus_hotplug_handler(BUS(&n->bus));
 
     if (nvme_init_subsys(n, errp)) {
         return;
@@ -9553,10 +9555,93 @@ static const TypeInfo nvme_info = {
     },
 };
 
+static void nvme_ns_hot_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                              Error **errp)
+{
+    NvmeNamespace *ns = NVME_NS(dev);
+    NvmeSubsystem *subsys = ns->subsys;
+    uint32_t nsid = ns->params.nsid;
+    int i;
+
+    /*
+     * Attach to all started controllers and notify via AEN.
+     * Skip controllers that haven't started yet (boot-time realize) —
+     * nvme_start_ctrl() will attach namespaces during controller init.
+     */
+    for (i = 0; i < NVME_MAX_CONTROLLERS; i++) {
+        NvmeCtrl *ctrl = nvme_subsys_ctrl(subsys, i);
+        if (!ctrl || !ctrl->qs_created) {
+            continue;
+        }
+
+        if (nvme_csi_supported(ctrl, ns->csi) && !ns->params.detached) {
+            nvme_attach_ns(ctrl, ns);
+            nvme_update_dsm_limits(ctrl, ns);
+
+            if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {
+                nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,
+                                   NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
+                                   NVME_LOG_CHANGED_NSLIST);
+            }
+        }
+    }
+}
+
+static void nvme_ns_hot_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                               Error **errp)
+{
+    NvmeNamespace *ns = NVME_NS(dev);
+    NvmeSubsystem *subsys = ns->subsys;
+    uint32_t nsid = ns->params.nsid;
+    int i;
+
+    /*
+     * Detach from all controllers and notify the guest via AEN.
+     * Must happen before unrealize to avoid use-after-free when the
+     * guest sends I/O to a freed namespace.
+     */
+    for (i = 0; i < NVME_MAX_CONTROLLERS; i++) {
+        NvmeCtrl *ctrl = nvme_subsys_ctrl(subsys, i);
+        if (!ctrl || !nvme_ns(ctrl, nsid)) {
+            continue;
+        }
+
+        nvme_detach_ns(ctrl, ns);
+        nvme_update_dsm_limits(ctrl, NULL);
+
+        if (!test_and_set_bit(nsid, ctrl->changed_nsids)) {
+            nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE,
+                               NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED,
+                               NVME_LOG_CHANGED_NSLIST);
+        }
+    }
+
+    /* Remove from subsystem namespace list. */
+    subsys->namespaces[nsid] = NULL;
+
+    /*
+     * Unrealize: drain I/O, flush, cleanup structures, remove from QOM.
+     * nvme_ns_unrealize() handles drain/shutdown/cleanup internally.
+     */
+    qdev_unrealize(dev);
+}
+
+static void nvme_bus_class_init(ObjectClass *klass, const void *data)
+{
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+    hc->plug = nvme_ns_hot_plug;
+    hc->unplug = nvme_ns_hot_unplug;
+}
+
 static const TypeInfo nvme_bus_info = {
     .name = TYPE_NVME_BUS,
     .parent = TYPE_BUS,
     .instance_size = sizeof(NvmeBus),
+    .class_init = nvme_bus_class_init,
+    .interfaces = (const InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    },
 };
 
 static void nvme_register_types(void)
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index b0106eaa5c..eb628c0734 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -937,6 +937,7 @@ static void nvme_ns_class_init(ObjectClass *oc, const void *data)
     dc->bus_type = TYPE_NVME_BUS;
     dc->realize = nvme_ns_realize;
     dc->unrealize = nvme_ns_unrealize;
+    dc->hotpluggable = true;
     device_class_set_props(dc, nvme_ns_props);
     dc->desc = "Virtual NVMe namespace";
 }
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 777e1c620f..fa35055d3c 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -9,6 +9,7 @@
 #include "qemu/osdep.h"
 #include "qemu/units.h"
 #include "qapi/error.h"
+#include "hw/core/qdev.h"
 
 #include "nvme.h"
 
@@ -205,6 +206,7 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
     NvmeSubsystem *subsys = NVME_SUBSYS(dev);
 
     qbus_init(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev, dev->id);
+    qbus_set_bus_hotplug_handler(BUS(&subsys->bus));
 
     nvme_subsys_setup(subsys, errp);
 }
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] block/monitor: add drive_insert HMP command
  2026-04-09  7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083
  2026-04-09  7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083
@ 2026-04-09  7:01 ` mr-083
  2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi
  2 siblings, 0 replies; 18+ messages in thread
From: mr-083 @ 2026-04-09  7:01 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083

Add a drive_insert HMP command that reconnects a host block device file
to an existing guest device whose backing store was previously removed
with drive_del.

After drive_del, the BlockBackend remains attached to the guest device
but has no BlockDriverState (shown as "[not inserted]" in info block).
drive_insert opens the specified file, finds the device's BlockBackend
by iterating all backends and matching the attached device ID, then
calls blk_insert_bs() to reconnect the backing store.

This complements drive_del for non-removable devices (such as NVMe
namespaces) where blockdev-change-medium cannot be used. Combined with
PCIe AER Surprise Down error injection to trigger a controller reset,
this enables complete NVMe disk hot-swap simulation where the guest
sees the same device names throughout.

Example usage:
  drive_del drv0             # remove backing store
  drive_insert ns0 disk.qcow2  # reconnect backing
  pcie_aer_inject_error rp0 SDN  # trigger controller reset

Signed-off-by: Matthieu Receveur <matthieu@min.io>
---
 block/monitor/block-hmp-cmds.c | 59 ++++++++++++++++++++++++++++++++++
 hmp-commands.hx                | 18 +++++++++++
 include/block/block-hmp-cmds.h |  1 +
 3 files changed, 78 insertions(+)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 1fd28d59eb..77e9662ead 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -38,7 +38,9 @@
 #include "qemu/osdep.h"
 #include "hw/core/boards.h"
 #include "system/block-backend.h"
+#include "system/block-backend-global-state.h"
 #include "system/blockdev.h"
+#include "block/block-global-state.h"
 #include "qapi/qapi-commands-block.h"
 #include "qapi/qapi-commands-block-export.h"
 #include "qobject/qdict.h"
@@ -195,6 +197,63 @@ unlock:
     hmp_handle_error(mon, err);
 }
 
+void hmp_drive_insert(Monitor *mon, const QDict *qdict)
+{
+    const char *id = qdict_get_str(qdict, "id");
+    const char *filename = qdict_get_str(qdict, "filename");
+    BlockBackend *blk = NULL;
+    BlockBackend *iter;
+    BlockDriverState *bs;
+    Error *err = NULL;
+
+    GLOBAL_STATE_CODE();
+
+    /*
+     * After drive_del, the BlockBackend is removed from the monitor name
+     * registry but still attached to the device. Find it by iterating all
+     * BlockBackends and matching by the device ID shown in "info block".
+     */
+    for (iter = blk_all_next(NULL); iter; iter = blk_all_next(iter)) {
+        DeviceState *dev = blk_get_attached_dev(iter);
+        if (dev && dev->id && strcmp(dev->id, id) == 0) {
+            blk = iter;
+            break;
+        }
+    }
+
+    if (!blk) {
+        /* Fallback: try by block backend name */
+        blk = blk_by_name(id);
+    }
+
+    if (!blk) {
+        error_setg(&err, "Device '%s' not found", id);
+        goto out;
+    }
+
+    if (blk_bs(blk)) {
+        error_setg(&err, "Device '%s' already has a medium inserted", id);
+        goto out;
+    }
+
+    bs = bdrv_open(filename, NULL, NULL, BDRV_O_RDWR, &err);
+    if (!bs) {
+        goto out;
+    }
+
+    if (blk_insert_bs(blk, bs, &err) < 0) {
+        bdrv_unref(bs);
+        goto out;
+    }
+
+    bdrv_unref(bs);
+    monitor_printf(mon, "OK\n");
+    return;
+
+out:
+    hmp_handle_error(mon, err);
+}
+
 void hmp_commit(Monitor *mon, const QDict *qdict)
 {
     const char *device = qdict_get_str(qdict, "device");
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 5cc4788f12..79af8e8988 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -207,6 +207,24 @@ SRST
   actions (drive options rerror, werror).
 ERST
 
+    {
+        .name       = "drive_insert",
+        .args_type  = "id:B,filename:F",
+        .params     = "device filename",
+        .help       = "insert a host block device into an empty drive",
+        .cmd        = hmp_drive_insert,
+    },
+
+SRST
+``drive_insert`` *device* *filename*
+  Insert a host block device file into a drive that has been emptied by
+  ``drive_del``.  This reconnects the backing store without removing the
+  guest device, enabling transparent disk hot-swap for non-removable devices
+  such as NVMe namespaces.  Combined with PCIe AER Surprise Down error
+  injection (``pcie_aer_inject_error`` *device* ``SDN``), this enables
+  complete NVMe disk hot-swap simulation.
+ERST
+
     {
         .name       = "change",
         .args_type  = "device:B,force:-f,target:F,arg:s?,read-only-mode:s?",
diff --git a/include/block/block-hmp-cmds.h b/include/block/block-hmp-cmds.h
index 71113cd7ef..73c9607402 100644
--- a/include/block/block-hmp-cmds.h
+++ b/include/block/block-hmp-cmds.h
@@ -21,6 +21,7 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict);
 
 void hmp_commit(Monitor *mon, const QDict *qdict);
 void hmp_drive_del(Monitor *mon, const QDict *qdict);
+void hmp_drive_insert(Monitor *mon, const QDict *qdict);
 
 void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_drive_backup(Monitor *mon, const QDict *qdict);
-- 
2.50.1 (Apple Git-155)



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-09  7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083
  2026-04-09  7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083
  2026-04-09  7:01 ` [PATCH 2/2] block/monitor: add drive_insert HMP command mr-083
@ 2026-04-09 21:00 ` Stefan Hajnoczi
  2026-04-10  0:49   ` Matthieu Rolla
  2 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-04-09 21:00 UTC (permalink / raw)
  To: mr-083; +Cc: qemu-devel, qemu-block, its, kbusch, mr-083

[-- Attachment #1: Type: text/plain, Size: 2797 bytes --]

On Thu, Apr 09, 2026 at 09:01:09AM +0200, mr-083 wrote:
> This series adds two features that together enable transparent NVMe disk
> hot-swap simulation in QEMU, matching the behavior of physical NVMe
> drives being pulled and reinserted in the same PCIe slot.
> 
> Problem:
> Currently, hot-swapping an NVMe disk in QEMU requires removing the
> entire NVMe controller via device_del, which causes the Linux guest to
> assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
> This breaks storage software that tracks drives by device name.

Hi mr-083,
Neat, I was looking for something like this recently!

> Solution:
> Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
> proper Asynchronous Event Notification (AEN) so the guest kernel detects
> namespace changes. This allows namespace-level hot-swap without removing
> the NVMe controller.
> 
> Patch 2 adds a drive_insert HMP command that reconnects a host block
> device file to an existing guest device after drive_del. This is the
> counterpart to drive_del for non-removable devices where
> blockdev-change-medium cannot be used.
> 
> The recommended hot-swap sequence is:
>   1. drive_del <drive-id>          # disconnect backing store
>   2. drive_insert <device> <file>  # reconnect backing store

Is it possible to achieve this with device_del + device_add instead of
introducing a new monitor command?

device_del nvme-ns2
blockdev-del nvme-ns2-blk      (or drive_del)
...
blockdev-add nvme-ns2-blk,...  (or drive_add)
device_add nvme-ns,id=nvme-ns2,nsid=2,drive=nvme-ns2-blk

>   3. pcie_aer_inject_error <port> SDN  # trigger controller reset

Is NVMe AEN insufficient to get the guest to recognize the Namespace
change? I looked at the Linux NVMe driver code recently and got the
impression it would process changes to the Namespace list upon receiving
the NVMe AEN.

> After this sequence, the guest sees the same controller and namespace
> names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
> recovers transparently via the standard AER recovery path.
> 
> Tested with:
> - Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
> - NVMe subsystem model with multipath disabled
> - DirectPV and MinIO AIStor storage stack
> 
> mr-083 (2):
>   hw/nvme: add namespace hotplug support
>   block/monitor: add drive_insert HMP command
> 
>  block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
>  hmp-commands.hx                | 18 +++++++
>  hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
>  hw/nvme/ns.c                   |  1 +
>  hw/nvme/subsys.c               |  2 +
>  include/block/block-hmp-cmds.h |  1 +
>  6 files changed, 166 insertions(+)
> 
> --
> 2.50.1 (Apple Git-155)
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi
@ 2026-04-10  0:49   ` Matthieu Rolla
  0 siblings, 0 replies; 18+ messages in thread
From: Matthieu Rolla @ 2026-04-10  0:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, qemu-block, its, kbusch, mr-083

[-- Attachment #1: Type: text/plain, Size: 4222 bytes --]

Thanks for the review!

> Is it possible to achieve this with device_del + device_add instead of
> introducing a new monitor command?

Yes, device_del + device_add works. I tested it and the AEN properly
notifies the guest kernel which rescans and adds/removes the block
device.

However, when filesystems (XFS via DirectPV in our case) are mounted
on the namespace, the old block device number is not reused on re-add.
The kernel's IDA allocator only frees the ID when all references to
the namespace head are released (nvme_free_ns_head), but the stale
XFS mount holds a reference indefinitely.

Without mounted filesystems, the ID is reused correctly (/dev/nvme0n1
stays nvme0n1).

> Is NVMe AEN insufficient to get the guest to recognize the Namespace
> change?

You're right AEN is sufficient. I confirmed that the Linux NVMe
driver processes NVME_AER_NOTICE_NS_CHANGED and rescans automatically.
The SDN was unnecessary.

I dropped Patch 2 (drive_insert) and sent v2 with just the
namespace hotplug support. The commit message now documents the
correct device_del + device_add flow.

Here is the link
https://mail.gnu.org/archive/html/qemu-devel/2026-04/msg01507.html

Thanks

On Thu, Apr 9, 2026 at 11:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:

> On Thu, Apr 09, 2026 at 09:01:09AM +0200, mr-083 wrote:
> > This series adds two features that together enable transparent NVMe disk
> > hot-swap simulation in QEMU, matching the behavior of physical NVMe
> > drives being pulled and reinserted in the same PCIe slot.
> >
> > Problem:
> > Currently, hot-swapping an NVMe disk in QEMU requires removing the
> > entire NVMe controller via device_del, which causes the Linux guest to
> > assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
> > This breaks storage software that tracks drives by device name.
>
> Hi mr-083,
> Neat, I was looking for something like this recently!
>
> > Solution:
> > Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
> > proper Asynchronous Event Notification (AEN) so the guest kernel detects
> > namespace changes. This allows namespace-level hot-swap without removing
> > the NVMe controller.
> >
> > Patch 2 adds a drive_insert HMP command that reconnects a host block
> > device file to an existing guest device after drive_del. This is the
> > counterpart to drive_del for non-removable devices where
> > blockdev-change-medium cannot be used.
> >
> > The recommended hot-swap sequence is:
> >   1. drive_del <drive-id>          # disconnect backing store
> >   2. drive_insert <device> <file>  # reconnect backing store
>
> Is it possible to achieve this with device_del + device_add instead of
> introducing a new monitor command?
>
> device_del nvme-ns2
> blockdev-del nvme-ns2-blk      (or drive_del)
> ...
> blockdev-add nvme-ns2-blk,...  (or drive_add)
> device_add nvme-ns,id=nvme-ns2,nsid=2,drive=nvme-ns2-blk
>
> >   3. pcie_aer_inject_error <port> SDN  # trigger controller reset
>
> Is NVMe AEN insufficient to get the guest to recognize the Namespace
> change? I looked at the Linux NVMe driver code recently and got the
> impression it would process changes to the Namespace list upon receiving
> the NVMe AEN.
>
> > After this sequence, the guest sees the same controller and namespace
> > names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
> > recovers transparently via the standard AER recovery path.
> >
> > Tested with:
> > - Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
> > - NVMe subsystem model with multipath disabled
> > - DirectPV and MinIO AIStor storage stack
> >
> > mr-083 (2):
> >   hw/nvme: add namespace hotplug support
> >   block/monitor: add drive_insert HMP command
> >
> >  block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
> >  hmp-commands.hx                | 18 +++++++
> >  hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
> >  hw/nvme/ns.c                   |  1 +
> >  hw/nvme/subsys.c               |  2 +
> >  include/block/block-hmp-cmds.h |  1 +
> >  6 files changed, 166 insertions(+)
> >
> > --
> > 2.50.1 (Apple Git-155)
> >
>

[-- Attachment #2: Type: text/html, Size: 5088 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
@ 2026-04-09  6:01 mr-083
  2026-04-13 17:17 ` Klaus Jensen
  0 siblings, 1 reply; 18+ messages in thread
From: mr-083 @ 2026-04-09  6:01 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083

This series adds two features that together enable transparent NVMe disk
hot-swap simulation in QEMU, matching the behavior of physical NVMe
drives being pulled and reinserted in the same PCIe slot.

Problem:
Currently, hot-swapping an NVMe disk in QEMU requires removing the
entire NVMe controller via device_del, which causes the Linux guest to
assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
This breaks storage software that tracks drives by device name.

Solution:
Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
proper Asynchronous Event Notification (AEN) so the guest kernel detects
namespace changes. This allows namespace-level hot-swap without removing
the NVMe controller.

Patch 2 adds a drive_insert HMP command that reconnects a host block
device file to an existing guest device after drive_del. This is the
counterpart to drive_del for non-removable devices where
blockdev-change-medium cannot be used.

The recommended hot-swap sequence is:
  1. drive_del <drive-id>          # disconnect backing store
  2. drive_insert <device> <file>  # reconnect backing store
  3. pcie_aer_inject_error <port> SDN  # trigger controller reset

After this sequence, the guest sees the same controller and namespace
names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
recovers transparently via the standard AER recovery path.

Tested with:
- Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
- NVMe subsystem model with multipath disabled
- DirectPV and MinIO AIStor storage stack

mr-083 (2):
  hw/nvme: add namespace hotplug support
  block/monitor: add drive_insert HMP command

 block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
 hmp-commands.hx                | 18 +++++++
 hw/nvme/ctrl.c                 | 85 ++++++++++++++++++++++++++++++++++
 hw/nvme/ns.c                   |  1 +
 hw/nvme/subsys.c               |  2 +
 include/block/block-hmp-cmds.h |  1 +
 6 files changed, 166 insertions(+)

--
2.50.1 (Apple Git-155)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-09  6:01 mr-083
@ 2026-04-13 17:17 ` Klaus Jensen
  2026-04-14 12:42   ` Stefan Hajnoczi
  0 siblings, 1 reply; 18+ messages in thread
From: Klaus Jensen @ 2026-04-13 17:17 UTC (permalink / raw)
  To: mr-083; +Cc: qemu-devel, qemu-block, kbusch, stefanha, mr-083

[-- Attachment #1: Type: text/plain, Size: 373 bytes --]

On Apr  9 08:01, mr-083 wrote:
> This series adds two features that together enable transparent NVMe disk
> hot-swap simulation in QEMU, matching the behavior of physical NVMe
> drives being pulled and reinserted in the same PCIe slot.
> 

I don't understand this. From an NVMe perspective you can't hotplug a
namespace. You can hotplug a PCIe-based NVM Subsystem.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-13 17:17 ` Klaus Jensen
@ 2026-04-14 12:42   ` Stefan Hajnoczi
  2026-04-14 13:36     ` Matthieu Rolla
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-04-14 12:42 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: mr-083, qemu-devel, qemu-block, kbusch, mr-083, John Meneghini

[-- Attachment #1: Type: text/plain, Size: 1684 bytes --]

On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote:
> On Apr  9 08:01, mr-083 wrote:
> > This series adds two features that together enable transparent NVMe disk
> > hot-swap simulation in QEMU, matching the behavior of physical NVMe
> > drives being pulled and reinserted in the same PCIe slot.
> > 
> 
> I don't understand this. From an NVMe perspective you can't hotplug a
> namespace. You can hotplug a PCIe-based NVM Subsystem.

Hi Klaus,
It would be great if someone with more NVMe experience than myself can
find a definite answer, but I think the Namespace List can change
asynchronously even on a NVMe PCIe controller as long as it supports
Namespace Management commands.

There are instances in the NVMe Express Base Specification 2.0b like:
- 8.3.1 Capacity Management Overview
  "a Namespace Attribute Changed event is generated for hosts other than
  the host which issued the Capacity Management command"
- 8.11 Namespace Management
  "If Namespace Attribute Notices are enabled, any controller(s) not
  processing the Namespace Management command that was attached to the
  namespace reports a Namespace Attribute Changed asynchronous event to
  the host."

I imagine this functionality would be useful in storage offload cards
(IPUs/DPUs) that present as NVMe PCIe controllers instead of as
NVMe-over-Fabrics. This makes sense when the host is not supposed to
manage the storage itself. When the card's control plane configures a
new volume, the NVMe Namespace List changes and the host is notified.

Linux and Windows NVMe PCI drivers support this according to the testing
that Matthieu and I have done.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 12:42   ` Stefan Hajnoczi
@ 2026-04-14 13:36     ` Matthieu Rolla
  2026-04-14 18:09       ` Keith Busch
  2026-04-14 18:10       ` Stefan Hajnoczi
  2026-04-14 14:04     ` John Meneghini
  2026-04-14 14:42     ` Keith Busch
  2 siblings, 2 replies; 18+ messages in thread
From: Matthieu Rolla @ 2026-04-14 13:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Klaus Jensen, qemu-devel, qemu-block, kbusch, mr-083,
	John Meneghini

[-- Attachment #1: Type: text/plain, Size: 2552 bytes --]

Thanks for testing Windows, Stefan! 

Great to have confirmation on both Linux and Windows.


Regarding `drive_insert`,  I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. 

However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`. 

This causes XFS "duplicate UUID" errors on remount.

`drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name. 

Should I send it as a separate follow-up patch, or keep it in this series?

Matthieu

> On Apr 14, 2026, at 2:42 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote:
>> On Apr  9 08:01, mr-083 wrote:
>>> This series adds two features that together enable transparent NVMe disk
>>> hot-swap simulation in QEMU, matching the behavior of physical NVMe
>>> drives being pulled and reinserted in the same PCIe slot.
>>> 
>> 
>> I don't understand this. From an NVMe perspective you can't hotplug a
>> namespace. You can hotplug a PCIe-based NVM Subsystem.
> 
> Hi Klaus,
> It would be great if someone with more NVMe experience than myself can
> find a definite answer, but I think the Namespace List can change
> asynchronously even on a NVMe PCIe controller as long as it supports
> Namespace Management commands.
> 
> There are instances in the NVMe Express Base Specification 2.0b like:
> - 8.3.1 Capacity Management Overview
>  "a Namespace Attribute Changed event is generated for hosts other than
>  the host which issued the Capacity Management command"
> - 8.11 Namespace Management
>  "If Namespace Attribute Notices are enabled, any controller(s) not
>  processing the Namespace Management command that was attached to the
>  namespace reports a Namespace Attribute Changed asynchronous event to
>  the host."
> 
> I imagine this functionality would be useful in storage offload cards
> (IPUs/DPUs) that present as NVMe PCIe controllers instead of as
> NVMe-over-Fabrics. This makes sense when the host is not supposed to
> manage the storage itself. When the card's control plane configures a
> new volume, the NVMe Namespace List changes and the host is notified.
> 
> Linux and Windows NVMe PCI drivers support this according to the testing
> that Matthieu and I have done.
> 
> Thanks,
> Stefan


[-- Attachment #2: Type: text/html, Size: 6306 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 13:36     ` Matthieu Rolla
@ 2026-04-14 18:09       ` Keith Busch
  2026-04-14 18:10       ` Stefan Hajnoczi
  1 sibling, 0 replies; 18+ messages in thread
From: Keith Busch @ 2026-04-14 18:09 UTC (permalink / raw)
  To: Matthieu Rolla
  Cc: Stefan Hajnoczi, Klaus Jensen, qemu-devel, qemu-block, mr-083,
	John Meneghini

On Tue, Apr 14, 2026 at 03:36:19PM +0200, Matthieu Rolla wrote:
> Regarding `drive_insert`,  I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. 
> 
> However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`. 
> 
> This causes XFS "duplicate UUID" errors on remount.
> 
> `drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name. 

Are you attempting some kind of covert way to swap out the backend
without the host knowing you did that? Isn't that just going to confuse
the filesystem that's actively using the previous backend when it's
in-memory context no longer aligns with the on-disk format?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 13:36     ` Matthieu Rolla
  2026-04-14 18:09       ` Keith Busch
@ 2026-04-14 18:10       ` Stefan Hajnoczi
  2026-04-14 18:14         ` Matthieu Rolla
  1 sibling, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-04-14 18:10 UTC (permalink / raw)
  To: Matthieu Rolla
  Cc: Klaus Jensen, qemu-devel, qemu-block, kbusch, mr-083,
	John Meneghini

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On Tue, Apr 14, 2026 at 03:36:19PM +0200, Matthieu Rolla wrote:
> Regarding `drive_insert`,  I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. 
> 
> However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`. 

Can you use the stable device names in /dev/disk/by-*/ instead of the
/dev/nvmeCnN names to access the new namespace? Then it won't matter
that ida_free() hasn't been called yet.

> This causes XFS "duplicate UUID" errors on remount.

(I have to admit that using stable device names doesn't solve this
because the guest kernel still potentially has multiple XFS mounts for
the file system.)

> `drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name. 

Are you sure this is safe? Even if PCIe AER somehow kills the old XFS
mount, then there is still a race condition between drive_insert and
PCIe AER injection when the guest kernel sees the new underlying storage
through the old XFS mount.

Getting this wrong could cause data corruption, so it needs to be well
understood. I don't really understand and would need to look at the
guest kernel code path. Can you describe what happens to the guest
kernel blkdev and the XFS mount in the drive_insert workflow?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 18:10       ` Stefan Hajnoczi
@ 2026-04-14 18:14         ` Matthieu Rolla
  2026-04-15 12:45           ` Stefan Hajnoczi
  0 siblings, 1 reply; 18+ messages in thread
From: Matthieu Rolla @ 2026-04-14 18:14 UTC (permalink / raw)
  To: kbusch
  Cc: Klaus Jensen, qemu-devel, qemu-block, mr-083, John Meneghini,
	Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2632 bytes --]

Hello Keith, 

To clarify,  we're not swapping to a different backend. It's the same disk file being disconnected and reconnected, simulating a physical drive being pulled and reinserted.

The sequence is:
drive_del -> disconnect the backing (simulates drive pull)

User does whatever they need (test failure handling, etc.)

drive_insert -> reconnect the same backing file (simulates drive reinsertion)
SDN -> reset controller so guest resumes I/O

The filesystem on disk is unchanged, same data, same UUID, same format. The guest's in-memory state realigns with the on-disk state after the controller reset, just like it would after a physical drive reinsertion on real hardware.

The use case is a storage integration lab where we need to simulate disk failures and recoveries without the guest block device being renamed, which is what happens with device_del + device_add due to the kernel's ida_alloc behavior.

Thank you


	.	
Matthieu
www.min.io <>
matthieu@min.io <> 

> On Apr 14, 2026, at 8:10 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Tue, Apr 14, 2026 at 03:36:19PM +0200, Matthieu Rolla wrote:
>> Regarding `drive_insert`,  I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. 
>> 
>> However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`.
> 
> Can you use the stable device names in /dev/disk/by-*/ instead of the
> /dev/nvmeCnN names to access the new namespace? Then it won't matter
> that ida_free() hasn't been called yet.
> 
>> This causes XFS "duplicate UUID" errors on remount.
> 
> (I have to admit that using stable device names doesn't solve this
> because the guest kernel still potentially has multiple XFS mounts for
> the file system.)
> 
>> `drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name.
> 
> Are you sure this is safe? Even if PCIe AER somehow kills the old XFS
> mount, then there is still a race condition between drive_insert and
> PCIe AER injection when the guest kernel sees the new underlying storage
> through the old XFS mount.
> 
> Getting this wrong could cause data corruption, so it needs to be well
> understood. I don't really understand and would need to look at the
> guest kernel code path. Can you describe what happens to the guest
> kernel blkdev and the XFS mount in the drive_insert workflow?
> 
> Thanks,
> Stefan


[-- Attachment #2: Type: text/html, Size: 4397 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 18:14         ` Matthieu Rolla
@ 2026-04-15 12:45           ` Stefan Hajnoczi
  2026-04-15 17:39             ` Matthieu Rolla
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-04-15 12:45 UTC (permalink / raw)
  To: Matthieu Rolla
  Cc: kbusch, Klaus Jensen, qemu-devel, qemu-block, mr-083,
	John Meneghini

[-- Attachment #1: Type: text/plain, Size: 448 bytes --]

On Tue, Apr 14, 2026 at 08:14:16PM +0200, Matthieu Rolla wrote:
> To clarify,  we're not swapping to a different backend. It's the same disk file being disconnected and reconnected, simulating a physical drive being pulled and reinserted.

Is it necessary to drive_del to simulate PCIe Surprise Down? Can you
perform just the PCIe actions without removing the drive from the NVMe
device? That way the drive_insert command is not necessary.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-15 12:45           ` Stefan Hajnoczi
@ 2026-04-15 17:39             ` Matthieu Rolla
  0 siblings, 0 replies; 18+ messages in thread
From: Matthieu Rolla @ 2026-04-15 17:39 UTC (permalink / raw)
  To: mr-083
  Cc: Klaus Jensen, qemu-devel, qemu-block, mr-083, John Meneghini,
	kbusch, Stefan Hajnoczi, "Daniel P. Berrangé"

[-- Attachment #1: Type: text/plain, Size: 1825 bytes --]

Hello,

Thanks everyone for the reviews.
I just sent v4 of the namespace hotplug patch (Series 1) with the I/O drain fix and nvme_ns_unrealize symmetry as discussed.
As suggested by Stefan, the backend reassociation is sent as a separate series (Series 2). 

Per Daniel's feedback, it is implemented as a QMP command (blockdev-attach) that pairs with the existing blockdev-add, with an HMP wrapper. This allows reconnecting a block node to a non-removable device's backend after drive_del, without the removable media restriction of blockdev-insert-medium.

Both patches tested with Linux 6.1 guest under DirectPV/MinIO AIStor storage stack. Scenarios covered:
Namespace attach/detach via device_del + device_add (Series 1)
Backend disconnect/reconnect via drive_del + blockdev-add + blockdev-attach + PCIe AER SDN (Series 2)
Same device name preserved across detach/attach cycles
Detach under heavy I/O (warp benchmark, 16 concurrent uploads)
Short disconnect (<3s): XFS mounts intact, DirectPV Ready, MinIO 12/12
Long disconnect (60s+): XFS journal shutdown, recovery via kubectl directpv repair, full 12/12 recovery (minio trigger healing on disk)
Multiple disks across multiple nodes (6 disks, 3 nodes)

	.	
Matthieu

matthieu@min.io <> 

> On Apr 15, 2026, at 2:45 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Tue, Apr 14, 2026 at 08:14:16PM +0200, Matthieu Rolla wrote:
>> To clarify,  we're not swapping to a different backend. It's the same disk file being disconnected and reconnected, simulating a physical drive being pulled and reinserted.
> 
> Is it necessary to drive_del to simulate PCIe Surprise Down? Can you
> perform just the PCIe actions without removing the drive from the NVMe
> device? That way the drive_insert command is not necessary.
> 
> Stefan

[-- Attachment #2: Type: text/html, Size: 5728 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 12:42   ` Stefan Hajnoczi
  2026-04-14 13:36     ` Matthieu Rolla
@ 2026-04-14 14:04     ` John Meneghini
  2026-04-16 10:11       ` Nilay Shroff
  2026-04-14 14:42     ` Keith Busch
  2 siblings, 1 reply; 18+ messages in thread
From: John Meneghini @ 2026-04-14 14:04 UTC (permalink / raw)
  To: Stefan Hajnoczi, Klaus Jensen, Nilay Shroff
  Cc: mr-083, qemu-devel, qemu-block, kbusch, mr-083

Adding Nilay who has done a lot of work on nvme hot plug.

Nilay please take look at these patches and let us know if they can work on powerpc

I'll set up a test bed and try this out with x86_64.

John A. Meneghini
Senior Principal Platform Storage Engineer
RHEL SST - Platform Storage Group
jmeneghi@redhat.com

On 4/14/26 8:42 AM, Stefan Hajnoczi wrote:
> On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote:
>> On Apr  9 08:01, mr-083 wrote:
>>> This series adds two features that together enable transparent NVMe disk
>>> hot-swap simulation in QEMU, matching the behavior of physical NVMe
>>> drives being pulled and reinserted in the same PCIe slot.
>>>
>>
>> I don't understand this. From an NVMe perspective you can't hotplug a
>> namespace. You can hotplug a PCIe-based NVM Subsystem.
> 
> Hi Klaus,
> It would be great if someone with more NVMe experience than myself can
> find a definite answer, but I think the Namespace List can change
> asynchronously even on a NVMe PCIe controller as long as it supports
> Namespace Management commands.
> 
> There are instances in the NVMe Express Base Specification 2.0b like:
> - 8.3.1 Capacity Management Overview
>    "a Namespace Attribute Changed event is generated for hosts other than
>    the host which issued the Capacity Management command"
> - 8.11 Namespace Management
>    "If Namespace Attribute Notices are enabled, any controller(s) not
>    processing the Namespace Management command that was attached to the
>    namespace reports a Namespace Attribute Changed asynchronous event to
>    the host."
> 
> I imagine this functionality would be useful in storage offload cards
> (IPUs/DPUs) that present as NVMe PCIe controllers instead of as
> NVMe-over-Fabrics. This makes sense when the host is not supposed to
> manage the storage itself. When the card's control plane configures a
> new volume, the NVMe Namespace List changes and the host is notified.
> 
> Linux and Windows NVMe PCI drivers support this according to the testing
> that Matthieu and I have done.
> 
> Thanks,
> Stefan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 14:04     ` John Meneghini
@ 2026-04-16 10:11       ` Nilay Shroff
  2026-04-16 12:33         ` Matthieu Rolla
  0 siblings, 1 reply; 18+ messages in thread
From: Nilay Shroff @ 2026-04-16 10:11 UTC (permalink / raw)
  To: John Meneghini, Stefan Hajnoczi, Klaus Jensen
  Cc: mr-083, qemu-devel, qemu-block, kbusch, mr-083

Hi John,

On 4/14/26 7:34 PM, John Meneghini wrote:
> Adding Nilay who has done a lot of work on nvme hot plug.
> 
> Nilay please take look at these patches and let us know if they can work on powerpc
> 
> I'll set up a test bed and try this out with x86_64.
> 

Thanks for looping me in.

I tested this patch series on pseries QEMU, and overall it works as expected.
For the first patch (NVMe namespace hotplug), the functionality behaves correctly
and achieves its intended goal. That said, from an NVMe specification perspective,
the operation appears closer to a namespace attach/detach rather than a traditional
“hotplug.” I understand that in the QEMU device model, this is framed as a hotplug
event, which is likely why the terminology is used here, but it may still be somewhat
confusing when viewed through the NVMe spec lens.

For the second patch (drive_insert), the implementation also works as intended on
pseries. However, I have a concern regarding how the backend is handled. The flow
effectively removes the backing storage using drive_del and later reattaches it
using drive_insert. While the expectation is to reconnect the same backing store,
there is currently no enforcement of this. As a result, it is possible—perhaps
unintentionally—to reattach a different backing file. If this happens, it may lead
to inconsistencies with the in-memory state maintained by the kernel (e.g., page
cache or filesystem metadata), especially if the original device was already in use
or mounted. This may potentially result in data corruption or undefined behavior
from the guest’s perspective. It might be worth considering whether some form of
validation or restriction should be added to ensure that the same backing store
is reattached, or at least to make this behavior more explicit.

Overall, both patches are functional on pseries, but the above points may be worth
addressing.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-16 10:11       ` Nilay Shroff
@ 2026-04-16 12:33         ` Matthieu Rolla
  0 siblings, 0 replies; 18+ messages in thread
From: Matthieu Rolla @ 2026-04-16 12:33 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: John Meneghini, Stefan Hajnoczi, Klaus Jensen, qemu-devel,
	qemu-block, kbusch, mr-083

[-- Attachment #1: Type: text/plain, Size: 3228 bytes --]

Thanks Nilay for testing on pseries!

On the terminology,  agreed, v4 of the namespace patch uses "out-of-band namespace attach/detach" wording as Klaus suggested.

On the backend concern,  the drive_insert patch has been replaced by a new series implementing a QMP blockdev-attach command (per Daniel's feedback). 
The ability to attach a different backing file is intentional, it allows simulating disk replacement where a failed drive is swapped for a new one. The guest sees the same device name but with fresh storage. 

This mirrors what happens on real hardware when you replace a failed disk in the same slot. The risk you describe (stale page cache / filesystem metadata) is expected and handled at the guest level,  the filesystem detects the inconsistency and the storage stack (e.g. MinIO) heals the data via erasure coding

Link to v4 patch (serie 1): https://lists.nongnu.org/archive/html/qemu-devel/2026-04/msg02612.html
Link to new patch (serie 2): https://lists.nongnu.org/archive/html/qemu-devel/2026-04/msg02613.html

Thanks again for your time.

	.	
Matthieu
www.min.io <>
matthieu@min.io <> 

> On Apr 16, 2026, at 12:11 PM, Nilay Shroff <nilay@linux.ibm.com> wrote:
> 
> Hi John,
> 
> On 4/14/26 7:34 PM, John Meneghini wrote:
>> Adding Nilay who has done a lot of work on nvme hot plug.
>> Nilay please take look at these patches and let us know if they can work on powerpc
>> I'll set up a test bed and try this out with x86_64.
> 
> Thanks for looping me in.
> 
> I tested this patch series on pseries QEMU, and overall it works as expected.
> For the first patch (NVMe namespace hotplug), the functionality behaves correctly
> and achieves its intended goal. That said, from an NVMe specification perspective,
> the operation appears closer to a namespace attach/detach rather than a traditional
> “hotplug.” I understand that in the QEMU device model, this is framed as a hotplug
> event, which is likely why the terminology is used here, but it may still be somewhat
> confusing when viewed through the NVMe spec lens.
> 
> For the second patch (drive_insert), the implementation also works as intended on
> pseries. However, I have a concern regarding how the backend is handled. The flow
> effectively removes the backing storage using drive_del and later reattaches it
> using drive_insert. While the expectation is to reconnect the same backing store,
> there is currently no enforcement of this. As a result, it is possible—perhaps
> unintentionally—to reattach a different backing file. If this happens, it may lead
> to inconsistencies with the in-memory state maintained by the kernel (e.g., page
> cache or filesystem metadata), especially if the original device was already in use
> or mounted. This may potentially result in data corruption or undefined behavior
> from the guest’s perspective. It might be worth considering whether some form of
> validation or restriction should be added to ensure that the same backing store
> is reattached, or at least to make this behavior more explicit.
> 
> Overall, both patches are functional on pseries, but the above points may be worth
> addressing.
> 
> Thanks,
> --Nilay


[-- Attachment #2: Type: text/html, Size: 4877 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
  2026-04-14 12:42   ` Stefan Hajnoczi
  2026-04-14 13:36     ` Matthieu Rolla
  2026-04-14 14:04     ` John Meneghini
@ 2026-04-14 14:42     ` Keith Busch
  2 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2026-04-14 14:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Klaus Jensen, mr-083, qemu-devel, qemu-block, mr-083,
	John Meneghini

On Tue, Apr 14, 2026 at 08:42:21AM -0400, Stefan Hajnoczi wrote:
> On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote:
> > On Apr  9 08:01, mr-083 wrote:
> > > This series adds two features that together enable transparent NVMe disk
> > > hot-swap simulation in QEMU, matching the behavior of physical NVMe
> > > drives being pulled and reinserted in the same PCIe slot.
> > > 
> > 
> > I don't understand this. From an NVMe perspective you can't hotplug a
> > namespace. You can hotplug a PCIe-based NVM Subsystem.
> 
> Hi Klaus,
> It would be great if someone with more NVMe experience than myself can
> find a definite answer, but I think the Namespace List can change
> asynchronously even on a NVMe PCIe controller as long as it supports
> Namespace Management commands.

I think there's some clash in terminology. From nvme protocol side,
hotplug refers to bus events detected by the host, so something like
PCIe slot capabilities defines how that works. This series is doing
something behind the scenes from the host-controller interface
visibility, so it's just coincidence that framework is also called
"hotplug". From nvme protocol perspective, this patch looks like a qemu
specific out-of-band method for namespace "attach/detach" via the QMP
interface. Sounds fine to me: the nvme namespace events are not strictly
tied to the spec defined in-band attachment status.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-04-16 17:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09  7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083
2026-04-09  7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083
2026-04-09  7:01 ` [PATCH 2/2] block/monitor: add drive_insert HMP command mr-083
2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi
2026-04-10  0:49   ` Matthieu Rolla
  -- strict thread matches above, loose matches on Subject: below --
2026-04-09  6:01 mr-083
2026-04-13 17:17 ` Klaus Jensen
2026-04-14 12:42   ` Stefan Hajnoczi
2026-04-14 13:36     ` Matthieu Rolla
2026-04-14 18:09       ` Keith Busch
2026-04-14 18:10       ` Stefan Hajnoczi
2026-04-14 18:14         ` Matthieu Rolla
2026-04-15 12:45           ` Stefan Hajnoczi
2026-04-15 17:39             ` Matthieu Rolla
2026-04-14 14:04     ` John Meneghini
2026-04-16 10:11       ` Nilay Shroff
2026-04-16 12:33         ` Matthieu Rolla
2026-04-14 14:42     ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.