* [PATCH 0/2] NVMe namespace hotplug and drive reconnection support
@ 2026-04-09 7:01 mr-083
2026-04-09 7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: mr-083 @ 2026-04-09 7:01 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083
This series adds two features that together enable transparent NVMe disk
hot-swap simulation in QEMU, matching the behavior of physical NVMe
drives being pulled and reinserted in the same PCIe slot.
Problem:
Currently, hot-swapping an NVMe disk in QEMU requires removing the
entire NVMe controller via device_del, which causes the Linux guest to
assign a new controller number on re-add (e.g. nvme2 becomes nvme4).
This breaks storage software that tracks drives by device name.
Solution:
Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with
proper Asynchronous Event Notification (AEN) so the guest kernel detects
namespace changes. This allows namespace-level hot-swap without removing
the NVMe controller.
Patch 2 adds a drive_insert HMP command that reconnects a host block
device file to an existing guest device after drive_del. This is the
counterpart to drive_del for non-removable devices where
blockdev-change-medium cannot be used.
The recommended hot-swap sequence is:
1. drive_del <drive-id> # disconnect backing store
2. drive_insert <device> <file> # reconnect backing store
3. pcie_aer_inject_error <port> SDN # trigger controller reset
After this sequence, the guest sees the same controller and namespace
names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver
recovers transparently via the standard AER recovery path.
Tested with:
- Linux 6.1 guest on QEMU aarch64 with HVF (macOS)
- NVMe subsystem model with multipath disabled
- DirectPV and MinIO AIStor storage stack
mr-083 (2):
hw/nvme: add namespace hotplug support
block/monitor: add drive_insert HMP command
block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++
hmp-commands.hx | 18 +++++++
hw/nvme/ctrl.c | 85 ++++++++++++++++++++++++++++++++++
hw/nvme/ns.c | 1 +
hw/nvme/subsys.c | 2 +
include/block/block-hmp-cmds.h | 1 +
6 files changed, 166 insertions(+)
--
2.50.1 (Apple Git-155)
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/2] hw/nvme: add namespace hotplug support 2026-04-09 7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083 @ 2026-04-09 7:01 ` mr-083 2026-04-09 7:01 ` [PATCH 2/2] block/monitor: add drive_insert HMP command mr-083 2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi 2 siblings, 0 replies; 18+ messages in thread From: mr-083 @ 2026-04-09 7:01 UTC (permalink / raw) To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083 Add hotplug support for nvme-ns devices on the NvmeBus. This enables namespace-level hot-swap without removing the NVMe controller, which is how physical NVMe drives behave when hot-swapped in the same PCIe slot. Mark nvme-ns devices as hotpluggable and register the NvmeBus as a hotplug handler with proper plug and unplug callbacks: - plug: attach namespace to all started controllers and send an Asynchronous Event Notification (AEN) with NS_ATTR_CHANGED so the guest kernel rescans namespaces - unplug: detach from all controllers, send AEN, remove from subsystem, then unrealize the device The plug handler skips controllers that haven't started yet (qs_created == false) to avoid interfering with boot-time namespace attachment in nvme_start_ctrl(). Both the controller bus and subsystem bus are configured as hotplug handlers via qbus_set_bus_hotplug_handler() since nvme-ns devices may reparent to the subsystem bus during realize. Signed-off-by: Matthieu Receveur <matthieu@min.io> --- hw/nvme/ctrl.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++ hw/nvme/ns.c | 1 + hw/nvme/subsys.c | 2 ++ 3 files changed, 88 insertions(+) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index be6c7028cb..5502e4ea2b 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -206,6 +206,7 @@ #include "system/hostmem.h" #include "hw/pci/msix.h" #include "hw/pci/pcie_sriov.h" +#include "hw/core/qdev.h" #include "system/spdm-socket.h" #include "migration/vmstate.h" @@ -9293,6 +9294,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) } qbus_init(&n->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev, dev->id); + qbus_set_bus_hotplug_handler(BUS(&n->bus)); if (nvme_init_subsys(n, errp)) { return; @@ -9553,10 +9555,93 @@ static const TypeInfo nvme_info = { }, }; +static void nvme_ns_hot_plug(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp) +{ + NvmeNamespace *ns = NVME_NS(dev); + NvmeSubsystem *subsys = ns->subsys; + uint32_t nsid = ns->params.nsid; + int i; + + /* + * Attach to all started controllers and notify via AEN. + * Skip controllers that haven't started yet (boot-time realize) — + * nvme_start_ctrl() will attach namespaces during controller init. + */ + for (i = 0; i < NVME_MAX_CONTROLLERS; i++) { + NvmeCtrl *ctrl = nvme_subsys_ctrl(subsys, i); + if (!ctrl || !ctrl->qs_created) { + continue; + } + + if (nvme_csi_supported(ctrl, ns->csi) && !ns->params.detached) { + nvme_attach_ns(ctrl, ns); + nvme_update_dsm_limits(ctrl, ns); + + if (!test_and_set_bit(nsid, ctrl->changed_nsids)) { + nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE, + NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED, + NVME_LOG_CHANGED_NSLIST); + } + } + } +} + +static void nvme_ns_hot_unplug(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp) +{ + NvmeNamespace *ns = NVME_NS(dev); + NvmeSubsystem *subsys = ns->subsys; + uint32_t nsid = ns->params.nsid; + int i; + + /* + * Detach from all controllers and notify the guest via AEN. + * Must happen before unrealize to avoid use-after-free when the + * guest sends I/O to a freed namespace. + */ + for (i = 0; i < NVME_MAX_CONTROLLERS; i++) { + NvmeCtrl *ctrl = nvme_subsys_ctrl(subsys, i); + if (!ctrl || !nvme_ns(ctrl, nsid)) { + continue; + } + + nvme_detach_ns(ctrl, ns); + nvme_update_dsm_limits(ctrl, NULL); + + if (!test_and_set_bit(nsid, ctrl->changed_nsids)) { + nvme_enqueue_event(ctrl, NVME_AER_TYPE_NOTICE, + NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED, + NVME_LOG_CHANGED_NSLIST); + } + } + + /* Remove from subsystem namespace list. */ + subsys->namespaces[nsid] = NULL; + + /* + * Unrealize: drain I/O, flush, cleanup structures, remove from QOM. + * nvme_ns_unrealize() handles drain/shutdown/cleanup internally. + */ + qdev_unrealize(dev); +} + +static void nvme_bus_class_init(ObjectClass *klass, const void *data) +{ + HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass); + hc->plug = nvme_ns_hot_plug; + hc->unplug = nvme_ns_hot_unplug; +} + static const TypeInfo nvme_bus_info = { .name = TYPE_NVME_BUS, .parent = TYPE_BUS, .instance_size = sizeof(NvmeBus), + .class_init = nvme_bus_class_init, + .interfaces = (const InterfaceInfo[]) { + { TYPE_HOTPLUG_HANDLER }, + { } + }, }; static void nvme_register_types(void) diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index b0106eaa5c..eb628c0734 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -937,6 +937,7 @@ static void nvme_ns_class_init(ObjectClass *oc, const void *data) dc->bus_type = TYPE_NVME_BUS; dc->realize = nvme_ns_realize; dc->unrealize = nvme_ns_unrealize; + dc->hotpluggable = true; device_class_set_props(dc, nvme_ns_props); dc->desc = "Virtual NVMe namespace"; } diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c index 777e1c620f..fa35055d3c 100644 --- a/hw/nvme/subsys.c +++ b/hw/nvme/subsys.c @@ -9,6 +9,7 @@ #include "qemu/osdep.h" #include "qemu/units.h" #include "qapi/error.h" +#include "hw/core/qdev.h" #include "nvme.h" @@ -205,6 +206,7 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp) NvmeSubsystem *subsys = NVME_SUBSYS(dev); qbus_init(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev, dev->id); + qbus_set_bus_hotplug_handler(BUS(&subsys->bus)); nvme_subsys_setup(subsys, errp); } -- 2.50.1 (Apple Git-155) ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/2] block/monitor: add drive_insert HMP command 2026-04-09 7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083 2026-04-09 7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083 @ 2026-04-09 7:01 ` mr-083 2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi 2 siblings, 0 replies; 18+ messages in thread From: mr-083 @ 2026-04-09 7:01 UTC (permalink / raw) To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083 Add a drive_insert HMP command that reconnects a host block device file to an existing guest device whose backing store was previously removed with drive_del. After drive_del, the BlockBackend remains attached to the guest device but has no BlockDriverState (shown as "[not inserted]" in info block). drive_insert opens the specified file, finds the device's BlockBackend by iterating all backends and matching the attached device ID, then calls blk_insert_bs() to reconnect the backing store. This complements drive_del for non-removable devices (such as NVMe namespaces) where blockdev-change-medium cannot be used. Combined with PCIe AER Surprise Down error injection to trigger a controller reset, this enables complete NVMe disk hot-swap simulation where the guest sees the same device names throughout. Example usage: drive_del drv0 # remove backing store drive_insert ns0 disk.qcow2 # reconnect backing pcie_aer_inject_error rp0 SDN # trigger controller reset Signed-off-by: Matthieu Receveur <matthieu@min.io> --- block/monitor/block-hmp-cmds.c | 59 ++++++++++++++++++++++++++++++++++ hmp-commands.hx | 18 +++++++++++ include/block/block-hmp-cmds.h | 1 + 3 files changed, 78 insertions(+) diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c index 1fd28d59eb..77e9662ead 100644 --- a/block/monitor/block-hmp-cmds.c +++ b/block/monitor/block-hmp-cmds.c @@ -38,7 +38,9 @@ #include "qemu/osdep.h" #include "hw/core/boards.h" #include "system/block-backend.h" +#include "system/block-backend-global-state.h" #include "system/blockdev.h" +#include "block/block-global-state.h" #include "qapi/qapi-commands-block.h" #include "qapi/qapi-commands-block-export.h" #include "qobject/qdict.h" @@ -195,6 +197,63 @@ unlock: hmp_handle_error(mon, err); } +void hmp_drive_insert(Monitor *mon, const QDict *qdict) +{ + const char *id = qdict_get_str(qdict, "id"); + const char *filename = qdict_get_str(qdict, "filename"); + BlockBackend *blk = NULL; + BlockBackend *iter; + BlockDriverState *bs; + Error *err = NULL; + + GLOBAL_STATE_CODE(); + + /* + * After drive_del, the BlockBackend is removed from the monitor name + * registry but still attached to the device. Find it by iterating all + * BlockBackends and matching by the device ID shown in "info block". + */ + for (iter = blk_all_next(NULL); iter; iter = blk_all_next(iter)) { + DeviceState *dev = blk_get_attached_dev(iter); + if (dev && dev->id && strcmp(dev->id, id) == 0) { + blk = iter; + break; + } + } + + if (!blk) { + /* Fallback: try by block backend name */ + blk = blk_by_name(id); + } + + if (!blk) { + error_setg(&err, "Device '%s' not found", id); + goto out; + } + + if (blk_bs(blk)) { + error_setg(&err, "Device '%s' already has a medium inserted", id); + goto out; + } + + bs = bdrv_open(filename, NULL, NULL, BDRV_O_RDWR, &err); + if (!bs) { + goto out; + } + + if (blk_insert_bs(blk, bs, &err) < 0) { + bdrv_unref(bs); + goto out; + } + + bdrv_unref(bs); + monitor_printf(mon, "OK\n"); + return; + +out: + hmp_handle_error(mon, err); +} + void hmp_commit(Monitor *mon, const QDict *qdict) { const char *device = qdict_get_str(qdict, "device"); diff --git a/hmp-commands.hx b/hmp-commands.hx index 5cc4788f12..79af8e8988 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -207,6 +207,24 @@ SRST actions (drive options rerror, werror). ERST + { + .name = "drive_insert", + .args_type = "id:B,filename:F", + .params = "device filename", + .help = "insert a host block device into an empty drive", + .cmd = hmp_drive_insert, + }, + +SRST +``drive_insert`` *device* *filename* + Insert a host block device file into a drive that has been emptied by + ``drive_del``. This reconnects the backing store without removing the + guest device, enabling transparent disk hot-swap for non-removable devices + such as NVMe namespaces. Combined with PCIe AER Surprise Down error + injection (``pcie_aer_inject_error`` *device* ``SDN``), this enables + complete NVMe disk hot-swap simulation. +ERST + { .name = "change", .args_type = "device:B,force:-f,target:F,arg:s?,read-only-mode:s?", diff --git a/include/block/block-hmp-cmds.h b/include/block/block-hmp-cmds.h index 71113cd7ef..73c9607402 100644 --- a/include/block/block-hmp-cmds.h +++ b/include/block/block-hmp-cmds.h @@ -21,6 +21,7 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict); void hmp_commit(Monitor *mon, const QDict *qdict); void hmp_drive_del(Monitor *mon, const QDict *qdict); +void hmp_drive_insert(Monitor *mon, const QDict *qdict); void hmp_drive_mirror(Monitor *mon, const QDict *qdict); void hmp_drive_backup(Monitor *mon, const QDict *qdict); -- 2.50.1 (Apple Git-155) ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-09 7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083 2026-04-09 7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083 2026-04-09 7:01 ` [PATCH 2/2] block/monitor: add drive_insert HMP command mr-083 @ 2026-04-09 21:00 ` Stefan Hajnoczi 2026-04-10 0:49 ` Matthieu Rolla 2 siblings, 1 reply; 18+ messages in thread From: Stefan Hajnoczi @ 2026-04-09 21:00 UTC (permalink / raw) To: mr-083; +Cc: qemu-devel, qemu-block, its, kbusch, mr-083 [-- Attachment #1: Type: text/plain, Size: 2797 bytes --] On Thu, Apr 09, 2026 at 09:01:09AM +0200, mr-083 wrote: > This series adds two features that together enable transparent NVMe disk > hot-swap simulation in QEMU, matching the behavior of physical NVMe > drives being pulled and reinserted in the same PCIe slot. > > Problem: > Currently, hot-swapping an NVMe disk in QEMU requires removing the > entire NVMe controller via device_del, which causes the Linux guest to > assign a new controller number on re-add (e.g. nvme2 becomes nvme4). > This breaks storage software that tracks drives by device name. Hi mr-083, Neat, I was looking for something like this recently! > Solution: > Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with > proper Asynchronous Event Notification (AEN) so the guest kernel detects > namespace changes. This allows namespace-level hot-swap without removing > the NVMe controller. > > Patch 2 adds a drive_insert HMP command that reconnects a host block > device file to an existing guest device after drive_del. This is the > counterpart to drive_del for non-removable devices where > blockdev-change-medium cannot be used. > > The recommended hot-swap sequence is: > 1. drive_del <drive-id> # disconnect backing store > 2. drive_insert <device> <file> # reconnect backing store Is it possible to achieve this with device_del + device_add instead of introducing a new monitor command? device_del nvme-ns2 blockdev-del nvme-ns2-blk (or drive_del) ... blockdev-add nvme-ns2-blk,... (or drive_add) device_add nvme-ns,id=nvme-ns2,nsid=2,drive=nvme-ns2-blk > 3. pcie_aer_inject_error <port> SDN # trigger controller reset Is NVMe AEN insufficient to get the guest to recognize the Namespace change? I looked at the Linux NVMe driver code recently and got the impression it would process changes to the Namespace list upon receiving the NVMe AEN. > After this sequence, the guest sees the same controller and namespace > names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver > recovers transparently via the standard AER recovery path. > > Tested with: > - Linux 6.1 guest on QEMU aarch64 with HVF (macOS) > - NVMe subsystem model with multipath disabled > - DirectPV and MinIO AIStor storage stack > > mr-083 (2): > hw/nvme: add namespace hotplug support > block/monitor: add drive_insert HMP command > > block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++ > hmp-commands.hx | 18 +++++++ > hw/nvme/ctrl.c | 85 ++++++++++++++++++++++++++++++++++ > hw/nvme/ns.c | 1 + > hw/nvme/subsys.c | 2 + > include/block/block-hmp-cmds.h | 1 + > 6 files changed, 166 insertions(+) > > -- > 2.50.1 (Apple Git-155) > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi @ 2026-04-10 0:49 ` Matthieu Rolla 0 siblings, 0 replies; 18+ messages in thread From: Matthieu Rolla @ 2026-04-10 0:49 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: qemu-devel, qemu-block, its, kbusch, mr-083 [-- Attachment #1: Type: text/plain, Size: 4222 bytes --] Thanks for the review! > Is it possible to achieve this with device_del + device_add instead of > introducing a new monitor command? Yes, device_del + device_add works. I tested it and the AEN properly notifies the guest kernel which rescans and adds/removes the block device. However, when filesystems (XFS via DirectPV in our case) are mounted on the namespace, the old block device number is not reused on re-add. The kernel's IDA allocator only frees the ID when all references to the namespace head are released (nvme_free_ns_head), but the stale XFS mount holds a reference indefinitely. Without mounted filesystems, the ID is reused correctly (/dev/nvme0n1 stays nvme0n1). > Is NVMe AEN insufficient to get the guest to recognize the Namespace > change? You're right AEN is sufficient. I confirmed that the Linux NVMe driver processes NVME_AER_NOTICE_NS_CHANGED and rescans automatically. The SDN was unnecessary. I dropped Patch 2 (drive_insert) and sent v2 with just the namespace hotplug support. The commit message now documents the correct device_del + device_add flow. Here is the link https://mail.gnu.org/archive/html/qemu-devel/2026-04/msg01507.html Thanks On Thu, Apr 9, 2026 at 11:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > On Thu, Apr 09, 2026 at 09:01:09AM +0200, mr-083 wrote: > > This series adds two features that together enable transparent NVMe disk > > hot-swap simulation in QEMU, matching the behavior of physical NVMe > > drives being pulled and reinserted in the same PCIe slot. > > > > Problem: > > Currently, hot-swapping an NVMe disk in QEMU requires removing the > > entire NVMe controller via device_del, which causes the Linux guest to > > assign a new controller number on re-add (e.g. nvme2 becomes nvme4). > > This breaks storage software that tracks drives by device name. > > Hi mr-083, > Neat, I was looking for something like this recently! > > > Solution: > > Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with > > proper Asynchronous Event Notification (AEN) so the guest kernel detects > > namespace changes. This allows namespace-level hot-swap without removing > > the NVMe controller. > > > > Patch 2 adds a drive_insert HMP command that reconnects a host block > > device file to an existing guest device after drive_del. This is the > > counterpart to drive_del for non-removable devices where > > blockdev-change-medium cannot be used. > > > > The recommended hot-swap sequence is: > > 1. drive_del <drive-id> # disconnect backing store > > 2. drive_insert <device> <file> # reconnect backing store > > Is it possible to achieve this with device_del + device_add instead of > introducing a new monitor command? > > device_del nvme-ns2 > blockdev-del nvme-ns2-blk (or drive_del) > ... > blockdev-add nvme-ns2-blk,... (or drive_add) > device_add nvme-ns,id=nvme-ns2,nsid=2,drive=nvme-ns2-blk > > > 3. pcie_aer_inject_error <port> SDN # trigger controller reset > > Is NVMe AEN insufficient to get the guest to recognize the Namespace > change? I looked at the Linux NVMe driver code recently and got the > impression it would process changes to the Namespace list upon receiving > the NVMe AEN. > > > After this sequence, the guest sees the same controller and namespace > > names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver > > recovers transparently via the standard AER recovery path. > > > > Tested with: > > - Linux 6.1 guest on QEMU aarch64 with HVF (macOS) > > - NVMe subsystem model with multipath disabled > > - DirectPV and MinIO AIStor storage stack > > > > mr-083 (2): > > hw/nvme: add namespace hotplug support > > block/monitor: add drive_insert HMP command > > > > block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++ > > hmp-commands.hx | 18 +++++++ > > hw/nvme/ctrl.c | 85 ++++++++++++++++++++++++++++++++++ > > hw/nvme/ns.c | 1 + > > hw/nvme/subsys.c | 2 + > > include/block/block-hmp-cmds.h | 1 + > > 6 files changed, 166 insertions(+) > > > > -- > > 2.50.1 (Apple Git-155) > > > [-- Attachment #2: Type: text/html, Size: 5088 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/2] NVMe namespace hotplug and drive reconnection support @ 2026-04-09 6:01 mr-083 2026-04-13 17:17 ` Klaus Jensen 0 siblings, 1 reply; 18+ messages in thread From: mr-083 @ 2026-04-09 6:01 UTC (permalink / raw) To: qemu-devel, qemu-block; +Cc: its, kbusch, stefanha, mr-083 This series adds two features that together enable transparent NVMe disk hot-swap simulation in QEMU, matching the behavior of physical NVMe drives being pulled and reinserted in the same PCIe slot. Problem: Currently, hot-swapping an NVMe disk in QEMU requires removing the entire NVMe controller via device_del, which causes the Linux guest to assign a new controller number on re-add (e.g. nvme2 becomes nvme4). This breaks storage software that tracks drives by device name. Solution: Patch 1 adds hotplug support for nvme-ns devices on the NvmeBus, with proper Asynchronous Event Notification (AEN) so the guest kernel detects namespace changes. This allows namespace-level hot-swap without removing the NVMe controller. Patch 2 adds a drive_insert HMP command that reconnects a host block device file to an existing guest device after drive_del. This is the counterpart to drive_del for non-removable devices where blockdev-change-medium cannot be used. The recommended hot-swap sequence is: 1. drive_del <drive-id> # disconnect backing store 2. drive_insert <device> <file> # reconnect backing store 3. pcie_aer_inject_error <port> SDN # trigger controller reset After this sequence, the guest sees the same controller and namespace names (e.g. /dev/nvme2n1 remains /dev/nvme2n1), and the NVMe driver recovers transparently via the standard AER recovery path. Tested with: - Linux 6.1 guest on QEMU aarch64 with HVF (macOS) - NVMe subsystem model with multipath disabled - DirectPV and MinIO AIStor storage stack mr-083 (2): hw/nvme: add namespace hotplug support block/monitor: add drive_insert HMP command block/monitor/block-hmp-cmds.c | 59 +++++++++++++++++++++++ hmp-commands.hx | 18 +++++++ hw/nvme/ctrl.c | 85 ++++++++++++++++++++++++++++++++++ hw/nvme/ns.c | 1 + hw/nvme/subsys.c | 2 + include/block/block-hmp-cmds.h | 1 + 6 files changed, 166 insertions(+) -- 2.50.1 (Apple Git-155) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-09 6:01 mr-083 @ 2026-04-13 17:17 ` Klaus Jensen 2026-04-14 12:42 ` Stefan Hajnoczi 0 siblings, 1 reply; 18+ messages in thread From: Klaus Jensen @ 2026-04-13 17:17 UTC (permalink / raw) To: mr-083; +Cc: qemu-devel, qemu-block, kbusch, stefanha, mr-083 [-- Attachment #1: Type: text/plain, Size: 373 bytes --] On Apr 9 08:01, mr-083 wrote: > This series adds two features that together enable transparent NVMe disk > hot-swap simulation in QEMU, matching the behavior of physical NVMe > drives being pulled and reinserted in the same PCIe slot. > I don't understand this. From an NVMe perspective you can't hotplug a namespace. You can hotplug a PCIe-based NVM Subsystem. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-13 17:17 ` Klaus Jensen @ 2026-04-14 12:42 ` Stefan Hajnoczi 2026-04-14 13:36 ` Matthieu Rolla ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Stefan Hajnoczi @ 2026-04-14 12:42 UTC (permalink / raw) To: Klaus Jensen Cc: mr-083, qemu-devel, qemu-block, kbusch, mr-083, John Meneghini [-- Attachment #1: Type: text/plain, Size: 1684 bytes --] On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote: > On Apr 9 08:01, mr-083 wrote: > > This series adds two features that together enable transparent NVMe disk > > hot-swap simulation in QEMU, matching the behavior of physical NVMe > > drives being pulled and reinserted in the same PCIe slot. > > > > I don't understand this. From an NVMe perspective you can't hotplug a > namespace. You can hotplug a PCIe-based NVM Subsystem. Hi Klaus, It would be great if someone with more NVMe experience than myself can find a definite answer, but I think the Namespace List can change asynchronously even on a NVMe PCIe controller as long as it supports Namespace Management commands. There are instances in the NVMe Express Base Specification 2.0b like: - 8.3.1 Capacity Management Overview "a Namespace Attribute Changed event is generated for hosts other than the host which issued the Capacity Management command" - 8.11 Namespace Management "If Namespace Attribute Notices are enabled, any controller(s) not processing the Namespace Management command that was attached to the namespace reports a Namespace Attribute Changed asynchronous event to the host." I imagine this functionality would be useful in storage offload cards (IPUs/DPUs) that present as NVMe PCIe controllers instead of as NVMe-over-Fabrics. This makes sense when the host is not supposed to manage the storage itself. When the card's control plane configures a new volume, the NVMe Namespace List changes and the host is notified. Linux and Windows NVMe PCI drivers support this according to the testing that Matthieu and I have done. Thanks, Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 12:42 ` Stefan Hajnoczi @ 2026-04-14 13:36 ` Matthieu Rolla 2026-04-14 18:09 ` Keith Busch 2026-04-14 18:10 ` Stefan Hajnoczi 2026-04-14 14:04 ` John Meneghini 2026-04-14 14:42 ` Keith Busch 2 siblings, 2 replies; 18+ messages in thread From: Matthieu Rolla @ 2026-04-14 13:36 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Klaus Jensen, qemu-devel, qemu-block, kbusch, mr-083, John Meneghini [-- Attachment #1: Type: text/plain, Size: 2552 bytes --] Thanks for testing Windows, Stefan! Great to have confirmation on both Linux and Windows. Regarding `drive_insert`, I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`. This causes XFS "duplicate UUID" errors on remount. `drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name. Should I send it as a separate follow-up patch, or keep it in this series? Matthieu > On Apr 14, 2026, at 2:42 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote: >> On Apr 9 08:01, mr-083 wrote: >>> This series adds two features that together enable transparent NVMe disk >>> hot-swap simulation in QEMU, matching the behavior of physical NVMe >>> drives being pulled and reinserted in the same PCIe slot. >>> >> >> I don't understand this. From an NVMe perspective you can't hotplug a >> namespace. You can hotplug a PCIe-based NVM Subsystem. > > Hi Klaus, > It would be great if someone with more NVMe experience than myself can > find a definite answer, but I think the Namespace List can change > asynchronously even on a NVMe PCIe controller as long as it supports > Namespace Management commands. > > There are instances in the NVMe Express Base Specification 2.0b like: > - 8.3.1 Capacity Management Overview > "a Namespace Attribute Changed event is generated for hosts other than > the host which issued the Capacity Management command" > - 8.11 Namespace Management > "If Namespace Attribute Notices are enabled, any controller(s) not > processing the Namespace Management command that was attached to the > namespace reports a Namespace Attribute Changed asynchronous event to > the host." > > I imagine this functionality would be useful in storage offload cards > (IPUs/DPUs) that present as NVMe PCIe controllers instead of as > NVMe-over-Fabrics. This makes sense when the host is not supposed to > manage the storage itself. When the card's control plane configures a > new volume, the NVMe Namespace List changes and the host is notified. > > Linux and Windows NVMe PCI drivers support this according to the testing > that Matthieu and I have done. > > Thanks, > Stefan [-- Attachment #2: Type: text/html, Size: 6306 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 13:36 ` Matthieu Rolla @ 2026-04-14 18:09 ` Keith Busch 2026-04-14 18:10 ` Stefan Hajnoczi 1 sibling, 0 replies; 18+ messages in thread From: Keith Busch @ 2026-04-14 18:09 UTC (permalink / raw) To: Matthieu Rolla Cc: Stefan Hajnoczi, Klaus Jensen, qemu-devel, qemu-block, mr-083, John Meneghini On Tue, Apr 14, 2026 at 03:36:19PM +0200, Matthieu Rolla wrote: > Regarding `drive_insert`, I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. > > However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`. > > This causes XFS "duplicate UUID" errors on remount. > > `drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name. Are you attempting some kind of covert way to swap out the backend without the host knowing you did that? Isn't that just going to confuse the filesystem that's actively using the previous backend when it's in-memory context no longer aligns with the on-disk format? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 13:36 ` Matthieu Rolla 2026-04-14 18:09 ` Keith Busch @ 2026-04-14 18:10 ` Stefan Hajnoczi 2026-04-14 18:14 ` Matthieu Rolla 1 sibling, 1 reply; 18+ messages in thread From: Stefan Hajnoczi @ 2026-04-14 18:10 UTC (permalink / raw) To: Matthieu Rolla Cc: Klaus Jensen, qemu-devel, qemu-block, kbusch, mr-083, John Meneghini [-- Attachment #1: Type: text/plain, Size: 1494 bytes --] On Tue, Apr 14, 2026 at 03:36:19PM +0200, Matthieu Rolla wrote: > Regarding `drive_insert`, I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. > > However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`. Can you use the stable device names in /dev/disk/by-*/ instead of the /dev/nvmeCnN names to access the new namespace? Then it won't matter that ida_free() hasn't been called yet. > This causes XFS "duplicate UUID" errors on remount. (I have to admit that using stable device names doesn't solve this because the guest kernel still potentially has multiple XFS mounts for the file system.) > `drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name. Are you sure this is safe? Even if PCIe AER somehow kills the old XFS mount, then there is still a race condition between drive_insert and PCIe AER injection when the guest kernel sees the new underlying storage through the old XFS mount. Getting this wrong could cause data corruption, so it needs to be well understood. I don't really understand and would need to look at the guest kernel code path. Can you describe what happens to the guest kernel blkdev and the XFS mount in the drive_insert workflow? Thanks, Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 18:10 ` Stefan Hajnoczi @ 2026-04-14 18:14 ` Matthieu Rolla 2026-04-15 12:45 ` Stefan Hajnoczi 0 siblings, 1 reply; 18+ messages in thread From: Matthieu Rolla @ 2026-04-14 18:14 UTC (permalink / raw) To: kbusch Cc: Klaus Jensen, qemu-devel, qemu-block, mr-083, John Meneghini, Stefan Hajnoczi [-- Attachment #1: Type: text/plain, Size: 2632 bytes --] Hello Keith, To clarify, we're not swapping to a different backend. It's the same disk file being disconnected and reconnected, simulating a physical drive being pulled and reinserted. The sequence is: drive_del -> disconnect the backing (simulates drive pull) User does whatever they need (test failure handling, etc.) drive_insert -> reconnect the same backing file (simulates drive reinsertion) SDN -> reset controller so guest resumes I/O The filesystem on disk is unchanged, same data, same UUID, same format. The guest's in-memory state realigns with the on-disk state after the controller reset, just like it would after a physical drive reinsertion on real hardware. The use case is a storage integration lab where we need to simulate disk failures and recoveries without the guest block device being renamed, which is what happens with device_del + device_add due to the kernel's ida_alloc behavior. Thank you . Matthieu www.min.io <> matthieu@min.io <> > On Apr 14, 2026, at 8:10 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Tue, Apr 14, 2026 at 03:36:19PM +0200, Matthieu Rolla wrote: >> Regarding `drive_insert`, I found that `device_del` + `device_add` works well when no filesystem is mounted on the namespace. >> >> However, when XFS is mounted (e.g. via DirectPV/CSI), the Linux kernel doesn't reuse the block device number (nvme0n1 becomes nvme0n2) because the stale mount holds a reference to the old `nvme_ns_head`, preventing `ida_free()`. > > Can you use the stable device names in /dev/disk/by-*/ instead of the > /dev/nvmeCnN names to access the new namespace? Then it won't matter > that ida_free() hasn't been called yet. > >> This causes XFS "duplicate UUID" errors on remount. > > (I have to admit that using stable device names doesn't solve this > because the guest kernel still potentially has multiple XFS mounts for > the file system.) > >> `drive_insert` avoids this by keeping the namespace device alive which means no ida cycle, same block device name. > > Are you sure this is safe? Even if PCIe AER somehow kills the old XFS > mount, then there is still a race condition between drive_insert and > PCIe AER injection when the guest kernel sees the new underlying storage > through the old XFS mount. > > Getting this wrong could cause data corruption, so it needs to be well > understood. I don't really understand and would need to look at the > guest kernel code path. Can you describe what happens to the guest > kernel blkdev and the XFS mount in the drive_insert workflow? > > Thanks, > Stefan [-- Attachment #2: Type: text/html, Size: 4397 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 18:14 ` Matthieu Rolla @ 2026-04-15 12:45 ` Stefan Hajnoczi 2026-04-15 17:39 ` Matthieu Rolla 0 siblings, 1 reply; 18+ messages in thread From: Stefan Hajnoczi @ 2026-04-15 12:45 UTC (permalink / raw) To: Matthieu Rolla Cc: kbusch, Klaus Jensen, qemu-devel, qemu-block, mr-083, John Meneghini [-- Attachment #1: Type: text/plain, Size: 448 bytes --] On Tue, Apr 14, 2026 at 08:14:16PM +0200, Matthieu Rolla wrote: > To clarify, we're not swapping to a different backend. It's the same disk file being disconnected and reconnected, simulating a physical drive being pulled and reinserted. Is it necessary to drive_del to simulate PCIe Surprise Down? Can you perform just the PCIe actions without removing the drive from the NVMe device? That way the drive_insert command is not necessary. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-15 12:45 ` Stefan Hajnoczi @ 2026-04-15 17:39 ` Matthieu Rolla 0 siblings, 0 replies; 18+ messages in thread From: Matthieu Rolla @ 2026-04-15 17:39 UTC (permalink / raw) To: mr-083 Cc: Klaus Jensen, qemu-devel, qemu-block, mr-083, John Meneghini, kbusch, Stefan Hajnoczi, "Daniel P. Berrangé" [-- Attachment #1: Type: text/plain, Size: 1825 bytes --] Hello, Thanks everyone for the reviews. I just sent v4 of the namespace hotplug patch (Series 1) with the I/O drain fix and nvme_ns_unrealize symmetry as discussed. As suggested by Stefan, the backend reassociation is sent as a separate series (Series 2). Per Daniel's feedback, it is implemented as a QMP command (blockdev-attach) that pairs with the existing blockdev-add, with an HMP wrapper. This allows reconnecting a block node to a non-removable device's backend after drive_del, without the removable media restriction of blockdev-insert-medium. Both patches tested with Linux 6.1 guest under DirectPV/MinIO AIStor storage stack. Scenarios covered: Namespace attach/detach via device_del + device_add (Series 1) Backend disconnect/reconnect via drive_del + blockdev-add + blockdev-attach + PCIe AER SDN (Series 2) Same device name preserved across detach/attach cycles Detach under heavy I/O (warp benchmark, 16 concurrent uploads) Short disconnect (<3s): XFS mounts intact, DirectPV Ready, MinIO 12/12 Long disconnect (60s+): XFS journal shutdown, recovery via kubectl directpv repair, full 12/12 recovery (minio trigger healing on disk) Multiple disks across multiple nodes (6 disks, 3 nodes) . Matthieu matthieu@min.io <> > On Apr 15, 2026, at 2:45 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Tue, Apr 14, 2026 at 08:14:16PM +0200, Matthieu Rolla wrote: >> To clarify, we're not swapping to a different backend. It's the same disk file being disconnected and reconnected, simulating a physical drive being pulled and reinserted. > > Is it necessary to drive_del to simulate PCIe Surprise Down? Can you > perform just the PCIe actions without removing the drive from the NVMe > device? That way the drive_insert command is not necessary. > > Stefan [-- Attachment #2: Type: text/html, Size: 5728 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 12:42 ` Stefan Hajnoczi 2026-04-14 13:36 ` Matthieu Rolla @ 2026-04-14 14:04 ` John Meneghini 2026-04-16 10:11 ` Nilay Shroff 2026-04-14 14:42 ` Keith Busch 2 siblings, 1 reply; 18+ messages in thread From: John Meneghini @ 2026-04-14 14:04 UTC (permalink / raw) To: Stefan Hajnoczi, Klaus Jensen, Nilay Shroff Cc: mr-083, qemu-devel, qemu-block, kbusch, mr-083 Adding Nilay who has done a lot of work on nvme hot plug. Nilay please take look at these patches and let us know if they can work on powerpc I'll set up a test bed and try this out with x86_64. John A. Meneghini Senior Principal Platform Storage Engineer RHEL SST - Platform Storage Group jmeneghi@redhat.com On 4/14/26 8:42 AM, Stefan Hajnoczi wrote: > On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote: >> On Apr 9 08:01, mr-083 wrote: >>> This series adds two features that together enable transparent NVMe disk >>> hot-swap simulation in QEMU, matching the behavior of physical NVMe >>> drives being pulled and reinserted in the same PCIe slot. >>> >> >> I don't understand this. From an NVMe perspective you can't hotplug a >> namespace. You can hotplug a PCIe-based NVM Subsystem. > > Hi Klaus, > It would be great if someone with more NVMe experience than myself can > find a definite answer, but I think the Namespace List can change > asynchronously even on a NVMe PCIe controller as long as it supports > Namespace Management commands. > > There are instances in the NVMe Express Base Specification 2.0b like: > - 8.3.1 Capacity Management Overview > "a Namespace Attribute Changed event is generated for hosts other than > the host which issued the Capacity Management command" > - 8.11 Namespace Management > "If Namespace Attribute Notices are enabled, any controller(s) not > processing the Namespace Management command that was attached to the > namespace reports a Namespace Attribute Changed asynchronous event to > the host." > > I imagine this functionality would be useful in storage offload cards > (IPUs/DPUs) that present as NVMe PCIe controllers instead of as > NVMe-over-Fabrics. This makes sense when the host is not supposed to > manage the storage itself. When the card's control plane configures a > new volume, the NVMe Namespace List changes and the host is notified. > > Linux and Windows NVMe PCI drivers support this according to the testing > that Matthieu and I have done. > > Thanks, > Stefan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 14:04 ` John Meneghini @ 2026-04-16 10:11 ` Nilay Shroff 2026-04-16 12:33 ` Matthieu Rolla 0 siblings, 1 reply; 18+ messages in thread From: Nilay Shroff @ 2026-04-16 10:11 UTC (permalink / raw) To: John Meneghini, Stefan Hajnoczi, Klaus Jensen Cc: mr-083, qemu-devel, qemu-block, kbusch, mr-083 Hi John, On 4/14/26 7:34 PM, John Meneghini wrote: > Adding Nilay who has done a lot of work on nvme hot plug. > > Nilay please take look at these patches and let us know if they can work on powerpc > > I'll set up a test bed and try this out with x86_64. > Thanks for looping me in. I tested this patch series on pseries QEMU, and overall it works as expected. For the first patch (NVMe namespace hotplug), the functionality behaves correctly and achieves its intended goal. That said, from an NVMe specification perspective, the operation appears closer to a namespace attach/detach rather than a traditional “hotplug.” I understand that in the QEMU device model, this is framed as a hotplug event, which is likely why the terminology is used here, but it may still be somewhat confusing when viewed through the NVMe spec lens. For the second patch (drive_insert), the implementation also works as intended on pseries. However, I have a concern regarding how the backend is handled. The flow effectively removes the backing storage using drive_del and later reattaches it using drive_insert. While the expectation is to reconnect the same backing store, there is currently no enforcement of this. As a result, it is possible—perhaps unintentionally—to reattach a different backing file. If this happens, it may lead to inconsistencies with the in-memory state maintained by the kernel (e.g., page cache or filesystem metadata), especially if the original device was already in use or mounted. This may potentially result in data corruption or undefined behavior from the guest’s perspective. It might be worth considering whether some form of validation or restriction should be added to ensure that the same backing store is reattached, or at least to make this behavior more explicit. Overall, both patches are functional on pseries, but the above points may be worth addressing. Thanks, --Nilay ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-16 10:11 ` Nilay Shroff @ 2026-04-16 12:33 ` Matthieu Rolla 0 siblings, 0 replies; 18+ messages in thread From: Matthieu Rolla @ 2026-04-16 12:33 UTC (permalink / raw) To: Nilay Shroff Cc: John Meneghini, Stefan Hajnoczi, Klaus Jensen, qemu-devel, qemu-block, kbusch, mr-083 [-- Attachment #1: Type: text/plain, Size: 3228 bytes --] Thanks Nilay for testing on pseries! On the terminology, agreed, v4 of the namespace patch uses "out-of-band namespace attach/detach" wording as Klaus suggested. On the backend concern, the drive_insert patch has been replaced by a new series implementing a QMP blockdev-attach command (per Daniel's feedback). The ability to attach a different backing file is intentional, it allows simulating disk replacement where a failed drive is swapped for a new one. The guest sees the same device name but with fresh storage. This mirrors what happens on real hardware when you replace a failed disk in the same slot. The risk you describe (stale page cache / filesystem metadata) is expected and handled at the guest level, the filesystem detects the inconsistency and the storage stack (e.g. MinIO) heals the data via erasure coding Link to v4 patch (serie 1): https://lists.nongnu.org/archive/html/qemu-devel/2026-04/msg02612.html Link to new patch (serie 2): https://lists.nongnu.org/archive/html/qemu-devel/2026-04/msg02613.html Thanks again for your time. . Matthieu www.min.io <> matthieu@min.io <> > On Apr 16, 2026, at 12:11 PM, Nilay Shroff <nilay@linux.ibm.com> wrote: > > Hi John, > > On 4/14/26 7:34 PM, John Meneghini wrote: >> Adding Nilay who has done a lot of work on nvme hot plug. >> Nilay please take look at these patches and let us know if they can work on powerpc >> I'll set up a test bed and try this out with x86_64. > > Thanks for looping me in. > > I tested this patch series on pseries QEMU, and overall it works as expected. > For the first patch (NVMe namespace hotplug), the functionality behaves correctly > and achieves its intended goal. That said, from an NVMe specification perspective, > the operation appears closer to a namespace attach/detach rather than a traditional > “hotplug.” I understand that in the QEMU device model, this is framed as a hotplug > event, which is likely why the terminology is used here, but it may still be somewhat > confusing when viewed through the NVMe spec lens. > > For the second patch (drive_insert), the implementation also works as intended on > pseries. However, I have a concern regarding how the backend is handled. The flow > effectively removes the backing storage using drive_del and later reattaches it > using drive_insert. While the expectation is to reconnect the same backing store, > there is currently no enforcement of this. As a result, it is possible—perhaps > unintentionally—to reattach a different backing file. If this happens, it may lead > to inconsistencies with the in-memory state maintained by the kernel (e.g., page > cache or filesystem metadata), especially if the original device was already in use > or mounted. This may potentially result in data corruption or undefined behavior > from the guest’s perspective. It might be worth considering whether some form of > validation or restriction should be added to ensure that the same backing store > is reattached, or at least to make this behavior more explicit. > > Overall, both patches are functional on pseries, but the above points may be worth > addressing. > > Thanks, > --Nilay [-- Attachment #2: Type: text/html, Size: 4877 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] NVMe namespace hotplug and drive reconnection support 2026-04-14 12:42 ` Stefan Hajnoczi 2026-04-14 13:36 ` Matthieu Rolla 2026-04-14 14:04 ` John Meneghini @ 2026-04-14 14:42 ` Keith Busch 2 siblings, 0 replies; 18+ messages in thread From: Keith Busch @ 2026-04-14 14:42 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Klaus Jensen, mr-083, qemu-devel, qemu-block, mr-083, John Meneghini On Tue, Apr 14, 2026 at 08:42:21AM -0400, Stefan Hajnoczi wrote: > On Mon, Apr 13, 2026 at 07:17:37PM +0200, Klaus Jensen wrote: > > On Apr 9 08:01, mr-083 wrote: > > > This series adds two features that together enable transparent NVMe disk > > > hot-swap simulation in QEMU, matching the behavior of physical NVMe > > > drives being pulled and reinserted in the same PCIe slot. > > > > > > > I don't understand this. From an NVMe perspective you can't hotplug a > > namespace. You can hotplug a PCIe-based NVM Subsystem. > > Hi Klaus, > It would be great if someone with more NVMe experience than myself can > find a definite answer, but I think the Namespace List can change > asynchronously even on a NVMe PCIe controller as long as it supports > Namespace Management commands. I think there's some clash in terminology. From nvme protocol side, hotplug refers to bus events detected by the host, so something like PCIe slot capabilities defines how that works. This series is doing something behind the scenes from the host-controller interface visibility, so it's just coincidence that framework is also called "hotplug". From nvme protocol perspective, this patch looks like a qemu specific out-of-band method for namespace "attach/detach" via the QMP interface. Sounds fine to me: the nvme namespace events are not strictly tied to the spec defined in-band attachment status. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-04-16 17:07 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-09 7:01 [PATCH 0/2] NVMe namespace hotplug and drive reconnection support mr-083 2026-04-09 7:01 ` [PATCH 1/2] hw/nvme: add namespace hotplug support mr-083 2026-04-09 7:01 ` [PATCH 2/2] block/monitor: add drive_insert HMP command mr-083 2026-04-09 21:00 ` [PATCH 0/2] NVMe namespace hotplug and drive reconnection support Stefan Hajnoczi 2026-04-10 0:49 ` Matthieu Rolla -- strict thread matches above, loose matches on Subject: below -- 2026-04-09 6:01 mr-083 2026-04-13 17:17 ` Klaus Jensen 2026-04-14 12:42 ` Stefan Hajnoczi 2026-04-14 13:36 ` Matthieu Rolla 2026-04-14 18:09 ` Keith Busch 2026-04-14 18:10 ` Stefan Hajnoczi 2026-04-14 18:14 ` Matthieu Rolla 2026-04-15 12:45 ` Stefan Hajnoczi 2026-04-15 17:39 ` Matthieu Rolla 2026-04-14 14:04 ` John Meneghini 2026-04-16 10:11 ` Nilay Shroff 2026-04-16 12:33 ` Matthieu Rolla 2026-04-14 14:42 ` Keith Busch
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.