All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS
@ 2022-12-28 19:41 Jonathan Derrick
  2022-12-28 19:41 ` [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns Jonathan Derrick
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jonathan Derrick @ 2022-12-28 19:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Kropaczek, qemu-block, Keith Busch, Klaus Jensen,
	Kevin Wolf, Hanna Reitz

From: Michael Kropaczek <michael.kropaczek@solidigm.com>

Description:

Currently namespaces could be configured as follows:
1. Legacy Namespace - just one namespace within Nvme controller's
   where the back-end was specified for nvme device by -drive parameter
   pointing directly to the image file.
2. Additional Namespaces - specified by nvme-ns devices each having its
   own back-end. To have multiple namespaces each needed to be specified
   at Qemu's command line being associated with the most recently defined
   nvme-bus from nvme device.
   If a such additional namespace should be attached and/or detached by the
   guest OS, nvme controller has to be linked with another device nvme-subsys.

All that have a static nature, all need to be specified at the Qemu's 
command line, all specified virtual nvme entities will be processed during
Qemu's start-up then created and provided to the guest OS.

To have a support for nvme create-ns and delete-ns commands with specified
parameters a different approach is needed.
Virtual devices representing namespaces need to be created and/or deleted 
during Qemu's running session, at anytime. The back-end image sizes for a
namespace must accommodate the payload size and size of metadata resulted
from specified parameters. The total capacity of the nvme controller
altogether with un-allocated capacity needs to be taken into account and
updated according to nvme create-ns and delete-ns commands respectively.

Here is the approach:
The nvme device will get new parameter:
 - auto-ns-path, which specifies the path to the storage area where back-end
   image and necessary config files located stored.

The virtual devices representing namespaces will be created dynamically during
the Qemu running session following issuance of nvme create-ns and delete-ns
commands from the guest OS. QOM classes and instances will be created utilizing
existing configuration scheme used during Qemu's start-up. Back-end image files
will be neither created nor deleted during Qemu's startup or running session.
Instead a set of back-end image files and relevant config will be created by
qemu-img tool with createns sub-command prior to Qemu's session.
Required parameters are: -S serial number which must match serial parameter of
qemu-system-xx -device nvme command line specification, -C total capacity, and
optional -N that will set a maximal limit on number of allowed
namespaces (default 256) which will be followed by path name pointing to
storage location corresponding to auto-ns-path of qemu-system-xx -device nvme
parameter.

Those created back-end image files will be pre-loaded during Qemu's start-up
and then during running Qemu's session will be associated or disassociated with
QOM namespaces virtual instances, as a result of issuing nvme create-ns or
delete-ns commands. The associated back-end image file for relevant namespace
will be re-sized as follows: delete-ns command will truncate image file to the
size of 0, whereas create-ns command will re-size the image file to the size
provided by nvme create-ns command parameters. Truncating/re-sizing is a result
of blk_truncate() API which utilizes co-routines and should not block Qemu main
thread while scheduling AIO operations. It is assured that all settings will
retain over Qemu's start-ups and shutdowns. The implementation makes it
possible to combine the existing "Additional Namespace" implementation with the
new "Managed Namespaces". Those will coexist with obvious restrictions, like
both will share the same NsIds space, "static" namespaces cannot be deleted or
if its NsId specified at Qemu's command line will conflicts with previously
created one by nvme create-ns (and retained), this will lead to an abort of
Qemu at its start up.

More than one of NVMe controllers associated with NVMe subsystem are supported.
This feature requires that parameters serial= and subsys= of additional
controllers must match those of the primary controller and auto-ns-path=
must not be specified.

Michael Kropaczek (2):
  hw/nvme: Support for Namespaces Management from guest OS - create-ns
  hw/nvme: Support for Namespaces Management from guest OS - delete-ns

 docs/system/devices/nvme.rst |  60 +++++-
 hw/nvme/cfg_key_checker.c    |  51 +++++
 hw/nvme/ctrl-cfg.c           | 224 +++++++++++++++++++++
 hw/nvme/ctrl.c               | 313 +++++++++++++++++++++++++++++-
 hw/nvme/meson.build          |   2 +-
 hw/nvme/ns-backend.c         | 288 +++++++++++++++++++++++++++
 hw/nvme/ns.c                 | 365 +++++++++++++++++++++++++++++++----
 hw/nvme/nvme.h               |  32 ++-
 hw/nvme/subsys.c             |  11 +-
 hw/nvme/trace-events         |   3 +
 include/block/nvme.h         |  31 +++
 include/hw/nvme/ctrl-cfg.h   |  24 +++
 include/hw/nvme/ns-cfg.h     |  28 +++
 include/hw/nvme/nvme-cfg.h   | 188 ++++++++++++++++++
 qemu-img-cmds.hx             |   6 +
 qemu-img.c                   | 132 +++++++++++++
 16 files changed, 1704 insertions(+), 54 deletions(-)
 create mode 100644 hw/nvme/cfg_key_checker.c
 create mode 100644 hw/nvme/ctrl-cfg.c
 create mode 100644 hw/nvme/ns-backend.c
 create mode 100644 include/hw/nvme/ctrl-cfg.h
 create mode 100644 include/hw/nvme/ns-cfg.h
 create mode 100644 include/hw/nvme/nvme-cfg.h

-- 
2.37.3



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns
  2022-12-28 19:41 [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS Jonathan Derrick
@ 2022-12-28 19:41 ` Jonathan Derrick
  2022-12-28 20:10   ` Keith Busch
  2022-12-28 19:41 ` [PATCH v4 2/2] hw/nvme: Support for Namespaces Management from guest OS - delete-ns Jonathan Derrick
  2023-01-09 13:29 ` [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS Kevin Wolf
  2 siblings, 1 reply; 6+ messages in thread
From: Jonathan Derrick @ 2022-12-28 19:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Kropaczek, qemu-block, Keith Busch, Klaus Jensen,
	Kevin Wolf, Hanna Reitz

From: Michael Kropaczek <michael.kropaczek@solidigm.com>

Added support for NVMEe NameSpaces Mangement allowing the guest OS to
create namespaces by issuing nvme create-ns command.
It is an extension to currently implemented Qemu nvme virtual device.
Virtual devices representing namespaces will be created and/or deleted
during Qemu's running session, at anytime.

Signed-off-by: Michael Kropaczek <michael.kropaczek@solidigm.com>
---
 docs/system/devices/nvme.rst |  59 ++++++-
 hw/nvme/cfg_key_checker.c    |  51 ++++++
 hw/nvme/ctrl-cfg.c           | 224 ++++++++++++++++++++++++++
 hw/nvme/ctrl.c               | 244 ++++++++++++++++++++++++++++-
 hw/nvme/meson.build          |   2 +-
 hw/nvme/ns-backend.c         | 283 +++++++++++++++++++++++++++++++++
 hw/nvme/ns.c                 | 293 ++++++++++++++++++++++++++++++-----
 hw/nvme/nvme.h               |  31 +++-
 hw/nvme/subsys.c             |  11 +-
 hw/nvme/trace-events         |   2 +
 include/block/nvme.h         |  30 ++++
 include/hw/nvme/ctrl-cfg.h   |  24 +++
 include/hw/nvme/ns-cfg.h     |  28 ++++
 include/hw/nvme/nvme-cfg.h   | 188 ++++++++++++++++++++++
 qemu-img-cmds.hx             |   6 +
 qemu-img.c                   | 132 ++++++++++++++++
 16 files changed, 1554 insertions(+), 54 deletions(-)
 create mode 100644 hw/nvme/cfg_key_checker.c
 create mode 100644 hw/nvme/ctrl-cfg.c
 create mode 100644 hw/nvme/ns-backend.c
 create mode 100644 include/hw/nvme/ctrl-cfg.h
 create mode 100644 include/hw/nvme/ns-cfg.h
 create mode 100644 include/hw/nvme/nvme-cfg.h

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index 30f841ef62..6b3bee5e5d 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -92,6 +92,63 @@ There are a number of parameters available:
   attach the namespace to a specific ``nvme`` device (identified by an ``id``
   parameter on the controller device).
 
+Additional Namespaces managed by guest OS Namespaces Management
+---------------------------------------------------------------------
+
+.. code-block:: console
+
+   -device nvme,id=nvme-ctrl,serial=1234,subsys=nvme-subsys,auto-ns-path=path
+
+Parameters:
+
+``auto-ns-path=<path to nvme storage location>``
+  If specified indicates a support for dynamic management of nvme namespaces
+  by means of nvme create-ns command. This path points
+  to the storage area for backend images must exist. Additionally it requires
+  that parameter `ns-subsys` must be specified whereas parameter `drive`
+  must not. The legacy namespace backend is disabled, instead, a pair of
+  files 'nvme_<ctrl SN>_ns_<NS-ID>.cfg' and 'nvme_<ctrl SN>_ns_<NS-ID>.img'
+  will refer to respective namespaces. The create-ns, attach-ns
+  and detach-ns commands, issued at the guest side, will make changes to
+  those files accordingly.
+  For each namespace exists an image file in raw format and a config file
+  containing namespace parameters and state of the attachment allowing QEMU
+  to configure namespaces accordingly during start up. If for instance an
+  image file has a size of 0 bytes this will be interpreted as non existent
+  namespace. Issuing create-ns command will change the status in the config
+  files and and will re-size the image file accordingly so the image file
+  will be associated with the respective namespace. The main config file
+  nvme_<ctrl SN>_ctrl.cfg keeps the track of allocated capacity to the
+  namespaces within the nvme controller.
+  As it is the case of a typical hard drive, backend images together with
+  config files need to be created. For this reason the qemu-img tool has
+  been extended by adding createns command.
+
+   qemu-img createns {-S <serial> -C <total NVMe capacity>}
+                     [-N <NsId max>] {<path>}
+
+  Parameters:
+  -S and -C and <path> are mandatory, `-S` must match `serial` parameter
+  and <path> must match `auto-ns-path` parameter of "-device nvme,..."
+  specification.
+  -N is optional, if specified it will set a limit to the number of potential
+  namespaces and will reduce the number of backend images and config files
+  accordingly. As a default, a set of images of 0 bytes size and default
+  config files for 256 namespaces will be created, a total of 513 files.
+
+Please note that ``nvme-ns`` device is not required to support of dynamic
+namespaces management feature. It is not prohibited to assign a such device to
+``nvme`` device specified to support dynamic namespace management if one has
+an use case to do so, however, it will only coexist and be out of the scope of
+Namespaces Management. NsIds will be consistently managed, creation (create-ns)
+of a namespace will not allocate the NsId already being taken. If ``nvme-ns``
+device conflicts with previously created one by create-ns (the same NsId),
+it will break QEMU's start up.
+More than one of NVMe controllers associated with NVMe subsystem are supported.
+This feature requires that parameters ``serial=`` and ``subsys=`` of additional
+controllers must match those of the primary controller and ``auto-ns-path=``
+must not be specified.
+
 NVM Subsystems
 --------------
 
@@ -320,4 +377,4 @@ controller are:
 
 .. code-block:: console
 
-   echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind
\ No newline at end of file
+   echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind
diff --git a/hw/nvme/cfg_key_checker.c b/hw/nvme/cfg_key_checker.c
new file mode 100644
index 0000000000..ed50117801
--- /dev/null
+++ b/hw/nvme/cfg_key_checker.c
@@ -0,0 +1,51 @@
+/*
+ * QEMU NVM Express Virtual Dynamic Namespace Management
+ *
+ *
+ * Copyright (c) 2022 Solidigm
+ *
+ * Authors:
+ *  Michael Kropaczek      <michael.kropaczek@solidigm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qapi/qmp/qnum.h"
+#include "qapi/qmp/qbool.h"
+#include "qapi/error.h"
+#include "block/qdict.h"
+
+#include "nvme.h"
+
+/* Here is a need for wrapping of original Qemu dictionary retrieval
+ * APIs. In rare cases, when nvme cfg files were tampered with or the
+ * Qemu version was upgraded and a new key is expected to be existent,
+ * but is missing, it will cause segfault crash.
+ * Builtin assert statements are not covering sufficiently such cases
+ * and additionally a possibility of error handling is lacking */
+#define NVME_KEY_CHECK_ERROR_FMT "key[%s] is expected to be existent"
+int64_t qdict_get_int_chkd(const QDict *qdict, const char *key, Error **errp)
+{
+    QObject *qobject = qdict_get(qdict, key);
+    if (qobject) {
+        return qnum_get_int(qobject_to(QNum, qobject));
+    }
+
+    error_setg(errp, NVME_KEY_CHECK_ERROR_FMT, key);
+    return 0;
+}
+
+QList *qdict_get_qlist_chkd(const QDict *qdict, const char *key, Error **errp)
+{
+    QObject *qobject = qdict_get(qdict, key);
+    if (qobject) {
+        return qobject_to(QList, qobject);
+    }
+
+    error_setg(errp, NVME_KEY_CHECK_ERROR_FMT, key);
+    return NULL;
+}
diff --git a/hw/nvme/ctrl-cfg.c b/hw/nvme/ctrl-cfg.c
new file mode 100644
index 0000000000..0835a55932
--- /dev/null
+++ b/hw/nvme/ctrl-cfg.c
@@ -0,0 +1,224 @@
+/*
+ * QEMU NVM Express Virtual Dynamic Namespace Management
+ *
+ *
+ * Copyright (c) 2022 Solidigm
+ *
+ * Authors:
+ *  Michael Kropaczek      <michael.kropaczek@solidigm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qlist.h"
+#include "sysemu/block-backend.h"
+#include "block/qdict.h"
+#include "qemu/int128.h"
+#include "hw/nvme/nvme-cfg.h"
+
+#include "nvme.h"
+#include "trace.h"
+
+static char *nvme_create_cfg_name(NvmeCtrl *n, Error **errp)
+{
+    return c_create_cfg_name(n->params.ns_directory, n->params.serial, errp);
+}
+
+static NvmeIdCtrl *get_nvme_id_ctrl(NvmeCtrl *n)
+{
+    NvmeCtrl *ctrl;
+    NvmeSubsystem *subsys = n->subsys;
+
+    ctrl = nvme_subsys_ctrl(subsys, 0);
+    return ctrl ? &ctrl->id_ctrl : &n->id_ctrl;
+}
+
+int nvme_cfg_save(NvmeCtrl *n)
+{
+    NvmeIdCtrl *id = get_nvme_id_ctrl(n);
+    QDict *nvme_cfg = NULL;
+    Int128  tnvmcap128;
+    Int128  unvmcap128;
+
+    nvme_cfg = qdict_new();
+
+    memcpy(&tnvmcap128, id->tnvmcap, sizeof(tnvmcap128));
+    memcpy(&unvmcap128, id->unvmcap, sizeof(unvmcap128));
+
+#define CTRL_CFG_DEF(type, key, value, default) \
+    qdict_put_##type(nvme_cfg, key, value);
+#include "hw/nvme/ctrl-cfg.h"
+#undef CTRL_CFG_DEF
+
+    return c_cfg_save(n->params.ns_directory, n->params.serial, nvme_cfg);
+}
+
+int nvme_cfg_update(NvmeCtrl *n, uint64_t amount, NvmeNsAllocAction action)
+{
+    int ret = 0;
+    NvmeCtrl *ctrl;
+    NvmeIdCtrl *id = get_nvme_id_ctrl(n);
+    NvmeSubsystem *subsys = n->subsys;
+    Int128  tnvmcap128;
+    Int128  unvmcap128;
+    Int128  amount128 = int128_make64(amount);
+    int i;
+
+    memcpy(&tnvmcap128, id->tnvmcap, sizeof(tnvmcap128));
+    memcpy(&unvmcap128, id->unvmcap, sizeof(unvmcap128));
+
+    switch (action) {
+    case NVME_NS_ALLOC_CHK:
+        if (int128_ge(unvmcap128, amount128)) {
+            return 0;   /* no update */
+        } else {
+            ret = -1;
+        }
+        break;
+    case NVME_NS_ALLOC:
+        if (int128_ge(unvmcap128, amount128)) {
+            unvmcap128 = int128_sub(unvmcap128, amount128);
+        } else {
+            ret = -1;
+        }
+        break;
+    case NVME_NS_DEALLOC:
+        unvmcap128 = int128_add(unvmcap128, amount128);
+        if (int128_ge(unvmcap128, tnvmcap128)) {
+            unvmcap128 = tnvmcap128;
+        }
+        break;
+    default:;
+    }
+
+    if (ret == 0) {
+        if (subsys) {
+            for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+                ctrl = nvme_subsys_ctrl(subsys, i);
+                if (ctrl) {
+                    id = &ctrl->id_ctrl;
+                    memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap));
+                }
+            }
+        } else {
+            memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap));
+        }
+    }
+
+    return ret;
+}
+
+/* Note: id->tnvmcap and id->unvmcap are pointing to 16 bytes arrays,
+ *       but those are interpreted as 128bits int objects.
+ *       It is OK here to use Int128 because backend's namespace images cannot
+ *       exceed size of 64bit max value */
+static int nvme_cfg_validate(NvmeCtrl *n, uint64_t tnvmcap, uint64_t unvmcap,
+                             Error **errp)
+{
+    int ret = 0;
+    NvmeIdCtrl *id = &n->id_ctrl;
+    Int128  tnvmcap128;
+    Int128  unvmcap128;
+
+    if (unvmcap > tnvmcap) {
+        error_setg(errp, "nvme-cfg file is corrupted, free to allocate[%"PRIu64
+                   "] > total capacity[%"PRIu64"]",
+                   unvmcap, tnvmcap);
+        ret = -1;
+    } else if (tnvmcap == (uint64_t) 0) {
+        error_setg(errp, "nvme-cfg file error: total capacity cannot be zero");
+        ret = -1;
+    } else {
+        tnvmcap128 = int128_make64(tnvmcap);
+        unvmcap128 = int128_make64(unvmcap);
+        memcpy(id->tnvmcap, &tnvmcap128, sizeof(id->tnvmcap));
+        memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap));
+    }
+
+    return ret;
+}
+
+int nvme_cfg_load(NvmeCtrl *n)
+{
+    QObject *nvme_cfg_obj = NULL;
+    QDict *nvme_cfg = NULL;
+    NvmeIdCtrl *id_p = get_nvme_id_ctrl(n);
+    NvmeIdCtrl *id = &n->id_ctrl;
+    int ret = 0;
+    char *filename = NULL;
+    uint64_t tnvmcap;
+    uint64_t unvmcap;
+    FILE *fp = NULL;
+    char buf[NVME_CFG_MAXSIZE] = {};
+    Error *local_err = NULL;
+
+    if (n->cntlid) {    /* secondary controller */
+        memcpy(id->tnvmcap, id_p->tnvmcap, sizeof(id->tnvmcap));
+        memcpy(id->unvmcap, id_p->unvmcap, sizeof(id->unvmcap));
+        goto fail2;
+    }
+
+    filename = nvme_create_cfg_name(n, &local_err);
+    if (local_err) {
+        goto fail2;
+    }
+
+    if (access(filename, F_OK)) {
+        error_setg(&local_err, "Missing nvme-cfg file");
+        goto fail2;
+    }
+
+    fp = fopen(filename, "r");
+    if (fp == NULL) {
+        error_setg(&local_err, "open %s: %s", filename,
+                     strerror(errno));
+        goto fail2;
+    }
+
+    if (fread(buf,  sizeof(buf), 1, fp)) {
+        error_setg(&local_err, "Could not read nvme-cfg");
+        goto fail1;
+    }
+
+    nvme_cfg_obj = qobject_from_json(buf, NULL);
+    if (!nvme_cfg_obj) {
+        error_setg(&local_err, "Could not parse the JSON for nvme-cfg");
+        goto fail1;
+    }
+
+    nvme_cfg = qobject_to(QDict, nvme_cfg_obj);
+    qdict_flatten(nvme_cfg);
+
+    tnvmcap = qdict_get_int_chkd(nvme_cfg, "tnvmcap", &local_err);
+    if (local_err) {
+        goto fail1;
+    }
+
+    unvmcap = qdict_get_int_chkd(nvme_cfg, "unvmcap", &local_err);
+    if (local_err) {
+        goto fail1;
+    }
+
+    nvme_cfg_validate(n, tnvmcap, unvmcap, &local_err);
+
+fail1:
+    fclose(fp);
+
+fail2:
+    if (local_err) {
+        error_report_err(local_err);
+        ret = -1;
+    }
+
+    qobject_unref(nvme_cfg_obj);
+    g_free(filename);
+
+    return ret;
+}
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 4a0c51a947..6155c2013e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -40,7 +40,9 @@
  *              sriov_vi_flexible=<N[optional]> \
  *              sriov_max_vi_per_vf=<N[optional]> \
  *              sriov_max_vq_per_vf=<N[optional]> \
- *              subsys=<subsys_id>
+ *              subsys=<subsys_id>, \
+ *              auto-ns-path=<path to ns storage[optional]>
+ *
  *      -device nvme-ns,drive=<drive_id>,bus=<bus_name>,nsid=<nsid>,\
  *              zoned=<true|false[optional]>, \
  *              subsys=<subsys_id>,detached=<true|false[optional]>
@@ -140,6 +142,65 @@
  *   a secondary controller. The default 0 resolves to
  *   `(sriov_vq_flexible / sriov_max_vfs)`.
  *
+ * - `auto-ns-path`
+ *   If specified indicates a support for dynamic management of nvme namespaces
+ *   by means of nvme create-ns command. This path pointing
+ *   to a storage area for backend images must exist. Additionally it requires
+ *   that parameter `ns-subsys` must be specified whereas parameter `drive`
+ *   must not. The legacy namespace backend is disabled, instead, a pair of
+ *   files 'nvme_<ctrl SN>_ns_<NS-ID>.cfg' and 'nvme_<ctrl SN>_ns_<NS-ID>.img'
+ *   will refer to respective namespaces. The create-ns, attach-ns
+ *   and detach-ns commands, issued at the guest side, will make changes to
+ *   those files accordingly.
+ *   For each namespace exists an image file in raw format and a config file
+ *   containing namespace parameters and a state of the attachment allowing QEMU
+ *   to configure namespace during its start up accordingly. If for instance an
+ *   image file has a size of 0 bytes, this will be interpreted as non existent
+ *   namespace. Issuing create-ns command will change the status in the config
+ *   files and and will re-size the image file accordingly so the image file
+ *   will be associated with the respective namespace. The main config file
+ *   nvme_<ctrl SN>_ctrl.cfg keeps the track of allocated capacity to the
+ *   namespaces within the nvme controller.
+ *   As it is the case of a typical hard drive, backend images together with
+ *   config files need to be created. For this reason the qemu-img tool has
+ *   been extended by adding createns command.
+ *
+ *    qemu-img createns {-S <serial> -C <total NVMe capacity>}
+ *                      [-N <NsId max>] {<path>}
+ *
+ *   Parameters:
+ *   -S and -C and <path> are mandatory, `-S` must match `serial` parameter
+ *   and <path> must match `auto-ns-path` parameter of "-device nvme,..."
+ *   specification.
+ *   -N is optional, if specified, it will set a limit to the number of potential
+ *   namespaces and will reduce the number of backend images and config files
+ *   accordingly. As a default, a set of images of 0 bytes size and default
+ *   config files for 256 namespaces will be created, a total of 513 files.
+ *
+ *   Note 1:
+ *         If the main "-drive" is not specified with 'if=virtio', then SeaBIOS
+ *         must be built with disabled "Parallelize hardware init" to allow
+ *         a proper boot. Without it, it is probable that non deterministic
+ *         order of collecting of potential block devices for a boot will not
+ *         catch that one with guest OS. Deterministic order however will fill
+ *         up the list of potential boot devices starting with a typical ATA
+ *         devices usually containing guest OS.
+ *         SeaBIOS has a limited space to store all potential boot block devices
+ *         if there are more than 11 namespaces. (other types require less
+ *         memory so the number of 11 does not apply universally)
+ *         (above Note refers to SeaBIOS rel-1.16.0)
+ *   Note 2:
+ *         If the main "-drive" referring to guest OS is specified with
+ *         'if=virtio', then there is no need to build SeaBIOS with disabled
+ *          "Parallelize hardware init".
+ *         Block boot device 'Virtio disk PCI:xx:xx.x" will appear as a first
+ *         listed instead of an ATA device.
+ *   Note 3:
+ *         More than one of NVMe controllers associated with NVMe subsystem are
+ *         supported. This feature requires that parameters 'serial=' and
+ *         'subsys=' of additional controllers must match those of the primary
+ *         controller and 'auto-ns-path=' must not be specified.
+ *
  * nvme namespace device parameters
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  * - `shared`
@@ -262,6 +323,7 @@ static const uint32_t nvme_cse_acs[256] = {
     [NVME_ADM_CMD_SET_FEATURES]     = NVME_CMD_EFF_CSUPP,
     [NVME_ADM_CMD_GET_FEATURES]     = NVME_CMD_EFF_CSUPP,
     [NVME_ADM_CMD_ASYNC_EV_REQ]     = NVME_CMD_EFF_CSUPP,
+    [NVME_ADM_CMD_NS_MGMT]          = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
     [NVME_ADM_CMD_NS_ATTACHMENT]    = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
     [NVME_ADM_CMD_VIRT_MNGMT]       = NVME_CMD_EFF_CSUPP,
     [NVME_ADM_CMD_DBBUF_CONFIG]     = NVME_CMD_EFF_CSUPP,
@@ -5577,6 +5639,133 @@ static void nvme_select_iocs_ns(NvmeCtrl *n, NvmeNamespace *ns)
     }
 }
 
+static NvmeNamespace *nvme_ns_mgmt_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *id_ns, Error **errp)
+{
+    NvmeNamespace *ns = NULL;
+    Error *local_err = NULL;
+
+    if (!n->params.ns_directory) {
+        error_setg(&local_err, "create-ns not supported if 'auto-ns-path' is not specified");
+    } else if (n->namespace.blkconf.blk) {
+        error_setg(&local_err, "create-ns not supported if 'drive' is specified");
+    } else {
+        ns = nvme_ns_create(n, nsid, id_ns, &local_err);
+    }
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ns = NULL;
+    }
+
+    return ns;
+}
+
+static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req)
+{
+    NvmeCtrl *n_p = NULL;     /* primary controller */
+    NvmeIdCtrl *id = &n->id_ctrl;
+    NvmeNamespace *ns;
+    NvmeIdNsMgmt id_ns = {};
+    uint8_t flags = req->cmd.flags;
+    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+    uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+    uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
+    uint8_t sel = dw10 & 0xf;
+    uint8_t csi = (dw11 >> 24) & 0xf;
+    uint16_t i;
+    uint16_t ret;
+    Error *local_err = NULL;
+
+    trace_pci_nvme_ns_mgmt(nvme_cid(req), nsid, sel, csi, NVME_CMD_FLAGS_PSDT(flags));
+
+    if (!(le16_to_cpu(id->oacs) & NVME_OACS_NS_MGMT)) {
+        return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR;
+    }
+
+    if (n->cntlid && !n->subsys) {
+        error_setg(&local_err, "Secondary controller without subsystem");
+        return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR;
+    }
+
+    n_p = n->subsys->ctrls[0];
+
+    switch (sel) {
+    case NVME_NS_MANAGEMENT_CREATE:
+        switch (csi) {
+        case NVME_CSI_NVM:
+            if (nsid) {
+                return NVME_INVALID_FIELD | NVME_DNR;
+            }
+
+            ret = nvme_h2c(n, (uint8_t *)&id_ns, sizeof(id_ns), req);
+            if (ret) {
+                return ret;
+            }
+
+            uint64_t nsze = le64_to_cpu(id_ns.nsze);
+            uint64_t ncap = le64_to_cpu(id_ns.ncap);
+
+            if (ncap > nsze) {
+                return NVME_INVALID_FIELD | NVME_DNR;
+            } else if (ncap != nsze) {
+                return NVME_THIN_PROVISION_NOTSPRD | NVME_DNR;
+            }
+
+            nvme_validate_flbas(id_ns.flbas, &local_err);
+            if (local_err) {
+                error_report_err(local_err);
+                return NVME_INVALID_FORMAT | NVME_DNR;
+            }
+
+            for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+                if (nvme_ns(n_p, (uint32_t)i) || nvme_subsys_ns(n_p->subsys, (uint32_t)i)) {
+                    continue;
+                }
+                break;
+            }
+
+
+            if (i >  le32_to_cpu(n_p->id_ctrl.nn) || i >  NVME_MAX_NAMESPACES) {
+               return NVME_NS_IDNTIFIER_UNAVAIL | NVME_DNR;
+            }
+            nsid = i;
+
+            /* create ns here */
+            ns = nvme_ns_mgmt_create(n, nsid, &id_ns, &local_err);
+            if (!ns || local_err) {
+                if (local_err) {
+                    error_report_err(local_err);
+                }
+                return NVME_INVALID_FIELD | NVME_DNR;
+            }
+
+            if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) {
+                /* place for delete-ns */
+                error_setg(&local_err, "Insufficient capacity, an orphaned ns[%"PRIu32"] will be left behind", nsid);
+                return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR;
+            }
+            (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC);
+            if (nvme_cfg_save(n)) {
+                (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC);
+                /* place for delete-ns */
+                error_setg(&local_err, "Cannot save conf file, an orphaned ns[%"PRIu32"] will be left behind", nsid);
+                return NVME_INVALID_FIELD | NVME_DNR;
+            }
+            req->cqe.result = cpu_to_le32(nsid);
+            break;
+        case NVME_CSI_ZONED:
+            /* fall through for now */
+        default:
+            return NVME_INVALID_FIELD | NVME_DNR;
+	    }
+        break;
+    default:
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
+    return NVME_SUCCESS;
+}
+
 static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeNamespace *ns;
@@ -5589,6 +5778,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
     uint16_t *ids = &list[1];
     uint16_t ret;
     int i;
+    Error *local_err;
 
     trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf);
 
@@ -5627,6 +5817,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
                 return NVME_NS_PRIVATE | NVME_DNR;
             }
 
+            ns->params.detached = false;
+
             nvme_attach_ns(ctrl, ns);
             nvme_select_iocs_ns(ctrl, ns);
 
@@ -5639,6 +5831,11 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
 
             ctrl->namespaces[nsid] = NULL;
             ns->attached--;
+            if (ns->attached) {
+                ns->params.detached = false;
+            } else {
+                ns->params.detached = true;
+            }
 
             nvme_update_dmrsl(ctrl);
 
@@ -5659,6 +5856,12 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
         }
     }
 
+    if (ns_cfg_save(n, ns, nsid) == -1) {           /* save ns cfg */
+        error_setg(&local_err, "Unable to save ns-cnf");
+        error_report_err(local_err);
+        return NVME_INVALID_FIELD | NVME_DNR;
+    }
+
     return NVME_SUCCESS;
 }
 
@@ -6124,6 +6327,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
         return nvme_get_feature(n, req);
     case NVME_ADM_CMD_ASYNC_EV_REQ:
         return nvme_aer(n, req);
+    case NVME_ADM_CMD_NS_MGMT:
+        return nvme_ns_mgmt(n, req);
     case NVME_ADM_CMD_NS_ATTACHMENT:
         return nvme_ns_attachment(n, req);
     case NVME_ADM_CMD_VIRT_MNGMT:
@@ -6966,7 +7171,7 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp)
         params->max_ioqpairs = params->num_queues - 1;
     }
 
-    if (n->namespace.blkconf.blk && n->subsys) {
+    if (n->namespace.blkconf.blk && n->subsys && !params->ns_directory) {
         error_setg(errp, "subsystem support is unavailable with legacy "
                    "namespace ('drive' property)");
         return;
@@ -7480,12 +7685,14 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
                             BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
 }
 
+static void nvme_add_bootindex(Object *obj);
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
     NvmeCtrl *n = NVME(pci_dev);
     NvmeNamespace *ns;
     Error *local_err = NULL;
     NvmeCtrl *pn = NVME(pcie_sriov_get_pf(pci_dev));
+    NvmeCtrl *ctrl = NULL;
 
     if (pci_is_vf(pci_dev)) {
         /*
@@ -7505,6 +7712,15 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     qbus_init(&n->bus, sizeof(NvmeBus), TYPE_NVME_BUS,
               &pci_dev->qdev, n->parent_obj.qdev.id);
 
+    ctrl = nvme_subsys_ctrl(n->subsys, 0);
+
+    /* check if secondary controller, if so take over the ns_directory */
+    if (ctrl && ctrl->params.ns_directory && !n->params.ns_directory) {
+        n->params.ns_directory = g_strdup(ctrl->params.ns_directory);
+    }
+
+    nvme_add_bootindex(OBJECT(n));
+
     if (nvme_init_subsys(n, errp)) {
         error_propagate(errp, local_err);
         return;
@@ -7516,7 +7732,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
     nvme_init_ctrl(n, pci_dev);
 
     /* setup a namespace if the controller drive property was given */
-    if (n->namespace.blkconf.blk) {
+    if (n->namespace.blkconf.blk && !n->params.ns_directory) {
         ns = &n->namespace;
         ns->params.nsid = 1;
 
@@ -7525,6 +7741,14 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         }
 
         nvme_attach_ns(n, ns);
+    } else if (!n->namespace.blkconf.blk && n->params.ns_directory) {
+        if (nvme_cfg_load(n)) {
+            error_setg(errp, "Could not process nvme-cfg");
+            return;
+        }
+        if (nvme_ns_backend_setup(n, errp)) {
+            return;
+        }
     }
 }
 
@@ -7569,6 +7793,7 @@ static void nvme_exit(PCIDevice *pci_dev)
 
 static Property nvme_props[] = {
     DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
+    DEFINE_PROP_STRING("auto-ns-path", NvmeCtrl,params.ns_directory),
     DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND,
                      HostMemoryBackend *),
     DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
@@ -7706,14 +7931,19 @@ static void nvme_class_init(ObjectClass *oc, void *data)
     dc->reset = nvme_pci_reset;
 }
 
-static void nvme_instance_init(Object *obj)
+static void nvme_add_bootindex(Object *obj)
 {
     NvmeCtrl *n = NVME(obj);
 
-    device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
-                                  "bootindex", "/namespace@1,0",
-                                  DEVICE(obj));
+    if (!n->params.ns_directory) {
+        device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex,
+                                      "bootindex", "/namespace@1,0",
+                                      DEVICE(obj));
+    }
+}
 
+static void nvme_instance_init(Object *obj)
+{
     object_property_add(obj, "smart_critical_warning", "uint8",
                         nvme_get_smart_warning,
                         nvme_set_smart_warning, NULL, NULL);
diff --git a/hw/nvme/meson.build b/hw/nvme/meson.build
index 3cf40046ee..8900831701 100644
--- a/hw/nvme/meson.build
+++ b/hw/nvme/meson.build
@@ -1 +1 @@
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', 'ns.c', 'subsys.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', 'ns.c', 'subsys.c', 'ns-backend.c', 'cfg_key_checker.c', 'ctrl-cfg.c'))
diff --git a/hw/nvme/ns-backend.c b/hw/nvme/ns-backend.c
new file mode 100644
index 0000000000..06de8f262c
--- /dev/null
+++ b/hw/nvme/ns-backend.c
@@ -0,0 +1,283 @@
+/*
+ * QEMU NVM Express Virtual Dynamic Namespace Management
+ *
+ *
+ * Copyright (c) 2022 Solidigm
+ *
+ * Authors:
+ *  Michael Kropaczek      <michael.kropaczek@solidigm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qnum.h"
+#include "qapi/qmp/qjson.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qlist.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/block-backend.h"
+#include "block/qdict.h"
+#include "hw/nvme/nvme-cfg.h"
+
+#include "nvme.h"
+#include "trace.h"
+
+/* caller will take ownership */
+static QDict *ns_get_bs_default_opts(bool read_only)
+{
+    QDict *bs_opts = qdict_new();
+
+    qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_DIRECT, "off");
+    qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_NO_FLUSH, "off");
+    qdict_set_default_str(bs_opts, BDRV_OPT_READ_ONLY,
+                          read_only ? "on" : "off");
+    qdict_set_default_str(bs_opts, BDRV_OPT_AUTO_READ_ONLY, "on");
+    qdict_set_default_str(bs_opts, "driver", "raw");
+
+    return bs_opts;
+}
+
+BlockBackend *ns_blockdev_init(const char *file, Error **errp)
+{
+    BlockBackend *blk = NULL;
+    bool read_only = false;
+    QDict *bs_opts;
+
+    if (access(file, F_OK)) {
+        error_setg(errp, "%s not found, please create one", file);
+    } else {
+        bs_opts = ns_get_bs_default_opts(read_only);
+        blk = blk_new_open(file, NULL, bs_opts, BDRV_O_RDWR | BDRV_O_RESIZE, errp);
+    }
+
+    return blk;
+}
+
+void ns_blockdev_activate(BlockBackend *blk,  uint64_t image_size, Error **errp)
+{
+    int ret;
+
+    ret = blk_set_perm(blk, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_WRITE_UNCHANGED,  errp);
+    if (ret < 0) {
+        return;
+    }
+    ret = blk_truncate(blk, image_size, false, PREALLOC_MODE_OFF, 0,
+                       errp);
+}
+
+int ns_storage_path_check(NvmeCtrl *n, Error **errp)
+{
+    return storage_path_check(n->params.ns_directory,  n->params.serial, errp);
+}
+
+/* caller will take ownership */
+char *ns_create_image_name(NvmeCtrl *n, uint32_t nsid, Error **errp)
+{
+    return create_image_name(n->params.ns_directory, n->params.serial, nsid, errp);
+}
+
+static char *ns_create_cfg_name(NvmeCtrl *n, uint32_t nsid, Error **errp)
+{
+    return create_cfg_name(n->params.ns_directory, n->params.serial, nsid, errp);
+}
+
+int ns_auto_check(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid)
+{
+    int ret = 0;
+    BlockBackend *blk = ns->blkconf.blk;
+    char *file_name_img = NULL;
+
+    if (!blk) {
+        return 0;
+    }
+
+    file_name_img = ns_create_image_name(n, nsid, NULL);
+
+    if (!file_name_img || strcmp(blk_bs(blk)->filename, file_name_img)) {
+        ret = -1;
+    }
+
+    g_free(file_name_img);
+
+    return ret;
+}
+
+void ns_cfg_clear(NvmeNamespace *ns)
+{
+    ns->params.pi = 0;
+    ns->lbasz = 0;
+    ns->id_ns.nsze = 0;
+    ns->id_ns.ncap = 0;
+    ns->id_ns.nuse = 0;
+    ns->id_ns.nsfeat = 0;
+    ns->id_ns.flbas = 0;
+    ns->id_ns.nmic= 0;
+    ns->size = 0;
+}
+
+int ns_cfg_save(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid)
+{
+    NvmeSubsystem *subsys = n->subsys;
+    QDict *ns_cfg = NULL;
+    QList *ctrl_qlist = NULL;
+    Error *local_err = NULL;
+    int i;
+
+    if (ns_auto_check(n, ns, nsid)) {
+        error_setg(&local_err, "ns-cfg not saved: ns[%"PRIu32"] configured via '-device nvme-ns'", nsid);
+        error_report_err(local_err);
+        return 1;       /* not an error */
+    }
+
+    ctrl_qlist = qlist_new();
+    ns_cfg = qdict_new();
+
+    if (subsys) {
+        for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+            NvmeCtrl *ctrl = subsys->ctrls[i];
+
+            if (ctrl && nvme_ns(ctrl, nsid)) {
+                qlist_append_int(ctrl_qlist, i);
+            }
+        }
+    }
+
+#define NS_CFG_DEF(type, key, value, default) \
+    qdict_put_##type(ns_cfg, key, value);
+#include "hw/nvme/ns-cfg.h"
+#undef NS_CFG_DEF
+
+    return nsid_cfg_save(n->params.ns_directory, n->params.serial, ns_cfg, nsid);
+}
+
+static bool glist_exists_int(QList *qlist, int64_t value)
+{
+    QListEntry *entry;
+
+    QLIST_FOREACH_ENTRY(qlist, entry) {
+        if (qnum_get_int(qobject_to(QNum, entry->value)) == value) {
+            return true;
+        }
+    }
+    return false;
+}
+
+
+int ns_cfg_load(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid)
+{
+    QObject *ns_cfg_obj = NULL;
+    QDict *ns_cfg = NULL;
+    QList *ctrl_qlist = NULL;
+    int ret = 0;
+    char *filename = NULL;
+    FILE *fp = NULL;
+    char buf[NS_CFG_MAXSIZE] = {};
+    Error *local_err = NULL;
+
+    if (ns_auto_check(n, ns, nsid)) {
+        error_setg(&local_err, "ns-cfg not loaded: ns[%"PRIu32"] configured via '-device nvme-ns'", nsid);
+        ret = 1;    /* not an error */
+        goto fail2;
+    }
+
+    filename = ns_create_cfg_name(n, nsid, &local_err);
+    if (local_err) {
+        goto fail2;
+    }
+
+    if (access(filename, F_OK)) {
+        error_setg(&local_err, "Missing ns-cfg file");
+        goto fail2;
+    }
+
+    fp = fopen(filename, "r");
+    if (fp == NULL) {
+        error_setg(&local_err, "open %s: %s", filename,
+                     strerror(errno));
+        goto fail2;
+    }
+
+    if (fread(buf,  sizeof(buf), 1, fp)) {
+        error_setg(&local_err, "Could not read ns-cfg");
+        goto fail1;
+    }
+
+    ns_cfg_obj = qobject_from_json(buf, NULL);
+    if (!ns_cfg_obj) {
+        error_setg(&local_err, "Could not parse the JSON for ns-cfg");
+        goto fail1;
+    }
+
+    ns_cfg = qobject_to(QDict, ns_cfg_obj);
+
+    ns->params.nsid = (uint32_t)qdict_get_int_chkd(ns_cfg, "params.nsid", &local_err);  /* (uint32_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ctrl_qlist = qdict_get_qlist_chkd(ns_cfg, "attached_ctrls", &local_err);            /* (QList) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->params.detached = !glist_exists_int(ctrl_qlist, n->cntlid);
+    ns->params.pi = (uint8_t)qdict_get_int_chkd(ns_cfg, "params.pi", &local_err);       /* (uint8_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->lbasz = (size_t)qdict_get_int_chkd(ns_cfg, "lbasz", &local_err);                /* (size_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->id_ns.nsze = cpu_to_le64(qdict_get_int_chkd(ns_cfg, "id_ns.nsze", &local_err)); /* (uint64_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->id_ns.ncap = cpu_to_le64(qdict_get_int_chkd(ns_cfg, "id_ns.ncap", &local_err)); /* (uint64_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->id_ns.nuse = cpu_to_le64(qdict_get_int_chkd(ns_cfg, "id_ns.nuse", &local_err)); /* (uint64_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->id_ns.nsfeat = (uint8_t)qdict_get_int_chkd(ns_cfg, "id_ns.nsfeat", &local_err); /* (uint8_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->id_ns.flbas = (uint8_t)qdict_get_int_chkd(ns_cfg, "id_ns.flbas", &local_err);   /* (uint8_t) */
+    if (local_err) {
+        goto fail1;
+    }
+    ns->id_ns.nmic = (uint8_t)qdict_get_int_chkd(ns_cfg, "id_ns.nmic", &local_err);     /* (uint8_t) */
+    if (local_err) {
+        goto fail1;
+    }
+
+    /* ns->size below will be overwritten after nvme_ns_backend_sanity_chk() */
+    ns->size = qdict_get_int_chkd(ns_cfg, "ns_size", &local_err);                       /* (uint64_t) */
+    if (local_err) {
+        goto fail1;
+    }
+
+    /* it is expected that ns-cfg file is consistent with paired ns-img file
+     * here is a simple check preventing against a crash */
+    nvme_validate_flbas(ns->id_ns.flbas, &local_err);
+
+fail1:
+    fclose(fp);
+
+fail2:
+    if (local_err) {
+        error_report_err(local_err);
+        ret = !ret ? -1: ret;
+    }
+
+    qobject_unref(ns_cfg_obj);
+    g_free(filename);
+    return ret;
+}
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 62a1f97be0..6e518f97be 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -3,9 +3,11 @@
  *
  * Copyright (c) 2019 CNEX Labs
  * Copyright (c) 2020 Samsung Electronics
+ * Copyright (c) 2022 Solidigm
  *
  * Authors:
  *  Klaus Jensen      <k.jensen@samsung.com>
+ *  Michael Kropaczek <michael.kropaczek@solidigm.com>
  *
  * This work is licensed under the terms of the GNU GPL, version 2. See the
  * COPYING file in the top-level directory.
@@ -55,6 +57,26 @@ void nvme_ns_init_format(NvmeNamespace *ns)
     id_ns->npda = id_ns->npdg = npdg - 1;
 }
 
+#define NVME_LBAF_DFLT_CNT 8
+#define NVME_LBAF_DFLT_SIZE 16
+static unsigned int ns_get_default_lbafs(void *lbafp)
+{
+    static const NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE] = {
+        [0] = { .ds =  9           },
+        [1] = { .ds =  9, .ms =  8 },
+        [2] = { .ds =  9, .ms = 16 },
+        [3] = { .ds =  9, .ms = 64 },
+        [4] = { .ds = 12           },
+        [5] = { .ds = 12, .ms =  8 },
+        [6] = { .ds = 12, .ms = 16 },
+        [7] = { .ds = 12, .ms = 64 },
+    };
+
+    memcpy(lbafp, &lbaf[0], sizeof(lbaf));
+
+    return NVME_LBAF_DFLT_CNT;
+}
+
 static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 {
     static uint64_t ns_count;
@@ -64,6 +86,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
     uint16_t ms;
     int i;
 
+    ms = ns->params.ms;
+    if (ms && NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)) {
+        return -1;
+    }
+
     ns->csi = NVME_CSI_NVM;
     ns->status = 0x0;
 
@@ -89,7 +116,6 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
     id_ns->eui64 = cpu_to_be64(ns->params.eui64);
 
     ds = 31 - clz32(ns->blkconf.logical_block_size);
-    ms = ns->params.ms;
 
     id_ns->mc = NVME_ID_NS_MC_EXTENDED | NVME_ID_NS_MC_SEPARATE;
 
@@ -105,39 +131,25 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
     ns->pif = ns->params.pif;
 
-    static const NvmeLBAF lbaf[16] = {
-        [0] = { .ds =  9           },
-        [1] = { .ds =  9, .ms =  8 },
-        [2] = { .ds =  9, .ms = 16 },
-        [3] = { .ds =  9, .ms = 64 },
-        [4] = { .ds = 12           },
-        [5] = { .ds = 12, .ms =  8 },
-        [6] = { .ds = 12, .ms = 16 },
-        [7] = { .ds = 12, .ms = 64 },
-    };
-
-    ns->nlbaf = 8;
+    ns->nlbaf = ns_get_default_lbafs(&id_ns->lbaf);
 
-    memcpy(&id_ns->lbaf, &lbaf, sizeof(lbaf));
-
-    for (i = 0; i < ns->nlbaf; i++) {
-        NvmeLBAF *lbaf = &id_ns->lbaf[i];
-        if (lbaf->ds == ds) {
-            if (lbaf->ms == ms) {
-                id_ns->flbas |= i;
-                goto lbaf_found;
+    if (ms) { /* ms from params */
+        for (i = 0; i < ns->nlbaf; i++) {
+            NvmeLBAF *lbaf = &id_ns->lbaf[i];
+            if (lbaf->ds == ds && lbaf->ms == ms) {
+                    id_ns->flbas |= i;
+                    goto lbaf_found;
             }
         }
+        /* add non-standard lba format */
+        id_ns->lbaf[ns->nlbaf].ds = ds;
+        id_ns->lbaf[ns->nlbaf].ms = ms;
+        ns->nlbaf++;
+        id_ns->flbas |= i;
+    } else {
+        i = NVME_ID_NS_FLBAS_INDEX(id_ns->flbas);
     }
 
-    /* add non-standard lba format */
-    id_ns->lbaf[ns->nlbaf].ds = ds;
-    id_ns->lbaf[ns->nlbaf].ms = ms;
-    ns->nlbaf++;
-
-    id_ns->flbas |= i;
-
-
 lbaf_found:
     id_ns_nvm->elbaf[i] = (ns->pif & 0x3) << 7;
     id_ns->nlbaf = ns->nlbaf - 1;
@@ -482,6 +494,120 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
     return 0;
 }
 
+static void  nvme_ns_backend_sanity_chk(NvmeNamespace *ns, BlockBackend *blk, Error **errp)
+{
+    uint64_t ns_size_img = ns->size;
+    uint64_t ns_size_cfg = blk_getlength(blk);
+
+    if (ns_size_cfg != ns_size_img) {
+        error_setg(errp, "ns-backend sanity check for nsid [%"PRIu32"] failed", ns->params.nsid);
+    }
+}
+
+void nvme_validate_flbas(uint8_t flbas,  Error **errp)
+{
+    uint8_t nlbaf;
+    NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE];
+
+    nlbaf = ns_get_default_lbafs(&lbaf[0]);
+    flbas = NVME_ID_NS_FLBAS_INDEX(flbas);
+    if (flbas >= nlbaf) {
+        error_setg(errp, "FLBA size index is out of range, max supported [%"PRIu8"]", nlbaf - 1);
+    }
+}
+
+NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *id_ns, Error **errp)
+{
+    NvmeCtrl *ctrl = NULL;     /* primary controller */
+    NvmeSubsystem *subsys = n->subsys;
+    NvmeNamespace *ns = NULL;
+    DeviceState *dev = NULL;
+    uint64_t nsze = le64_to_cpu(id_ns->nsze);
+    uint64_t ncap = le64_to_cpu(id_ns->ncap);
+    uint8_t flbas = id_ns->flbas;
+    uint8_t dps = id_ns->dps;
+    uint8_t nmic = id_ns->nmic;
+    NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE];
+    size_t lbasz;
+    uint64_t image_size;
+    Error *local_err = NULL;
+    BlockBackend *blk = NULL;
+
+    trace_pci_nvme_ns_create(nsid, nsze, ncap, flbas);
+
+    flbas = NVME_ID_NS_FLBAS_INDEX(flbas);
+
+    ns_get_default_lbafs(&lbaf[0]);
+    lbasz = 1 << lbaf[flbas].ds;
+    image_size = (lbasz + lbaf[flbas].ms) * nsze;
+
+    dev = qdev_try_new(TYPE_NVME_NS);
+    if (!dev) {
+        error_setg(&local_err, "Unable to allocate ns QOM (dev)");
+        goto fail;
+    }
+
+    if (n->cntlid > 0 && !subsys) {
+        error_setg(&local_err, "Secondary controller without subsystem");
+        goto fail2;
+    }
+
+    ctrl = nvme_subsys_ctrl(subsys, 0);
+
+    if (!ctrl) {
+        error_setg(&local_err, "Missing reference to primary controller");
+        goto fail2;
+    }
+
+    ns = NVME_NS(dev);
+    if (ns) {
+        ns->params.nsid = nsid;
+        ns->params.detached = true;
+        ns->params.pi = dps;
+        ns->id_ns.nsfeat = 0x0;     /* reporting no support for THINP */
+        ns->lbasz = lbasz;
+        ns->id_ns.flbas = id_ns->flbas;
+        ns->id_ns.nsze = cpu_to_le64(nsze);
+        ns->id_ns.ncap = cpu_to_le64(ncap);
+        ns->id_ns.nuse = cpu_to_le64(ncap);     /* at this time no usage recording */
+        ns->id_ns.nmic = nmic;
+
+        blk = ctrl->preloaded_blk[nsid];
+        if (blk) {
+            ns_blockdev_activate(blk, image_size, &local_err);
+            if (local_err) {
+                goto fail2;
+            }
+            ns->blkconf.blk = blk;
+            qdev_realize_and_unref(dev, &n->bus.parent_bus, &local_err);    /* causes by extension
+                                                                             * a call to
+                                                                             * nvme_ns_realize() */
+            if (local_err) {
+                goto fail2;
+            }
+        } else {
+            error_setg(&local_err, "Unable to find preloaded back-end reference");
+            goto fail2;
+        }
+
+        if (ns_cfg_save(n, ns, nsid)) {               /* save ns cfg */
+            error_setg(&local_err, "Unable to save ns-cnf, an orphaned ns[%"PRIu32"] will be left behind", nsid);
+            goto fail;
+        }
+    }
+    return ns;
+
+fail2:
+    if (blk) {
+        ctrl->preloaded_blk[nsid] = NULL;
+    }
+    object_unref(OBJECT(dev));
+
+fail:
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 {
     if (nvme_ns_check_constraints(ns, errp)) {
@@ -505,6 +631,87 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
     return 0;
 }
 
+static void nvme_ns_add_bootindex(Object *obj);
+int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp)
+{
+    DeviceState *dev = NULL;
+    NvmeCtrl *ctrl = NULL;
+    NvmeSubsystem *subsys = n->subsys;
+    BlockBackend *blk;
+    NvmeNamespace *ns;
+    NvmeNamespace *ns_p = NULL;
+    uint16_t i;
+    int ret = 0;
+    char *exact_filename;
+    Error *local_err = NULL;
+
+    if (n->cntlid > 0) {    /* secondary controller */
+        ctrl = nvme_subsys_ctrl(subsys, 0);
+    }
+
+    for (i = 1; i <= NVME_MAX_NAMESPACES && !local_err; i++ ) {
+        blk = NULL;
+        exact_filename = ns_create_image_name(n, i, &local_err);
+        if (local_err) {
+            break;
+        }
+        if (access(exact_filename, F_OK)) { /* skip if not found */
+            g_free(exact_filename);
+            continue;
+        }
+
+        if (ctrl) {    /* reference from subsys to primary controller */
+            ns_p = nvme_subsys_ns(subsys, i);
+            if (!ns_p) {
+                continue;
+            }
+            blk = ctrl->preloaded_blk[i];
+        } else {
+            blk = ns_blockdev_init(exact_filename, &local_err);
+        }
+
+        g_free(exact_filename);
+
+        if (blk_getlength(blk)) {       /* namespace was created in a previous QEMU session */
+            dev = qdev_try_new(TYPE_NVME_NS);
+            if (!dev) {
+                error_setg(&local_err, "Unable to create a new device entry");
+                break;
+            }
+
+            ns = NVME_NS(dev);
+            if (ns) {
+                if (ns_cfg_load(n, ns, i) == -1) {     /* load ns cfg */
+                    error_setg(&local_err, "Unable to load ns-cfg for ns [%"PRIu16"]", i);
+                    break;
+                }
+                nvme_ns_backend_sanity_chk(ns, blk, &local_err);
+                if (local_err) {
+                    break;
+                }
+                ns->blkconf.blk = blk;
+                qdev_realize_and_unref(dev, &n->bus.parent_bus, &local_err);    /* causes by extension
+                                                                                 * a call to
+                                                                                 * nvme_ns_realize() */
+            }
+        }
+
+        if (!ctrl) {    /* if primary controller */
+            n->preloaded_blk[i] = blk;
+            n->id_ctrl.nn = cpu_to_le32(i);
+        }
+    }
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -1;
+    } else if (ctrl) {
+        n->id_ctrl.nn = ctrl->id_ctrl.nn;
+    }
+
+    return ret;
+}
+
 void nvme_ns_drain(NvmeNamespace *ns)
 {
     blk_drain(ns->blkconf.blk);
@@ -563,11 +770,13 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
         }
     }
 
+    nvme_ns_add_bootindex(OBJECT(dev));
+
     if (nvme_ns_setup(ns, errp)) {
         return;
     }
 
-    if (!nsid) {
+    if (!nsid) {    /* legacy implementation without params.nsid specified */
         for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
             if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
                 continue;
@@ -582,7 +791,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
             return;
         }
     } else {
-        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
+        if (nvme_ns(n, nsid) || (!n->params.ns_directory && nvme_subsys_ns(subsys, nsid))) {
             error_setg(errp, "namespace id '%d' already allocated", nsid);
             return;
         }
@@ -596,14 +805,18 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
         }
 
         if (ns->params.shared) {
-            for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
-                NvmeCtrl *ctrl = subsys->ctrls[i];
-
-                if (ctrl && ctrl != SUBSYS_SLOT_RSVD) {
-                    nvme_attach_ns(ctrl, ns);
+            if (!n->params.ns_directory) {
+                /* legacy implementation */
+                for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+                    NvmeCtrl *ctrl = subsys->ctrls[i];
+
+                    if (ctrl && ctrl != SUBSYS_SLOT_RSVD) {
+                        nvme_attach_ns(ctrl, ns);
+                    }
                 }
+            } else {
+                nvme_attach_ns(n, ns);
             }
-
             return;
         }
     }
@@ -660,7 +873,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)
     dc->desc = "Virtual NVMe namespace";
 }
 
-static void nvme_ns_instance_init(Object *obj)
+static void nvme_ns_add_bootindex(Object *obj)
 {
     NvmeNamespace *ns = NVME_NS(obj);
     char *bootindex = g_strdup_printf("/namespace@%d,0", ns->params.nsid);
@@ -671,6 +884,10 @@ static void nvme_ns_instance_init(Object *obj)
     g_free(bootindex);
 }
 
+static void nvme_ns_instance_init(Object *obj)
+{
+}
+
 static const TypeInfo nvme_ns_info = {
     .name = TYPE_NVME_NS,
     .parent = TYPE_DEVICE,
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 7adf042ec3..15b790cbdd 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -23,9 +23,8 @@
 #include "hw/block/block.h"
 
 #include "block/nvme.h"
+#include "hw/nvme/ctrl-cfg.h"
 
-#define NVME_MAX_CONTROLLERS 256
-#define NVME_MAX_NAMESPACES  256
 #define NVME_EUI64_DEFAULT ((uint64_t)0x5254000000000000)
 
 QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
@@ -279,6 +278,8 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
+void nvme_validate_flbas(uint8_t flbas,  Error **errp);
+NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *id_ns, Error **errp);
 
 typedef struct NvmeAsyncEvent {
     QTAILQ_ENTRY(NvmeAsyncEvent) entry;
@@ -339,6 +340,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc)
     case NVME_ADM_CMD_SET_FEATURES:     return "NVME_ADM_CMD_SET_FEATURES";
     case NVME_ADM_CMD_GET_FEATURES:     return "NVME_ADM_CMD_GET_FEATURES";
     case NVME_ADM_CMD_ASYNC_EV_REQ:     return "NVME_ADM_CMD_ASYNC_EV_REQ";
+    case NVME_ADM_CMD_NS_MGMT:          return "NVME_ADM_CMD_NS_MGMT";
     case NVME_ADM_CMD_NS_ATTACHMENT:    return "NVME_ADM_CMD_NS_ATTACHMENT";
     case NVME_ADM_CMD_VIRT_MNGMT:       return "NVME_ADM_CMD_VIRT_MNGMT";
     case NVME_ADM_CMD_DBBUF_CONFIG:     return "NVME_ADM_CMD_DBBUF_CONFIG";
@@ -427,6 +429,7 @@ typedef struct NvmeParams {
     uint16_t sriov_vi_flexible;
     uint8_t  sriov_max_vq_per_vf;
     uint8_t  sriov_max_vi_per_vf;
+    char     *ns_directory;     /* if empty (default) one legacy ns will be created */
 } NvmeParams;
 
 typedef struct NvmeCtrl {
@@ -485,8 +488,9 @@ typedef struct NvmeCtrl {
 
     NvmeSubsystem   *subsys;
 
-    NvmeNamespace   namespace;
+    NvmeNamespace   namespace;                  /* if ns_directory is empty this will be used */
     NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES + 1];
+    BlockBackend    *preloaded_blk[NVME_MAX_NAMESPACES + 1];
     NvmeSQueue      **sq;
     NvmeCQueue      **cq;
     NvmeSQueue      admin_sq;
@@ -509,6 +513,7 @@ typedef struct NvmeCtrl {
         uint16_t    vqrfap;
         uint16_t    virfap;
     } next_pri_ctrl_cap;    /* These override pri_ctrl_cap after reset */
+    uint16_t nsidMax;
 } NvmeCtrl;
 
 typedef enum NvmeResetType {
@@ -575,6 +580,9 @@ static inline NvmeSecCtrlEntry *nvme_sctrl_for_cntlid(NvmeCtrl *n,
     return NULL;
 }
 
+BlockBackend *ns_blockdev_init(const char *file, Error **errp);
+void ns_blockdev_activate(BlockBackend *blk,  uint64_t image_size, Error **errp);
+int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp);
 void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns);
 uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, uint32_t len,
                           NvmeTxDirection dir, NvmeRequest *req);
@@ -583,5 +591,22 @@ uint16_t nvme_bounce_mdata(NvmeCtrl *n, void *ptr, uint32_t len,
 void nvme_rw_complete_cb(void *opaque, int ret);
 uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
                        NvmeCmd *cmd);
+char *ns_create_image_name(NvmeCtrl *n, uint32_t nsid, Error **errp);
+int ns_storage_path_check(NvmeCtrl *n, Error **errp);
+int ns_auto_check(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid);
+int ns_cfg_save(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid);
+int ns_cfg_load(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid);
+int64_t qdict_get_int_chkd(const QDict *qdict, const char *key, Error **errp);
+QList *qdict_get_qlist_chkd(const QDict *qdict, const char *key, Error **errp);
+void ns_cfg_clear(NvmeNamespace *ns);
+int nvme_cfg_save(NvmeCtrl *n);
+int nvme_cfg_load(NvmeCtrl *n);
+
+typedef enum NvmeNsAllocAction {
+    NVME_NS_ALLOC_CHK,
+    NVME_NS_ALLOC,
+    NVME_NS_DEALLOC,
+} NvmeNsAllocAction;
+int nvme_cfg_update(NvmeCtrl *n, uint64_t ammount, NvmeNsAllocAction action);
 
 #endif /* HW_NVME_NVME_H */
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 9d2643678b..5f4bb0e6a2 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -87,10 +87,13 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 
     subsys->ctrls[cntlid] = n;
 
-    for (nsid = 1; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) {
-        NvmeNamespace *ns = subsys->namespaces[nsid];
-        if (ns && ns->params.shared && !ns->params.detached) {
-            nvme_attach_ns(n, ns);
+    if (!n->params.ns_directory) {
+        /* legacy implementation */
+        for (nsid = 1; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) {
+            NvmeNamespace *ns = subsys->namespaces[nsid];
+            if (ns && ns->params.shared && !ns->params.detached) {
+                nvme_attach_ns(n, ns);
+            }
         }
     }
 
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index fccb79f489..28b025ac42 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -77,6 +77,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16""
 pci_nvme_aer_aerl_exceeded(void) "aerl exceeded"
 pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8""
 pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
+pci_nvme_ns_mgmt(uint16_t cid, uint32_t nsid, uint8_t sel, uint8_t csi, uint8_t psdt) "cid %"PRIu16", nsid=%"PRIu32", sel=0x%"PRIx8", csi=0x%"PRIx8", psdt=0x%"PRIx8""
+pci_nvme_ns_create(uint16_t nsid, uint64_t nsze, uint64_t ncap, uint8_t flbas) "nsid %"PRIu16", nsze=%"PRIu64", ncap=%"PRIu64", flbas=%"PRIu8""
 pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=0x%"PRIx8""
 pci_nvme_ns_attachment_attach(uint16_t cntlid, uint32_t nsid) "cntlid=0x%"PRIx16", nsid=0x%"PRIx32""
 pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 8027b7126b..9d2e121f1a 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -592,6 +592,7 @@ enum NvmeAdminCommands {
     NVME_ADM_CMD_SET_FEATURES   = 0x09,
     NVME_ADM_CMD_GET_FEATURES   = 0x0a,
     NVME_ADM_CMD_ASYNC_EV_REQ   = 0x0c,
+    NVME_ADM_CMD_NS_MGMT        = 0x0d,
     NVME_ADM_CMD_ACTIVATE_FW    = 0x10,
     NVME_ADM_CMD_DOWNLOAD_FW    = 0x11,
     NVME_ADM_CMD_NS_ATTACHMENT  = 0x15,
@@ -897,14 +898,18 @@ enum NvmeStatusCodes {
     NVME_FEAT_NOT_CHANGEABLE    = 0x010e,
     NVME_FEAT_NOT_NS_SPEC       = 0x010f,
     NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
+    NVME_NS_INSUFFICIENT_CAPAC  = 0x0115,
+    NVME_NS_IDNTIFIER_UNAVAIL   = 0x0116,
     NVME_NS_ALREADY_ATTACHED    = 0x0118,
     NVME_NS_PRIVATE             = 0x0119,
     NVME_NS_NOT_ATTACHED        = 0x011a,
+    NVME_THIN_PROVISION_NOTSPRD = 0x011b,
     NVME_NS_CTRL_LIST_INVALID   = 0x011c,
     NVME_INVALID_CTRL_ID        = 0x011f,
     NVME_INVALID_SEC_CTRL_STATE = 0x0120,
     NVME_INVALID_NUM_RESOURCES  = 0x0121,
     NVME_INVALID_RESOURCE_ID    = 0x0122,
+    NVME_NS_ATTACH_MGMT_NOTSPRD = 0x0129,
     NVME_CONFLICTING_ATTRS      = 0x0180,
     NVME_INVALID_PROT_INFO      = 0x0181,
     NVME_WRITE_TO_RO            = 0x0182,
@@ -1184,6 +1189,10 @@ enum NvmeIdCtrlCmic {
     NVME_CMIC_MULTI_CTRL    = 1 << 1,
 };
 
+enum NvmeNsManagementOperation {
+    NVME_NS_MANAGEMENT_CREATE = 0x0,
+};
+
 enum NvmeNsAttachmentOperation {
     NVME_NS_ATTACHMENT_ATTACH = 0x0,
     NVME_NS_ATTACHMENT_DETACH = 0x1,
@@ -1345,6 +1354,26 @@ typedef struct QEMU_PACKED NvmeIdNs {
     uint8_t     vs[3712];
 } NvmeIdNs;
 
+typedef struct QEMU_PACKED NvmeIdNsMgmt {
+    uint64_t    nsze;
+    uint64_t    ncap;
+    uint8_t     rsvd16[10];
+    uint8_t     flbas;
+    uint8_t     rsvd27[2];
+    uint8_t     dps;
+    uint8_t     nmic;
+    uint8_t     rsvd31[61];
+    uint32_t    anagrpid;
+    uint8_t     rsvd96[4];
+    uint16_t    nvmsetid;
+    uint16_t    endgid;
+    uint8_t     rsvd104[280];
+    uint64_t    lbstm;
+    uint8_t     rsvd392[120];
+    uint8_t     rsvd512[512];
+    uint8_t     vs[3072];
+} NvmeIdNsMgmt;
+
 #define NVME_ID_NS_NVM_ELBAF_PIF(elbaf) (((elbaf) >> 7) & 0x3)
 
 typedef struct QEMU_PACKED NvmeIdNsNvm {
@@ -1646,6 +1675,7 @@ static inline void _nvme_check_size(void)
     QEMU_BUILD_BUG_ON(sizeof(NvmeLBAF) != 4);
     QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096);
+    QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsMgmt) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZoned) != 4096);
     QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16);
diff --git a/include/hw/nvme/ctrl-cfg.h b/include/hw/nvme/ctrl-cfg.h
new file mode 100644
index 0000000000..1be44cb8df
--- /dev/null
+++ b/include/hw/nvme/ctrl-cfg.h
@@ -0,0 +1,24 @@
+/*
+ * QEMU NVM Express Virtual Dynamic Namespace Management
+ * Common configuration handling for qemu-img tool and and qemu-system-xx
+ *
+ *
+ * Copyright (c) 2022 Solidigm
+ *
+ * Authors:
+ *  Michael Kropaczek      <michael.kropaczek@solidigm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef CTRL_CFG_DEF
+#define NVME_STR_(s) #s
+#define NVME_STRINGIFY(s) NVME_STR_(s)
+#define NVME_MAX_NAMESPACES  256
+#define NVME_MAX_CONTROLLERS 256
+#else
+CTRL_CFG_DEF(int, "tnvmcap", int128_get64(tnvmcap128), tnvmcap64)
+CTRL_CFG_DEF(int, "unvmcap", int128_get64(unvmcap128), unvmcap64)
+#endif
diff --git a/include/hw/nvme/ns-cfg.h b/include/hw/nvme/ns-cfg.h
new file mode 100644
index 0000000000..5f27a6cece
--- /dev/null
+++ b/include/hw/nvme/ns-cfg.h
@@ -0,0 +1,28 @@
+/*
+ * QEMU NVM Express Virtual Dynamic Namespace Management
+ * Common configuration handling for qemu-img tool and qemu-system-xx
+ *
+ *
+ * Copyright (c) 2022 Solidigm
+ *
+ * Authors:
+ *  Michael Kropaczek      <michael.kropaczek@solidigm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#ifdef NS_CFG_DEF
+NS_CFG_DEF(int, "params.nsid", (int64_t)ns->params.nsid, nsid)
+NS_CFG_DEF(obj, "attached_ctrls", QOBJECT(ctrl_qlist), QOBJECT(ctrl_qlist))
+NS_CFG_DEF(int, "params.pi", (int64_t)ns->params.pi, 0)
+NS_CFG_DEF(int, "lbasz", (int64_t)ns->lbasz, 0)
+NS_CFG_DEF(int, "id_ns.nsze", le64_to_cpu(ns->id_ns.nsze), 0)
+NS_CFG_DEF(int, "id_ns.ncap", le64_to_cpu(ns->id_ns.ncap), 0)
+NS_CFG_DEF(int, "id_ns.nuse", le64_to_cpu(ns->id_ns.nuse), 0)
+NS_CFG_DEF(int, "id_ns.nsfeat", (int64_t)ns->id_ns.nsfeat, 0)
+NS_CFG_DEF(int, "id_ns.flbas", (int64_t)ns->id_ns.flbas, 0)
+NS_CFG_DEF(int, "id_ns.nmic", (int64_t)ns->id_ns.nmic, 0)
+NS_CFG_DEF(int, "ns_size", ns->size, 0)
+#endif
diff --git a/include/hw/nvme/nvme-cfg.h b/include/hw/nvme/nvme-cfg.h
new file mode 100644
index 0000000000..b3e61e7c21
--- /dev/null
+++ b/include/hw/nvme/nvme-cfg.h
@@ -0,0 +1,188 @@
+/*
+ * QEMU NVM Express Virtual Dynamic Namespace Management
+ * Common configuration handling for qemu-img tool and qemu-system-xx
+ *
+ *
+ * Copyright (c) 2022 Solidigm
+ *
+ * Authors:
+ *  Michael Kropaczek      <michael.kropaczek@solidigm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/nvme/ctrl-cfg.h"
+
+#define NS_CFG_MAXSIZE 1024
+#define NS_FILE_FMT "%s/nvme_%s_ns_%03d"
+#define NS_IMG_EXT ".img"
+#define NS_CFG_EXT ".cfg"
+
+#define NVME_FILE_FMT "%s/nvme_%s_ctrl"
+#define NVME_CFG_EXT ".cfg"
+
+#define NVME_CFG_MAXSIZE 512
+static inline int storage_path_check(char *ns_directory, char *serial, Error **errp)
+{
+    int ret = 0;
+
+    if (access(ns_directory, F_OK)) {
+        error_setg(errp,
+                         "Path '%s' to nvme controller's storage area with serial no: '%s' must exist",
+                          ns_directory, serial);
+        ret = -1;
+    }
+
+    return ret;
+}
+
+
+static inline char *c_create_cfg_name(char *ns_directory, char *serial, Error **errp)
+{
+    char *file_name = NULL;
+
+    if (!storage_path_check(ns_directory, serial, errp)) {
+        file_name = g_strdup_printf(NVME_FILE_FMT NVME_CFG_EXT,
+                                   ns_directory, serial);
+    }
+
+    return file_name;
+}
+
+static inline char *create_fmt_name(const char *fmt, char *ns_directory, char *serial, uint32_t nsid, Error **errp)
+{
+    char *file_name = NULL;
+
+    if (!storage_path_check(ns_directory, serial, errp)) {
+        file_name = g_strdup_printf(fmt, ns_directory, serial, nsid);
+    }
+
+    return file_name;
+}
+
+static inline char *create_cfg_name(char *ns_directory, char *serial, uint32_t nsid, Error **errp)
+{
+    return create_fmt_name(NS_FILE_FMT NS_CFG_EXT, ns_directory, serial, nsid, errp);
+}
+
+
+static inline char *create_image_name(char *ns_directory, char *serial, uint32_t nsid, Error **errp)
+{
+    return create_fmt_name(NS_FILE_FMT NS_IMG_EXT, ns_directory, serial, nsid, errp);
+}
+
+static inline int nsid_cfg_save(char *ns_directory, char *serial, QDict *ns_cfg, uint32_t nsid)
+{
+    GString *json = NULL;
+    char *filename = NULL;
+    FILE *fp;
+    int ret = 0;
+    Error *local_err = NULL;
+
+    json = qobject_to_json_pretty(QOBJECT(ns_cfg), false);
+
+    if (strlen(json->str) + 2 /* '\n'+'\0' */ > NS_CFG_MAXSIZE) {
+        error_setg(&local_err, "ns-cfg allowed max size %d exceeded", NS_CFG_MAXSIZE);
+        goto fail;
+    }
+
+    filename = create_cfg_name(ns_directory, serial, nsid, &local_err);
+    if (filename) {
+        fp = fopen(filename, "w");
+        if (fp == NULL) {
+            error_setg(&local_err, "open %s: %s", filename,
+                         strerror(errno));
+        } else {
+            chmod(filename, 0644);
+            if (!fprintf(fp, "%s\n", json->str)) {
+                error_setg(&local_err, "could not write ns-cfg %s: %s", filename,
+                             strerror(errno));
+            }
+            fclose(fp);
+        }
+    }
+
+fail:
+    if (local_err) {
+        error_report_err(local_err);
+        ret = -1;
+    }
+
+    g_string_free(json, true);
+    g_free(filename);
+    qobject_unref(ns_cfg);
+
+    return ret;
+}
+
+static inline int ns_cfg_default_save(char *ns_directory, char *serial, uint32_t nsid)
+{
+    QDict *ns_cfg = qdict_new();
+    QList *ctrl_qlist = qlist_new();
+
+#define NS_CFG_DEF(type, key, value, default) \
+    qdict_put_##type(ns_cfg, key, default);
+#include "hw/nvme/ns-cfg.h"
+#undef NS_CFG_DEF
+
+    return nsid_cfg_save(ns_directory, serial, ns_cfg, nsid);
+}
+
+static inline int c_cfg_save(char *ns_directory, char *serial, QDict *nvme_cfg)
+{
+    GString *json = NULL;
+    char *filename = NULL;
+    FILE *fp;
+    int ret = 0;
+    Error *local_err = NULL;
+
+    json = qobject_to_json_pretty(QOBJECT(nvme_cfg), false);
+
+    if (strlen(json->str) + 2 /* '\n'+'\0' */ > NVME_CFG_MAXSIZE) {
+        error_setg(&local_err, "ctrl-cfg allowed max size %d exceeded",
+                    NVME_CFG_MAXSIZE);
+        goto fail;
+    }
+
+    filename = c_create_cfg_name(ns_directory, serial, &local_err);
+    if (!local_err && filename) {
+        fp = fopen(filename, "w");
+        if (fp == NULL) {
+            error_setg(&local_err, "open %s: %s", filename,
+                         strerror(errno));
+        } else {
+            chmod(filename, 0644);
+            if (!fprintf(fp, "%s\n", json->str)) {
+                error_setg(&local_err, "could not write ctrl-cfg %s: %s",
+                            filename, strerror(errno));
+            }
+            fclose(fp);
+        }
+    }
+
+fail:
+    if (local_err) {
+        error_report_err(local_err);
+        ret = -1;
+    }
+
+    g_string_free(json, true);
+    g_free(filename);
+    qobject_unref(nvme_cfg);
+
+    return ret;
+}
+
+static inline int c_cfg_default_save(char *ns_directory, char *serial, uint64_t tnvmcap64, uint64_t unvmcap64)
+{
+    QDict *nvme_cfg = qdict_new();
+
+#define CTRL_CFG_DEF(type, key, value, default) \
+    qdict_put_##type(nvme_cfg, key, default);
+#include "hw/nvme/ctrl-cfg.h"
+#undef CTRL_CFG_DEF
+
+    return c_cfg_save(ns_directory, serial, nvme_cfg);
+}
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1b1dab5b17..9aacb88fc9 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -57,6 +57,12 @@ SRST
 .. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE [-F BACKING_FMT]] [-u] [-o OPTIONS] FILENAME [SIZE]
 ERST
 
+DEF("createns", nsimgs_create,
+    "createns -S nvme_ctrl_serial_number -C nvme_ctrl_total_capacity [-N <NsId_max>] pathname")
+SRST
+.. option:: createns -S SERIAL_NUMBER -C TOTAL_CAPACITY [-N NSID_MAX] PATHNAME
+ERST
+
 DEF("dd", img_dd,
     "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=block_size] [count=blocks] [skip=blocks] if=input of=output")
 SRST
diff --git a/qemu-img.c b/qemu-img.c
index 439d8de1e3..d54f86942f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -34,6 +34,7 @@
 #include "qapi/qobject-output-visitor.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
 #include "qemu/cutils.h"
 #include "qemu/config-file.h"
 #include "qemu/option.h"
@@ -49,10 +50,12 @@
 #include "block/block_int.h"
 #include "block/blockjob.h"
 #include "block/qapi.h"
+#include "block/qdict.h"
 #include "crypto/init.h"
 #include "trace/control.h"
 #include "qemu/throttle.h"
 #include "block/throttle-groups.h"
+#include "hw/nvme/nvme-cfg.h"
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_FULL_VERSION \
                           "\n" QEMU_COPYRIGHT "\n"
@@ -219,6 +222,14 @@ void help(void)
            "  '-F' second image format\n"
            "  '-s' run in Strict mode - fail on different image size or sector allocation\n"
            "\n"
+           "Parameters to createns subcommand:\n"
+           "  'pathname' points to the storage area for namespaces backend images and must exist,\n"
+           "    and must match the -device nvme 'auto-ns-path=...' of the qemu-system-xx command\n"
+           "  '-S' indicates NVMe serial number, must match the -device nvme 'serial=...' of the qemu-system-xx command\n"
+           "  '-C' indicates NVMe total capacity\n"
+           "  '-N' sets a limit on possible NVMe namespaces number associated with NVMe controller,\n"
+           "        the default maximal value is " NVME_STRINGIFY(NVME_MAX_NAMESPACES) " and cannot be exceeded\n"
+           "\n"
            "Parameters to dd subcommand:\n"
            "  'bs=BYTES' read and write up to BYTES bytes at a time "
            "(default: 512)\n"
@@ -603,6 +614,127 @@ fail:
     return 1;
 }
 
+static int nsimgs_create(int argc, char **argv)
+{
+    int c;
+    char *auto_ns_path = NULL;
+    char *serial = NULL;
+    char *nsidMax = NULL;
+    char *tnvmcap = NULL;
+    uint64_t tnvmcap64 = 0L;
+    unsigned int nsidMaxi = NVME_MAX_NAMESPACES;
+    char *filename = NULL;
+    uint32_t i;
+    Error *local_err = NULL;
+
+    for(;;) {
+        static const struct option long_options[] = {
+            {"help", no_argument, 0, 'h'},
+            {"serial", required_argument, 0, 'S'},
+            {"tnvmcap", required_argument, 0, 'C'},
+            {"nsidmax", required_argument, 0, 'N'},
+            {0, 0, 0, 0}
+        };
+        c = getopt_long(argc, argv, "S:C:N:",
+                        long_options, NULL);
+        if (c == -1) {
+            break;
+        }
+        switch(c) {
+        case ':':
+            missing_argument(argv[optind - 1]);
+            break;
+        case '?':
+            unrecognized_option(argv[optind - 1]);
+            break;
+        case 'h':
+            help();
+            break;
+        case 'S':
+            serial = optarg;
+            break;
+        case 'N':
+            nsidMax = optarg;
+            break;
+        case 'C':
+            tnvmcap = optarg;
+            break;
+        }
+    }
+
+    if (optind >= argc) {
+        error_exit("Expecting path name");
+    }
+
+    if (!serial || !tnvmcap) {
+        error_exit("Both -S and -C must be specified");
+    }
+
+    tnvmcap64 = cvtnum_full("tnvmcap", tnvmcap, 0, INT64_MAX);
+
+    if (nsidMax && (qemu_strtoui(nsidMax, NULL, 0, &nsidMaxi) < 0 ||
+                                 nsidMaxi > NVME_MAX_NAMESPACES)) {
+        error_exit("-N 'NsIdMax' must be numeric and cannot exceed %d",
+                   NVME_MAX_NAMESPACES);
+    }
+
+    auto_ns_path = (optind < argc) ? argv[optind] : NULL;
+
+    /* create backend images and config flles for namespaces */
+    for (i = 1; !local_err && i <= NVME_MAX_NAMESPACES; i++) {
+        filename = create_image_name(auto_ns_path, serial, i, &local_err);
+        if (local_err) {
+            break;
+        }
+
+        /* calling bdrv_img_create() in both cases if i <= nsidMaxi and othewise,
+         * it checks shared resize permission, if likely locked by qemu-system-xx
+         * it will abort */
+        bdrv_img_create(filename, "raw", NULL, NULL, NULL,
+                                    0, BDRV_O_RDWR | BDRV_O_RESIZE,
+                                    true, &local_err);
+        if (local_err) {
+            break;
+        }
+
+        if (i <= nsidMaxi) { /* backend image file was created */
+            if (ns_cfg_default_save(auto_ns_path, serial, i)) { /* create
+                                                                 * namespace
+                                                                 * config file */
+                break;
+            }
+        } else if (filename && !access(filename, F_OK)) { /* reducing the number of files
+                                                           * if i > nsidMaxi */
+            unlink(filename);
+            g_free(filename);
+            filename = create_cfg_name(auto_ns_path, serial, i, &local_err);
+            if (local_err) {
+                break;
+            }
+            unlink(filename);
+        }
+        g_free(filename);
+        filename = NULL;
+    }
+
+    if (local_err && filename) {
+        error_reportf_err(local_err, "Could not create ns-image [%s] ",
+                          filename);
+        g_free(filename);
+        return 1;
+    } else if (c_cfg_default_save(auto_ns_path, serial,
+                                  tnvmcap64, tnvmcap64)) { /* create controller
+                                                            * config file */
+        error_reportf_err(local_err, "Could not create nvme-cfg ");
+        return 1;
+    } else if (local_err) {
+            error_report_err(local_err);
+        return 1;
+    }
+
+    return 0;
+}
+
 static void dump_json_image_check(ImageCheck *check, bool quiet)
 {
     GString *str;
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/2] hw/nvme: Support for Namespaces Management from guest OS - delete-ns
  2022-12-28 19:41 [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS Jonathan Derrick
  2022-12-28 19:41 ` [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns Jonathan Derrick
@ 2022-12-28 19:41 ` Jonathan Derrick
  2023-01-09 13:29 ` [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS Kevin Wolf
  2 siblings, 0 replies; 6+ messages in thread
From: Jonathan Derrick @ 2022-12-28 19:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Kropaczek, qemu-block, Keith Busch, Klaus Jensen,
	Kevin Wolf, Hanna Reitz

From: Michael Kropaczek <michael.kropaczek@solidigm.com>

Added support for NVMEe NameSpaces Mangement allowing the guest OS to
delete namespaces by issuing nvme delete-ns command.
It is an extension to currently implemented Qemu nvme virtual device.
Virtual devices representing namespaces will be created and/or deleted
during Qemu's running session, at anytime.

Signed-off-by: Michael Kropaczek <michael.kropaczek@solidigm.com>
---
 docs/system/devices/nvme.rst |  9 ++--
 hw/nvme/ctrl.c               | 81 +++++++++++++++++++++++++++++++++---
 hw/nvme/ns-backend.c         |  5 +++
 hw/nvme/ns.c                 | 72 ++++++++++++++++++++++++++++++++
 hw/nvme/nvme.h               |  3 +-
 hw/nvme/trace-events         |  1 +
 include/block/nvme.h         |  1 +
 7 files changed, 161 insertions(+), 11 deletions(-)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index 6b3bee5e5d..f19072f1bc 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -103,12 +103,12 @@ Parameters:
 
 ``auto-ns-path=<path to nvme storage location>``
   If specified indicates a support for dynamic management of nvme namespaces
-  by means of nvme create-ns command. This path points
+  by means of nvme create-ns and nvme delete-ns commands. This path points
   to the storage area for backend images must exist. Additionally it requires
   that parameter `ns-subsys` must be specified whereas parameter `drive`
   must not. The legacy namespace backend is disabled, instead, a pair of
   files 'nvme_<ctrl SN>_ns_<NS-ID>.cfg' and 'nvme_<ctrl SN>_ns_<NS-ID>.img'
-  will refer to respective namespaces. The create-ns, attach-ns
+  will refer to respective namespaces. The create-ns, delete-ns, attach-ns
   and detach-ns commands, issued at the guest side, will make changes to
   those files accordingly.
   For each namespace exists an image file in raw format and a config file
@@ -140,8 +140,9 @@ Please note that ``nvme-ns`` device is not required to support of dynamic
 namespaces management feature. It is not prohibited to assign a such device to
 ``nvme`` device specified to support dynamic namespace management if one has
 an use case to do so, however, it will only coexist and be out of the scope of
-Namespaces Management. NsIds will be consistently managed, creation (create-ns)
-of a namespace will not allocate the NsId already being taken. If ``nvme-ns``
+Namespaces Management. Deletion (delete-ns) will render an error for this
+namespace. NsIds will be consistently managed, creation (create-ns) of
+a namespace will not allocate the NsId already being taken. If ``nvme-ns``
 device conflicts with previously created one by create-ns (the same NsId),
 it will break QEMU's start up.
 More than one of NVMe controllers associated with NVMe subsystem are supported.
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 6155c2013e..6b93911f01 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -144,12 +144,12 @@
  *
  * - `auto-ns-path`
  *   If specified indicates a support for dynamic management of nvme namespaces
- *   by means of nvme create-ns command. This path pointing
+ *   by means of nvme create-ns and nvme delete-ns commands. This path pointing
  *   to a storage area for backend images must exist. Additionally it requires
  *   that parameter `ns-subsys` must be specified whereas parameter `drive`
  *   must not. The legacy namespace backend is disabled, instead, a pair of
  *   files 'nvme_<ctrl SN>_ns_<NS-ID>.cfg' and 'nvme_<ctrl SN>_ns_<NS-ID>.img'
- *   will refer to respective namespaces. The create-ns, attach-ns
+ *   will refer to respective namespaces. The create-ns, delete-ns, attach-ns
  *   and detach-ns commands, issued at the guest side, will make changes to
  *   those files accordingly.
  *   For each namespace exists an image file in raw format and a config file
@@ -5660,6 +5660,23 @@ static NvmeNamespace *nvme_ns_mgmt_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMg
     return ns;
 }
 
+static void nvme_ns_mgmt_delete(NvmeCtrl *n, uint32_t nsid, Error **errp)
+{
+    Error *local_err = NULL;
+
+    if (!n->params.ns_directory) {
+        error_setg(&local_err, "delete-ns not supported if 'auto-ns-path' is not specified");
+    } else if (n->namespace.blkconf.blk) {
+        error_setg(&local_err, "delete-ns not supported if 'drive' is specified");
+    } else {
+        nvme_ns_delete(n, nsid, &local_err);
+    }
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+}
+
 static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req)
 {
     NvmeCtrl *n_p = NULL;     /* primary controller */
@@ -5673,6 +5690,9 @@ static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req)
     uint8_t sel = dw10 & 0xf;
     uint8_t csi = (dw11 >> 24) & 0xf;
     uint16_t i;
+    uint16_t first = nsid;
+    uint16_t last = nsid;;
+    uint64_t image_size;
     uint16_t ret;
     Error *local_err = NULL;
 
@@ -5739,16 +5759,15 @@ static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req)
                 return NVME_INVALID_FIELD | NVME_DNR;
             }
 
+            /* ns->size is the real image size after creation */
             if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) {
-                /* place for delete-ns */
-                error_setg(&local_err, "Insufficient capacity, an orphaned ns[%"PRIu32"] will be left behind", nsid);
+                nvme_ns_mgmt_delete(n, nsid, NULL);
                 return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR;
             }
             (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC);
             if (nvme_cfg_save(n)) {
                 (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC);
-                /* place for delete-ns */
-                error_setg(&local_err, "Cannot save conf file, an orphaned ns[%"PRIu32"] will be left behind", nsid);
+                nvme_ns_mgmt_delete(n, nsid, NULL);
                 return NVME_INVALID_FIELD | NVME_DNR;
             }
             req->cqe.result = cpu_to_le32(nsid);
@@ -5759,6 +5778,56 @@ static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req)
             return NVME_INVALID_FIELD | NVME_DNR;
 	    }
         break;
+    case NVME_NS_MANAGEMENT_DELETE:
+        switch (csi) {
+        case NVME_CSI_NVM:
+            if (!nsid) {
+                return NVME_INVALID_FIELD | NVME_DNR;
+            }
+
+            if (nsid == NVME_NSID_BROADCAST) {
+                first = 1;
+                last = NVME_MAX_NAMESPACES;
+            }
+
+            for (i = first; i <= last; i++) {
+                ns = nvme_subsys_ns(n->subsys, (uint32_t)i);
+                if (n->params.ns_directory && ns && ns_auto_check(n, ns, (uint32_t)i)) {
+                    error_setg(&local_err, "ns[%"PRIu32"] cannot be deleted, configured via '-device nvme-ns...'", i);
+                    error_report_err(local_err);
+                    if (first != last) {
+                        local_err = NULL;       /* we are skipping */
+                    }
+                } else if (ns) {
+                    image_size = ns->size;
+                    nvme_ns_mgmt_delete(n, (uint16_t)i, &local_err);
+                    if (local_err) {
+                        goto fail;
+                    }
+                    (void)nvme_cfg_update(n, image_size, NVME_NS_DEALLOC);
+                    if (nvme_cfg_save(n)) {
+                        error_setg(&local_err, "Could not save nvme-cfg");
+                        goto fail;
+                    }
+                } else if (first == last) {
+                    return NVME_INVALID_FIELD | NVME_DNR;
+                }
+            }
+
+fail:
+            if (local_err) {
+                error_report_err(local_err);
+                return NVME_INVALID_FIELD | NVME_DNR;
+            }
+
+            nvme_update_dmrsl(n);
+            break;
+        case NVME_CSI_ZONED:
+            /* fall through for now */
+        default:
+            return NVME_INVALID_FIELD | NVME_DNR;
+	    }
+        break;
     default:
         return NVME_INVALID_FIELD | NVME_DNR;
     }
diff --git a/hw/nvme/ns-backend.c b/hw/nvme/ns-backend.c
index 06de8f262c..57d0b695fa 100644
--- a/hw/nvme/ns-backend.c
+++ b/hw/nvme/ns-backend.c
@@ -71,6 +71,11 @@ void ns_blockdev_activate(BlockBackend *blk,  uint64_t image_size, Error **errp)
                        errp);
 }
 
+void ns_blockdev_deactivate(BlockBackend *blk, Error **errp)
+{
+    ns_blockdev_activate(blk, 0, errp);
+}
+
 int ns_storage_path_check(NvmeCtrl *n, Error **errp)
 {
     return storage_path_check(n->params.ns_directory,  n->params.serial, errp);
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 6e518f97be..e1dfd83054 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -608,6 +608,78 @@ fail:
     return NULL;
 }
 
+static void nvme_ns_unrealize(DeviceState *dev);
+
+void nvme_ns_delete(NvmeCtrl *n, uint32_t nsid, Error **errp)
+{
+    NvmeNamespace *ns = NULL;
+    NvmeSubsystem *subsys = n->subsys;
+    int i;
+    int ret = 0;
+    Error *local_err = NULL;
+
+    trace_pci_nvme_ns_delete(nsid);
+
+    if (n->cntlid > 0 && !n->subsys) {
+        error_setg(&local_err, "Secondary controller without subsystem ");
+        return;
+    }
+
+    if (subsys) {
+        ns = nvme_subsys_ns(subsys, (uint32_t)nsid);
+        if (ns) {
+            if (ns->params.shared) {
+                for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+                    NvmeCtrl *ctrl = subsys->ctrls[i];
+
+                    if (ctrl && ctrl->namespaces[nsid]) {
+                        ctrl->namespaces[nsid] = NULL;
+                        ns->attached--;
+                    }
+                }
+            }
+            subsys->namespaces[nsid] = NULL;
+        }
+    }
+
+    if (!ns) {
+        ns = nvme_ns(n, (uint32_t)nsid);
+        if (ns) {
+            n->namespaces[nsid] = NULL;
+            ns->attached--;
+        } else {
+            error_setg(errp, "Namespace %d does not exist", nsid);
+            return;
+        }
+    }
+
+    if (ns->attached > 0) {
+        error_setg(errp, "Could not detach all ns references for ns[%d], still %d left", nsid, ns->attached);
+        return;
+    }
+
+    /* here is actual deletion */
+    nvme_ns_unrealize(&ns->parent_obj);
+    qdev_unrealize(&ns->parent_obj);
+    ns_blockdev_deactivate(ns->blkconf.blk, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ns_cfg_clear(ns);
+    ret = ns_cfg_save(n, ns, nsid);
+    if (ret == -1) {
+        error_setg(errp, "Unable to save ns-cnf");
+        return;
+    } else if (ret == 1) {  /* should not occur here, check and error message prior to call to nvme_ns_delete() */
+        return;
+    }
+
+    /* disassociating refernces to the back-end and keeping it as preloaded */
+    ns->blkconf.blk = NULL;
+}
+
 int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 {
     if (nvme_ns_check_constraints(ns, errp)) {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 15b790cbdd..2c3ed4a57e 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -280,6 +280,7 @@ void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
 void nvme_validate_flbas(uint8_t flbas,  Error **errp);
 NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *id_ns, Error **errp);
+void nvme_ns_delete(NvmeCtrl *n, uint32_t nsid, Error **errp);
 
 typedef struct NvmeAsyncEvent {
     QTAILQ_ENTRY(NvmeAsyncEvent) entry;
@@ -513,7 +514,6 @@ typedef struct NvmeCtrl {
         uint16_t    vqrfap;
         uint16_t    virfap;
     } next_pri_ctrl_cap;    /* These override pri_ctrl_cap after reset */
-    uint16_t nsidMax;
 } NvmeCtrl;
 
 typedef enum NvmeResetType {
@@ -582,6 +582,7 @@ static inline NvmeSecCtrlEntry *nvme_sctrl_for_cntlid(NvmeCtrl *n,
 
 BlockBackend *ns_blockdev_init(const char *file, Error **errp);
 void ns_blockdev_activate(BlockBackend *blk,  uint64_t image_size, Error **errp);
+void ns_blockdev_deactivate(BlockBackend *blk, Error **errp);
 int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp);
 void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns);
 uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, uint32_t len,
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index 28b025ac42..0dd0c23208 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -79,6 +79,7 @@ pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"PRIx8"
 pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
 pci_nvme_ns_mgmt(uint16_t cid, uint32_t nsid, uint8_t sel, uint8_t csi, uint8_t psdt) "cid %"PRIu16", nsid=%"PRIu32", sel=0x%"PRIx8", csi=0x%"PRIx8", psdt=0x%"PRIx8""
 pci_nvme_ns_create(uint16_t nsid, uint64_t nsze, uint64_t ncap, uint8_t flbas) "nsid %"PRIu16", nsze=%"PRIu64", ncap=%"PRIu64", flbas=%"PRIu8""
+pci_nvme_ns_delete(uint16_t nsid) "nsid %"PRIu16""
 pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=0x%"PRIx8""
 pci_nvme_ns_attachment_attach(uint16_t cntlid, uint32_t nsid) "cntlid=0x%"PRIx16", nsid=0x%"PRIx32""
 pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 9d2e121f1a..0fe7fe9bb1 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1191,6 +1191,7 @@ enum NvmeIdCtrlCmic {
 
 enum NvmeNsManagementOperation {
     NVME_NS_MANAGEMENT_CREATE = 0x0,
+    NVME_NS_MANAGEMENT_DELETE = 0x1,
 };
 
 enum NvmeNsAttachmentOperation {
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns
  2022-12-28 19:41 ` [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns Jonathan Derrick
@ 2022-12-28 20:10   ` Keith Busch
  2022-12-29  0:08     ` Michael Kropaczek (CW)
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2022-12-28 20:10 UTC (permalink / raw)
  To: Jonathan Derrick
  Cc: qemu-devel, Michael Kropaczek, qemu-block, Klaus Jensen,
	Kevin Wolf, Hanna Reitz

On Wed, Dec 28, 2022 at 01:41:40PM -0600, Jonathan Derrick wrote:
> +static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req)
> +{
> +    NvmeCtrl *n_p = NULL;     /* primary controller */
> +    NvmeIdCtrl *id = &n->id_ctrl;
> +    NvmeNamespace *ns;
> +    NvmeIdNsMgmt id_ns = {};
> +    uint8_t flags = req->cmd.flags;
> +    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
> +    uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
> +    uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
> +    uint8_t sel = dw10 & 0xf;
> +    uint8_t csi = (dw11 >> 24) & 0xf;
> +    uint16_t i;
> +    uint16_t ret;
> +    Error *local_err = NULL;
> +
> +    trace_pci_nvme_ns_mgmt(nvme_cid(req), nsid, sel, csi, NVME_CMD_FLAGS_PSDT(flags));
> +
> +    if (!(le16_to_cpu(id->oacs) & NVME_OACS_NS_MGMT)) {
> +        return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR;
> +    }
> +
> +    if (n->cntlid && !n->subsys) {
> +        error_setg(&local_err, "Secondary controller without subsystem");
> +        return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR;

Leaking local_err. Any time you call error_setg(), the error needs to be
reported or freed at some point.

> +    }
> +
> +    n_p = n->subsys->ctrls[0];
> +
> +    switch (sel) {
> +    case NVME_NS_MANAGEMENT_CREATE:
> +        switch (csi) {
> +        case NVME_CSI_NVM:

The following case is sufficiently large enough that the implementation
should be its own function.

> +            if (nsid) {
> +                return NVME_INVALID_FIELD | NVME_DNR;
> +            }
> +
> +            ret = nvme_h2c(n, (uint8_t *)&id_ns, sizeof(id_ns), req);
> +            if (ret) {
> +                return ret;
> +            }
> +
> +            uint64_t nsze = le64_to_cpu(id_ns.nsze);
> +            uint64_t ncap = le64_to_cpu(id_ns.ncap);

Please don't mix declarations with code; declare these local variables
at the top of the scope.

> +
> +            if (ncap > nsze) {
> +                return NVME_INVALID_FIELD | NVME_DNR;
> +            } else if (ncap != nsze) {
> +                return NVME_THIN_PROVISION_NOTSPRD | NVME_DNR;
> +            }
> +
> +            nvme_validate_flbas(id_ns.flbas, &local_err);
> +            if (local_err) {
> +                error_report_err(local_err);
> +                return NVME_INVALID_FORMAT | NVME_DNR;
> +            }
> +
> +            for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> +                if (nvme_ns(n_p, (uint32_t)i) || nvme_subsys_ns(n_p->subsys, (uint32_t)i)) {
> +                    continue;
> +                }
> +                break;
> +            }
> +
> +
> +            if (i >  le32_to_cpu(n_p->id_ctrl.nn) || i >  NVME_MAX_NAMESPACES) {
> +               return NVME_NS_IDNTIFIER_UNAVAIL | NVME_DNR;
> +            }
> +            nsid = i;
> +
> +            /* create ns here */
> +            ns = nvme_ns_mgmt_create(n, nsid, &id_ns, &local_err);
> +            if (!ns || local_err) {
> +                if (local_err) {
> +                    error_report_err(local_err);
> +                }
> +                return NVME_INVALID_FIELD | NVME_DNR;
> +            }
> +
> +            if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) {
> +                /* place for delete-ns */
> +                error_setg(&local_err, "Insufficient capacity, an orphaned ns[%"PRIu32"] will be left behind", nsid);
> +                return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR;

Leaked local_err.

> +            }
> +            (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC);
> +            if (nvme_cfg_save(n)) {
> +                (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC);
> +                /* place for delete-ns */
> +                error_setg(&local_err, "Cannot save conf file, an orphaned ns[%"PRIu32"] will be left behind", nsid);
> +                return NVME_INVALID_FIELD | NVME_DNR;

Another leaked local_err.

> +            }
> +            req->cqe.result = cpu_to_le32(nsid);
> +            break;
> +        case NVME_CSI_ZONED:
> +            /* fall through for now */
> +        default:
> +            return NVME_INVALID_FIELD | NVME_DNR;
> +	    }
> +        break;
> +    default:
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    return NVME_SUCCESS;
> +}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns
  2022-12-28 20:10   ` Keith Busch
@ 2022-12-29  0:08     ` Michael Kropaczek (CW)
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Kropaczek (CW) @ 2022-12-29  0:08 UTC (permalink / raw)
  To: Keith Busch, Jonathan Derrick
  Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, Klaus Jensen,
	Kevin Wolf, Hanna Reitz

Hi Keith,

Thank you for the review and I agree with you, the issues will be addressed.

Regards,
  Michael

-----Original Message-----
From: Keith Busch <kbusch@kernel.org> 
Sent: Wednesday, December 28, 2022 12:10 PM
To: Jonathan Derrick <jonathan.derrick@linux.dev>
Cc: qemu-devel@nongnu.org; Michael Kropaczek (CW) <Michael.Kropaczek@solidigm.com>; qemu-block@nongnu.org; Klaus Jensen <its@irrelevant.dk>; Kevin Wolf <kwolf@redhat.com>; Hanna Reitz <hreitz@redhat.com>
Subject: Re: [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns

[You don't often get email from kbusch@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Caution: External Email


On Wed, Dec 28, 2022 at 01:41:40PM -0600, Jonathan Derrick wrote:
> +static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req) {
> +    NvmeCtrl *n_p = NULL;     /* primary controller */
> +    NvmeIdCtrl *id = &n->id_ctrl;
> +    NvmeNamespace *ns;
> +    NvmeIdNsMgmt id_ns = {};
> +    uint8_t flags = req->cmd.flags;
> +    uint32_t nsid = le32_to_cpu(req->cmd.nsid);
> +    uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
> +    uint32_t dw11 = le32_to_cpu(req->cmd.cdw11);
> +    uint8_t sel = dw10 & 0xf;
> +    uint8_t csi = (dw11 >> 24) & 0xf;
> +    uint16_t i;
> +    uint16_t ret;
> +    Error *local_err = NULL;
> +
> +    trace_pci_nvme_ns_mgmt(nvme_cid(req), nsid, sel, csi, 
> + NVME_CMD_FLAGS_PSDT(flags));
> +
> +    if (!(le16_to_cpu(id->oacs) & NVME_OACS_NS_MGMT)) {
> +        return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR;
> +    }
> +
> +    if (n->cntlid && !n->subsys) {
> +        error_setg(&local_err, "Secondary controller without subsystem");
> +        return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR;

Leaking local_err. Any time you call error_setg(), the error needs to be reported or freed at some point.

> +    }
> +
> +    n_p = n->subsys->ctrls[0];
> +
> +    switch (sel) {
> +    case NVME_NS_MANAGEMENT_CREATE:
> +        switch (csi) {
> +        case NVME_CSI_NVM:

The following case is sufficiently large enough that the implementation should be its own function.

> +            if (nsid) {
> +                return NVME_INVALID_FIELD | NVME_DNR;
> +            }
> +
> +            ret = nvme_h2c(n, (uint8_t *)&id_ns, sizeof(id_ns), req);
> +            if (ret) {
> +                return ret;
> +            }
> +
> +            uint64_t nsze = le64_to_cpu(id_ns.nsze);
> +            uint64_t ncap = le64_to_cpu(id_ns.ncap);

Please don't mix declarations with code; declare these local variables at the top of the scope.

> +
> +            if (ncap > nsze) {
> +                return NVME_INVALID_FIELD | NVME_DNR;
> +            } else if (ncap != nsze) {
> +                return NVME_THIN_PROVISION_NOTSPRD | NVME_DNR;
> +            }
> +
> +            nvme_validate_flbas(id_ns.flbas, &local_err);
> +            if (local_err) {
> +                error_report_err(local_err);
> +                return NVME_INVALID_FORMAT | NVME_DNR;
> +            }
> +
> +            for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> +                if (nvme_ns(n_p, (uint32_t)i) || nvme_subsys_ns(n_p->subsys, (uint32_t)i)) {
> +                    continue;
> +                }
> +                break;
> +            }
> +
> +
> +            if (i >  le32_to_cpu(n_p->id_ctrl.nn) || i >  NVME_MAX_NAMESPACES) {
> +               return NVME_NS_IDNTIFIER_UNAVAIL | NVME_DNR;
> +            }
> +            nsid = i;
> +
> +            /* create ns here */
> +            ns = nvme_ns_mgmt_create(n, nsid, &id_ns, &local_err);
> +            if (!ns || local_err) {
> +                if (local_err) {
> +                    error_report_err(local_err);
> +                }
> +                return NVME_INVALID_FIELD | NVME_DNR;
> +            }
> +
> +            if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) {
> +                /* place for delete-ns */
> +                error_setg(&local_err, "Insufficient capacity, an orphaned ns[%"PRIu32"] will be left behind", nsid);
> +                return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR;

Leaked local_err.

> +            }
> +            (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC);
> +            if (nvme_cfg_save(n)) {
> +                (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC);
> +                /* place for delete-ns */
> +                error_setg(&local_err, "Cannot save conf file, an orphaned ns[%"PRIu32"] will be left behind", nsid);
> +                return NVME_INVALID_FIELD | NVME_DNR;

Another leaked local_err.

> +            }
> +            req->cqe.result = cpu_to_le32(nsid);
> +            break;
> +        case NVME_CSI_ZONED:
> +            /* fall through for now */
> +        default:
> +            return NVME_INVALID_FIELD | NVME_DNR;
> +         }
> +        break;
> +    default:
> +        return NVME_INVALID_FIELD | NVME_DNR;
> +    }
> +
> +    return NVME_SUCCESS;
> +}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS
  2022-12-28 19:41 [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS Jonathan Derrick
  2022-12-28 19:41 ` [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns Jonathan Derrick
  2022-12-28 19:41 ` [PATCH v4 2/2] hw/nvme: Support for Namespaces Management from guest OS - delete-ns Jonathan Derrick
@ 2023-01-09 13:29 ` Kevin Wolf
  2 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2023-01-09 13:29 UTC (permalink / raw)
  To: Jonathan Derrick
  Cc: qemu-devel, Michael Kropaczek, qemu-block, Keith Busch,
	Klaus Jensen, Hanna Reitz, pkrempa, armbru

Am 28.12.2022 um 20:41 hat Jonathan Derrick geschrieben:
> Here is the approach:
> The nvme device will get new parameter:
>  - auto-ns-path, which specifies the path to the storage area where back-end
>    image and necessary config files located stored.
> 
> The virtual devices representing namespaces will be created dynamically during
> the Qemu running session following issuance of nvme create-ns and delete-ns
> commands from the guest OS. QOM classes and instances will be created utilizing
> existing configuration scheme used during Qemu's start-up. Back-end image files
> will be neither created nor deleted during Qemu's startup or running session.
> Instead a set of back-end image files and relevant config will be created by
> qemu-img tool with createns sub-command prior to Qemu's session.
> Required parameters are: -S serial number which must match serial parameter of
> qemu-system-xx -device nvme command line specification, -C total capacity, and
> optional -N that will set a maximal limit on number of allowed
> namespaces (default 256) which will be followed by path name pointing to
> storage location corresponding to auto-ns-path of qemu-system-xx -device nvme
> parameter.
> 
> Those created back-end image files will be pre-loaded during Qemu's start-up
> and then during running Qemu's session will be associated or disassociated with
> QOM namespaces virtual instances, as a result of issuing nvme create-ns or
> delete-ns commands. The associated back-end image file for relevant namespace
> will be re-sized as follows: delete-ns command will truncate image file to the
> size of 0, whereas create-ns command will re-size the image file to the size
> provided by nvme create-ns command parameters. Truncating/re-sizing is a result
> of blk_truncate() API which utilizes co-routines and should not block Qemu main
> thread while scheduling AIO operations. It is assured that all settings will
> retain over Qemu's start-ups and shutdowns. The implementation makes it
> possible to combine the existing "Additional Namespace" implementation with the
> new "Managed Namespaces". Those will coexist with obvious restrictions, like
> both will share the same NsIds space, "static" namespaces cannot be deleted or
> if its NsId specified at Qemu's command line will conflicts with previously
> created one by nvme create-ns (and retained), this will lead to an abort of
> Qemu at its start up.

This looks like a valid approach for a proof of concept, but from a
backend perspective, I'm concerned that this approach might be too
limiting and we won't have a good path forward.

For example, how can we integrate this with snapshots? You expect a
specific filename for the image, but taking an external snapshot means
creating an overlay image with a different name.

How do we migrate storage like this? If the management tool (probably
libvirt) knows about all the namespace images and the config file (!),
it can possibly migrate them individually, but note that while a mirror
job is active, images can't be resized any more.

What if we don't want to use a directory on the local filesystem to
store the images, but some network protocol?

It seems to me that we should define proper block layer APIs for
handling namespaces, and then we can have your implementation as one
possible image driver that supports these APIs, for which we can accept
these limitations for now. At least this would already avoid having
backend logic in the device implementation, and allow us to replace it
with something better later without having to change the design of the
device emulation code.

Eventually, I think, if we want to have dynamic namespaces properly
supported, they need to be a feature on the image format level, so that
you could keep all namespaces in a single qcow2 file.

Kevin



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-09 13:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-28 19:41 [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS Jonathan Derrick
2022-12-28 19:41 ` [PATCH v4 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns Jonathan Derrick
2022-12-28 20:10   ` Keith Busch
2022-12-29  0:08     ` Michael Kropaczek (CW)
2022-12-28 19:41 ` [PATCH v4 2/2] hw/nvme: Support for Namespaces Management from guest OS - delete-ns Jonathan Derrick
2023-01-09 13:29 ` [PATCH v4 0/2] hw/nvme: Support for Namespaces Management from guest OS Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.