qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] hw/nvme: add atomic write support
@ 2024-09-20  0:07 Alan Adamson
  2024-09-20  0:07 ` [PATCH v2 1/1] " Alan Adamson
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Adamson @ 2024-09-20  0:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: alan.adamson, kbusch, its, qemu-block

Changelog:

v2:	- Include changes suggested by Klaus
	- Check for READ/WRITE commmands when walking SQs.
	- Updated the "cover-letter" below with new fio example.

=====================================================================================

Since there is work in the Linux NVMe Driver community to add Atomic Write
support, it would be desirable to be able to test it with qemu nvme emulation.
 
This patch will focus on supporting NVMe controller atomic write parameters (AWUN and
AWUPF) but can be extended to support Namespace parameters (NAWUN and NAWUPF)
and Boundaries (NABSN, NABO, and NABSPF).
 
Atomic Write Parameters for NVMe QEMU
-------------------------------------
New NVMe QEMU Parameters (See NVMe Specification for details):
        atomic.dn (default off) - Set the value of Disable Normal.
        atomic.awun=UINT16 (default: 0)
        atomic.awupf=UINT16 (default: 0)
 
qemu command line example:
        qemu-system-x86_64 -cpu host --enable-kvm -smp cpus=4 -no-reboot -m 8192M -drive file=./disk.img,if=ide \
        -boot c -device e1000,netdev=net0,mac=DE:CC:CC:EF:99:88 -netdev tap,id=net0 \
	-device nvme,id=nvme-ctrl-0,serial=nvme-1,atomic.dn=off,atomic.awun=15,atomic.awupf=7 \
        -drive file=./nvme.img,if=none,id=nvm-1 -device nvme-ns,drive=nvm-1,bus=nvme-ctrl-0 nvme-ns,drive=nvm-1,bus=nvme-ctrl-0
 
Making Writes Atomic:
---------------------
Currently, as the nvme emulator walks through the Submission Queue (SQ)
(nvme_process_sq()), it takes each request (read/write/etc) off the SQ and starts its
execution and then continues on with the next SQ entry until all entries are started. It
is likely, multiple requests (from multiple SQs) will be executing in parallel and acting
on a common LBA range.  This prevents writes from completing atomically. When a write
completes atomically, either all or none of the LBAs will be committed to media.  This
means writes to a common LBA range can not be done in parallel if writes are going to
be atomic. The nvme emulator does not currently guarantee this and LBAs
from multiple requests may get committed.  The fio test shown below, comfirms this.
 
Prior to taking a command off of a SQ, a check needs to be done to determine if it
conflicts atomically with a currently executing command.
 
bool nvme_atomic_write_check() - Checks a NVMe command to determine if it can be started,
or if it conflicts atomically with a currently executing command.
 
Returns:   NVME_ATOMIC_NO_START - The command atomically conflicts with a currently
           executing command and can not be started.
 
           NVME_ATOMIC_START_ATOMIC  - The command is an atomic write, does not
           conflict atomically with a currently executing command, and can be started.
 
           NVME_ATOMIC_START_NONATOMIC - The command is not an atomic write, but it
           can be started.

If a command is blocked from being started, nvme_process_sq() needs to be rescheduled.
 
Implementation:
---------------
Each SQ maintains a list of executing requests (sq->out_req_list). When a command is
taken off the SQ to start executing it, it is placed on out_req_list and removed when
the command completes and placed on the Completion Queue (CQ). When nvme_process_sq()
is executing and looking to take a command off the SQ, nvme_atomic_write_check() is
called to determine if it is atomically safe to start executing the command. If it is
safe, nvme_atomic_write_check() will return NVME_ATOMIC_START_ATOMIC or
NVME_ATOMIC_START_NONATOMIC. nvme_process_sq() then pulls the command off the SQ,
places an associated request onto out_req_list. If it is not atomically safe,
(nvme_atomic_write_check() returns NVME_ATOMIC_NO_START). The command remains on the SQ,
and processing of that SQ stops and nvme_process_sq() will be rescheduled.
When nvme_atomic_write_check() is called, the out_req_list for each SQ is walked and the
LBA range of the command to be started is compared with each executing request.

What is the Maximum Atomic Write Size?
--------------------------------------
By default the qemu parameter atomic.awun specifices that maximum atomic write size which
will be used by maximum atomic Write size. If Disable Normal is set to true with qemu
parameter atomic.dn or with the SET FEATURE command, the atomic.awupf value will specify
the maximum atomic write size.

Testing
-------
NVMe QEMU Parameters used: atomic.dn=off,atomic.awun=63,atomic.awupf=63
 
# nvme id-ctrl /dev/nvme0 | grep awun
awun      : 15
# nvme id-ctrl /dev/nvme0 | grep awupf
awupf     : 7
# nvme id-ctrl /dev/nvme0 | grep acwu
acwu      : 0    < Since qemu-nvme doesn't support Compare and Write, this is always zero
# nvme get-feature /dev/nvme0  -f 0xa
get-feature:0x0a (Write Atomicity Normal), Current value:00000000
#

fio testing - using upstream version fio-3.37-124 (includes atomic write support) 
---------------------------------------------------------------------------------
# fio --filename=/dev/nvme0n1 --direct=1 --rw=randwrite --bs=8k --iodepth=256 --name=iops --numjobs=50 --ioengine=libaio --loops=10 --verify=crc64 --verify_write_sequence=0
Since the block size passed into fio is 8k, this is <= the maximum atomic blocksize (awun=15(8k)), this test will always succeed. 

# fio --filename=/dev/nvme0n1 --direct=1 --rw=randwrite --bs=64k --iodepth=256 --name=iops --numjobs=50 --ioengine=libaio --loops=10 --verify=crc64 --verify_write_sequence=0
Since the block size passed into fio is 64k, which is > the maximum atomic blocksize (awun=15(8k)), this test will eventually fail with:
crc64: verify failed at file /dev/nvme0n1 offset 347799552, length 65536 (requested block: offset=347799552, length=65536, flags=88)
       Expected CRC: d54d5f50d2569c94
       Received CRC: 691e1aed4669ba33 

Future Work
-----------
- Namespace support (NAWUN, NAWUPF and NACWU)
- Namespace Boundary support (NABSN, NABO, and NABSPF)
- Atomic Compare and Write Unit (ACWU)

Alan Adamson (1):
  hw/nvme: add atomic write support

 hw/nvme/ctrl.c | 164 ++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/nvme/nvme.h |  12 ++++
 2 files changed, 175 insertions(+), 1 deletion(-)

-- 
2.43.5



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/1] hw/nvme: add atomic write support
  2024-09-20  0:07 [PATCH v2 0/1] hw/nvme: add atomic write support Alan Adamson
@ 2024-09-20  0:07 ` Alan Adamson
  2024-09-24 12:15   ` Klaus Jensen
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Adamson @ 2024-09-20  0:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: alan.adamson, kbusch, its, qemu-block

Adds support for the controller atomic parameters: AWUN and AWUPF. Atomic
Compare and Write Unit (ACWU) is not currently supported.

Writes that adhere to the ACWU and AWUPF parameters are guaranteed to be atomic.

New NVMe QEMU Parameters (See NVMe Specification for details):
       atomic.dn (default off) - Set the value of Disable Normal.
       atomic.awun=UINT16 (default: 0)
       atomic.awupf=UINT16 (default: 0)

By default (Disable Normal set to zero), the maximum atomic write size is
set to the AWUN value.  If Disable Normal is set, the maximum atomic write
size is set to AWUPF.

Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/ctrl.c | 164 ++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/nvme/nvme.h |  12 ++++
 2 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 9e94a2405407..0af46c57ee86 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -40,6 +40,9 @@
  *              sriov_vi_flexible=<N[optional]> \
  *              sriov_max_vi_per_vf=<N[optional]> \
  *              sriov_max_vq_per_vf=<N[optional]> \
+ *              atomic.dn=<on|off[optional]>, \
+ *              atomic.awun<N[optional]>, \
+ *              atomic.awupf<N[optional]>, \
  *              subsys=<subsys_id>
  *      -device nvme-ns,drive=<drive_id>,bus=<bus_name>,nsid=<nsid>,\
  *              zoned=<true|false[optional]>, \
@@ -254,6 +257,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
     [NVME_ERROR_RECOVERY]           = NVME_FEAT_CAP_CHANGE | NVME_FEAT_CAP_NS,
     [NVME_VOLATILE_WRITE_CACHE]     = NVME_FEAT_CAP_CHANGE,
     [NVME_NUMBER_OF_QUEUES]         = NVME_FEAT_CAP_CHANGE,
+    [NVME_WRITE_ATOMICITY]          = NVME_FEAT_CAP_CHANGE,
     [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
     [NVME_TIMESTAMP]                = NVME_FEAT_CAP_CHANGE,
     [NVME_HOST_BEHAVIOR_SUPPORT]    = NVME_FEAT_CAP_CHANGE,
@@ -6293,8 +6297,10 @@ defaults:
         if (ret) {
             return ret;
         }
-        goto out;
+        break;
 
+    case NVME_WRITE_ATOMICITY:
+        result = n->dn;
         break;
     default:
         result = nvme_feature_default[fid];
@@ -6378,6 +6384,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
     uint8_t save = NVME_SETFEAT_SAVE(dw10);
     uint16_t status;
     int i;
+    NvmeIdCtrl *id = &n->id_ctrl;
+    NvmeAtomic *atomic = &n->atomic;
 
     trace_pci_nvme_setfeat(nvme_cid(req), nsid, fid, save, dw11);
 
@@ -6530,6 +6538,22 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
         return NVME_CMD_SEQ_ERROR | NVME_DNR;
     case NVME_FDP_EVENTS:
         return nvme_set_feature_fdp_events(n, ns, req);
+    case NVME_WRITE_ATOMICITY:
+
+        n->dn = 0x1 & dw11;
+
+        if (n->dn) {
+            atomic->atomic_max_write_size = id->awupf + 1;
+        } else {
+            atomic->atomic_max_write_size = id->awun + 1;
+        }
+
+        if (atomic->atomic_max_write_size == 1) {
+            atomic->atomic_writes = 0;
+        } else {
+            atomic->atomic_writes = 1;
+        }
+        break;
     default:
         return NVME_FEAT_NOT_CHANGEABLE | NVME_DNR;
     }
@@ -7227,6 +7251,80 @@ static void nvme_update_sq_tail(NvmeSQueue *sq)
     trace_pci_nvme_update_sq_tail(sq->sqid, sq->tail);
 }
 
+#define NVME_ATOMIC_NO_START        0
+#define NVME_ATOMIC_START_ATOMIC    1
+#define NVME_ATOMIC_START_NONATOMIC 2
+
+static int nvme_atomic_write_check(NvmeCtrl *n, NvmeCmd *cmd,
+    NvmeAtomic *atomic)
+{
+    NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
+    uint64_t slba = le64_to_cpu(rw->slba);
+    uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb);
+    uint64_t elba = slba + nlb;
+    bool cmd_atomic_wr = true;
+    int i;
+
+    if ((cmd->opcode == NVME_CMD_READ) || ((cmd->opcode == NVME_CMD_WRITE) &&
+        ((rw->nlb + 1) > atomic->atomic_max_write_size))) {
+        cmd_atomic_wr = false;
+    }
+
+    /*
+     * Walk the queues to see if there are any atomic conflicts.
+     */
+    for (i = 1; i < n->params.max_ioqpairs + 1; i++) {
+        NvmeSQueue *sq;
+        NvmeRequest *req;
+        NvmeRwCmd *req_rw;
+        uint64_t req_slba;
+        uint32_t req_nlb;
+        uint64_t req_elba;
+
+        sq = n->sq[i];
+        if (!sq) {
+            break;
+        }
+
+        /*
+         * Walk all the requests on a given queue.
+         */
+        QTAILQ_FOREACH(req, &sq->out_req_list, entry) {
+            req_rw = (NvmeRwCmd *)&req->cmd;
+
+            if (((req_rw->opcode == NVME_CMD_WRITE) || (req_rw->opcode == NVME_CMD_READ)) &&
+                (cmd->nsid == req->ns->params.nsid)) {
+                req_slba = le64_to_cpu(req_rw->slba);
+                req_nlb = (uint32_t)le16_to_cpu(req_rw->nlb);
+                req_elba = req_slba + req_nlb;
+
+                if (cmd_atomic_wr) {
+                    if ((elba >= req_slba) && (slba <= req_elba)) {
+                        return NVME_ATOMIC_NO_START;
+                    }
+                } else {
+                    if (req->atomic_write && ((elba >= req_slba) &&
+                        (slba <= req_elba))) {
+                        return NVME_ATOMIC_NO_START;
+                    }
+                }
+            }
+        }
+    }
+    if (cmd_atomic_wr) {
+        return NVME_ATOMIC_START_ATOMIC;
+    }
+    return NVME_ATOMIC_START_NONATOMIC;
+}
+
+static NvmeAtomic *nvme_get_atomic(NvmeCtrl *n, NvmeCmd *cmd)
+{
+    if (n->atomic.atomic_writes) {
+        return &n->atomic;
+    }
+    return NULL;
+}
+
 static void nvme_process_sq(void *opaque)
 {
     NvmeSQueue *sq = opaque;
@@ -7243,6 +7341,9 @@ static void nvme_process_sq(void *opaque)
     }
 
     while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(&sq->req_list))) {
+        NvmeAtomic *atomic;
+        bool cmd_is_atomic;
+
         addr = sq->dma_addr + (sq->head << NVME_SQES);
         if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
             trace_pci_nvme_err_addr_read(addr);
@@ -7250,6 +7351,28 @@ static void nvme_process_sq(void *opaque)
             stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
             break;
         }
+
+        atomic = nvme_get_atomic(n, &cmd);
+
+        cmd_is_atomic = false;
+        if (sq->sqid && atomic) {
+            int ret;
+
+            qemu_mutex_lock(&atomic->atomic_lock);
+            ret = nvme_atomic_write_check(n, &cmd, atomic);
+            switch (ret) {
+            case NVME_ATOMIC_NO_START:
+                qemu_bh_schedule(sq->bh);
+                qemu_mutex_unlock(&atomic->atomic_lock);
+                return;
+            case NVME_ATOMIC_START_ATOMIC:
+                cmd_is_atomic = true;
+                break;
+            case NVME_ATOMIC_START_NONATOMIC:
+            default:
+                break;
+            }
+        }
         nvme_inc_sq_head(sq);
 
         req = QTAILQ_FIRST(&sq->req_list);
@@ -7259,6 +7382,11 @@ static void nvme_process_sq(void *opaque)
         req->cqe.cid = cmd.cid;
         memcpy(&req->cmd, &cmd, sizeof(NvmeCmd));
 
+        if (sq->sqid && atomic) {
+            req->atomic_write = cmd_is_atomic;
+            qemu_mutex_unlock(&atomic->atomic_lock);
+        }
+
         status = sq->sqid ? nvme_io_cmd(n, req) :
             nvme_admin_cmd(n, req);
         if (status != NVME_NO_COMPLETE) {
@@ -7362,6 +7490,8 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
     n->outstanding_aers = 0;
     n->qs_created = false;
 
+    n->dn = n->params.atomic_dn; /* Set Disable Normal */
+
     nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
 
     if (pci_is_vf(pci_dev)) {
@@ -8138,6 +8268,8 @@ static void nvme_init_state(NvmeCtrl *n)
     NvmeSecCtrlEntry *list = n->sec_ctrl_list;
     NvmeSecCtrlEntry *sctrl;
     PCIDevice *pci = PCI_DEVICE(n);
+    NvmeAtomic *atomic = &n->atomic;
+    NvmeIdCtrl *id = &n->id_ctrl;
     uint8_t max_vfs;
     int i;
 
@@ -8195,6 +8327,31 @@ static void nvme_init_state(NvmeCtrl *n)
                         cpu_to_le16(n->params.sriov_max_vi_per_vf) :
                         cap->vifrt / MAX(max_vfs, 1);
     }
+
+    /* Atomic Write */
+    id->awun = n->params.atomic_awun;
+    id->awupf = n->params.atomic_awupf;
+    n->dn = n->params.atomic_dn;
+
+    qemu_mutex_init(&atomic->atomic_lock);
+
+    if (id->awun || id->awupf) {
+        if (id->awupf > id->awun) {
+            id->awupf = 0;
+        }
+
+        if (n->dn) {
+            atomic->atomic_max_write_size = id->awupf + 1;
+        } else {
+            atomic->atomic_max_write_size = id->awun + 1;
+        }
+
+        if (atomic->atomic_max_write_size == 1) {
+            atomic->atomic_writes = 0;
+        } else {
+            atomic->atomic_writes = 1;
+        }
+    }
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -8675,6 +8832,8 @@ static void nvme_exit(PCIDevice *pci_dev)
         nvme_subsys_unregister_ctrl(n->subsys, n);
     }
 
+    qemu_mutex_destroy(&n->atomic.atomic_lock);
+
     g_free(n->cq);
     g_free(n->sq);
     g_free(n->aer_reqs);
@@ -8734,6 +8893,9 @@ static Property nvme_props[] = {
                      false),
     DEFINE_PROP_UINT16("mqes", NvmeCtrl, params.mqes, 0x7ff),
     DEFINE_PROP_UINT16("spdm_port", PCIDevice, spdm_port, 0),
+    DEFINE_PROP_BOOL("atomic.dn", NvmeCtrl, params.atomic_dn, 0),
+    DEFINE_PROP_UINT16("atomic.awun", NvmeCtrl, params.atomic_awun, 0),
+    DEFINE_PROP_UINT16("atomic.awupf", NvmeCtrl, params.atomic_awupf, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 781985754d0d..4d8582e6f2a5 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -220,6 +220,12 @@ typedef struct NvmeNamespaceParams {
     } fdp;
 } NvmeNamespaceParams;
 
+typedef struct NvmeAtomic {
+    uint32_t    atomic_max_write_size;
+    QemuMutex   atomic_lock;
+    bool        atomic_writes;
+} NvmeAtomic;
+
 typedef struct NvmeNamespace {
     DeviceState  parent_obj;
     BlockConf    blkconf;
@@ -421,6 +427,7 @@ typedef struct NvmeRequest {
     NvmeCmd                 cmd;
     BlockAcctCookie         acct;
     NvmeSg                  sg;
+    bool                    atomic_write;
     QTAILQ_ENTRY(NvmeRequest)entry;
 } NvmeRequest;
 
@@ -538,6 +545,9 @@ typedef struct NvmeParams {
     uint32_t  sriov_max_vq_per_vf;
     uint32_t  sriov_max_vi_per_vf;
     bool     msix_exclusive_bar;
+    uint16_t atomic_awun;
+    uint16_t atomic_awupf;
+    bool     atomic_dn;
 } NvmeParams;
 
 typedef struct NvmeCtrl {
@@ -619,6 +629,8 @@ typedef struct NvmeCtrl {
         uint16_t    vqrfap;
         uint16_t    virfap;
     } next_pri_ctrl_cap;    /* These override pri_ctrl_cap after reset */
+    uint32_t    dn; /* Disable Normal */
+    NvmeAtomic  atomic;
 } NvmeCtrl;
 
 typedef enum NvmeResetType {
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/1] hw/nvme: add atomic write support
  2024-09-20  0:07 ` [PATCH v2 1/1] " Alan Adamson
@ 2024-09-24 12:15   ` Klaus Jensen
  2024-09-26 17:21     ` alan.adamson
  0 siblings, 1 reply; 4+ messages in thread
From: Klaus Jensen @ 2024-09-24 12:15 UTC (permalink / raw)
  To: Alan Adamson; +Cc: qemu-devel, kbusch, qemu-block

On Sep 19 17:07, Alan Adamson wrote:
> Adds support for the controller atomic parameters: AWUN and AWUPF. Atomic
> Compare and Write Unit (ACWU) is not currently supported.
> 
> Writes that adhere to the ACWU and AWUPF parameters are guaranteed to be atomic.
> 
> New NVMe QEMU Parameters (See NVMe Specification for details):
>        atomic.dn (default off) - Set the value of Disable Normal.
>        atomic.awun=UINT16 (default: 0)
>        atomic.awupf=UINT16 (default: 0)
> 
> By default (Disable Normal set to zero), the maximum atomic write size is
> set to the AWUN value.  If Disable Normal is set, the maximum atomic write
> size is set to AWUPF.
> 
> Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
> Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/nvme/ctrl.c | 164 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/nvme/nvme.h |  12 ++++
>  2 files changed, 175 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 9e94a2405407..0af46c57ee86 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -40,6 +40,9 @@
>   *              sriov_vi_flexible=<N[optional]> \
>   *              sriov_max_vi_per_vf=<N[optional]> \
>   *              sriov_max_vq_per_vf=<N[optional]> \
> + *              atomic.dn=<on|off[optional]>, \
> + *              atomic.awun<N[optional]>, \
> + *              atomic.awupf<N[optional]>, \
>   *              subsys=<subsys_id>
>   *      -device nvme-ns,drive=<drive_id>,bus=<bus_name>,nsid=<nsid>,\
>   *              zoned=<true|false[optional]>, \
> @@ -254,6 +257,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
>      [NVME_ERROR_RECOVERY]           = NVME_FEAT_CAP_CHANGE | NVME_FEAT_CAP_NS,
>      [NVME_VOLATILE_WRITE_CACHE]     = NVME_FEAT_CAP_CHANGE,
>      [NVME_NUMBER_OF_QUEUES]         = NVME_FEAT_CAP_CHANGE,
> +    [NVME_WRITE_ATOMICITY]          = NVME_FEAT_CAP_CHANGE,
>      [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
>      [NVME_TIMESTAMP]                = NVME_FEAT_CAP_CHANGE,
>      [NVME_HOST_BEHAVIOR_SUPPORT]    = NVME_FEAT_CAP_CHANGE,
> @@ -6293,8 +6297,10 @@ defaults:
>          if (ret) {
>              return ret;
>          }
> -        goto out;
> +        break;
>  
> +    case NVME_WRITE_ATOMICITY:
> +        result = n->dn;
>          break;
>      default:
>          result = nvme_feature_default[fid];
> @@ -6378,6 +6384,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
>      uint8_t save = NVME_SETFEAT_SAVE(dw10);
>      uint16_t status;
>      int i;
> +    NvmeIdCtrl *id = &n->id_ctrl;
> +    NvmeAtomic *atomic = &n->atomic;
>  
>      trace_pci_nvme_setfeat(nvme_cid(req), nsid, fid, save, dw11);
>  
> @@ -6530,6 +6538,22 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
>          return NVME_CMD_SEQ_ERROR | NVME_DNR;
>      case NVME_FDP_EVENTS:
>          return nvme_set_feature_fdp_events(n, ns, req);
> +    case NVME_WRITE_ATOMICITY:
> +
> +        n->dn = 0x1 & dw11;
> +
> +        if (n->dn) {
> +            atomic->atomic_max_write_size = id->awupf + 1;
> +        } else {
> +            atomic->atomic_max_write_size = id->awun + 1;
> +        }

le16_to_cpu()'s needed here.

> +
> +        if (atomic->atomic_max_write_size == 1) {
> +            atomic->atomic_writes = 0;
> +        } else {
> +            atomic->atomic_writes = 1;
> +        }
> +        break;
>      default:
>          return NVME_FEAT_NOT_CHANGEABLE | NVME_DNR;
>      }
> @@ -7227,6 +7251,80 @@ static void nvme_update_sq_tail(NvmeSQueue *sq)
>      trace_pci_nvme_update_sq_tail(sq->sqid, sq->tail);
>  }
>  
> +#define NVME_ATOMIC_NO_START        0
> +#define NVME_ATOMIC_START_ATOMIC    1
> +#define NVME_ATOMIC_START_NONATOMIC 2
> +
> +static int nvme_atomic_write_check(NvmeCtrl *n, NvmeCmd *cmd,
> +    NvmeAtomic *atomic)
> +{
> +    NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
> +    uint64_t slba = le64_to_cpu(rw->slba);
> +    uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb);
> +    uint64_t elba = slba + nlb;
> +    bool cmd_atomic_wr = true;
> +    int i;
> +
> +    if ((cmd->opcode == NVME_CMD_READ) || ((cmd->opcode == NVME_CMD_WRITE) &&
> +        ((rw->nlb + 1) > atomic->atomic_max_write_size))) {
> +        cmd_atomic_wr = false;
> +    }
> +
> +    /*
> +     * Walk the queues to see if there are any atomic conflicts.
> +     */
> +    for (i = 1; i < n->params.max_ioqpairs + 1; i++) {
> +        NvmeSQueue *sq;
> +        NvmeRequest *req;
> +        NvmeRwCmd *req_rw;
> +        uint64_t req_slba;
> +        uint32_t req_nlb;
> +        uint64_t req_elba;
> +
> +        sq = n->sq[i];
> +        if (!sq) {
> +            break;

This needs to be a `continue`.

> +        }
> +
> +        /*
> +         * Walk all the requests on a given queue.
> +         */
> +        QTAILQ_FOREACH(req, &sq->out_req_list, entry) {
> +            req_rw = (NvmeRwCmd *)&req->cmd;
> +
> +            if (((req_rw->opcode == NVME_CMD_WRITE) || (req_rw->opcode == NVME_CMD_READ)) &&
> +                (cmd->nsid == req->ns->params.nsid)) {
> +                req_slba = le64_to_cpu(req_rw->slba);
> +                req_nlb = (uint32_t)le16_to_cpu(req_rw->nlb);
> +                req_elba = req_slba + req_nlb;
> +
> +                if (cmd_atomic_wr) {
> +                    if ((elba >= req_slba) && (slba <= req_elba)) {
> +                        return NVME_ATOMIC_NO_START;
> +                    }
> +                } else {
> +                    if (req->atomic_write && ((elba >= req_slba) &&
> +                        (slba <= req_elba))) {
> +                        return NVME_ATOMIC_NO_START;
> +                    }
> +                }
> +            }
> +        }
> +    }
> +    if (cmd_atomic_wr) {
> +        return NVME_ATOMIC_START_ATOMIC;
> +    }
> +    return NVME_ATOMIC_START_NONATOMIC;
> +}
> +
> +static NvmeAtomic *nvme_get_atomic(NvmeCtrl *n, NvmeCmd *cmd)
> +{
> +    if (n->atomic.atomic_writes) {
> +        return &n->atomic;
> +    }
> +    return NULL;
> +}
> +
>  static void nvme_process_sq(void *opaque)
>  {
>      NvmeSQueue *sq = opaque;
> @@ -7243,6 +7341,9 @@ static void nvme_process_sq(void *opaque)
>      }
>  
>      while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(&sq->req_list))) {
> +        NvmeAtomic *atomic;
> +        bool cmd_is_atomic;
> +
>          addr = sq->dma_addr + (sq->head << NVME_SQES);
>          if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
>              trace_pci_nvme_err_addr_read(addr);
> @@ -7250,6 +7351,28 @@ static void nvme_process_sq(void *opaque)
>              stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
>              break;
>          }
> +
> +        atomic = nvme_get_atomic(n, &cmd);
> +
> +        cmd_is_atomic = false;
> +        if (sq->sqid && atomic) {
> +            int ret;
> +
> +            qemu_mutex_lock(&atomic->atomic_lock);

I don't think this needs to be protected by a lock. The nvme emulation
is running in the main loop, so a Set Feature cannot be processed at the
same time as this. I think that is what we are expecting to guard
against?

If I/O queues were processed from an iothread, this would be needed, but
then we also need to take the lock when processing the feature and a
bunch of other stuff might become more complicated.

For now, I think it can just be dropped since if we enable the user to
attach an iothread, my intention is to reduce such complexity by
disabling all the "faked" features of the device.

> +            ret = nvme_atomic_write_check(n, &cmd, atomic);
> +            switch (ret) {
> +            case NVME_ATOMIC_NO_START:
> +                qemu_bh_schedule(sq->bh);
> +                qemu_mutex_unlock(&atomic->atomic_lock);
> +                return;
> +            case NVME_ATOMIC_START_ATOMIC:
> +                cmd_is_atomic = true;
> +                break;
> +            case NVME_ATOMIC_START_NONATOMIC:
> +            default:
> +                break;
> +            }
> +        }
>          nvme_inc_sq_head(sq);
>  
>          req = QTAILQ_FIRST(&sq->req_list);
> @@ -7259,6 +7382,11 @@ static void nvme_process_sq(void *opaque)
>          req->cqe.cid = cmd.cid;
>          memcpy(&req->cmd, &cmd, sizeof(NvmeCmd));
>  
> +        if (sq->sqid && atomic) {
> +            req->atomic_write = cmd_is_atomic;
> +            qemu_mutex_unlock(&atomic->atomic_lock);
> +        }
> +
>          status = sq->sqid ? nvme_io_cmd(n, req) :
>              nvme_admin_cmd(n, req);
>          if (status != NVME_NO_COMPLETE) {
> @@ -7362,6 +7490,8 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
>      n->outstanding_aers = 0;
>      n->qs_created = false;
>  
> +    n->dn = n->params.atomic_dn; /* Set Disable Normal */
> +
>      nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
>  
>      if (pci_is_vf(pci_dev)) {
> @@ -8138,6 +8268,8 @@ static void nvme_init_state(NvmeCtrl *n)
>      NvmeSecCtrlEntry *list = n->sec_ctrl_list;
>      NvmeSecCtrlEntry *sctrl;
>      PCIDevice *pci = PCI_DEVICE(n);
> +    NvmeAtomic *atomic = &n->atomic;
> +    NvmeIdCtrl *id = &n->id_ctrl;
>      uint8_t max_vfs;
>      int i;
>  
> @@ -8195,6 +8327,31 @@ static void nvme_init_state(NvmeCtrl *n)
>                          cpu_to_le16(n->params.sriov_max_vi_per_vf) :
>                          cap->vifrt / MAX(max_vfs, 1);
>      }
> +
> +    /* Atomic Write */
> +    id->awun = n->params.atomic_awun;
> +    id->awupf = n->params.atomic_awupf;

This is missing cpu_to_le16()'s.

> +    n->dn = n->params.atomic_dn;
> +
> +    qemu_mutex_init(&atomic->atomic_lock);
> +
> +    if (id->awun || id->awupf) {
> +        if (id->awupf > id->awun) {
> +            id->awupf = 0;
> +        }
> +
> +        if (n->dn) {
> +            atomic->atomic_max_write_size = id->awupf + 1;
> +        } else {
> +            atomic->atomic_max_write_size = id->awun + 1;
> +        }
> +
> +        if (atomic->atomic_max_write_size == 1) {
> +            atomic->atomic_writes = 0;
> +        } else {
> +            atomic->atomic_writes = 1;
> +        }
> +    }
>  }
>  
>  static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
> @@ -8675,6 +8832,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>          nvme_subsys_unregister_ctrl(n->subsys, n);
>      }
>  
> +    qemu_mutex_destroy(&n->atomic.atomic_lock);
> +
>      g_free(n->cq);
>      g_free(n->sq);
>      g_free(n->aer_reqs);
> @@ -8734,6 +8893,9 @@ static Property nvme_props[] = {
>                       false),
>      DEFINE_PROP_UINT16("mqes", NvmeCtrl, params.mqes, 0x7ff),
>      DEFINE_PROP_UINT16("spdm_port", PCIDevice, spdm_port, 0),
> +    DEFINE_PROP_BOOL("atomic.dn", NvmeCtrl, params.atomic_dn, 0),
> +    DEFINE_PROP_UINT16("atomic.awun", NvmeCtrl, params.atomic_awun, 0),
> +    DEFINE_PROP_UINT16("atomic.awupf", NvmeCtrl, params.atomic_awupf, 0),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> index 781985754d0d..4d8582e6f2a5 100644
> --- a/hw/nvme/nvme.h
> +++ b/hw/nvme/nvme.h
> @@ -220,6 +220,12 @@ typedef struct NvmeNamespaceParams {
>      } fdp;
>  } NvmeNamespaceParams;
>  
> +typedef struct NvmeAtomic {
> +    uint32_t    atomic_max_write_size;
> +    QemuMutex   atomic_lock;
> +    bool        atomic_writes;
> +} NvmeAtomic;
> +
>  typedef struct NvmeNamespace {
>      DeviceState  parent_obj;
>      BlockConf    blkconf;
> @@ -421,6 +427,7 @@ typedef struct NvmeRequest {
>      NvmeCmd                 cmd;
>      BlockAcctCookie         acct;
>      NvmeSg                  sg;
> +    bool                    atomic_write;
>      QTAILQ_ENTRY(NvmeRequest)entry;
>  } NvmeRequest;
>  
> @@ -538,6 +545,9 @@ typedef struct NvmeParams {
>      uint32_t  sriov_max_vq_per_vf;
>      uint32_t  sriov_max_vi_per_vf;
>      bool     msix_exclusive_bar;
> +    uint16_t atomic_awun;
> +    uint16_t atomic_awupf;
> +    bool     atomic_dn;
>  } NvmeParams;
>  
>  typedef struct NvmeCtrl {
> @@ -619,6 +629,8 @@ typedef struct NvmeCtrl {
>          uint16_t    vqrfap;
>          uint16_t    virfap;
>      } next_pri_ctrl_cap;    /* These override pri_ctrl_cap after reset */
> +    uint32_t    dn; /* Disable Normal */
> +    NvmeAtomic  atomic;
>  } NvmeCtrl;
>  
>  typedef enum NvmeResetType {
> -- 
> 2.43.5
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/1] hw/nvme: add atomic write support
  2024-09-24 12:15   ` Klaus Jensen
@ 2024-09-26 17:21     ` alan.adamson
  0 siblings, 0 replies; 4+ messages in thread
From: alan.adamson @ 2024-09-26 17:21 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: qemu-devel, kbusch, qemu-block


On 9/24/24 5:15 AM, Klaus Jensen wrote:
> On Sep 19 17:07, Alan Adamson wrote:
>> Adds support for the controller atomic parameters: AWUN and AWUPF. Atomic
>> Compare and Write Unit (ACWU) is not currently supported.
>>
>> Writes that adhere to the ACWU and AWUPF parameters are guaranteed to be atomic.
>>
>> New NVMe QEMU Parameters (See NVMe Specification for details):
>>         atomic.dn (default off) - Set the value of Disable Normal.
>>         atomic.awun=UINT16 (default: 0)
>>         atomic.awupf=UINT16 (default: 0)
>>
>> By default (Disable Normal set to zero), the maximum atomic write size is
>> set to the AWUN value.  If Disable Normal is set, the maximum atomic write
>> size is set to AWUPF.
>>
>> Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
>> Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
>> ---
>>   hw/nvme/ctrl.c | 164 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   hw/nvme/nvme.h |  12 ++++
>>   2 files changed, 175 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>> index 9e94a2405407..0af46c57ee86 100644
>> --- a/hw/nvme/ctrl.c
>> +++ b/hw/nvme/ctrl.c
>> @@ -40,6 +40,9 @@
>>    *              sriov_vi_flexible=<N[optional]> \
>>    *              sriov_max_vi_per_vf=<N[optional]> \
>>    *              sriov_max_vq_per_vf=<N[optional]> \
>> + *              atomic.dn=<on|off[optional]>, \
>> + *              atomic.awun<N[optional]>, \
>> + *              atomic.awupf<N[optional]>, \
>>    *              subsys=<subsys_id>
>>    *      -device nvme-ns,drive=<drive_id>,bus=<bus_name>,nsid=<nsid>,\
>>    *              zoned=<true|false[optional]>, \
>> @@ -254,6 +257,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
>>       [NVME_ERROR_RECOVERY]           = NVME_FEAT_CAP_CHANGE | NVME_FEAT_CAP_NS,
>>       [NVME_VOLATILE_WRITE_CACHE]     = NVME_FEAT_CAP_CHANGE,
>>       [NVME_NUMBER_OF_QUEUES]         = NVME_FEAT_CAP_CHANGE,
>> +    [NVME_WRITE_ATOMICITY]          = NVME_FEAT_CAP_CHANGE,
>>       [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
>>       [NVME_TIMESTAMP]                = NVME_FEAT_CAP_CHANGE,
>>       [NVME_HOST_BEHAVIOR_SUPPORT]    = NVME_FEAT_CAP_CHANGE,
>> @@ -6293,8 +6297,10 @@ defaults:
>>           if (ret) {
>>               return ret;
>>           }
>> -        goto out;
>> +        break;
>>   
>> +    case NVME_WRITE_ATOMICITY:
>> +        result = n->dn;
>>           break;
>>       default:
>>           result = nvme_feature_default[fid];
>> @@ -6378,6 +6384,8 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
>>       uint8_t save = NVME_SETFEAT_SAVE(dw10);
>>       uint16_t status;
>>       int i;
>> +    NvmeIdCtrl *id = &n->id_ctrl;
>> +    NvmeAtomic *atomic = &n->atomic;
>>   
>>       trace_pci_nvme_setfeat(nvme_cid(req), nsid, fid, save, dw11);
>>   
>> @@ -6530,6 +6538,22 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
>>           return NVME_CMD_SEQ_ERROR | NVME_DNR;
>>       case NVME_FDP_EVENTS:
>>           return nvme_set_feature_fdp_events(n, ns, req);
>> +    case NVME_WRITE_ATOMICITY:
>> +
>> +        n->dn = 0x1 & dw11;
>> +
>> +        if (n->dn) {
>> +            atomic->atomic_max_write_size = id->awupf + 1;
>> +        } else {
>> +            atomic->atomic_max_write_size = id->awun + 1;
>> +        }
> le16_to_cpu()'s needed here.
>
>> +
>> +        if (atomic->atomic_max_write_size == 1) {
>> +            atomic->atomic_writes = 0;
>> +        } else {
>> +            atomic->atomic_writes = 1;
>> +        }
>> +        break;
>>       default:
>>           return NVME_FEAT_NOT_CHANGEABLE | NVME_DNR;
>>       }
>> @@ -7227,6 +7251,80 @@ static void nvme_update_sq_tail(NvmeSQueue *sq)
>>       trace_pci_nvme_update_sq_tail(sq->sqid, sq->tail);
>>   }
>>   
>> +#define NVME_ATOMIC_NO_START        0
>> +#define NVME_ATOMIC_START_ATOMIC    1
>> +#define NVME_ATOMIC_START_NONATOMIC 2
>> +
>> +static int nvme_atomic_write_check(NvmeCtrl *n, NvmeCmd *cmd,
>> +    NvmeAtomic *atomic)
>> +{
>> +    NvmeRwCmd *rw = (NvmeRwCmd *)cmd;
>> +    uint64_t slba = le64_to_cpu(rw->slba);
>> +    uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb);
>> +    uint64_t elba = slba + nlb;
>> +    bool cmd_atomic_wr = true;
>> +    int i;
>> +
>> +    if ((cmd->opcode == NVME_CMD_READ) || ((cmd->opcode == NVME_CMD_WRITE) &&
>> +        ((rw->nlb + 1) > atomic->atomic_max_write_size))) {
>> +        cmd_atomic_wr = false;
>> +    }
>> +
>> +    /*
>> +     * Walk the queues to see if there are any atomic conflicts.
>> +     */
>> +    for (i = 1; i < n->params.max_ioqpairs + 1; i++) {
>> +        NvmeSQueue *sq;
>> +        NvmeRequest *req;
>> +        NvmeRwCmd *req_rw;
>> +        uint64_t req_slba;
>> +        uint32_t req_nlb;
>> +        uint64_t req_elba;
>> +
>> +        sq = n->sq[i];
>> +        if (!sq) {
>> +            break;
> This needs to be a `continue`.
>
>> +        }
>> +
>> +        /*
>> +         * Walk all the requests on a given queue.
>> +         */
>> +        QTAILQ_FOREACH(req, &sq->out_req_list, entry) {
>> +            req_rw = (NvmeRwCmd *)&req->cmd;
>> +
>> +            if (((req_rw->opcode == NVME_CMD_WRITE) || (req_rw->opcode == NVME_CMD_READ)) &&
>> +                (cmd->nsid == req->ns->params.nsid)) {
>> +                req_slba = le64_to_cpu(req_rw->slba);
>> +                req_nlb = (uint32_t)le16_to_cpu(req_rw->nlb);
>> +                req_elba = req_slba + req_nlb;
>> +
>> +                if (cmd_atomic_wr) {
>> +                    if ((elba >= req_slba) && (slba <= req_elba)) {
>> +                        return NVME_ATOMIC_NO_START;
>> +                    }
>> +                } else {
>> +                    if (req->atomic_write && ((elba >= req_slba) &&
>> +                        (slba <= req_elba))) {
>> +                        return NVME_ATOMIC_NO_START;
>> +                    }
>> +                }
>> +            }
>> +        }
>> +    }
>> +    if (cmd_atomic_wr) {
>> +        return NVME_ATOMIC_START_ATOMIC;
>> +    }
>> +    return NVME_ATOMIC_START_NONATOMIC;
>> +}
>> +
>> +static NvmeAtomic *nvme_get_atomic(NvmeCtrl *n, NvmeCmd *cmd)
>> +{
>> +    if (n->atomic.atomic_writes) {
>> +        return &n->atomic;
>> +    }
>> +    return NULL;
>> +}
>> +
>>   static void nvme_process_sq(void *opaque)
>>   {
>>       NvmeSQueue *sq = opaque;
>> @@ -7243,6 +7341,9 @@ static void nvme_process_sq(void *opaque)
>>       }
>>   
>>       while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(&sq->req_list))) {
>> +        NvmeAtomic *atomic;
>> +        bool cmd_is_atomic;
>> +
>>           addr = sq->dma_addr + (sq->head << NVME_SQES);
>>           if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
>>               trace_pci_nvme_err_addr_read(addr);
>> @@ -7250,6 +7351,28 @@ static void nvme_process_sq(void *opaque)
>>               stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
>>               break;
>>           }
>> +
>> +        atomic = nvme_get_atomic(n, &cmd);
>> +
>> +        cmd_is_atomic = false;
>> +        if (sq->sqid && atomic) {
>> +            int ret;
>> +
>> +            qemu_mutex_lock(&atomic->atomic_lock);
> I don't think this needs to be protected by a lock. The nvme emulation
> is running in the main loop, so a Set Feature cannot be processed at the
> same time as this. I think that is what we are expecting to guard
> against?
>
> If I/O queues were processed from an iothread, this would be needed, but
> then we also need to take the lock when processing the feature and a
> bunch of other stuff might become more complicated.
>
> For now, I think it can just be dropped since if we enable the user to
> attach an iothread, my intention is to reduce such complexity by
> disabling all the "faked" features of the device.

I verified removing the locks doesn't have any issues.  I'll include 
this and the requests in v3.

What's the plan for iothread support, just a single thread per 
controller (for all queues) or a iothread per queue?

Thanks,

Alan



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-26 17:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-20  0:07 [PATCH v2 0/1] hw/nvme: add atomic write support Alan Adamson
2024-09-20  0:07 ` [PATCH v2 1/1] " Alan Adamson
2024-09-24 12:15   ` Klaus Jensen
2024-09-26 17:21     ` alan.adamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).