* [PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
2021-02-08 15:07 ` Minwoo Im
2021-02-07 21:49 ` [PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes Klaus Jensen
` (6 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
The 'len' member of the nvme_compare_ctx struct is redundant since the
same information is available in the 'iov' member.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/block/nvme.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b84e34843f5..6b46925ddd18 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1656,7 +1656,6 @@ static void nvme_aio_copy_in_cb(void *opaque, int ret)
struct nvme_compare_ctx {
QEMUIOVector iov;
uint8_t *bounce;
- size_t len;
};
static void nvme_compare_cb(void *opaque, int ret)
@@ -1677,16 +1676,16 @@ static void nvme_compare_cb(void *opaque, int ret)
goto out;
}
- buf = g_malloc(ctx->len);
+ buf = g_malloc(ctx->iov.size);
- status = nvme_dma(nvme_ctrl(req), buf, ctx->len, DMA_DIRECTION_TO_DEVICE,
- req);
+ status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
+ DMA_DIRECTION_TO_DEVICE, req);
if (status) {
req->status = status;
goto out;
}
- if (memcmp(buf, ctx->bounce, ctx->len)) {
+ if (memcmp(buf, ctx->bounce, ctx->iov.size)) {
req->status = NVME_CMP_FAILURE;
}
@@ -1924,7 +1923,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req)
ctx = g_new(struct nvme_compare_ctx, 1);
ctx->bounce = bounce;
- ctx->len = len;
req->opaque = ctx;
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context
2021-02-07 21:49 ` [PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context Klaus Jensen
@ 2021-02-08 15:07 ` Minwoo Im
0 siblings, 0 replies; 14+ messages in thread
From: Minwoo Im @ 2021-02-08 15:07 UTC (permalink / raw)
To: Klaus Jensen
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, qemu-devel, Max Reitz, Stefan Hajnoczi,
Keith Busch
On 21-02-07 22:49:33, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
>
> The 'len' member of the nvme_compare_ctx struct is redundant since the
> same information is available in the 'iov' member.
>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
2021-02-07 21:49 ` [PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
2021-02-08 15:07 ` Minwoo Im
2021-02-07 21:49 ` [PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing Klaus Jensen
` (5 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
A Write Zeroes commands should not be counted in either the 'Data Units
Written' or in 'Host Write Commands' SMART/Health Information Log page.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/block/nvme.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b46925ddd18..e4a01cf9edc5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2088,7 +2088,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
nvme_rw_cb, req);
}
} else {
- block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE);
req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
req);
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes
2021-02-07 21:49 ` [PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes Klaus Jensen
@ 2021-02-08 15:07 ` Minwoo Im
0 siblings, 0 replies; 14+ messages in thread
From: Minwoo Im @ 2021-02-08 15:07 UTC (permalink / raw)
To: Klaus Jensen
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, qemu-devel, Max Reitz, Stefan Hajnoczi,
Keith Busch
On 21-02-07 22:49:34, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
>
> A Write Zeroes commands should not be counted in either the 'Data Units
> Written' or in 'Host Write Commands' SMART/Health Information Log page.
>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
2021-02-07 21:49 ` [PATCH RFC v2 1/8] hw/block/nvme: remove redundant len member in compare context Klaus Jensen
2021-02-07 21:49 ` [PATCH RFC v2 2/8] hw/block/nvme: remove block accounting for write zeroes Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
2021-02-08 15:13 ` Minwoo Im
2021-02-07 21:49 ` [PATCH RFC v2 4/8] hw/block/nvme: try to deal with the iov/qsg duality Klaus Jensen
` (4 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
Fix missing sign inversion.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/block/nvme.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e4a01cf9edc5..29902038d618 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1150,7 +1150,7 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
break;
}
- trace_pci_nvme_err_aio(nvme_cid(req), strerror(ret), status);
+ trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
error_setg_errno(&local_err, -ret, "aio failed");
error_report_err(local_err);
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing
2021-02-07 21:49 ` [PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing Klaus Jensen
@ 2021-02-08 15:13 ` Minwoo Im
0 siblings, 0 replies; 14+ messages in thread
From: Minwoo Im @ 2021-02-08 15:13 UTC (permalink / raw)
To: Klaus Jensen
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, qemu-devel, Max Reitz, Stefan Hajnoczi,
Keith Busch
On 21-02-07 22:49:35, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
>
> Fix missing sign inversion.
>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH RFC v2 4/8] hw/block/nvme: try to deal with the iov/qsg duality
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
` (2 preceding siblings ...)
2021-02-07 21:49 ` [PATCH RFC v2 3/8] hw/block/nvme: fix strerror printing Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
2021-02-08 15:45 ` Keith Busch
2021-02-07 21:49 ` [PATCH RFC v2 5/8] hw/block/nvme: remove the req dependency in map functions Klaus Jensen
` (3 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
Introduce NvmeSg and try to deal with that pesky qsg/iov duality that
haunts all the memory-related functions.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/block/nvme.h | 8 ++-
hw/block/nvme.c | 171 ++++++++++++++++++++++++------------------------
2 files changed, 90 insertions(+), 89 deletions(-)
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index cb2b5175f1a1..0e4fbd6990ad 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -29,6 +29,11 @@ typedef struct NvmeAsyncEvent {
NvmeAerResult result;
} NvmeAsyncEvent;
+typedef struct NvmeSg {
+ QEMUSGList qsg;
+ QEMUIOVector iov;
+} NvmeSg;
+
typedef struct NvmeRequest {
struct NvmeSQueue *sq;
struct NvmeNamespace *ns;
@@ -38,8 +43,7 @@ typedef struct NvmeRequest {
NvmeCqe cqe;
NvmeCmd cmd;
BlockAcctCookie acct;
- QEMUSGList qsg;
- QEMUIOVector iov;
+ NvmeSg sg;
QTAILQ_ENTRY(NvmeRequest)entry;
} NvmeRequest;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 29902038d618..a0009c057f1e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -428,14 +428,20 @@ static void nvme_req_clear(NvmeRequest *req)
req->status = NVME_SUCCESS;
}
-static void nvme_req_exit(NvmeRequest *req)
+static inline void nvme_sg_init(NvmeCtrl *n, NvmeSg *sg)
{
- if (req->qsg.sg) {
- qemu_sglist_destroy(&req->qsg);
+ pci_dma_sglist_init(&sg->qsg, &n->parent_obj, 0);
+ qemu_iovec_init(&sg->iov, 0);
+}
+
+static inline void nvme_sg_unmap(NvmeSg *sg)
+{
+ if (sg->qsg.sg) {
+ qemu_sglist_destroy(&sg->qsg);
}
- if (req->iov.iov) {
- qemu_iovec_destroy(&req->iov);
+ if (sg->iov.iov) {
+ qemu_iovec_destroy(&sg->iov);
}
}
@@ -473,8 +479,7 @@ static uint16_t nvme_map_addr_pmr(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
return NVME_SUCCESS;
}
-static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
- hwaddr addr, size_t len)
+static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len)
{
bool cmb = false, pmr = false;
@@ -491,34 +496,22 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
}
if (cmb || pmr) {
- if (qsg && qsg->sg) {
+ if (sg->qsg.nsg) {
return NVME_INVALID_USE_OF_CMB | NVME_DNR;
}
- assert(iov);
-
- if (!iov->iov) {
- qemu_iovec_init(iov, 1);
- }
-
if (cmb) {
- return nvme_map_addr_cmb(n, iov, addr, len);
+ return nvme_map_addr_cmb(n, &sg->iov, addr, len);
} else {
- return nvme_map_addr_pmr(n, iov, addr, len);
+ return nvme_map_addr_pmr(n, &sg->iov, addr, len);
}
}
- if (iov && iov->iov) {
+ if (sg->iov.niov) {
return NVME_INVALID_USE_OF_CMB | NVME_DNR;
}
- assert(qsg);
-
- if (!qsg->sg) {
- pci_dma_sglist_init(qsg, &n->parent_obj, 1);
- }
-
- qemu_sglist_add(qsg, addr, len);
+ qemu_sglist_add(&sg->qsg, addr, len);
return NVME_SUCCESS;
}
@@ -532,20 +525,13 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
uint16_t status;
int ret;
- QEMUSGList *qsg = &req->qsg;
- QEMUIOVector *iov = &req->iov;
-
trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
- if (nvme_addr_is_cmb(n, prp1) || (nvme_addr_is_pmr(n, prp1))) {
- qemu_iovec_init(iov, num_prps);
- } else {
- pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
- }
+ nvme_sg_init(n, &req->sg);
- status = nvme_map_addr(n, qsg, iov, prp1, trans_len);
+ status = nvme_map_addr(n, &req->sg, prp1, trans_len);
if (status) {
- return status;
+ goto unmap;
}
len -= trans_len;
@@ -560,7 +546,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
if (ret) {
trace_pci_nvme_err_addr_read(prp2);
- return NVME_DATA_TRAS_ERROR;
+ status = NVME_DATA_TRAS_ERROR;
+ goto unmap;
}
while (len != 0) {
uint64_t prp_ent = le64_to_cpu(prp_list[i]);
@@ -568,7 +555,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
if (i == n->max_prp_ents - 1 && len > n->page_size) {
if (unlikely(prp_ent & (n->page_size - 1))) {
trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
- return NVME_INVALID_PRP_OFFSET | NVME_DNR;
+ status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
+ goto unmap;
}
i = 0;
@@ -578,20 +566,22 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
prp_trans);
if (ret) {
trace_pci_nvme_err_addr_read(prp_ent);
- return NVME_DATA_TRAS_ERROR;
+ status = NVME_DATA_TRAS_ERROR;
+ goto unmap;
}
prp_ent = le64_to_cpu(prp_list[i]);
}
if (unlikely(prp_ent & (n->page_size - 1))) {
trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
- return NVME_INVALID_PRP_OFFSET | NVME_DNR;
+ status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
+ goto unmap;
}
trans_len = MIN(len, n->page_size);
- status = nvme_map_addr(n, qsg, iov, prp_ent, trans_len);
+ status = nvme_map_addr(n, &req->sg, prp_ent, trans_len);
if (status) {
- return status;
+ goto unmap;
}
len -= trans_len;
@@ -600,24 +590,28 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
} else {
if (unlikely(prp2 & (n->page_size - 1))) {
trace_pci_nvme_err_invalid_prp2_align(prp2);
- return NVME_INVALID_PRP_OFFSET | NVME_DNR;
+ status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
+ goto unmap;
}
- status = nvme_map_addr(n, qsg, iov, prp2, len);
+ status = nvme_map_addr(n, &req->sg, prp2, len);
if (status) {
- return status;
+ goto unmap;
}
}
}
return NVME_SUCCESS;
+
+unmap:
+ nvme_sg_unmap(&req->sg);
+ return status;
}
/*
* Map 'nsgld' data descriptors from 'segment'. The function will subtract the
* number of bytes mapped in len.
*/
-static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
- QEMUIOVector *iov,
+static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
NvmeSglDescriptor *segment, uint64_t nsgld,
size_t *len, NvmeRequest *req)
{
@@ -675,7 +669,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
}
- status = nvme_map_addr(n, qsg, iov, addr, trans_len);
+ status = nvme_map_addr(n, sg, addr, trans_len);
if (status) {
return status;
}
@@ -687,9 +681,8 @@ next:
return NVME_SUCCESS;
}
-static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
- NvmeSglDescriptor sgl, size_t len,
- NvmeRequest *req)
+static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
+ size_t len, NvmeRequest *req)
{
/*
* Read the segment in chunks of 256 descriptors (one 4k page) to avoid
@@ -707,6 +700,8 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
hwaddr addr;
int ret;
+ nvme_sg_init(n, sg);
+
sgld = &sgl;
addr = le64_to_cpu(sgl.addr);
@@ -717,7 +712,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
* be mapped directly.
*/
if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
- status = nvme_map_sgl_data(n, qsg, iov, sgld, 1, &len, req);
+ status = nvme_map_sgl_data(n, sg, sgld, 1, &len, req);
if (status) {
goto unmap;
}
@@ -755,7 +750,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
goto unmap;
}
- status = nvme_map_sgl_data(n, qsg, iov, segment, SEG_CHUNK_SIZE,
+ status = nvme_map_sgl_data(n, sg, segment, SEG_CHUNK_SIZE,
&len, req);
if (status) {
goto unmap;
@@ -782,7 +777,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
switch (NVME_SGL_TYPE(last_sgld->type)) {
case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
- status = nvme_map_sgl_data(n, qsg, iov, segment, nsgld, &len, req);
+ status = nvme_map_sgl_data(n, sg, segment, nsgld, &len, req);
if (status) {
goto unmap;
}
@@ -809,7 +804,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
* Do not map the last descriptor; it will be a Segment or Last Segment
* descriptor and is handled by the next iteration.
*/
- status = nvme_map_sgl_data(n, qsg, iov, segment, nsgld - 1, &len, req);
+ status = nvme_map_sgl_data(n, sg, segment, nsgld - 1, &len, req);
if (status) {
goto unmap;
}
@@ -825,14 +820,7 @@ out:
return NVME_SUCCESS;
unmap:
- if (iov->iov) {
- qemu_iovec_destroy(iov);
- }
-
- if (qsg->sg) {
- qemu_sglist_destroy(qsg);
- }
-
+ nvme_sg_unmap(sg);
return status;
}
@@ -853,8 +841,7 @@ static uint16_t nvme_map_dptr(NvmeCtrl *n, size_t len, NvmeRequest *req)
return NVME_INVALID_FIELD | NVME_DNR;
}
- return nvme_map_sgl(n, &req->qsg, &req->iov, req->cmd.dptr.sgl, len,
- req);
+ return nvme_map_sgl(n, &req->sg, req->cmd.dptr.sgl, len, req);
default:
return NVME_INVALID_FIELD;
}
@@ -871,15 +858,15 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
}
/* assert that only one of qsg and iov carries data */
- assert((req->qsg.nsg > 0) != (req->iov.niov > 0));
+ assert((req->sg.qsg.nsg > 0) != (req->sg.iov.niov > 0));
- if (req->qsg.nsg > 0) {
+ if (req->sg.qsg.nsg > 0) {
uint64_t residual;
if (dir == DMA_DIRECTION_TO_DEVICE) {
- residual = dma_buf_write(ptr, len, &req->qsg);
+ residual = dma_buf_write(ptr, len, &req->sg.qsg);
} else {
- residual = dma_buf_read(ptr, len, &req->qsg);
+ residual = dma_buf_read(ptr, len, &req->sg.qsg);
}
if (unlikely(residual)) {
@@ -890,9 +877,9 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
size_t bytes;
if (dir == DMA_DIRECTION_TO_DEVICE) {
- bytes = qemu_iovec_to_buf(&req->iov, 0, ptr, len);
+ bytes = qemu_iovec_to_buf(&req->sg.iov, 0, ptr, len);
} else {
- bytes = qemu_iovec_from_buf(&req->iov, 0, ptr, len);
+ bytes = qemu_iovec_from_buf(&req->sg.iov, 0, ptr, len);
}
if (unlikely(bytes != len)) {
@@ -904,6 +891,28 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
return status;
}
+static inline void nvme_blk_read(BlockBackend *blk, int64_t offset,
+ BlockCompletionFunc *cb, NvmeRequest *req)
+{
+ if (req->sg.qsg.nsg) {
+ req->aiocb = dma_blk_read(blk, &req->sg.qsg, offset, BDRV_SECTOR_SIZE,
+ cb, req);
+ } else {
+ req->aiocb = blk_aio_preadv(blk, offset, &req->sg.iov, 0, cb, req);
+ }
+}
+
+static inline void nvme_blk_write(BlockBackend *blk, int64_t offset,
+ BlockCompletionFunc *cb, NvmeRequest *req)
+{
+ if (req->sg.qsg.nsg) {
+ req->aiocb = dma_blk_write(blk, &req->sg.qsg, offset, BDRV_SECTOR_SIZE,
+ cb, req);
+ } else {
+ req->aiocb = blk_aio_pwritev(blk, offset, &req->sg.iov, 0, cb, req);
+ }
+}
+
static void nvme_post_cqes(void *opaque)
{
NvmeCQueue *cq = opaque;
@@ -934,7 +943,7 @@ static void nvme_post_cqes(void *opaque)
}
QTAILQ_REMOVE(&cq->req_list, req, entry);
nvme_inc_cq_tail(cq);
- nvme_req_exit(req);
+ nvme_sg_unmap(&req->sg);
QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
}
if (cq->tail != cq->head) {
@@ -1597,14 +1606,14 @@ static void nvme_copy_in_complete(NvmeRequest *req)
zone->w_ptr += ctx->nlb;
}
- qemu_iovec_init(&req->iov, 1);
- qemu_iovec_add(&req->iov, ctx->bounce, nvme_l2b(ns, ctx->nlb));
+ qemu_iovec_init(&req->sg.iov, 1);
+ qemu_iovec_add(&req->sg.iov, ctx->bounce, nvme_l2b(ns, ctx->nlb));
block_acct_start(blk_get_stats(ns->blkconf.blk), &req->acct, 0,
BLOCK_ACCT_WRITE);
req->aiocb = blk_aio_pwritev(ns->blkconf.blk, nvme_l2b(ns, sdlba),
- &req->iov, 0, nvme_copy_cb, req);
+ &req->sg.iov, 0, nvme_copy_cb, req);
return;
@@ -1992,13 +2001,7 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
block_acct_start(blk_get_stats(blk), &req->acct, data_size,
BLOCK_ACCT_READ);
- if (req->qsg.sg) {
- req->aiocb = dma_blk_read(blk, &req->qsg, data_offset,
- BDRV_SECTOR_SIZE, nvme_rw_cb, req);
- } else {
- req->aiocb = blk_aio_preadv(blk, data_offset, &req->iov, 0,
- nvme_rw_cb, req);
- }
+ nvme_blk_read(blk, data_offset, nvme_rw_cb, req);
return NVME_NO_COMPLETE;
invalid:
@@ -2080,13 +2083,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
block_acct_start(blk_get_stats(blk), &req->acct, data_size,
BLOCK_ACCT_WRITE);
- if (req->qsg.sg) {
- req->aiocb = dma_blk_write(blk, &req->qsg, data_offset,
- BDRV_SECTOR_SIZE, nvme_rw_cb, req);
- } else {
- req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0,
- nvme_rw_cb, req);
- }
+ nvme_blk_write(blk, data_offset, nvme_rw_cb, req);
} else {
req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v2 4/8] hw/block/nvme: try to deal with the iov/qsg duality
2021-02-07 21:49 ` [PATCH RFC v2 4/8] hw/block/nvme: try to deal with the iov/qsg duality Klaus Jensen
@ 2021-02-08 15:45 ` Keith Busch
2021-02-08 15:47 ` Klaus Jensen
0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2021-02-08 15:45 UTC (permalink / raw)
To: Klaus Jensen
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, qemu-devel, Max Reitz, Stefan Hajnoczi
On Sun, Feb 07, 2021 at 10:49:36PM +0100, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
>
> Introduce NvmeSg and try to deal with that pesky qsg/iov duality that
> haunts all the memory-related functions.
>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
> hw/block/nvme.h | 8 ++-
> hw/block/nvme.c | 171 ++++++++++++++++++++++++------------------------
> 2 files changed, 90 insertions(+), 89 deletions(-)
>
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index cb2b5175f1a1..0e4fbd6990ad 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -29,6 +29,11 @@ typedef struct NvmeAsyncEvent {
> NvmeAerResult result;
> } NvmeAsyncEvent;
>
> +typedef struct NvmeSg {
> + QEMUSGList qsg;
> + QEMUIOVector iov;
> +} NvmeSg;
It's always one or the other, right? If so, this could be a 'union'
type, and then you'd just need an indicator field to say which type it
is. There may be a meaninful memory savings that way since we
potentially allocate thousands of these.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC v2 4/8] hw/block/nvme: try to deal with the iov/qsg duality
2021-02-08 15:45 ` Keith Busch
@ 2021-02-08 15:47 ` Klaus Jensen
0 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-02-08 15:47 UTC (permalink / raw)
To: Keith Busch
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, qemu-devel, Max Reitz, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]
On Feb 9 00:45, Keith Busch wrote:
> On Sun, Feb 07, 2021 at 10:49:36PM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> >
> > Introduce NvmeSg and try to deal with that pesky qsg/iov duality that
> > haunts all the memory-related functions.
> >
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> > hw/block/nvme.h | 8 ++-
> > hw/block/nvme.c | 171 ++++++++++++++++++++++++------------------------
> > 2 files changed, 90 insertions(+), 89 deletions(-)
> >
> > diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> > index cb2b5175f1a1..0e4fbd6990ad 100644
> > --- a/hw/block/nvme.h
> > +++ b/hw/block/nvme.h
> > @@ -29,6 +29,11 @@ typedef struct NvmeAsyncEvent {
> > NvmeAerResult result;
> > } NvmeAsyncEvent;
> >
> > +typedef struct NvmeSg {
> > + QEMUSGList qsg;
> > + QEMUIOVector iov;
> > +} NvmeSg;
>
> It's always one or the other, right? If so, this could be a 'union'
> type, and then you'd just need an indicator field to say which type it
> is. There may be a meaninful memory savings that way since we
> potentially allocate thousands of these.
Yes, that's true.
I'll do that for v3!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH RFC v2 5/8] hw/block/nvme: remove the req dependency in map functions
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
` (3 preceding siblings ...)
2021-02-07 21:49 ` [PATCH RFC v2 4/8] hw/block/nvme: try to deal with the iov/qsg duality Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
2021-02-07 21:49 ` [PATCH RFC v2 6/8] hw/block/nvme: refactor nvme_dma Klaus Jensen
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
The PRP and SGL mapping functions does not have any particular need for
the entire NvmeRequest as a parameter. Clean it up.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/block/nvme.c | 61 ++++++++++++++++++++++---------------------
hw/block/trace-events | 4 +--
2 files changed, 33 insertions(+), 32 deletions(-)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a0009c057f1e..24156699b035 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -516,8 +516,8 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len)
return NVME_SUCCESS;
}
-static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
- uint32_t len, NvmeRequest *req)
+static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1,
+ uint64_t prp2, uint32_t len)
{
hwaddr trans_len = n->page_size - (prp1 % n->page_size);
trans_len = MIN(len, trans_len);
@@ -527,9 +527,9 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
- nvme_sg_init(n, &req->sg);
+ nvme_sg_init(n, sg);
- status = nvme_map_addr(n, &req->sg, prp1, trans_len);
+ status = nvme_map_addr(n, sg, prp1, trans_len);
if (status) {
goto unmap;
}
@@ -579,7 +579,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
}
trans_len = MIN(len, n->page_size);
- status = nvme_map_addr(n, &req->sg, prp_ent, trans_len);
+ status = nvme_map_addr(n, sg, prp_ent, trans_len);
if (status) {
goto unmap;
}
@@ -593,7 +593,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
goto unmap;
}
- status = nvme_map_addr(n, &req->sg, prp2, len);
+ status = nvme_map_addr(n, sg, prp2, len);
if (status) {
goto unmap;
}
@@ -603,7 +603,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
return NVME_SUCCESS;
unmap:
- nvme_sg_unmap(&req->sg);
+ nvme_sg_unmap(sg);
return status;
}
@@ -613,7 +613,7 @@ unmap:
*/
static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
NvmeSglDescriptor *segment, uint64_t nsgld,
- size_t *len, NvmeRequest *req)
+ size_t *len, NvmeCmd *cmd)
{
dma_addr_t addr, trans_len;
uint32_t dlen;
@@ -624,7 +624,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
switch (type) {
case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
- if (req->cmd.opcode == NVME_CMD_WRITE) {
+ if (cmd->opcode == NVME_CMD_WRITE) {
continue;
}
case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
@@ -653,7 +653,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
break;
}
- trace_pci_nvme_err_invalid_sgl_excess_length(nvme_cid(req));
+ trace_pci_nvme_err_invalid_sgl_excess_length(dlen);
return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
}
@@ -682,7 +682,7 @@ next:
}
static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
- size_t len, NvmeRequest *req)
+ size_t len, NvmeCmd *cmd)
{
/*
* Read the segment in chunks of 256 descriptors (one 4k page) to avoid
@@ -705,14 +705,14 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
sgld = &sgl;
addr = le64_to_cpu(sgl.addr);
- trace_pci_nvme_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), len);
+ trace_pci_nvme_map_sgl(NVME_SGL_TYPE(sgl.type), len);
/*
* If the entire transfer can be described with a single data block it can
* be mapped directly.
*/
if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
- status = nvme_map_sgl_data(n, sg, sgld, 1, &len, req);
+ status = nvme_map_sgl_data(n, sg, sgld, 1, &len, cmd);
if (status) {
goto unmap;
}
@@ -751,7 +751,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
}
status = nvme_map_sgl_data(n, sg, segment, SEG_CHUNK_SIZE,
- &len, req);
+ &len, cmd);
if (status) {
goto unmap;
}
@@ -777,7 +777,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
switch (NVME_SGL_TYPE(last_sgld->type)) {
case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
- status = nvme_map_sgl_data(n, sg, segment, nsgld, &len, req);
+ status = nvme_map_sgl_data(n, sg, segment, nsgld, &len, cmd);
if (status) {
goto unmap;
}
@@ -804,7 +804,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
* Do not map the last descriptor; it will be a Segment or Last Segment
* descriptor and is handled by the next iteration.
*/
- status = nvme_map_sgl_data(n, sg, segment, nsgld - 1, &len, req);
+ status = nvme_map_sgl_data(n, sg, segment, nsgld - 1, &len, cmd);
if (status) {
goto unmap;
}
@@ -824,24 +824,20 @@ unmap:
return status;
}
-static uint16_t nvme_map_dptr(NvmeCtrl *n, size_t len, NvmeRequest *req)
+static uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
+ NvmeCmd *cmd)
{
uint64_t prp1, prp2;
- switch (NVME_CMD_FLAGS_PSDT(req->cmd.flags)) {
+ switch (NVME_CMD_FLAGS_PSDT(cmd->flags)) {
case NVME_PSDT_PRP:
- prp1 = le64_to_cpu(req->cmd.dptr.prp1);
- prp2 = le64_to_cpu(req->cmd.dptr.prp2);
+ prp1 = le64_to_cpu(cmd->dptr.prp1);
+ prp2 = le64_to_cpu(cmd->dptr.prp2);
- return nvme_map_prp(n, prp1, prp2, len, req);
+ return nvme_map_prp(n, sg, prp1, prp2, len);
case NVME_PSDT_SGL_MPTR_CONTIGUOUS:
case NVME_PSDT_SGL_MPTR_SGL:
- /* SGLs shall not be used for Admin commands in NVMe over PCIe */
- if (!req->sq->sqid) {
- return NVME_INVALID_FIELD | NVME_DNR;
- }
-
- return nvme_map_sgl(n, &req->sg, req->cmd.dptr.sgl, len, req);
+ return nvme_map_sgl(n, sg, cmd->dptr.sgl, len, cmd);
default:
return NVME_INVALID_FIELD;
}
@@ -852,7 +848,7 @@ static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
{
uint16_t status = NVME_SUCCESS;
- status = nvme_map_dptr(n, len, req);
+ status = nvme_map_dptr(n, &req->sg, len, &req->cmd);
if (status) {
return status;
}
@@ -1985,7 +1981,7 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
}
}
- status = nvme_map_dptr(n, data_size, req);
+ status = nvme_map_dptr(n, &req->sg, data_size, &req->cmd);
if (status) {
goto invalid;
}
@@ -2076,7 +2072,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
data_offset = nvme_l2b(ns, slba);
if (!wrz) {
- status = nvme_map_dptr(n, data_size, req);
+ status = nvme_map_dptr(n, &req->sg, data_size, &req->cmd);
if (status) {
goto invalid;
}
@@ -3732,6 +3728,11 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
return NVME_INVALID_OPCODE | NVME_DNR;
}
+ /* SGLs shall not be used for Admin commands in NVMe over PCIe */
+ if (NVME_CMD_FLAGS_PSDT(req->cmd.flags) != NVME_PSDT_PRP) {
+ return NVME_INVALID_FIELD | NVME_DNR;
+ }
+
switch (req->cmd.opcode) {
case NVME_ADM_CMD_DELETE_SQ:
return nvme_del_sq(n, req);
diff --git a/hw/block/trace-events b/hw/block/trace-events
index b6e972d733a6..fe4a6c24306d 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -37,7 +37,7 @@ pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2
pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
pci_nvme_map_addr_cmb(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
pci_nvme_map_prp(uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t prp2, int num_prps) "trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 0x%"PRIx64" num_prps %d"
-pci_nvme_map_sgl(uint16_t cid, uint8_t typ, uint64_t len) "cid %"PRIu16" type 0x%"PRIx8" len %"PRIu64""
+pci_nvme_map_sgl(uint8_t typ, uint64_t len) "type 0x%"PRIx8" len %"PRIu64""
pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'"
pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opname) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'"
pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
@@ -121,7 +121,7 @@ pci_nvme_err_aio(uint16_t cid, const char *errname, uint16_t status) "cid %"PRIu
pci_nvme_err_copy_invalid_format(uint8_t format) "format 0x%"PRIx8""
pci_nvme_err_invalid_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
pci_nvme_err_invalid_num_sgld(uint16_t cid, uint8_t typ) "cid %"PRIu16" type 0x%"PRIx8""
-pci_nvme_err_invalid_sgl_excess_length(uint16_t cid) "cid %"PRIu16""
+pci_nvme_err_invalid_sgl_excess_length(uint32_t residual) "residual %"PRIu32""
pci_nvme_err_invalid_dma(void) "PRP/SGL is too small for transfer size"
pci_nvme_err_invalid_prplist_ent(uint64_t prplist) "PRP list entry is not page aligned: 0x%"PRIx64""
pci_nvme_err_invalid_prp2_align(uint64_t prp2) "PRP2 is not page aligned: 0x%"PRIx64""
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH RFC v2 6/8] hw/block/nvme: refactor nvme_dma
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
` (4 preceding siblings ...)
2021-02-07 21:49 ` [PATCH RFC v2 5/8] hw/block/nvme: remove the req dependency in map functions Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
2021-02-07 21:49 ` [PATCH RFC v2 7/8] hw/block/nvme: add metadata support Klaus Jensen
2021-02-07 21:49 ` [PATCH RFC v2 8/8] hw/block/nvme: end-to-end data protection Klaus Jensen
7 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
The nvme_dma function doesn't just do DMA (QEMUSGList-based) memory transfers;
it also handles QEMUIOVector copies.
Introduce the NvmeTxDirection enum and rename to nvme_tx. Remove mapping
of PRPs/SGLs from nvme_tx and instead assert that they have been mapped
previously. This allows more fine-grained use in subsequent patches.
Add new (better named) helpers, nvme_{c2h,h2c}, that does both PRP/SGL
mapping and transfer.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/block/nvme.c | 143 ++++++++++++++++++++++++++----------------------
1 file changed, 77 insertions(+), 66 deletions(-)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 24156699b035..2752e5d8572a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -843,48 +843,72 @@ static uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
}
}
-static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
- DMADirection dir, NvmeRequest *req)
+typedef enum NvmeTxDirection {
+ NVME_TX_DIRECTION_TO_DEVICE = 0,
+ NVME_TX_DIRECTION_FROM_DEVICE = 1,
+} NvmeTxDirection;
+
+static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, uint8_t *ptr, uint32_t len,
+ NvmeTxDirection dir)
{
- uint16_t status = NVME_SUCCESS;
+ /* assert that exactly one of qsg and iov carries data */
+ assert((sg->qsg.nsg > 0) != (sg->iov.niov > 0));
+
+ if (sg->qsg.nsg > 0) {
+ uint64_t residual;
+
+ if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+ residual = dma_buf_write(ptr, len, &sg->qsg);
+ } else {
+ residual = dma_buf_read(ptr, len, &sg->qsg);
+ }
+
+ if (unlikely(residual)) {
+ trace_pci_nvme_err_invalid_dma();
+ return NVME_INVALID_FIELD | NVME_DNR;
+ }
+ } else {
+ size_t bytes;
+
+ if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+ bytes = qemu_iovec_to_buf(&sg->iov, 0, ptr, len);
+ } else {
+ bytes = qemu_iovec_from_buf(&sg->iov, 0, ptr, len);
+ }
+
+ if (unlikely(bytes != len)) {
+ trace_pci_nvme_err_invalid_dma();
+ return NVME_INVALID_FIELD | NVME_DNR;
+ }
+ }
+
+ return NVME_SUCCESS;
+}
+
+static inline uint16_t nvme_c2h(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+ NvmeRequest *req)
+{
+ uint16_t status;
status = nvme_map_dptr(n, &req->sg, len, &req->cmd);
if (status) {
return status;
}
- /* assert that only one of qsg and iov carries data */
- assert((req->sg.qsg.nsg > 0) != (req->sg.iov.niov > 0));
+ return nvme_tx(n, &req->sg, ptr, len, NVME_TX_DIRECTION_FROM_DEVICE);
+}
- if (req->sg.qsg.nsg > 0) {
- uint64_t residual;
+static inline uint16_t nvme_h2c(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+ NvmeRequest *req)
+{
+ uint16_t status;
- if (dir == DMA_DIRECTION_TO_DEVICE) {
- residual = dma_buf_write(ptr, len, &req->sg.qsg);
- } else {
- residual = dma_buf_read(ptr, len, &req->sg.qsg);
- }
-
- if (unlikely(residual)) {
- trace_pci_nvme_err_invalid_dma();
- status = NVME_INVALID_FIELD | NVME_DNR;
- }
- } else {
- size_t bytes;
-
- if (dir == DMA_DIRECTION_TO_DEVICE) {
- bytes = qemu_iovec_to_buf(&req->sg.iov, 0, ptr, len);
- } else {
- bytes = qemu_iovec_from_buf(&req->sg.iov, 0, ptr, len);
- }
-
- if (unlikely(bytes != len)) {
- trace_pci_nvme_err_invalid_dma();
- status = NVME_INVALID_FIELD | NVME_DNR;
- }
+ status = nvme_map_dptr(n, &req->sg, len, &req->cmd);
+ if (status) {
+ return status;
}
- return status;
+ return nvme_tx(n, &req->sg, ptr, len, NVME_TX_DIRECTION_TO_DEVICE);
}
static inline void nvme_blk_read(BlockBackend *blk, int64_t offset,
@@ -1683,8 +1707,7 @@ static void nvme_compare_cb(void *opaque, int ret)
buf = g_malloc(ctx->iov.size);
- status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
- DMA_DIRECTION_TO_DEVICE, req);
+ status = nvme_h2c(nvme_ctrl(req), buf, ctx->iov.size, req);
if (status) {
req->status = status;
goto out;
@@ -1720,8 +1743,7 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
NvmeDsmRange range[nr];
uintptr_t *discards = (uintptr_t *)&req->opaque;
- status = nvme_dma(n, (uint8_t *)range, sizeof(range),
- DMA_DIRECTION_TO_DEVICE, req);
+ status = nvme_h2c(n, (uint8_t *)range, sizeof(range), req);
if (status) {
return status;
}
@@ -1803,8 +1825,8 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
range = g_new(NvmeCopySourceRange, nr);
- status = nvme_dma(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
- DMA_DIRECTION_TO_DEVICE, req);
+ status = nvme_h2c(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
+ req);
if (status) {
return status;
}
@@ -2413,8 +2435,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
return NVME_INVALID_FIELD | NVME_DNR;
}
zd_ext = nvme_get_zd_extension(ns, zone_idx);
- status = nvme_dma(n, zd_ext, ns->params.zd_extension_size,
- DMA_DIRECTION_TO_DEVICE, req);
+ status = nvme_h2c(n, zd_ext, ns->params.zd_extension_size, req);
if (status) {
trace_pci_nvme_err_zd_extension_map_error(zone_idx);
return status;
@@ -2569,8 +2590,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
}
}
- status = nvme_dma(n, (uint8_t *)buf, data_size,
- DMA_DIRECTION_FROM_DEVICE, req);
+ status = nvme_c2h(n, (uint8_t *)buf, data_size, req);
g_free(buf);
@@ -2814,8 +2834,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
nvme_clear_events(n, NVME_AER_TYPE_SMART);
}
- return nvme_dma(n, (uint8_t *) &smart + off, trans_len,
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *) &smart + off, trans_len, req);
}
static uint16_t nvme_fw_log_info(NvmeCtrl *n, uint32_t buf_len, uint64_t off,
@@ -2833,8 +2852,7 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, uint32_t buf_len, uint64_t off,
strpadcpy((char *)&fw_log.frs1, sizeof(fw_log.frs1), "1.0", ' ');
trans_len = MIN(sizeof(fw_log) - off, buf_len);
- return nvme_dma(n, (uint8_t *) &fw_log + off, trans_len,
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *) &fw_log + off, trans_len, req);
}
static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
@@ -2854,8 +2872,7 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
memset(&errlog, 0x0, sizeof(errlog));
trans_len = MIN(sizeof(errlog) - off, buf_len);
- return nvme_dma(n, (uint8_t *)&errlog, trans_len,
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *)&errlog, trans_len, req);
}
static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t csi, uint32_t buf_len,
@@ -2895,8 +2912,7 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t csi, uint32_t buf_len,
trans_len = MIN(sizeof(log) - off, buf_len);
- return nvme_dma(n, ((uint8_t *)&log) + off, trans_len,
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, ((uint8_t *)&log) + off, trans_len, req);
}
static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
@@ -3066,7 +3082,7 @@ static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, NvmeRequest *req)
{
uint8_t id[NVME_IDENTIFY_DATA_SIZE] = {};
- return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, id, sizeof(id), req);
}
static inline bool nvme_csi_has_nvm_support(NvmeNamespace *ns)
@@ -3083,8 +3099,7 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req)
{
trace_pci_nvme_identify_ctrl();
- return nvme_dma(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl),
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *)&n->id_ctrl, sizeof(n->id_ctrl), req);
}
static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
@@ -3100,8 +3115,7 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
if (n->params.zasl_bs) {
id.zasl = n->zasl;
}
- return nvme_dma(n, (uint8_t *)&id, sizeof(id),
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *)&id, sizeof(id), req);
}
return NVME_INVALID_FIELD | NVME_DNR;
@@ -3125,8 +3139,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
}
if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
- return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs),
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), req);
}
return NVME_INVALID_CMD_SET | NVME_DNR;
@@ -3152,8 +3165,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
return nvme_rpt_empty_id_struct(n, req);
} else if (c->csi == NVME_CSI_ZONED && ns->csi == NVME_CSI_ZONED) {
- return nvme_dma(n, (uint8_t *)ns->id_ns_zoned, sizeof(NvmeIdNsZoned),
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *)ns->id_ns_zoned, sizeof(NvmeIdNsZoned),
+ req);
}
return NVME_INVALID_FIELD | NVME_DNR;
@@ -3195,7 +3208,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
}
}
- return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, list, data_len, req);
}
static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
@@ -3235,7 +3248,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
}
}
- return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, list, data_len, req);
}
static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
@@ -3282,7 +3295,7 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
ns_descrs->csi.hdr.nidl = NVME_NIDL_CSI;
ns_descrs->csi.v = ns->csi;
- return nvme_dma(n, list, sizeof(list), DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, list, sizeof(list), req);
}
static uint16_t nvme_identify_cmd_set(NvmeCtrl *n, NvmeRequest *req)
@@ -3295,7 +3308,7 @@ static uint16_t nvme_identify_cmd_set(NvmeCtrl *n, NvmeRequest *req)
NVME_SET_CSI(*list, NVME_CSI_NVM);
NVME_SET_CSI(*list, NVME_CSI_ZONED);
- return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, list, data_len, req);
}
static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req)
@@ -3384,8 +3397,7 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl *n, NvmeRequest *req)
{
uint64_t timestamp = nvme_get_timestamp(n);
- return nvme_dma(n, (uint8_t *)×tamp, sizeof(timestamp),
- DMA_DIRECTION_FROM_DEVICE, req);
+ return nvme_c2h(n, (uint8_t *)×tamp, sizeof(timestamp), req);
}
static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req)
@@ -3545,8 +3557,7 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeRequest *req)
uint16_t ret;
uint64_t timestamp;
- ret = nvme_dma(n, (uint8_t *)×tamp, sizeof(timestamp),
- DMA_DIRECTION_TO_DEVICE, req);
+ ret = nvme_h2c(n, (uint8_t *)×tamp, sizeof(timestamp), req);
if (ret) {
return ret;
}
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH RFC v2 7/8] hw/block/nvme: add metadata support
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
` (5 preceding siblings ...)
2021-02-07 21:49 ` [PATCH RFC v2 6/8] hw/block/nvme: refactor nvme_dma Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
2021-02-07 21:49 ` [PATCH RFC v2 8/8] hw/block/nvme: end-to-end data protection Klaus Jensen
7 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
Add support for metadata in the form of extended logical blocks as well
as a separate buffer of data. The new `ms` nvme-ns device parameter
specifies the size of metadata per logical block in bytes. The `mset`
nvme-ns device parameter controls whether metadata is transfered as part
of an extended lba (set to '1') or in a separate buffer (set to '0',
the default).
Regardsless of the scheme chosen with `mset`, metadata is stored at the
end of the namespace backing block device. This requires the user
provided PRP/SGLs to be walked and "split" into data and metadata
scatter/gather lists if the extended logical block scheme is used, but
has the advantage of not breaking the deallocated blocks support.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
---
hw/block/nvme-ns.h | 39 ++-
hw/block/nvme-ns.c | 18 +-
hw/block/nvme.c | 652 ++++++++++++++++++++++++++++++++++++------
hw/block/trace-events | 4 +-
4 files changed, 618 insertions(+), 95 deletions(-)
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 7af6884862b5..2281fd39930a 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,9 @@ typedef struct NvmeNamespaceParams {
uint32_t nsid;
QemuUUID uuid;
+ uint16_t ms;
+ uint8_t mset;
+
uint16_t mssrl;
uint32_t mcl;
uint8_t msrc;
@@ -47,6 +50,7 @@ typedef struct NvmeNamespace {
BlockConf blkconf;
int32_t bootindex;
int64_t size;
+ int64_t mdata_offset;
NvmeIdNs id_ns;
const uint32_t *iocs;
uint8_t csi;
@@ -99,18 +103,41 @@ static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
return nvme_ns_lbaf(ns)->ds;
}
-/* calculate the number of LBAs that the namespace can accomodate */
-static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
-{
- return ns->size >> nvme_ns_lbads(ns);
-}
-
/* convert an LBA to the equivalent in bytes */
static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
{
return lba << nvme_ns_lbads(ns);
}
+static inline size_t nvme_lsize(NvmeNamespace *ns)
+{
+ return 1 << nvme_ns_lbads(ns);
+}
+
+static inline uint16_t nvme_msize(NvmeNamespace *ns)
+{
+ return nvme_ns_lbaf(ns)->ms;
+}
+
+static inline size_t nvme_m2b(NvmeNamespace *ns, uint64_t lba)
+{
+ return nvme_msize(ns) * lba;
+}
+
+static inline bool nvme_ns_ext(NvmeNamespace *ns)
+{
+ return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas);
+}
+
+/* calculate the number of LBAs that the namespace can accomodate */
+static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
+{
+ if (ns->params.ms) {
+ return ns->size / (nvme_lsize(ns) + nvme_msize(ns));
+ }
+ return ns->size >> nvme_ns_lbads(ns);
+}
+
typedef struct NvmeCtrl NvmeCtrl;
static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone)
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c3b513b0fc78..7a662c170428 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -37,13 +37,25 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
BlockDriverInfo bdi;
NvmeIdNs *id_ns = &ns->id_ns;
int lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
- int npdg;
+ int npdg, nlbas;
ns->id_ns.dlfeat = 0x9;
id_ns->lbaf[lba_index].ds = 31 - clz32(ns->blkconf.logical_block_size);
+ id_ns->lbaf[lba_index].ms = ns->params.ms;
- id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
+ if (ns->params.ms) {
+ id_ns->mc = 0x3;
+
+ if (ns->params.mset) {
+ id_ns->flbas |= 0x10;
+ }
+ }
+
+ nlbas = nvme_ns_nlbas(ns);
+
+ id_ns->nsze = cpu_to_le64(nlbas);
+ ns->mdata_offset = nvme_l2b(ns, nlbas);
ns->csi = NVME_CSI_NVM;
@@ -395,6 +407,8 @@ static Property nvme_ns_props[] = {
NvmeSubsystem *),
DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
+ DEFINE_PROP_UINT16("ms", NvmeNamespace, params.ms, 0),
+ DEFINE_PROP_UINT8("mset", NvmeNamespace, params.mset, 0),
DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128),
DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 2752e5d8572a..8aa892ec3106 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -339,6 +339,26 @@ static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
return pci_dma_read(&n->parent_obj, addr, buf, size);
}
+static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+{
+ hwaddr hi = addr + size - 1;
+ if (hi < addr) {
+ return 1;
+ }
+
+ if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
+ memcpy(nvme_addr_to_cmb(n, addr), buf, size);
+ return 0;
+ }
+
+ if (nvme_addr_is_pmr(n, addr) && nvme_addr_is_pmr(n, hi)) {
+ memcpy(nvme_addr_to_pmr(n, addr), buf, size);
+ return 0;
+ }
+
+ return pci_dma_write(&n->parent_obj, addr, buf, size);
+}
+
static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
{
return nsid && (nsid == NVME_NSID_BROADCAST || nsid <= n->num_namespaces);
@@ -445,6 +465,64 @@ static inline void nvme_sg_unmap(NvmeSg *sg)
}
}
+/*
+ * When metadata is transfered as extended LBAs, the DPTR mapped into `sg`
+ * holds both data and metadata. This function splits the data and metadata
+ * into two separate QSG/IOVs.
+ */
+static void nvme_sg_split(NvmeSg *sg, NvmeNamespace *ns, NvmeSg *data,
+ NvmeSg *mdata)
+{
+ NvmeSg *dst = data;
+ size_t size = nvme_lsize(ns);
+ size_t msize = nvme_msize(ns);
+ uint32_t trans_len, count = size;
+ uint64_t offset = 0;
+ bool dma = sg->qsg.nsg > 0;
+ size_t sge_len;
+ size_t sg_len = dma ? sg->qsg.size : sg->iov.size;
+ int sg_idx = 0;
+ hwaddr addr;
+
+ /* assert that exactly one of qsg and iov carries data */
+ assert((sg->qsg.nsg > 0) != (sg->iov.niov > 0));
+
+ while (sg_len) {
+ sge_len = dma ? sg->qsg.sg[sg_idx].len : sg->iov.iov[sg_idx].iov_len;
+
+ trans_len = MIN(sg_len, count);
+ trans_len = MIN(trans_len, sge_len - offset);
+
+ if (dma) {
+ addr = sg->qsg.sg[sg_idx].base + offset;
+ } else {
+ addr = (hwaddr)sg->iov.iov[sg_idx].iov_base + offset;
+ }
+
+ if (dst) {
+ if (dma) {
+ qemu_sglist_add(&dst->qsg, addr, trans_len);
+ } else {
+ qemu_iovec_add(&dst->iov, (void *)addr, trans_len);
+ }
+ }
+
+ sg_len -= trans_len;
+ count -= trans_len;
+ offset += trans_len;
+
+ if (count == 0) {
+ dst = (dst == data) ? mdata : data;
+ count = (dst == data) ? size : msize;
+ }
+
+ if (sge_len == offset) {
+ offset = 0;
+ sg_idx++;
+ }
+ }
+}
+
static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
size_t len)
{
@@ -843,11 +921,157 @@ static uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
}
}
+static uint16_t nvme_map_mptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
+ NvmeCmd *cmd)
+{
+ int psdt = NVME_CMD_FLAGS_PSDT(cmd->flags);
+ hwaddr mptr = le64_to_cpu(cmd->mptr);
+ uint16_t status;
+
+ if (psdt == NVME_PSDT_SGL_MPTR_SGL) {
+ NvmeSglDescriptor sgl;
+
+ if (nvme_addr_read(n, mptr, &sgl, sizeof(sgl))) {
+ return NVME_DATA_TRAS_ERROR;
+ }
+
+ status = nvme_map_sgl(n, sg, sgl, len, cmd);
+ if (status && (status & 0x7ff) == NVME_DATA_SGL_LEN_INVALID) {
+ status = NVME_MD_SGL_LEN_INVALID | NVME_DNR;
+ }
+
+ return status;
+ }
+
+ nvme_sg_init(n, sg);
+ status = nvme_map_addr(n, sg, mptr, len);
+ if (status) {
+ nvme_sg_unmap(sg);
+ }
+
+ return status;
+}
+
+static uint16_t nvme_map_data(NvmeCtrl *n, uint32_t nlb, NvmeRequest *req)
+{
+ NvmeNamespace *ns = req->ns;
+ size_t len = nvme_l2b(ns, nlb);
+ uint16_t status;
+
+ if (nvme_ns_ext(ns)) {
+ NvmeSg sg;
+
+ len += nvme_m2b(ns, nlb);
+
+ status = nvme_map_dptr(n, &sg, len, &req->cmd);
+ if (status) {
+ return status;
+ }
+
+ nvme_sg_init(n, &req->sg);
+ nvme_sg_split(&sg, ns, &req->sg, NULL);
+ nvme_sg_unmap(&sg);
+
+ return NVME_SUCCESS;
+ }
+
+ return nvme_map_dptr(n, &req->sg, len, &req->cmd);
+}
+
+static uint16_t nvme_map_mdata(NvmeCtrl *n, uint32_t nlb, NvmeRequest *req)
+{
+ NvmeNamespace *ns = req->ns;
+ size_t len = nvme_m2b(ns, nlb);
+ uint16_t status;
+
+ if (nvme_ns_ext(ns)) {
+ NvmeSg sg;
+
+ len += nvme_l2b(ns, nlb);
+
+ status = nvme_map_dptr(n, &sg, len, &req->cmd);
+ if (status) {
+ return status;
+ }
+
+ nvme_sg_init(n, &req->sg);
+ nvme_sg_split(&sg, ns, NULL, &req->sg);
+ nvme_sg_unmap(&sg);
+
+ return NVME_SUCCESS;
+ }
+
+ return nvme_map_mptr(n, &req->sg, len, &req->cmd);
+}
+
typedef enum NvmeTxDirection {
NVME_TX_DIRECTION_TO_DEVICE = 0,
NVME_TX_DIRECTION_FROM_DEVICE = 1,
} NvmeTxDirection;
+static uint16_t nvme_tx_interleaved(NvmeCtrl *n, NvmeSg *sg, uint8_t *ptr,
+ uint32_t len, uint32_t bytes,
+ int32_t skip_bytes, int64_t offset,
+ NvmeTxDirection dir)
+{
+ hwaddr addr;
+ uint32_t trans_len, count = bytes;
+ bool dma = sg->qsg.nsg > 0;
+ int64_t sge_len;
+ int sg_idx = 0;
+ int ret;
+
+ /* assert that exactly one of qsg and iov carries data */
+ assert((sg->qsg.nsg > 0) != (sg->iov.niov > 0));
+
+ while (len) {
+ sge_len = dma ? sg->qsg.sg[sg_idx].len : sg->iov.iov[sg_idx].iov_len;
+
+ if (sge_len - offset < 0) {
+ offset -= sge_len;
+ sg_idx++;
+ continue;
+ }
+
+ if (sge_len == offset) {
+ offset = 0;
+ sg_idx++;
+ continue;
+ }
+
+ trans_len = MIN(len, count);
+ trans_len = MIN(trans_len, sge_len - offset);
+
+ if (dma) {
+ addr = sg->qsg.sg[sg_idx].base + offset;
+ } else {
+ addr = (hwaddr)sg->iov.iov[sg_idx].iov_base + offset;
+ }
+
+ if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+ ret = nvme_addr_read(n, addr, ptr, trans_len);
+ } else {
+ ret = nvme_addr_write(n, addr, ptr, trans_len);
+ }
+
+ if (ret) {
+ return NVME_DATA_TRAS_ERROR;
+ }
+
+ ptr += trans_len;
+ len -= trans_len;
+ count -= trans_len;
+ offset += trans_len;
+
+ if (count == 0) {
+ count = bytes;
+ offset += skip_bytes;
+ }
+ }
+
+ return NVME_SUCCESS;
+}
+
static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, uint8_t *ptr, uint32_t len,
NvmeTxDirection dir)
{
@@ -911,6 +1135,46 @@ static inline uint16_t nvme_h2c(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
return nvme_tx(n, &req->sg, ptr, len, NVME_TX_DIRECTION_TO_DEVICE);
}
+static uint16_t nvme_bounce_data(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+ NvmeTxDirection dir, NvmeRequest *req)
+{
+ NvmeNamespace *ns = req->ns;
+
+ if (nvme_ns_ext(ns)) {
+ size_t lsize = nvme_lsize(ns);
+ size_t msize = nvme_msize(ns);
+
+ return nvme_tx_interleaved(n, &req->sg, ptr, len, lsize, msize, 0,
+ dir);
+ }
+
+ return nvme_tx(n, &req->sg, ptr, len, dir);
+}
+
+static uint16_t nvme_bounce_mdata(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+ NvmeTxDirection dir, NvmeRequest *req)
+{
+ NvmeNamespace *ns = req->ns;
+ uint16_t status;
+
+ if (nvme_ns_ext(ns)) {
+ size_t lsize = nvme_lsize(ns);
+ size_t msize = nvme_msize(ns);
+
+ return nvme_tx_interleaved(n, &req->sg, ptr, len, msize, lsize, lsize,
+ dir);
+ }
+
+ nvme_sg_unmap(&req->sg);
+
+ status = nvme_map_mptr(n, &req->sg, len, &req->cmd);
+ if (status) {
+ return status;
+ }
+
+ return nvme_tx(n, &req->sg, ptr, len, dir);
+}
+
static inline void nvme_blk_read(BlockBackend *blk, int64_t offset,
BlockCompletionFunc *cb, NvmeRequest *req)
{
@@ -1460,29 +1724,78 @@ static inline bool nvme_is_write(NvmeRequest *req)
rw->opcode == NVME_CMD_WRITE_ZEROES;
}
+static void nvme_rw_complete_cb(void *opaque, int ret)
+{
+ NvmeRequest *req = opaque;
+ NvmeNamespace *ns = req->ns;
+ BlockBackend *blk = ns->blkconf.blk;
+ BlockAcctCookie *acct = &req->acct;
+ BlockAcctStats *stats = blk_get_stats(blk);
+
+ trace_pci_nvme_rw_complete_cb(nvme_cid(req), blk_name(blk));
+
+ if (ret) {
+ block_acct_failed(stats, acct);
+ nvme_aio_err(req, ret);
+ } else {
+ block_acct_done(stats, acct);
+ }
+
+ if (ns->params.zoned && nvme_is_write(req)) {
+ nvme_finalize_zoned_write(ns, req);
+ }
+
+ nvme_enqueue_req_completion(nvme_cq(req), req);
+}
+
static void nvme_rw_cb(void *opaque, int ret)
{
NvmeRequest *req = opaque;
NvmeNamespace *ns = req->ns;
BlockBackend *blk = ns->blkconf.blk;
- BlockAcctCookie *acct = &req->acct;
- BlockAcctStats *stats = blk_get_stats(blk);
trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
- if (ns->params.zoned && nvme_is_write(req)) {
- nvme_finalize_zoned_write(ns, req);
+ if (ret) {
+ goto out;
}
- if (!ret) {
- block_acct_done(stats, acct);
- } else {
- block_acct_failed(stats, acct);
- nvme_aio_err(req, ret);
+ if (nvme_msize(ns)) {
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint64_t slba = le64_to_cpu(rw->slba);
+ uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+ uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+
+ if (req->cmd.opcode == NVME_CMD_WRITE_ZEROES) {
+ size_t mlen = nvme_m2b(ns, nlb);
+
+ req->aiocb = blk_aio_pwrite_zeroes(blk, offset, mlen,
+ BDRV_REQ_MAY_UNMAP,
+ nvme_rw_complete_cb, req);
+ return;
+ }
+
+ if (nvme_ns_ext(ns) || req->cmd.mptr) {
+ uint16_t status;
+
+ nvme_sg_unmap(&req->sg);
+ status = nvme_map_mdata(nvme_ctrl(req), nlb, req);
+ if (status) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ if (req->cmd.opcode == NVME_CMD_READ) {
+ return nvme_blk_read(blk, offset, nvme_rw_complete_cb, req);
+ }
+
+ return nvme_blk_write(blk, offset, nvme_rw_complete_cb, req);
+ }
}
- nvme_enqueue_req_completion(nvme_cq(req), req);
+out:
+ nvme_rw_complete_cb(req, ret);
}
static void nvme_aio_discard_cb(void *opaque, int ret)
@@ -1510,7 +1823,7 @@ struct nvme_zone_reset_ctx {
NvmeZone *zone;
};
-static void nvme_aio_zone_reset_cb(void *opaque, int ret)
+static void nvme_aio_zone_reset_complete_cb(void *opaque, int ret)
{
struct nvme_zone_reset_ctx *ctx = opaque;
NvmeRequest *req = ctx->req;
@@ -1518,31 +1831,31 @@ static void nvme_aio_zone_reset_cb(void *opaque, int ret)
NvmeZone *zone = ctx->zone;
uintptr_t *resets = (uintptr_t *)&req->opaque;
- g_free(ctx);
-
- trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), zone->d.zslba);
-
- if (!ret) {
- switch (nvme_get_zone_state(zone)) {
- case NVME_ZONE_STATE_EXPLICITLY_OPEN:
- case NVME_ZONE_STATE_IMPLICITLY_OPEN:
- nvme_aor_dec_open(ns);
- /* fall through */
- case NVME_ZONE_STATE_CLOSED:
- nvme_aor_dec_active(ns);
- /* fall through */
- case NVME_ZONE_STATE_FULL:
- zone->w_ptr = zone->d.zslba;
- zone->d.wp = zone->w_ptr;
- nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY);
- /* fall through */
- default:
- break;
- }
- } else {
+ if (ret) {
nvme_aio_err(req, ret);
+ goto out;
}
+ switch (nvme_get_zone_state(zone)) {
+ case NVME_ZONE_STATE_EXPLICITLY_OPEN:
+ case NVME_ZONE_STATE_IMPLICITLY_OPEN:
+ nvme_aor_dec_open(ns);
+ /* fall through */
+ case NVME_ZONE_STATE_CLOSED:
+ nvme_aor_dec_active(ns);
+ /* fall through */
+ case NVME_ZONE_STATE_FULL:
+ zone->w_ptr = zone->d.zslba;
+ zone->d.wp = zone->w_ptr;
+ nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY);
+ /* fall through */
+ default:
+ break;
+ }
+
+out:
+ g_free(ctx);
+
(*resets)--;
if (*resets) {
@@ -1552,9 +1865,36 @@ static void nvme_aio_zone_reset_cb(void *opaque, int ret)
nvme_enqueue_req_completion(nvme_cq(req), req);
}
+static void nvme_aio_zone_reset_cb(void *opaque, int ret)
+{
+ struct nvme_zone_reset_ctx *ctx = opaque;
+ NvmeRequest *req = ctx->req;
+ NvmeNamespace *ns = req->ns;
+ NvmeZone *zone = ctx->zone;
+
+ trace_pci_nvme_aio_zone_reset_cb(nvme_cid(req), zone->d.zslba);
+
+ if (ret) {
+ goto out;
+ }
+
+ if (nvme_msize(ns)) {
+ int64_t offset = ns->mdata_offset + nvme_m2b(ns, zone->d.zslba);
+
+ blk_aio_pwrite_zeroes(ns->blkconf.blk, offset,
+ nvme_m2b(ns, ns->zone_size), BDRV_REQ_MAY_UNMAP,
+ nvme_aio_zone_reset_complete_cb, ctx);
+ return;
+ }
+
+out:
+ nvme_aio_zone_reset_complete_cb(opaque, ret);
+}
+
struct nvme_copy_ctx {
int copies;
uint8_t *bounce;
+ uint8_t *mbounce;
uint32_t nlb;
};
@@ -1563,6 +1903,36 @@ struct nvme_copy_in_ctx {
QEMUIOVector iov;
};
+static void nvme_copy_complete_cb(void *opaque, int ret)
+{
+ NvmeRequest *req = opaque;
+ NvmeNamespace *ns = req->ns;
+ struct nvme_copy_ctx *ctx = req->opaque;
+
+ if (ret) {
+ block_acct_failed(blk_get_stats(ns->blkconf.blk), &req->acct);
+ nvme_aio_err(req, ret);
+ goto out;
+ }
+
+ block_acct_done(blk_get_stats(ns->blkconf.blk), &req->acct);
+
+out:
+ if (ns->params.zoned) {
+ NvmeCopyCmd *copy = (NvmeCopyCmd *)&req->cmd;
+ uint64_t sdlba = le64_to_cpu(copy->sdlba);
+ NvmeZone *zone = nvme_get_zone_by_slba(ns, sdlba);
+
+ __nvme_advance_zone_wp(ns, zone, ctx->nlb);
+ }
+
+ g_free(ctx->bounce);
+ g_free(ctx->mbounce);
+ g_free(ctx);
+
+ nvme_enqueue_req_completion(nvme_cq(req), req);
+}
+
static void nvme_copy_cb(void *opaque, int ret)
{
NvmeRequest *req = opaque;
@@ -1571,25 +1941,25 @@ static void nvme_copy_cb(void *opaque, int ret)
trace_pci_nvme_copy_cb(nvme_cid(req));
- if (ns->params.zoned) {
+ if (ret) {
+ goto out;
+ }
+
+ if (nvme_msize(ns)) {
NvmeCopyCmd *copy = (NvmeCopyCmd *)&req->cmd;
uint64_t sdlba = le64_to_cpu(copy->sdlba);
- NvmeZone *zone = nvme_get_zone_by_slba(ns, sdlba);
+ int64_t offset = ns->mdata_offset + nvme_m2b(ns, sdlba);
- __nvme_advance_zone_wp(ns, zone, ctx->nlb);
+ qemu_iovec_reset(&req->sg.iov);
+ qemu_iovec_add(&req->sg.iov, ctx->mbounce, nvme_m2b(ns, ctx->nlb));
+
+ req->aiocb = blk_aio_pwritev(ns->blkconf.blk, offset, &req->sg.iov, 0,
+ nvme_copy_complete_cb, req);
+ return;
}
- if (!ret) {
- block_acct_done(blk_get_stats(ns->blkconf.blk), &req->acct);
- } else {
- block_acct_failed(blk_get_stats(ns->blkconf.blk), &req->acct);
- nvme_aio_err(req, ret);
- }
-
- g_free(ctx->bounce);
- g_free(ctx);
-
- nvme_enqueue_req_completion(nvme_cq(req), req);
+out:
+ nvme_copy_complete_cb(opaque, ret);
}
static void nvme_copy_in_complete(NvmeRequest *req)
@@ -1672,6 +2042,7 @@ static void nvme_aio_copy_in_cb(void *opaque, int ret)
block_acct_failed(blk_get_stats(ns->blkconf.blk), &req->acct);
g_free(ctx->bounce);
+ g_free(ctx->mbounce);
g_free(ctx);
nvme_enqueue_req_completion(nvme_cq(req), req);
@@ -1683,43 +2054,110 @@ static void nvme_aio_copy_in_cb(void *opaque, int ret)
}
struct nvme_compare_ctx {
- QEMUIOVector iov;
- uint8_t *bounce;
+ struct {
+ QEMUIOVector iov;
+ uint8_t *bounce;
+ } data;
+
+ struct {
+ QEMUIOVector iov;
+ uint8_t *bounce;
+ } mdata;
};
-static void nvme_compare_cb(void *opaque, int ret)
+static void nvme_compare_mdata_cb(void *opaque, int ret)
{
NvmeRequest *req = opaque;
- NvmeNamespace *ns = req->ns;
+ NvmeCtrl *n = nvme_ctrl(req);
struct nvme_compare_ctx *ctx = req->opaque;
g_autofree uint8_t *buf = NULL;
- uint16_t status;
+ uint16_t status = NVME_SUCCESS;
- trace_pci_nvme_compare_cb(nvme_cid(req));
+ trace_pci_nvme_compare_mdata_cb(nvme_cid(req));
- if (!ret) {
- block_acct_done(blk_get_stats(ns->blkconf.blk), &req->acct);
- } else {
- block_acct_failed(blk_get_stats(ns->blkconf.blk), &req->acct);
- nvme_aio_err(req, ret);
- goto out;
- }
+ buf = g_malloc(ctx->mdata.iov.size);
- buf = g_malloc(ctx->iov.size);
-
- status = nvme_h2c(nvme_ctrl(req), buf, ctx->iov.size, req);
+ status = nvme_bounce_mdata(n, buf, ctx->mdata.iov.size,
+ NVME_TX_DIRECTION_TO_DEVICE, req);
if (status) {
req->status = status;
goto out;
}
- if (memcmp(buf, ctx->bounce, ctx->iov.size)) {
+ if (memcmp(buf, ctx->mdata.bounce, ctx->mdata.iov.size)) {
req->status = NVME_CMP_FAILURE;
+ goto out;
}
out:
- qemu_iovec_destroy(&ctx->iov);
- g_free(ctx->bounce);
+ qemu_iovec_destroy(&ctx->data.iov);
+ g_free(ctx->data.bounce);
+
+ qemu_iovec_destroy(&ctx->mdata.iov);
+ g_free(ctx->mdata.bounce);
+
+ g_free(ctx);
+
+ nvme_enqueue_req_completion(nvme_cq(req), req);
+}
+
+static void nvme_compare_data_cb(void *opaque, int ret)
+{
+ NvmeRequest *req = opaque;
+ NvmeCtrl *n = nvme_ctrl(req);
+ NvmeNamespace *ns = req->ns;
+ BlockBackend *blk = ns->blkconf.blk;
+ BlockAcctCookie *acct = &req->acct;
+ BlockAcctStats *stats = blk_get_stats(blk);
+
+ struct nvme_compare_ctx *ctx = req->opaque;
+ g_autofree uint8_t *buf = NULL;
+ uint16_t status;
+
+ trace_pci_nvme_compare_data_cb(nvme_cid(req));
+
+ if (ret) {
+ block_acct_failed(stats, acct);
+ nvme_aio_err(req, ret);
+ goto out;
+ }
+
+ buf = g_malloc(ctx->data.iov.size);
+
+ status = nvme_bounce_data(n, buf, ctx->data.iov.size,
+ NVME_TX_DIRECTION_TO_DEVICE, req);
+ if (status) {
+ req->status = status;
+ goto out;
+ }
+
+ if (memcmp(buf, ctx->data.bounce, ctx->data.iov.size)) {
+ req->status = NVME_CMP_FAILURE;
+ goto out;
+ }
+
+ if (nvme_msize(ns)) {
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint64_t slba = le64_to_cpu(rw->slba);
+ uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
+ size_t mlen = nvme_m2b(ns, nlb);
+ uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+
+ ctx->mdata.bounce = g_malloc(mlen);
+
+ qemu_iovec_init(&ctx->mdata.iov, 1);
+ qemu_iovec_add(&ctx->mdata.iov, ctx->mdata.bounce, mlen);
+
+ req->aiocb = blk_aio_preadv(blk, offset, &ctx->mdata.iov, 0,
+ nvme_compare_mdata_cb, req);
+ return;
+ }
+
+ block_acct_done(stats, acct);
+
+out:
+ qemu_iovec_destroy(&ctx->data.iov);
+ g_free(ctx->data.bounce);
g_free(ctx);
nvme_enqueue_req_completion(nvme_cq(req), req);
@@ -1808,6 +2246,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
uint32_t nlb = 0;
uint8_t *bounce = NULL, *bouncep = NULL;
+ uint8_t *mbounce = NULL, *mbouncep = NULL;
struct nvme_copy_ctx *ctx;
uint16_t status;
int i;
@@ -1867,6 +2306,9 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
}
bounce = bouncep = g_malloc(nvme_l2b(ns, nlb));
+ if (nvme_msize(ns)) {
+ mbounce = mbouncep = g_malloc(nvme_m2b(ns, nlb));
+ }
block_acct_start(blk_get_stats(ns->blkconf.blk), &req->acct, 0,
BLOCK_ACCT_READ);
@@ -1874,6 +2316,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
ctx = g_new(struct nvme_copy_ctx, 1);
ctx->bounce = bounce;
+ ctx->mbounce = mbounce;
ctx->nlb = nlb;
ctx->copies = 1;
@@ -1900,6 +2343,24 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
nvme_aio_copy_in_cb, in_ctx);
bouncep += len;
+
+ if (nvme_msize(ns)) {
+ len = nvme_m2b(ns, nlb);
+ offset = ns->mdata_offset + nvme_m2b(ns, slba);
+
+ in_ctx = g_new(struct nvme_copy_in_ctx, 1);
+ in_ctx->req = req;
+
+ qemu_iovec_init(&in_ctx->iov, 1);
+ qemu_iovec_add(&in_ctx->iov, mbouncep, len);
+
+ ctx->copies++;
+
+ blk_aio_preadv(ns->blkconf.blk, offset, &in_ctx->iov, 0,
+ nvme_aio_copy_in_cb, in_ctx);
+
+ mbouncep += len;
+ }
}
/* account for the 1-initialization */
@@ -1919,14 +2380,18 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req)
BlockBackend *blk = ns->blkconf.blk;
uint64_t slba = le64_to_cpu(rw->slba);
uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
- size_t len = nvme_l2b(ns, nlb);
+ size_t data_len = nvme_l2b(ns, nlb);
+ size_t len = data_len;
int64_t offset = nvme_l2b(ns, slba);
- uint8_t *bounce = NULL;
struct nvme_compare_ctx *ctx = NULL;
uint16_t status;
trace_pci_nvme_compare(nvme_cid(req), nvme_nsid(ns), slba, nlb);
+ if (nvme_ns_ext(ns)) {
+ len += nvme_m2b(ns, nlb);
+ }
+
status = nvme_check_mdts(n, len);
if (status) {
trace_pci_nvme_err_mdts(nvme_cid(req), len);
@@ -1946,18 +2411,22 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req)
}
}
- bounce = g_malloc(len);
+ status = nvme_map_dptr(n, &req->sg, len, &req->cmd);
+ if (status) {
+ return status;
+ }
ctx = g_new(struct nvme_compare_ctx, 1);
- ctx->bounce = bounce;
+ ctx->data.bounce = g_malloc(data_len);
req->opaque = ctx;
- qemu_iovec_init(&ctx->iov, 1);
- qemu_iovec_add(&ctx->iov, bounce, len);
+ qemu_iovec_init(&ctx->data.iov, 1);
+ qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, data_len);
- block_acct_start(blk_get_stats(blk), &req->acct, len, BLOCK_ACCT_READ);
- blk_aio_preadv(blk, offset, &ctx->iov, 0, nvme_compare_cb, req);
+ block_acct_start(blk_get_stats(blk), &req->acct, data_len,
+ BLOCK_ACCT_READ);
+ blk_aio_preadv(blk, offset, &ctx->data.iov, 0, nvme_compare_data_cb, req);
return NVME_NO_COMPLETE;
}
@@ -1977,15 +2446,20 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
uint64_t slba = le64_to_cpu(rw->slba);
uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
uint64_t data_size = nvme_l2b(ns, nlb);
+ uint64_t mapped_size = data_size;
uint64_t data_offset;
BlockBackend *blk = ns->blkconf.blk;
uint16_t status;
- trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, data_size, slba);
+ if (nvme_ns_ext(ns)) {
+ mapped_size += nvme_m2b(ns, nlb);
+ }
- status = nvme_check_mdts(n, data_size);
+ trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, mapped_size, slba);
+
+ status = nvme_check_mdts(n, mapped_size);
if (status) {
- trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
+ trace_pci_nvme_err_mdts(nvme_cid(req), mapped_size);
goto invalid;
}
@@ -2003,11 +2477,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
}
}
- status = nvme_map_dptr(n, &req->sg, data_size, &req->cmd);
- if (status) {
- goto invalid;
- }
-
if (NVME_ERR_REC_DULBE(ns->features.err_rec)) {
status = nvme_check_dulbe(ns, slba, nlb);
if (status) {
@@ -2015,6 +2484,11 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
}
}
+ status = nvme_map_data(n, nlb, req);
+ if (status) {
+ goto invalid;
+ }
+
data_offset = nvme_l2b(ns, slba);
block_acct_start(blk_get_stats(blk), &req->acct, data_size,
@@ -2035,19 +2509,24 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
uint64_t slba = le64_to_cpu(rw->slba);
uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
uint64_t data_size = nvme_l2b(ns, nlb);
+ uint64_t mapped_size = data_size;
uint64_t data_offset;
NvmeZone *zone;
NvmeZonedResult *res = (NvmeZonedResult *)&req->cqe;
BlockBackend *blk = ns->blkconf.blk;
uint16_t status;
+ if (nvme_ns_ext(ns)) {
+ mapped_size += nvme_m2b(ns, nlb);
+ }
+
trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode),
- nvme_nsid(ns), nlb, data_size, slba);
+ nvme_nsid(ns), nlb, mapped_size, slba);
if (!wrz) {
- status = nvme_check_mdts(n, data_size);
+ status = nvme_check_mdts(n, mapped_size);
if (status) {
- trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
+ trace_pci_nvme_err_mdts(nvme_cid(req), mapped_size);
goto invalid;
}
}
@@ -2094,7 +2573,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
data_offset = nvme_l2b(ns, slba);
if (!wrz) {
- status = nvme_map_dptr(n, &req->sg, data_size, &req->cmd);
+ status = nvme_map_data(n, nlb, req);
if (status) {
goto invalid;
}
@@ -2107,6 +2586,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
req);
}
+
return NVME_NO_COMPLETE;
invalid:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index fe4a6c24306d..1c76c9583fd5 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -47,11 +47,13 @@ pci_nvme_copy(uint16_t cid, uint32_t nsid, uint16_t nr, uint8_t format) "cid %"P
pci_nvme_copy_source_range(uint64_t slba, uint32_t nlb) "slba 0x%"PRIx64" nlb %"PRIu32""
pci_nvme_copy_in_complete(uint16_t cid) "cid %"PRIu16""
pci_nvme_copy_cb(uint16_t cid) "cid %"PRIu16""
+pci_nvme_rw_complete_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'"
pci_nvme_block_status(int64_t offset, int64_t bytes, int64_t pnum, int ret, bool zeroed) "offset %"PRId64" bytes %"PRId64" pnum %"PRId64" ret 0x%x zeroed %d"
pci_nvme_dsm(uint16_t cid, uint32_t nsid, uint32_t nr, uint32_t attr) "cid %"PRIu16" nsid %"PRIu32" nr %"PRIu32" attr 0x%"PRIx32""
pci_nvme_dsm_deallocate(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
pci_nvme_compare(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32""
-pci_nvme_compare_cb(uint16_t cid) "cid %"PRIu16""
+pci_nvme_compare_data_cb(uint16_t cid) "cid %"PRIu16""
+pci_nvme_compare_mdata_cb(uint16_t cid) "cid %"PRIu16""
pci_nvme_aio_discard_cb(uint16_t cid) "cid %"PRIu16""
pci_nvme_aio_copy_in_cb(uint16_t cid) "cid %"PRIu16""
pci_nvme_aio_zone_reset_cb(uint16_t cid, uint64_t zslba) "cid %"PRIu16" zslba 0x%"PRIx64""
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH RFC v2 8/8] hw/block/nvme: end-to-end data protection
2021-02-07 21:49 [PATCH RFC v2 0/8] hw/block/nvme: metadata and end-to-end data protection support Klaus Jensen
` (6 preceding siblings ...)
2021-02-07 21:49 ` [PATCH RFC v2 7/8] hw/block/nvme: add metadata support Klaus Jensen
@ 2021-02-07 21:49 ` Klaus Jensen
7 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-02-07 21:49 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Fam Zheng, qemu-block, Klaus Jensen,
Gollu Appalanaidu, Max Reitz, Klaus Jensen, Stefan Hajnoczi,
Keith Busch
From: Klaus Jensen <k.jensen@samsung.com>
Add support for namespaces formatted with protection information. The
type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is
selected with the `pi` nvme-ns device parameter. If the number of
metadata bytes is larger than 8, the `pil` nvme-ns device parameter may
be used to control the location of the 8-byte DIF tuple. The default
`pil` value of '0', causes the DIF tuple to be transferred as the last
8 bytes of the metadata. Set to 1 to store this in the first eight bytes
instead.
This patch is based on work by Gollu Appalanaidu.
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com>
---
hw/block/nvme-ns.h | 2 +
hw/block/nvme.h | 36 +++
include/block/nvme.h | 26 +-
hw/block/nvme-ns.c | 11 +
hw/block/nvme.c | 731 ++++++++++++++++++++++++++++++++++++++++--
hw/block/trace-events | 11 +
6 files changed, 793 insertions(+), 24 deletions(-)
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 2281fd39930a..e537bfba18b8 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -31,6 +31,8 @@ typedef struct NvmeNamespaceParams {
uint16_t ms;
uint8_t mset;
+ uint8_t pi;
+ uint8_t pil;
uint16_t mssrl;
uint32_t mcl;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 0e4fbd6990ad..f81580353868 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -212,6 +212,42 @@ static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
return sq->ctrl;
}
+/* from Linux kernel (crypto/crct10dif_common.c) */
+static const uint16_t t10_dif_crc_table[256] = {
+ 0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+ 0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+ 0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+ 0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+ 0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+ 0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+ 0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+ 0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+ 0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+ 0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+ 0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+ 0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+ 0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+ 0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+ 0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+ 0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+ 0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+ 0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+ 0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+ 0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+ 0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+ 0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+ 0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+ 0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+ 0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+ 0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+ 0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+ 0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+ 0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+ 0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+ 0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+ 0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
+
int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
#endif /* HW_NVME_H */
diff --git a/include/block/nvme.h b/include/block/nvme.h
index f82b5ffc2c1d..b6a3fb5e1f0f 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -695,12 +695,17 @@ enum {
NVME_RW_DSM_LATENCY_LOW = 3 << 4,
NVME_RW_DSM_SEQ_REQ = 1 << 6,
NVME_RW_DSM_COMPRESSED = 1 << 7,
+ NVME_RW_PIREMAP = 1 << 9,
NVME_RW_PRINFO_PRACT = 1 << 13,
NVME_RW_PRINFO_PRCHK_GUARD = 1 << 12,
NVME_RW_PRINFO_PRCHK_APP = 1 << 11,
NVME_RW_PRINFO_PRCHK_REF = 1 << 10,
+ NVME_RW_PRINFO_PRCHK_MASK = 7 << 10,
+
};
+#define NVME_RW_PRINFO(control) ((control >> 10) & 0xf)
+
typedef struct QEMU_PACKED NvmeDsmCmd {
uint8_t opcode;
uint8_t flags;
@@ -1292,14 +1297,22 @@ typedef struct QEMU_PACKED NvmeIdNsZoned {
#define NVME_ID_NS_DPC_TYPE_MASK 0x7
enum NvmeIdNsDps {
- DPS_TYPE_NONE = 0,
- DPS_TYPE_1 = 1,
- DPS_TYPE_2 = 2,
- DPS_TYPE_3 = 3,
- DPS_TYPE_MASK = 0x7,
- DPS_FIRST_EIGHT = 8,
+ NVME_ID_NS_DPS_TYPE_NONE = 0,
+ NVME_ID_NS_DPS_TYPE_1 = 1,
+ NVME_ID_NS_DPS_TYPE_2 = 2,
+ NVME_ID_NS_DPS_TYPE_3 = 3,
+ NVME_ID_NS_DPS_TYPE_MASK = 0x7,
+ NVME_ID_NS_DPS_FIRST_EIGHT = 8,
};
+#define NVME_ID_NS_DPS_TYPE(dps) (dps & NVME_ID_NS_DPS_TYPE_MASK)
+
+typedef struct NvmeDifTuple {
+ uint16_t guard;
+ uint16_t apptag;
+ uint32_t reftag;
+} NvmeDifTuple;
+
enum NvmeZoneAttr {
NVME_ZA_FINISHED_BY_CTLR = 1 << 0,
NVME_ZA_FINISH_RECOMMENDED = 1 << 1,
@@ -1395,5 +1408,6 @@ static inline void _nvme_check_size(void)
QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16);
QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4);
QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescr) != 64);
+ QEMU_BUILD_BUG_ON(sizeof(NvmeDifTuple) != 8);
}
#endif
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7a662c170428..c7664926a6ca 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -50,6 +50,9 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
if (ns->params.mset) {
id_ns->flbas |= 0x10;
}
+
+ id_ns->dpc = 0x1f;
+ id_ns->dps = (ns->params.pil << 3) | ns->params.pi;
}
nlbas = nvme_ns_nlbas(ns);
@@ -325,6 +328,12 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
return -1;
}
+ if (ns->params.pi && !ns->params.ms) {
+ error_setg(errp, "at least 8 bytes of metadata required to enable "
+ "protection information");
+ return -1;
+ }
+
return 0;
}
@@ -409,6 +418,8 @@ static Property nvme_ns_props[] = {
DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
DEFINE_PROP_UINT16("ms", NvmeNamespace, params.ms, 0),
DEFINE_PROP_UINT8("mset", NvmeNamespace, params.mset, 0),
+ DEFINE_PROP_UINT8("pi", NvmeNamespace, params.pi, 0),
+ DEFINE_PROP_UINT8("pil", NvmeNamespace, params.pil, 0),
DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128),
DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 8aa892ec3106..bc2218ca523e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -955,9 +955,16 @@ static uint16_t nvme_map_mptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
static uint16_t nvme_map_data(NvmeCtrl *n, uint32_t nlb, NvmeRequest *req)
{
NvmeNamespace *ns = req->ns;
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint16_t ctrl = le16_to_cpu(rw->control);
size_t len = nvme_l2b(ns, nlb);
uint16_t status;
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) &&
+ (ctrl & NVME_RW_PRINFO_PRACT && nvme_msize(ns) == 8)) {
+ goto out;
+ }
+
if (nvme_ns_ext(ns)) {
NvmeSg sg;
@@ -975,6 +982,7 @@ static uint16_t nvme_map_data(NvmeCtrl *n, uint32_t nlb, NvmeRequest *req)
return NVME_SUCCESS;
}
+out:
return nvme_map_dptr(n, &req->sg, len, &req->cmd);
}
@@ -1139,8 +1147,11 @@ static uint16_t nvme_bounce_data(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
NvmeTxDirection dir, NvmeRequest *req)
{
NvmeNamespace *ns = req->ns;
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint16_t ctrl = le16_to_cpu(rw->control);
- if (nvme_ns_ext(ns)) {
+ if (nvme_ns_ext(ns) &&
+ !(ctrl & NVME_RW_PRINFO_PRACT && nvme_msize(ns) == 8)) {
size_t lsize = nvme_lsize(ns);
size_t msize = nvme_msize(ns);
@@ -1423,6 +1434,161 @@ static uint16_t nvme_check_dulbe(NvmeNamespace *ns, uint64_t slba,
return NVME_SUCCESS;
}
+static uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint16_t ctrl,
+ uint64_t slba, uint32_t reftag)
+{
+ if ((NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) == NVME_ID_NS_DPS_TYPE_1) &&
+ (slba & 0xffffffff) != reftag) {
+ return NVME_INVALID_PROT_INFO | NVME_DNR;
+ }
+
+ return NVME_SUCCESS;
+}
+
+/* from Linux kernel (crypto/crct10dif_common.c) */
+static uint16_t crc_t10dif(uint16_t crc, const unsigned char *buffer,
+ size_t len)
+{
+ unsigned int i;
+
+ for (i = 0; i < len; i++) {
+ crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
+ }
+
+ return crc;
+}
+
+static void nvme_e2e_pract_generate_dif(NvmeNamespace *ns, uint8_t *buf,
+ size_t len, uint8_t *mbuf, size_t mlen,
+ uint16_t apptag, uint32_t reftag)
+{
+ uint8_t *end = buf + len;
+ size_t lsize = nvme_lsize(ns);
+ size_t msize = nvme_msize(ns);
+ int16_t pil = 0;
+
+ if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
+ pil = nvme_msize(ns) - sizeof(NvmeDifTuple);
+ }
+
+ trace_pci_nvme_e2e_pract_generate_dif(len, lsize, lsize + pil, apptag,
+ reftag);
+
+ for (; buf < end; buf += lsize, mbuf += msize) {
+ NvmeDifTuple *dif = (NvmeDifTuple *)(mbuf + pil);
+ uint16_t crc = crc_t10dif(0x0, buf, lsize);
+
+ if (pil) {
+ crc = crc_t10dif(crc, mbuf, pil);
+ }
+
+ dif->guard = cpu_to_be16(crc);
+ dif->apptag = cpu_to_be16(apptag);
+ dif->reftag = cpu_to_be32(reftag);
+
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) != NVME_ID_NS_DPS_TYPE_3) {
+ reftag++;
+ }
+ }
+}
+
+static uint16_t nvme_e2e_prchk(NvmeNamespace *ns, NvmeDifTuple *dif,
+ uint8_t *buf, uint8_t *mbuf, size_t pil,
+ uint16_t ctrl, uint16_t apptag,
+ uint16_t appmask, uint32_t reftag)
+{
+ switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ case NVME_ID_NS_DPS_TYPE_3:
+ if (be32_to_cpu(dif->reftag) != 0xffffffff) {
+ break;
+ }
+
+ /* fallthrough */
+ case NVME_ID_NS_DPS_TYPE_1:
+ case NVME_ID_NS_DPS_TYPE_2:
+ if (be16_to_cpu(dif->apptag) != 0xffff) {
+ break;
+ }
+
+ trace_pci_nvme_e2e_prchk_disabled(be16_to_cpu(dif->apptag),
+ be32_to_cpu(dif->reftag));
+
+ return NVME_SUCCESS;
+ }
+
+ if (ctrl & NVME_RW_PRINFO_PRCHK_GUARD) {
+ uint16_t crc = crc_t10dif(0x0, buf, nvme_lsize(ns));
+
+ if (pil) {
+ crc = crc_t10dif(crc, mbuf, pil);
+ }
+
+ trace_pci_nvme_e2e_prchk_guard(be16_to_cpu(dif->guard), crc);
+
+ if (be16_to_cpu(dif->guard) != crc) {
+ return NVME_E2E_GUARD_ERROR;
+ }
+ }
+
+ if (ctrl & NVME_RW_PRINFO_PRCHK_APP) {
+ trace_pci_nvme_e2e_prchk_apptag(be16_to_cpu(dif->apptag), apptag,
+ appmask);
+
+ if ((be16_to_cpu(dif->apptag) & appmask) != (apptag & appmask)) {
+ return NVME_E2E_APP_ERROR;
+ }
+ }
+
+ if (ctrl & NVME_RW_PRINFO_PRCHK_REF) {
+ trace_pci_nvme_e2e_prchk_reftag(be32_to_cpu(dif->reftag), reftag);
+
+ if (be32_to_cpu(dif->reftag) != reftag) {
+ return NVME_E2E_REF_ERROR;
+ }
+ }
+
+ return NVME_SUCCESS;
+}
+
+static uint16_t nvme_e2e_check(NvmeNamespace *ns, uint8_t *buf, size_t len,
+ uint8_t *mbuf, size_t mlen, uint16_t ctrl,
+ uint64_t slba, uint16_t apptag,
+ uint16_t appmask, uint32_t reftag)
+{
+ uint8_t *end = buf + len;
+ size_t lsize = nvme_lsize(ns);
+ size_t msize = nvme_msize(ns);
+ int16_t pil = 0;
+ uint16_t status;
+
+ status = nvme_check_prinfo(ns, ctrl, slba, reftag);
+ if (status) {
+ return status;
+ }
+
+ if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
+ pil = nvme_msize(ns) - sizeof(NvmeDifTuple);
+ }
+
+ trace_pci_nvme_e2e_check(NVME_RW_PRINFO(ctrl), lsize + pil);
+
+ for (; buf < end; buf += lsize, mbuf += msize) {
+ NvmeDifTuple *dif = (NvmeDifTuple *)(mbuf + pil);
+
+ status = nvme_e2e_prchk(ns, dif, buf, mbuf, pil, ctrl, apptag,
+ appmask, reftag);
+ if (status) {
+ return status;
+ }
+
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) != NVME_ID_NS_DPS_TYPE_3) {
+ reftag++;
+ }
+ }
+
+ return NVME_SUCCESS;
+}
+
static void nvme_aio_err(NvmeRequest *req, int ret)
{
uint16_t status = NVME_SUCCESS;
@@ -1459,6 +1625,20 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
req->status = status;
}
+struct nvme_e2e_ctx {
+ NvmeRequest *req;
+
+ struct {
+ QEMUIOVector iov;
+ uint8_t *bounce;
+ } data;
+
+ struct {
+ QEMUIOVector iov;
+ uint8_t *bounce;
+ } mdata;
+};
+
static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba)
{
return ns->zone_size_log2 > 0 ? slba >> ns->zone_size_log2 :
@@ -1748,6 +1928,26 @@ static void nvme_rw_complete_cb(void *opaque, int ret)
nvme_enqueue_req_completion(nvme_cq(req), req);
}
+static void nvme_e2e_rw_cb(void *opaque, int ret)
+{
+ struct nvme_e2e_ctx *ctx = opaque;
+ NvmeRequest *req = ctx->req;
+ NvmeNamespace *ns = req->ns;
+ BlockBackend *blk = ns->blkconf.blk;
+
+ trace_pci_nvme_e2e_rw_cb(nvme_cid(req), blk_name(blk));
+
+ qemu_iovec_destroy(&ctx->data.iov);
+ g_free(ctx->data.bounce);
+
+ qemu_iovec_destroy(&ctx->mdata.iov);
+ g_free(ctx->mdata.bounce);
+
+ g_free(ctx);
+
+ nvme_rw_complete_cb(req, ret);
+}
+
static void nvme_rw_cb(void *opaque, int ret)
{
NvmeRequest *req = opaque;
@@ -1798,6 +1998,158 @@ out:
nvme_rw_complete_cb(req, ret);
}
+static void nvme_e2e_rw_check_cb(void *opaque, int ret)
+{
+ struct nvme_e2e_ctx *ctx = opaque;
+ NvmeRequest *req = ctx->req;
+ NvmeNamespace *ns = req->ns;
+ BlockDriverState *bs = blk_bs(ns->blkconf.blk);
+ NvmeCtrl *n = nvme_ctrl(req);
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint64_t slba = le64_to_cpu(rw->slba);
+ int64_t offset = nvme_l2b(ns, slba);
+ uint16_t ctrl = le16_to_cpu(rw->control);
+ uint16_t apptag = le16_to_cpu(rw->apptag);
+ uint16_t appmask = le16_to_cpu(rw->appmask);
+ uint32_t reftag = le32_to_cpu(rw->reftag);
+ int64_t pnum, bytes;
+ bool zeroed;
+ int16_t pil = 0;
+ uint16_t status;
+
+ Error *err = NULL;
+
+ trace_pci_nvme_e2e_rw_check_cb(nvme_cid(req), NVME_RW_PRINFO(ctrl), apptag,
+ appmask, reftag);
+
+ if (ret) {
+ goto out;
+ }
+
+ if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
+ pil = nvme_msize(ns) - sizeof(NvmeDifTuple);
+ }
+
+ pnum = 0;
+ bytes = ctx->data.iov.size;
+
+ do {
+ int ret;
+
+ bytes -= pnum;
+
+ ret = bdrv_block_status(bs, offset, bytes, &pnum, NULL, NULL);
+ if (ret < 0) {
+ error_setg_errno(&err, -ret, "unable to get block status");
+ error_report_err(err);
+
+ status = NVME_INTERNAL_DEV_ERROR;
+ goto out;
+ }
+
+ zeroed = !!(ret & BDRV_BLOCK_ZERO);
+
+ trace_pci_nvme_block_status(offset, bytes, pnum, ret, zeroed);
+
+ if (zeroed) {
+ size_t msize = nvme_msize(ns);
+ uint8_t *mbuf = ctx->mdata.bounce +
+ ((offset - nvme_l2b(ns, slba)) / nvme_lsize(ns)) * msize;
+ size_t mlen = (pnum / nvme_lsize(ns)) * msize;
+ uint8_t *end = mbuf + mlen;
+
+ for (; mbuf < end; mbuf += msize) {
+ memset(mbuf + pil, 0xff, sizeof(NvmeDifTuple));
+ }
+ }
+
+ offset += pnum;
+ } while (pnum != bytes);
+
+ status = nvme_e2e_check(ns, ctx->data.bounce, ctx->data.iov.size,
+ ctx->mdata.bounce, ctx->mdata.iov.size, ctrl,
+ slba, apptag, appmask, reftag);
+ if (status) {
+ req->status = status;
+ goto out;
+ }
+
+ status = nvme_bounce_data(n, ctx->data.bounce, ctx->data.iov.size,
+ NVME_TX_DIRECTION_FROM_DEVICE, req);
+ if (status) {
+ req->status = status;
+ goto out;
+ }
+
+ if (ctrl & NVME_RW_PRINFO_PRACT && nvme_msize(ns) == 8) {
+ goto out;
+ }
+
+ status = nvme_bounce_mdata(n, ctx->mdata.bounce, ctx->mdata.iov.size,
+ NVME_TX_DIRECTION_FROM_DEVICE, req);
+ if (status) {
+ req->status = status;
+ }
+
+out:
+ nvme_e2e_rw_cb(ctx, ret);
+}
+
+static void nvme_e2e_rw_mdata_in_cb(void *opaque, int ret)
+{
+ struct nvme_e2e_ctx *ctx = opaque;
+ NvmeRequest *req = ctx->req;
+ NvmeNamespace *ns = req->ns;
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint64_t slba = le64_to_cpu(rw->slba);
+ uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
+ size_t mlen = nvme_m2b(ns, nlb);
+ uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+ BlockBackend *blk = ns->blkconf.blk;
+
+ trace_pci_nvme_e2e_rw_mdata_in_cb(nvme_cid(req), blk_name(blk));
+
+ if (ret) {
+ goto out;
+ }
+
+ ctx->mdata.bounce = g_malloc(mlen);
+
+ qemu_iovec_reset(&ctx->mdata.iov);
+ qemu_iovec_add(&ctx->mdata.iov, ctx->mdata.bounce, mlen);
+
+ req->aiocb = blk_aio_preadv(blk, offset, &ctx->mdata.iov, 0,
+ nvme_e2e_rw_check_cb, ctx);
+ return;
+
+out:
+ nvme_e2e_rw_cb(ctx, ret);
+}
+
+static void nvme_e2e_rw_mdata_out_cb(void *opaque, int ret)
+{
+ struct nvme_e2e_ctx *ctx = opaque;
+ NvmeRequest *req = ctx->req;
+ NvmeNamespace *ns = req->ns;
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint64_t slba = le64_to_cpu(rw->slba);
+ uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+ BlockBackend *blk = ns->blkconf.blk;
+
+ trace_pci_nvme_e2e_rw_mdata_out_cb(nvme_cid(req), blk_name(blk));
+
+ if (ret) {
+ goto out;
+ }
+
+ req->aiocb = blk_aio_pwritev(blk, offset, &ctx->mdata.iov, 0,
+ nvme_e2e_rw_cb, ctx);
+ return;
+
+out:
+ nvme_e2e_rw_cb(ctx, ret);
+}
+
static void nvme_aio_discard_cb(void *opaque, int ret)
{
NvmeRequest *req = opaque;
@@ -1896,11 +2248,13 @@ struct nvme_copy_ctx {
uint8_t *bounce;
uint8_t *mbounce;
uint32_t nlb;
+ NvmeCopySourceRange *ranges;
};
struct nvme_copy_in_ctx {
NvmeRequest *req;
QEMUIOVector iov;
+ NvmeCopySourceRange *range;
};
static void nvme_copy_complete_cb(void *opaque, int ret)
@@ -1974,6 +2328,70 @@ static void nvme_copy_in_complete(NvmeRequest *req)
block_acct_done(blk_get_stats(ns->blkconf.blk), &req->acct);
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ uint16_t prinfor = (copy->control[0] >> 4) & 0xf;
+ uint16_t prinfow = (copy->control[2] >> 2) & 0xf;
+ uint16_t nr = copy->nr + 1;
+ NvmeCopySourceRange *range;
+ uint64_t slba;
+ uint32_t nlb;
+ uint16_t apptag, appmask;
+ uint32_t reftag;
+ uint8_t *buf = ctx->bounce, *mbuf = ctx->mbounce;
+ size_t len, mlen;
+ int i;
+
+ /*
+ * The e2e helpers expects prinfo to be similar to the control field of
+ * the NvmeRwCmd, so shift by 10 to fake it.
+ */
+ prinfor = prinfor << 10;
+ prinfow = prinfow << 10;
+
+ for (i = 0; i < nr; i++) {
+ range = &ctx->ranges[i];
+ slba = le64_to_cpu(range->slba);
+ nlb = le16_to_cpu(range->nlb) + 1;
+ len = nvme_l2b(ns, nlb);
+ mlen = nvme_m2b(ns, nlb);
+ apptag = le16_to_cpu(range->apptag);
+ appmask = le16_to_cpu(range->appmask);
+ reftag = le32_to_cpu(range->reftag);
+
+ status = nvme_e2e_check(ns, buf, len, mbuf, mlen, prinfor, slba,
+ apptag, appmask, reftag);
+ if (status) {
+ goto invalid;
+ }
+
+ buf += len;
+ mbuf += mlen;
+ }
+
+ apptag = le16_to_cpu(copy->apptag);
+ appmask = le16_to_cpu(copy->appmask);
+ reftag = le32_to_cpu(copy->reftag);
+
+ if (prinfow & NVME_RW_PRINFO_PRACT) {
+ size_t len = nvme_l2b(ns, ctx->nlb);
+ size_t mlen = nvme_m2b(ns, ctx->nlb);
+
+ status = nvme_check_prinfo(ns, prinfow, slba, reftag);
+ if (status) {
+ goto invalid;
+ }
+
+ nvme_e2e_pract_generate_dif(ns, ctx->bounce, len, ctx->mbounce,
+ mlen, apptag, reftag);
+ } else {
+ status = nvme_e2e_check(ns, ctx->bounce, len, ctx->mbounce, mlen,
+ prinfow, sdlba, apptag, appmask, reftag);
+ if (status) {
+ goto invalid;
+ }
+ }
+ }
+
status = nvme_check_bounds(ns, sdlba, ctx->nlb);
if (status) {
trace_pci_nvme_err_invalid_lba_range(sdlba, ctx->nlb, ns->id_ns.nsze);
@@ -2068,7 +2486,13 @@ struct nvme_compare_ctx {
static void nvme_compare_mdata_cb(void *opaque, int ret)
{
NvmeRequest *req = opaque;
+ NvmeNamespace *ns = req->ns;
NvmeCtrl *n = nvme_ctrl(req);
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ uint16_t ctrl = le16_to_cpu(rw->control);
+ uint16_t apptag = le16_to_cpu(rw->apptag);
+ uint16_t appmask = le16_to_cpu(rw->appmask);
+ uint32_t reftag = le32_to_cpu(rw->reftag);
struct nvme_compare_ctx *ctx = req->opaque;
g_autofree uint8_t *buf = NULL;
uint16_t status = NVME_SUCCESS;
@@ -2084,6 +2508,40 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
goto out;
}
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ uint64_t slba = le64_to_cpu(rw->slba);
+ uint8_t *bufp;
+ uint8_t *mbufp = ctx->mdata.bounce;
+ uint8_t *end = mbufp + ctx->mdata.iov.size;
+ size_t msize = nvme_msize(ns);
+ int16_t pil = 0;
+
+ status = nvme_e2e_check(ns, ctx->data.bounce, ctx->data.iov.size,
+ ctx->mdata.bounce, ctx->mdata.iov.size, ctrl,
+ slba, apptag, appmask, reftag);
+ if (status) {
+ req->status = status;
+ goto out;
+ }
+
+ /*
+ * When formatted with protection information, do not compare the DIF
+ * tuple.
+ */
+ if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
+ pil = nvme_msize(ns) - sizeof(NvmeDifTuple);
+ }
+
+ for (bufp = buf; mbufp < end; bufp += msize, mbufp += msize) {
+ if (memcmp(bufp + pil, mbufp + pil, msize - pil)) {
+ req->status = NVME_CMP_FAILURE;
+ goto out;
+ }
+ }
+
+ goto out;
+ }
+
if (memcmp(buf, ctx->mdata.bounce, ctx->mdata.iov.size)) {
req->status = NVME_CMP_FAILURE;
goto out;
@@ -2239,12 +2697,18 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
{
NvmeNamespace *ns = req->ns;
NvmeCopyCmd *copy = (NvmeCopyCmd *)&req->cmd;
- g_autofree NvmeCopySourceRange *range = NULL;
uint16_t nr = copy->nr + 1;
uint8_t format = copy->control[0] & 0xf;
- uint32_t nlb = 0;
+ /*
+ * Shift the PRINFOR/PRINFOW values by 10 to allow reusing the
+ * NVME_RW_PRINFO constants.
+ */
+ uint16_t prinfor = ((copy->control[0] >> 4) & 0xf) << 10;
+ uint16_t prinfow = ((copy->control[2] >> 2) & 0xf) << 10;
+
+ uint32_t nlb = 0;
uint8_t *bounce = NULL, *bouncep = NULL;
uint8_t *mbounce = NULL, *mbouncep = NULL;
struct nvme_copy_ctx *ctx;
@@ -2253,6 +2717,11 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
trace_pci_nvme_copy(nvme_cid(req), nvme_nsid(ns), nr, format);
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) &&
+ ((prinfor & NVME_RW_PRINFO_PRACT) != (prinfow & NVME_RW_PRINFO_PRACT))) {
+ return NVME_INVALID_FIELD | NVME_DNR;
+ }
+
if (!(n->id_ctrl.ocfs & (1 << format))) {
trace_pci_nvme_err_copy_invalid_format(format);
return NVME_INVALID_FIELD | NVME_DNR;
@@ -2262,39 +2731,41 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
return NVME_CMD_SIZE_LIMIT | NVME_DNR;
}
- range = g_new(NvmeCopySourceRange, nr);
+ ctx = g_new(struct nvme_copy_ctx, 1);
+ ctx->ranges = g_new(NvmeCopySourceRange, nr);
- status = nvme_h2c(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
- req);
+ status = nvme_h2c(n, (uint8_t *)ctx->ranges,
+ nr * sizeof(NvmeCopySourceRange), req);
if (status) {
- return status;
+ goto out;
}
for (i = 0; i < nr; i++) {
- uint64_t slba = le64_to_cpu(range[i].slba);
- uint32_t _nlb = le16_to_cpu(range[i].nlb) + 1;
+ uint64_t slba = le64_to_cpu(ctx->ranges[i].slba);
+ uint32_t _nlb = le16_to_cpu(ctx->ranges[i].nlb) + 1;
if (_nlb > le16_to_cpu(ns->id_ns.mssrl)) {
- return NVME_CMD_SIZE_LIMIT | NVME_DNR;
+ status = NVME_CMD_SIZE_LIMIT | NVME_DNR;
+ goto out;
}
status = nvme_check_bounds(ns, slba, _nlb);
if (status) {
trace_pci_nvme_err_invalid_lba_range(slba, _nlb, ns->id_ns.nsze);
- return status;
+ goto out;
}
if (NVME_ERR_REC_DULBE(ns->features.err_rec)) {
status = nvme_check_dulbe(ns, slba, _nlb);
if (status) {
- return status;
+ goto out;
}
}
if (ns->params.zoned) {
status = nvme_check_zone_read(ns, slba, _nlb);
if (status) {
- return status;
+ goto out;
}
}
@@ -2302,7 +2773,8 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
}
if (nlb > le32_to_cpu(ns->id_ns.mcl)) {
- return NVME_CMD_SIZE_LIMIT | NVME_DNR;
+ status = NVME_CMD_SIZE_LIMIT | NVME_DNR;
+ goto out;
}
bounce = bouncep = g_malloc(nvme_l2b(ns, nlb));
@@ -2313,8 +2785,6 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
block_acct_start(blk_get_stats(ns->blkconf.blk), &req->acct, 0,
BLOCK_ACCT_READ);
- ctx = g_new(struct nvme_copy_ctx, 1);
-
ctx->bounce = bounce;
ctx->mbounce = mbounce;
ctx->nlb = nlb;
@@ -2323,8 +2793,8 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
req->opaque = ctx;
for (i = 0; i < nr; i++) {
- uint64_t slba = le64_to_cpu(range[i].slba);
- uint32_t nlb = le16_to_cpu(range[i].nlb) + 1;
+ uint64_t slba = le64_to_cpu(ctx->ranges[i].slba);
+ uint32_t nlb = le16_to_cpu(ctx->ranges[i].nlb) + 1;
size_t len = nvme_l2b(ns, nlb);
int64_t offset = nvme_l2b(ns, slba);
@@ -2371,6 +2841,12 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
}
return NVME_NO_COMPLETE;
+
+out:
+ g_free(ctx->ranges);
+ g_free(ctx);
+
+ return status;
}
static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req)
@@ -2380,6 +2856,7 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req)
BlockBackend *blk = ns->blkconf.blk;
uint64_t slba = le64_to_cpu(rw->slba);
uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
+ uint16_t ctrl = le16_to_cpu(rw->control);
size_t data_len = nvme_l2b(ns, nlb);
size_t len = data_len;
int64_t offset = nvme_l2b(ns, slba);
@@ -2388,6 +2865,10 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req)
trace_pci_nvme_compare(nvme_cid(req), nvme_nsid(ns), slba, nlb);
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps) && (ctrl & NVME_RW_PRINFO_PRACT)) {
+ return NVME_INVALID_PROT_INFO | NVME_DNR;
+ }
+
if (nvme_ns_ext(ns)) {
len += nvme_m2b(ns, nlb);
}
@@ -2439,12 +2920,175 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
return NVME_NO_COMPLETE;
}
+static uint16_t nvme_e2e_rw(NvmeCtrl *n, NvmeRequest *req)
+{
+ NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
+ NvmeNamespace *ns = req->ns;
+ BlockBackend *blk = ns->blkconf.blk;
+ bool wrz = rw->opcode == NVME_CMD_WRITE_ZEROES;
+ uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
+ uint64_t slba = le64_to_cpu(rw->slba);
+ size_t len = nvme_l2b(ns, nlb);
+ size_t mlen = nvme_m2b(ns, nlb);
+ size_t mapped_len = len;
+ int64_t offset = nvme_l2b(ns, slba);
+ uint16_t ctrl = le16_to_cpu(rw->control);
+ uint16_t apptag = le16_to_cpu(rw->apptag);
+ uint16_t appmask = le16_to_cpu(rw->appmask);
+ uint32_t reftag = le32_to_cpu(rw->reftag);
+ bool pract = !!(ctrl & NVME_RW_PRINFO_PRACT);
+ struct nvme_e2e_ctx *ctx;
+ uint16_t status;
+
+ trace_pci_nvme_e2e_rw(pract);
+
+ ctx = g_new0(struct nvme_e2e_ctx, 1);
+ ctx->req = req;
+
+ if (wrz) {
+ BdrvRequestFlags flags = BDRV_REQ_MAY_UNMAP;
+
+ if (ctrl & NVME_RW_PRINFO_PRCHK_MASK) {
+ status = NVME_INVALID_PROT_INFO | NVME_DNR;
+ goto err;
+ }
+
+ if (pract) {
+ uint8_t *mbuf, *end;
+ size_t msize = nvme_msize(ns);
+ int16_t pil = msize - sizeof(NvmeDifTuple);
+
+ status = nvme_check_prinfo(ns, ctrl, slba, reftag);
+ if (status) {
+ goto err;
+ }
+
+ flags = 0;
+
+ ctx->mdata.bounce = g_malloc0(mlen);
+
+ qemu_iovec_init(&ctx->mdata.iov, 1);
+ qemu_iovec_add(&ctx->mdata.iov, ctx->mdata.bounce, mlen);
+
+ mbuf = ctx->mdata.bounce;
+ end = mbuf + mlen;
+
+ if (ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT) {
+ pil = 0;
+ }
+
+ for (; mbuf < end; mbuf += msize) {
+ NvmeDifTuple *dif = (NvmeDifTuple *)(mbuf + pil);
+
+ dif->apptag = cpu_to_be16(apptag);
+ dif->reftag = cpu_to_be32(reftag);
+
+ switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ case NVME_ID_NS_DPS_TYPE_1:
+ case NVME_ID_NS_DPS_TYPE_2:
+ reftag++;
+ }
+ }
+ }
+
+ req->aiocb = blk_aio_pwrite_zeroes(blk, offset, len, flags,
+ nvme_e2e_rw_mdata_out_cb, ctx);
+ return NVME_NO_COMPLETE;
+ }
+
+ if (nvme_ns_ext(ns) && !(pract && nvme_msize(ns) == 8)) {
+ mapped_len += mlen;
+ }
+
+ status = nvme_map_dptr(n, &req->sg, mapped_len, &req->cmd);
+ if (status) {
+ return status;
+ }
+
+ ctx->data.bounce = g_malloc(len);
+
+ qemu_iovec_init(&ctx->data.iov, 1);
+ qemu_iovec_add(&ctx->data.iov, ctx->data.bounce, len);
+
+ if (req->cmd.opcode == NVME_CMD_READ) {
+ block_acct_start(blk_get_stats(blk), &req->acct, ctx->data.iov.size,
+ BLOCK_ACCT_READ);
+
+ req->aiocb = blk_aio_preadv(ns->blkconf.blk, offset, &ctx->data.iov, 0,
+ nvme_e2e_rw_mdata_in_cb, ctx);
+ return NVME_NO_COMPLETE;
+ }
+
+ status = nvme_bounce_data(n, ctx->data.bounce, ctx->data.iov.size,
+ NVME_TX_DIRECTION_TO_DEVICE, req);
+ if (status) {
+ goto err;
+ }
+
+ ctx->mdata.bounce = g_malloc(mlen);
+
+ qemu_iovec_init(&ctx->mdata.iov, 1);
+ qemu_iovec_add(&ctx->mdata.iov, ctx->mdata.bounce, mlen);
+
+ if (!(pract && nvme_msize(ns) == 8)) {
+ status = nvme_bounce_mdata(n, ctx->mdata.bounce, ctx->mdata.iov.size,
+ NVME_TX_DIRECTION_TO_DEVICE, req);
+ if (status) {
+ goto err;
+ }
+ }
+
+ status = nvme_check_prinfo(ns, ctrl, slba, reftag);
+ if (status) {
+ goto err;
+ }
+
+ if (pract) {
+ status = nvme_check_prinfo(ns, ctrl, slba, reftag);
+ if (status) {
+ goto err;
+ }
+
+ /* splice generated protection information into the buffer */
+ nvme_e2e_pract_generate_dif(ns, ctx->data.bounce, ctx->data.iov.size,
+ ctx->mdata.bounce, ctx->mdata.iov.size,
+ apptag, reftag);
+ } else {
+ status = nvme_e2e_check(ns, ctx->data.bounce, ctx->data.iov.size,
+ ctx->mdata.bounce, ctx->mdata.iov.size, ctrl,
+ slba, apptag, appmask, reftag);
+ if (status) {
+ goto err;
+ }
+ }
+
+ block_acct_start(blk_get_stats(blk), &req->acct, ctx->data.iov.size,
+ BLOCK_ACCT_WRITE);
+
+ req->aiocb = blk_aio_pwritev(ns->blkconf.blk, offset, &ctx->data.iov, 0,
+ nvme_e2e_rw_mdata_out_cb, ctx);
+
+ return NVME_NO_COMPLETE;
+
+err:
+ qemu_iovec_destroy(&ctx->data.iov);
+ g_free(ctx->data.bounce);
+
+ qemu_iovec_destroy(&ctx->mdata.iov);
+ g_free(ctx->mdata.bounce);
+
+ g_free(ctx);
+
+ return status;
+}
+
static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
{
NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
NvmeNamespace *ns = req->ns;
uint64_t slba = le64_to_cpu(rw->slba);
uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+ uint16_t ctrl = le16_to_cpu(rw->control);
uint64_t data_size = nvme_l2b(ns, nlb);
uint64_t mapped_size = data_size;
uint64_t data_offset;
@@ -2453,6 +3097,14 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
if (nvme_ns_ext(ns)) {
mapped_size += nvme_m2b(ns, nlb);
+
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ bool pract = ctrl & NVME_RW_PRINFO_PRACT;
+
+ if (pract && nvme_msize(ns) == 8) {
+ mapped_size = data_size;
+ }
+ }
}
trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, mapped_size, slba);
@@ -2484,6 +3136,10 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
}
}
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ return nvme_e2e_rw(n, req);
+ }
+
status = nvme_map_data(n, nlb, req);
if (status) {
goto invalid;
@@ -2508,6 +3164,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
NvmeNamespace *ns = req->ns;
uint64_t slba = le64_to_cpu(rw->slba);
uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+ uint16_t ctrl = le16_to_cpu(rw->control);
uint64_t data_size = nvme_l2b(ns, nlb);
uint64_t mapped_size = data_size;
uint64_t data_offset;
@@ -2518,6 +3175,14 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
if (nvme_ns_ext(ns)) {
mapped_size += nvme_m2b(ns, nlb);
+
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ bool pract = ctrl & NVME_RW_PRINFO_PRACT;
+
+ if (pract && nvme_msize(ns) == 8) {
+ mapped_size -= nvme_m2b(ns, nlb);
+ }
+ }
}
trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode),
@@ -2541,6 +3206,8 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
zone = nvme_get_zone_by_slba(ns, slba);
if (append) {
+ bool piremap = !!(ctrl & NVME_RW_PIREMAP);
+
if (unlikely(slba != zone->d.zslba)) {
trace_pci_nvme_err_append_not_at_start(slba, zone->d.zslba);
status = NVME_INVALID_FIELD;
@@ -2555,6 +3222,30 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
slba = zone->w_ptr;
res->slba = cpu_to_le64(slba);
+
+ switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ case NVME_ID_NS_DPS_TYPE_1:
+ if (!piremap) {
+ return NVME_INVALID_PROT_INFO | NVME_DNR;
+ }
+
+ /* fallthrough */
+
+ case NVME_ID_NS_DPS_TYPE_2:
+ if (piremap) {
+ uint32_t reftag = le32_to_cpu(rw->reftag);
+ rw->reftag = cpu_to_le32(reftag + (slba - zone->d.zslba));
+ }
+
+ break;
+
+ case NVME_ID_NS_DPS_TYPE_3:
+ if (piremap) {
+ return NVME_INVALID_PROT_INFO | NVME_DNR;
+ }
+
+ break;
+ }
}
status = nvme_check_zone_write(ns, zone, slba, nlb);
@@ -2572,6 +3263,10 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append,
data_offset = nvme_l2b(ns, slba);
+ if (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
+ return nvme_e2e_rw(n, req);
+ }
+
if (!wrz) {
status = nvme_map_data(n, nlb, req);
if (status) {
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 1c76c9583fd5..093662218627 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -43,6 +43,17 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opna
pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'"
+pci_nvme_e2e_rw(uint8_t pract) "pract 0x%"PRIx8""
+pci_nvme_e2e_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'"
+pci_nvme_e2e_rw_mdata_in_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'"
+pci_nvme_e2e_rw_mdata_out_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'"
+pci_nvme_e2e_rw_check_cb(uint16_t cid, uint8_t prinfo, uint16_t apptag, uint16_t appmask, uint32_t reftag) "cid %"PRIu16" prinfo 0x%"PRIx8" apptag 0x%"PRIx16" appmask 0x%"PRIx16" reftag 0x%"PRIx32""
+pci_nvme_e2e_pract_generate_dif(size_t len, size_t lba_size, size_t chksum_len, uint16_t apptag, uint32_t reftag) "len %zu lba_size %zu chksum_len %zu apptag 0x%"PRIx16" reftag 0x%"PRIx32""
+pci_nvme_e2e_check(uint8_t prinfo, uint16_t chksum_len) "prinfo 0x%"PRIx8" chksum_len %"PRIu16""
+pci_nvme_e2e_prchk_disabled(uint16_t apptag, uint32_t reftag) "apptag 0x%"PRIx16" reftag 0x%"PRIx32""
+pci_nvme_e2e_prchk_guard(uint16_t guard, uint16_t crc) "guard 0x%"PRIx16" crc 0x%"PRIx16""
+pci_nvme_e2e_prchk_apptag(uint16_t apptag, uint16_t elbat, uint16_t elbatm) "apptag 0x%"PRIx16" elbat 0x%"PRIx16" elbatm 0x%"PRIx16""
+pci_nvme_e2e_prchk_reftag(uint32_t reftag, uint32_t elbrt) "reftag 0x%"PRIx32" elbrt 0x%"PRIx32""
pci_nvme_copy(uint16_t cid, uint32_t nsid, uint16_t nr, uint8_t format) "cid %"PRIu16" nsid %"PRIu32" nr %"PRIu16" format 0x%"PRIx8""
pci_nvme_copy_source_range(uint64_t slba, uint32_t nlb) "slba 0x%"PRIx64" nlb %"PRIu32""
pci_nvme_copy_in_complete(uint16_t cid) "cid %"PRIu16""
--
2.30.0
^ permalink raw reply related [flat|nested] 14+ messages in thread