* [Qemu-devel] [PATCH 0/2] two more NVMe commands @ 2017-01-30 18:13 Christoph Hellwig 2017-01-30 18:13 ` [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command Christoph Hellwig 2017-01-30 18:13 ` [Qemu-devel] [PATCH 2/2] nvme: Implement Write Zeroes Christoph Hellwig 0 siblings, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2017-01-30 18:13 UTC (permalink / raw) To: qemu-devel; +Cc: keith.busch, qemu-block Hi all, this series implements two more NVMe commands: DSM and Write Zeroes. Both trace their lineage to Keith's qemu-nvme.git repository, and while the Write Zeroes one is taken from there almost literally the DSM one has seen a major rewrite to not block the main thread as well as various other small improvements. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command 2017-01-30 18:13 [Qemu-devel] [PATCH 0/2] two more NVMe commands Christoph Hellwig @ 2017-01-30 18:13 ` Christoph Hellwig 2017-01-30 18:55 ` Keith Busch 2017-02-01 16:40 ` Stefan Hajnoczi 2017-01-30 18:13 ` [Qemu-devel] [PATCH 2/2] nvme: Implement Write Zeroes Christoph Hellwig 1 sibling, 2 replies; 9+ messages in thread From: Christoph Hellwig @ 2017-01-30 18:13 UTC (permalink / raw) To: qemu-devel; +Cc: keith.busch, qemu-block Support deallocating of LBAs using the DSM command by wiring it up to the qemu discard implementation. The other DSM operations which are purely advisory are ignored for now. Based on an implementation by Keith Busch in the qemu-nvme.git repository, but rewritten to use the qemu AIO infrastructure properly to not block the main thread on discard requests, and cleaned up a little bit. Signed-off-by: Christoph Hellwig <hch@lst.de> --- hw/block/nvme.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ hw/block/nvme.h | 1 + 2 files changed, 88 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d479fd2..d2f1d9a 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -227,6 +227,90 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, return NVME_NO_COMPLETE; } +static void nvme_discard_cb(void *opaque, int ret) +{ + NvmeRequest *req = opaque; + NvmeSQueue *sq = req->sq; + NvmeCtrl *n = sq->ctrl; + NvmeCQueue *cq = n->cq[sq->cqid]; + + if (ret) { + req->status = NVME_INTERNAL_DEV_ERROR; + } + + if (--req->aio_inflight > 0) { + return; + } + + nvme_enqueue_req_completion(cq, req); +} + + +static uint16_t nvme_dsm_discard(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + uint16_t nr = (le32_to_cpu(cmd->cdw10) & 0xff) + 1; + uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas); + uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds - BDRV_SECTOR_BITS; + NvmeDsmRange *range; + QEMUSGList qsg; + int i; + + range = g_new(NvmeDsmRange, nr); + + if (nvme_map_prp(&qsg, le64_to_cpu(cmd->prp1), le64_to_cpu(cmd->prp2), + sizeof(range), n)) { + goto out_free_range; + } + + if (dma_buf_write((uint8_t *)range, sizeof(range), &qsg)) { + goto out_destroy_qsg; + } + + qemu_sglist_destroy(&qsg); + + req->status = NVME_SUCCESS; + req->has_sg = false; + req->aio_inflight = 1; + + for (i = 0; i < nr; i++) { + uint64_t slba = le64_to_cpu(range[i].slba); + uint32_t nlb = le32_to_cpu(range[i].nlb); + + if (slba + nlb > le64_to_cpu(ns->id_ns.nsze)) { + return NVME_LBA_RANGE | NVME_DNR; + } + + req->aio_inflight++; + req->aiocb = blk_aio_pdiscard(n->conf.blk, slba << data_shift, + nlb << data_shift, nvme_discard_cb, req); + } + + g_free(range); + + if (--req->aio_inflight > 0) { + return NVME_NO_COMPLETE; + } + + return NVME_SUCCESS; + +out_destroy_qsg: + qemu_sglist_destroy(&qsg); +out_free_range: + g_free(range); + return NVME_INVALID_FIELD | NVME_DNR; +} + +static uint16_t nvme_dsm(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + if (cmd->cdw11 & cpu_to_le32(NVME_DSMGMT_AD)) { + return nvme_dsm_discard(n, ns, cmd, req); + } else { + return NVME_SUCCESS; + } +} + static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, NvmeRequest *req) { @@ -279,6 +363,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) switch (cmd->opcode) { case NVME_CMD_FLUSH: return nvme_flush(n, ns, cmd, req); + case NVME_CMD_DSM: + return nvme_dsm(n, ns, cmd, req); case NVME_CMD_WRITE: case NVME_CMD_READ: return nvme_rw(n, ns, cmd, req); @@ -889,6 +975,7 @@ static int nvme_init(PCIDevice *pci_dev) id->sqes = (0x6 << 4) | 0x6; id->cqes = (0x4 << 4) | 0x4; id->nn = cpu_to_le32(n->num_namespaces); + id->oncs = cpu_to_le16(NVME_ONCS_DSM); id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 8fb0c10..e64a758 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -640,6 +640,7 @@ typedef struct NvmeRequest { BlockAIOCB *aiocb; uint16_t status; bool has_sg; + uint32_t aio_inflight; NvmeCqe cqe; BlockAcctCookie acct; QEMUSGList qsg; -- 2.1.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command 2017-01-30 18:13 ` [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command Christoph Hellwig @ 2017-01-30 18:55 ` Keith Busch 2017-02-01 16:40 ` Stefan Hajnoczi 1 sibling, 0 replies; 9+ messages in thread From: Keith Busch @ 2017-01-30 18:55 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel, qemu-block On Mon, Jan 30, 2017 at 07:13:51PM +0100, Christoph Hellwig wrote: > Support deallocating of LBAs using the DSM command by wiring it up to > the qemu discard implementation. The other DSM operations which are > purely advisory are ignored for now. > > Based on an implementation by Keith Busch in the qemu-nvme.git repository, > but rewritten to use the qemu AIO infrastructure properly to not block > the main thread on discard requests, and cleaned up a little bit. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Thanks for doing this. I've one comment below, but I don't think it should block this unless you see a good way to fix it. Reviewed-by: Keith Busch <keith.busch@intel.com> > + for (i = 0; i < nr; i++) { > + uint64_t slba = le64_to_cpu(range[i].slba); > + uint32_t nlb = le32_to_cpu(range[i].nlb); > + > + if (slba + nlb > le64_to_cpu(ns->id_ns.nsze)) { > + return NVME_LBA_RANGE | NVME_DNR; > + } > + > + req->aio_inflight++; > + req->aiocb = blk_aio_pdiscard(n->conf.blk, slba << data_shift, > + nlb << data_shift, nvme_discard_cb, req); Overwriting the request's aiocb here for a multiple ranges is potentially a problem if we have to cancel the request later. That scenario is extremely unlikely, though, so I don't think it's worth addressing. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command 2017-01-30 18:13 ` [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command Christoph Hellwig 2017-01-30 18:55 ` Keith Busch @ 2017-02-01 16:40 ` Stefan Hajnoczi 2017-02-01 20:29 ` Paolo Bonzini 1 sibling, 1 reply; 9+ messages in thread From: Stefan Hajnoczi @ 2017-02-01 16:40 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel, keith.busch, qemu-block [-- Attachment #1: Type: text/plain, Size: 1897 bytes --] On Mon, Jan 30, 2017 at 07:13:51PM +0100, Christoph Hellwig wrote: > +static uint16_t nvme_dsm_discard(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, > + NvmeRequest *req) > +{ > + uint16_t nr = (le32_to_cpu(cmd->cdw10) & 0xff) + 1; > + uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas); > + uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds - BDRV_SECTOR_BITS; > + NvmeDsmRange *range; > + QEMUSGList qsg; > + int i; > + > + range = g_new(NvmeDsmRange, nr); > + > + if (nvme_map_prp(&qsg, le64_to_cpu(cmd->prp1), le64_to_cpu(cmd->prp2), > + sizeof(range), n)) { > + goto out_free_range; > + } > + > + if (dma_buf_write((uint8_t *)range, sizeof(range), &qsg)) { Did you mean sizeof(*range) * nr? > + goto out_destroy_qsg; > + } > + > + qemu_sglist_destroy(&qsg); > + > + req->status = NVME_SUCCESS; > + req->has_sg = false; > + req->aio_inflight = 1; > + > + for (i = 0; i < nr; i++) { > + uint64_t slba = le64_to_cpu(range[i].slba); > + uint32_t nlb = le32_to_cpu(range[i].nlb); > + > + if (slba + nlb > le64_to_cpu(ns->id_ns.nsze)) { > + return NVME_LBA_RANGE | NVME_DNR; aio requests are potentially in flight, range is still allocated, and req->aio_inflight still needs to be decremented once. Is there cleanup code missing here at least so range will be freed? > + } > + > + req->aio_inflight++; > + req->aiocb = blk_aio_pdiscard(n->conf.blk, slba << data_shift, > + nlb << data_shift, nvme_discard_cb, req); > + } > + > + g_free(range); > + > + if (--req->aio_inflight > 0) { > + return NVME_NO_COMPLETE; > + } > + > + return NVME_SUCCESS; > + > +out_destroy_qsg: > + qemu_sglist_destroy(&qsg); > +out_free_range: > + g_free(range); > + return NVME_INVALID_FIELD | NVME_DNR; > +} [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command 2017-02-01 16:40 ` Stefan Hajnoczi @ 2017-02-01 20:29 ` Paolo Bonzini 2017-02-02 10:17 ` Stefan Hajnoczi 0 siblings, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2017-02-01 20:29 UTC (permalink / raw) To: Stefan Hajnoczi, Christoph Hellwig; +Cc: keith.busch, qemu-devel, qemu-block On 01/02/2017 08:40, Stefan Hajnoczi wrote: >> + range = g_new(NvmeDsmRange, nr); >> + >> + if (nvme_map_prp(&qsg, le64_to_cpu(cmd->prp1), le64_to_cpu(cmd->prp2), >> + sizeof(range), n)) { This should be sizeof(*range) * nr, like the DMA below. >> + goto out_free_range; >> + } >> + >> + if (dma_buf_write((uint8_t *)range, sizeof(range), &qsg)) { > > Did you mean sizeof(*range) * nr? Did you also mean dma_buf_read (you want to read from device to range)? Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command 2017-02-01 20:29 ` Paolo Bonzini @ 2017-02-02 10:17 ` Stefan Hajnoczi 2017-02-02 12:04 ` Paolo Bonzini 0 siblings, 1 reply; 9+ messages in thread From: Stefan Hajnoczi @ 2017-02-02 10:17 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Christoph Hellwig, keith.busch, qemu-devel, qemu-block [-- Attachment #1: Type: text/plain, Size: 543 bytes --] On Wed, Feb 01, 2017 at 12:29:22PM -0800, Paolo Bonzini wrote: > On 01/02/2017 08:40, Stefan Hajnoczi wrote: > >> + goto out_free_range; > >> + } > >> + > >> + if (dma_buf_write((uint8_t *)range, sizeof(range), &qsg)) { > > > > Did you mean sizeof(*range) * nr? > > Did you also mean dma_buf_read (you want to read from device to range)? uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg) { return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE); } DMA_DIRECTION_TO_DEVICE seems correct. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command 2017-02-02 10:17 ` Stefan Hajnoczi @ 2017-02-02 12:04 ` Paolo Bonzini 0 siblings, 0 replies; 9+ messages in thread From: Paolo Bonzini @ 2017-02-02 12:04 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Christoph Hellwig, keith busch, qemu-devel, qemu-block > On Wed, Feb 01, 2017 at 12:29:22PM -0800, Paolo Bonzini wrote: > > On 01/02/2017 08:40, Stefan Hajnoczi wrote: > > >> + goto out_free_range; > > >> + } > > >> + > > >> + if (dma_buf_write((uint8_t *)range, sizeof(range), &qsg)) { > > > > > > Did you mean sizeof(*range) * nr? > > > > Did you also mean dma_buf_read (you want to read from device to range)? > > uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg) > { > return dma_buf_rw(ptr, len, sg, DMA_DIRECTION_TO_DEVICE); > } > > DMA_DIRECTION_TO_DEVICE seems correct. Yes, it is. Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 2/2] nvme: Implement Write Zeroes 2017-01-30 18:13 [Qemu-devel] [PATCH 0/2] two more NVMe commands Christoph Hellwig 2017-01-30 18:13 ` [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command Christoph Hellwig @ 2017-01-30 18:13 ` Christoph Hellwig 2017-02-01 16:41 ` Stefan Hajnoczi 1 sibling, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2017-01-30 18:13 UTC (permalink / raw) To: qemu-devel; +Cc: keith.busch, qemu-block From: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: ported over from qemu-nvme.git to mainline] Signed-off-by: Christoph Hellwig <hch@lst.de> --- hw/block/nvme.c | 27 ++++++++++++++++++++++++++- hw/block/nvme.h | 1 + 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d2f1d9a..6dac9ce 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -311,6 +311,29 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, } } +static uint16_t nvme_write_zeros(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, + NvmeRequest *req) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)cmd; + const uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas); + const uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds; + uint64_t slba = le64_to_cpu(rw->slba); + uint32_t nlb = le16_to_cpu(rw->nlb) + 1; + uint64_t aio_slba = slba << (data_shift - BDRV_SECTOR_BITS); + uint32_t aio_nlb = nlb << (data_shift - BDRV_SECTOR_BITS); + + if (slba + nlb > ns->id_ns.nsze) { + return NVME_LBA_RANGE | NVME_DNR; + } + + req->has_sg = false; + block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0, + BLOCK_ACCT_WRITE); + req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, aio_slba, aio_nlb, 0, + nvme_rw_cb, req); + return NVME_NO_COMPLETE; +} + static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd, NvmeRequest *req) { @@ -365,6 +388,8 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req) return nvme_flush(n, ns, cmd, req); case NVME_CMD_DSM: return nvme_dsm(n, ns, cmd, req); + case NVME_CMD_WRITE_ZEROS: + return nvme_write_zeros(n, ns, cmd, req); case NVME_CMD_WRITE: case NVME_CMD_READ: return nvme_rw(n, ns, cmd, req); @@ -975,7 +1000,7 @@ static int nvme_init(PCIDevice *pci_dev) id->sqes = (0x6 << 4) | 0x6; id->cqes = (0x4 << 4) | 0x4; id->nn = cpu_to_le32(n->num_namespaces); - id->oncs = cpu_to_le16(NVME_ONCS_DSM); + id->oncs = cpu_to_le16(NVME_ONCS_DSM | NVME_ONCS_WRITE_ZEROS); id->psd[0].mp = cpu_to_le16(0x9c4); id->psd[0].enlat = cpu_to_le32(0x10); id->psd[0].exlat = cpu_to_le32(0x4); diff --git a/hw/block/nvme.h b/hw/block/nvme.h index e64a758..f3054b3 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -179,6 +179,7 @@ enum NvmeIoCommands { NVME_CMD_READ = 0x02, NVME_CMD_WRITE_UNCOR = 0x04, NVME_CMD_COMPARE = 0x05, + NVME_CMD_WRITE_ZEROS = 0x08, NVME_CMD_DSM = 0x09, }; -- 2.1.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] nvme: Implement Write Zeroes 2017-01-30 18:13 ` [Qemu-devel] [PATCH 2/2] nvme: Implement Write Zeroes Christoph Hellwig @ 2017-02-01 16:41 ` Stefan Hajnoczi 0 siblings, 0 replies; 9+ messages in thread From: Stefan Hajnoczi @ 2017-02-01 16:41 UTC (permalink / raw) To: Christoph Hellwig; +Cc: qemu-devel, keith.busch, qemu-block [-- Attachment #1: Type: text/plain, Size: 467 bytes --] On Mon, Jan 30, 2017 at 07:13:52PM +0100, Christoph Hellwig wrote: > From: Keith Busch <keith.busch@intel.com> > > Signed-off-by: Keith Busch <keith.busch@intel.com> > [hch: ported over from qemu-nvme.git to mainline] > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > hw/block/nvme.c | 27 ++++++++++++++++++++++++++- > hw/block/nvme.h | 1 + > 2 files changed, 27 insertions(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-02-02 12:04 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-30 18:13 [Qemu-devel] [PATCH 0/2] two more NVMe commands Christoph Hellwig 2017-01-30 18:13 ` [Qemu-devel] [PATCH 1/2] nvme: implement the DSM command Christoph Hellwig 2017-01-30 18:55 ` Keith Busch 2017-02-01 16:40 ` Stefan Hajnoczi 2017-02-01 20:29 ` Paolo Bonzini 2017-02-02 10:17 ` Stefan Hajnoczi 2017-02-02 12:04 ` Paolo Bonzini 2017-01-30 18:13 ` [Qemu-devel] [PATCH 2/2] nvme: Implement Write Zeroes Christoph Hellwig 2017-02-01 16:41 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).