* [PATCH 0/3] Fix request completion holes
@ 2017-10-31 8:55 Sagi Grimberg
2017-10-31 8:55 ` [PATCH 1/3] nvme-rdma: don't suppress send completions Sagi Grimberg
` (3 more replies)
0 siblings, 4 replies; 23+ messages in thread
From: Sagi Grimberg @ 2017-10-31 8:55 UTC (permalink / raw)
We have two holes in nvme-rdma when completing request.
1. We never wait for send work request to complete before completing
a request. It is possible that the HCA retries a send operation (due
to dropped ack) after the nvme cqe has already arrived back to the host.
If we unmap the host buffer upon reception of the cqe, the HCA might
get iommu errors when attempting to access an unmapped host buffer.
We must wait also for the send completion before completing a request,
most of the time it will be before the nvme cqe has arrived back so
we pay only for the extra cq entry processing.
2. We don't wait for the request memory region to be fully invalidated
in case the target didn't invalidate remotely. We must wait for the local
invalidation to complete before completing the request.
Sagi Grimberg (3):
nvme-rdma: don't suppress send completions
nvme-rdma: don't complete requests before a send work request has
completed
nvme-rdma: wait for local invalidation before completing a request
drivers/nvme/host/rdma.c | 101 +++++++++++++++++++++++------------------------
1 file changed, 50 insertions(+), 51 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 1/3] nvme-rdma: don't suppress send completions 2017-10-31 8:55 [PATCH 0/3] Fix request completion holes Sagi Grimberg @ 2017-10-31 8:55 ` Sagi Grimberg 2017-10-31 8:55 ` [PATCH 2/3] nvme-rdma: don't complete requests before a send work request has completed Sagi Grimberg ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Sagi Grimberg @ 2017-10-31 8:55 UTC (permalink / raw) The entire completions suppress mechanism is currently broken because the HCA might retry a send operation (due to dropped ack) after the nvme transaction has completed. In order to handle this, we signal all send completions (besides async event which is not racing anything). Signed-off-by: Sagi Grimberg <sagi at grimberg.me> --- drivers/nvme/host/rdma.c | 40 ++++------------------------------------ 1 file changed, 4 insertions(+), 36 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 53ad241493a1..ccbae327fe72 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1001,21 +1001,9 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc) nvme_rdma_wr_error(cq, wc, "SEND"); } -/* - * We want to signal completion at least every queue depth/2. This returns the - * largest power of two that is not above half of (queue size + 1) to optimize - * (avoid divisions). - */ -static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue) -{ - int limit = 1 << ilog2((queue->queue_size + 1) / 2); - - return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0; -} - static int nvme_rdma_post_send(struct nvme_rdma_queue *queue, struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge, - struct ib_send_wr *first, bool flush) + struct ib_send_wr *first) { struct ib_send_wr wr, *bad_wr; int ret; @@ -1031,24 +1019,7 @@ static int nvme_rdma_post_send(struct nvme_rdma_queue *queue, wr.sg_list = sge; wr.num_sge = num_sge; wr.opcode = IB_WR_SEND; - wr.send_flags = 0; - - /* - * Unsignalled send completions are another giant desaster in the - * IB Verbs spec: If we don't regularly post signalled sends - * the send queue will fill up and only a QP reset will rescue us. - * Would have been way to obvious to handle this in hardware or - * at least the RDMA stack.. - * - * Always signal the flushes. The magic request used for the flush - * sequencer is not allocated in our driver's tagset and it's - * triggered to be freed by blk_cleanup_queue(). So we need to - * always mark it as signaled to ensure that the "wr_cqe", which is - * embedded in request's payload, is not freed when __ib_process_cq() - * calls wr_cqe->done(). - */ - if (nvme_rdma_queue_sig_limit(queue) || flush) - wr.send_flags |= IB_SEND_SIGNALED; + wr.send_flags = IB_SEND_SIGNALED; if (first) first->next = ≀ @@ -1122,7 +1093,7 @@ static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg, int aer_idx) ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(*cmd), DMA_TO_DEVICE); - ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL, false); + ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL); WARN_ON_ONCE(ret); } @@ -1407,7 +1378,6 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); struct nvme_rdma_qe *sqe = &req->sqe; struct nvme_command *c = sqe->data; - bool flush = false; struct ib_device *dev; blk_status_t ret; int err; @@ -1439,10 +1409,8 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(struct nvme_command), DMA_TO_DEVICE); - if (req_op(rq) == REQ_OP_FLUSH) - flush = true; err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge, - req->mr->need_inval ? &req->reg_wr.wr : NULL, flush); + req->mr->need_inval ? &req->reg_wr.wr : NULL); if (unlikely(err)) { nvme_rdma_unmap_data(queue, rq); goto err; -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/3] nvme-rdma: don't complete requests before a send work request has completed 2017-10-31 8:55 [PATCH 0/3] Fix request completion holes Sagi Grimberg 2017-10-31 8:55 ` [PATCH 1/3] nvme-rdma: don't suppress send completions Sagi Grimberg @ 2017-10-31 8:55 ` Sagi Grimberg 2017-10-31 8:55 ` [PATCH 3/3] nvme-rdma: wait for local invalidation before completing a request Sagi Grimberg 2017-10-31 9:38 ` [PATCH 0/3] Fix request completion holes Max Gurtovoy 3 siblings, 0 replies; 23+ messages in thread From: Sagi Grimberg @ 2017-10-31 8:55 UTC (permalink / raw) In order to guarantee that the HCA will never get an access violation (either from invalidated rkey or from iommu) when retrying a send operation we must complete a request only when both send completion and the nvme cqe has arrived. Only then we are safe to invalidate the rkey (if needed), unmap the host buffers, and complete the IO. Signed-off-by: Sagi Grimberg <sagi at grimberg.me> --- drivers/nvme/host/rdma.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index ccbae327fe72..ae1fb66358f7 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -67,6 +67,9 @@ struct nvme_rdma_request { struct nvme_request req; struct ib_mr *mr; struct nvme_rdma_qe sqe; + struct nvme_completion cqe; + bool send_completed; + bool resp_completed; struct ib_sge sge[1 + NVME_RDMA_MAX_INLINE_SEGMENTS]; u32 num_sge; int nents; @@ -961,6 +964,8 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue, req->num_sge = 1; req->inline_data = false; req->mr->need_inval = false; + req->send_completed = false; + req->resp_completed = false; c->common.flags |= NVME_CMD_SGL_METABUF; @@ -997,13 +1002,25 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue, static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc) { - if (unlikely(wc->status != IB_WC_SUCCESS)) + struct nvme_rdma_qe *qe = + container_of(wc->wr_cqe, struct nvme_rdma_qe, cqe); + struct nvme_rdma_request *req = + container_of(qe, struct nvme_rdma_request, sqe); + struct request *rq = blk_mq_rq_from_pdu(req); + + if (unlikely(wc->status != IB_WC_SUCCESS)) { nvme_rdma_wr_error(cq, wc, "SEND"); + return; + } + + req->send_completed = true; + if (req->resp_completed) + nvme_end_request(rq, req->cqe.status, req->cqe.result); } static int nvme_rdma_post_send(struct nvme_rdma_queue *queue, struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge, - struct ib_send_wr *first) + struct ib_send_wr *first, bool signal) { struct ib_send_wr wr, *bad_wr; int ret; @@ -1019,7 +1036,7 @@ static int nvme_rdma_post_send(struct nvme_rdma_queue *queue, wr.sg_list = sge; wr.num_sge = num_sge; wr.opcode = IB_WR_SEND; - wr.send_flags = IB_SEND_SIGNALED; + wr.send_flags = signal ? IB_SEND_SIGNALED : 0; if (first) first->next = ≀ @@ -1093,7 +1110,7 @@ static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg, int aer_idx) ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(*cmd), DMA_TO_DEVICE); - ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL); + ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL, false); WARN_ON_ONCE(ret); } @@ -1117,11 +1134,17 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue, if (rq->tag == tag) ret = 1; + req->cqe.status = cqe->status; + req->cqe.result = cqe->result; + req->resp_completed = true; + if ((wc->wc_flags & IB_WC_WITH_INVALIDATE) && wc->ex.invalidate_rkey == req->mr->rkey) req->mr->need_inval = false; - nvme_end_request(rq, cqe->status, cqe->result); + if (req->send_completed) + nvme_end_request(rq, req->cqe.status, req->cqe.result); + return ret; } @@ -1410,7 +1433,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx, sizeof(struct nvme_command), DMA_TO_DEVICE); err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge, - req->mr->need_inval ? &req->reg_wr.wr : NULL); + req->mr->need_inval ? &req->reg_wr.wr : NULL, true); if (unlikely(err)) { nvme_rdma_unmap_data(queue, rq); goto err; -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/3] nvme-rdma: wait for local invalidation before completing a request 2017-10-31 8:55 [PATCH 0/3] Fix request completion holes Sagi Grimberg 2017-10-31 8:55 ` [PATCH 1/3] nvme-rdma: don't suppress send completions Sagi Grimberg 2017-10-31 8:55 ` [PATCH 2/3] nvme-rdma: don't complete requests before a send work request has completed Sagi Grimberg @ 2017-10-31 8:55 ` Sagi Grimberg 2017-10-31 9:38 ` [PATCH 0/3] Fix request completion holes Max Gurtovoy 3 siblings, 0 replies; 23+ messages in thread From: Sagi Grimberg @ 2017-10-31 8:55 UTC (permalink / raw) We must not complete a request before the host memory region is invalidated. Luckily we have send with invalidate protocol support so we usually don't need to execute it, but in case the target did not invalidate a memory region for us, we must wait for the invalidation to complete before unmapping host memory and completing the I/O. Signed-off-by: Sagi Grimberg <sagi at grimberg.me> --- drivers/nvme/host/rdma.c | 42 +++++++++++++++++++++++++----------------- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index ae1fb66358f7..b7e0fb0fe913 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -818,8 +818,19 @@ static void nvme_rdma_memreg_done(struct ib_cq *cq, struct ib_wc *wc) static void nvme_rdma_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc) { - if (unlikely(wc->status != IB_WC_SUCCESS)) + struct nvme_rdma_request *req = + container_of(wc->wr_cqe, struct nvme_rdma_request, reg_cqe); + struct request *rq = blk_mq_rq_from_pdu(req); + + if (unlikely(wc->status != IB_WC_SUCCESS)) { nvme_rdma_wr_error(cq, wc, "LOCAL_INV"); + return; + } + + req->mr->need_inval = false; + if (req->resp_completed && req->send_completed) + nvme_end_request(rq, req->cqe.status, req->cqe.result); + } static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue, @@ -830,7 +841,7 @@ static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue, .opcode = IB_WR_LOCAL_INV, .next = NULL, .num_sge = 0, - .send_flags = 0, + .send_flags = IB_SEND_SIGNALED, .ex.invalidate_rkey = req->mr->rkey, }; @@ -844,24 +855,12 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue, struct request *rq) { struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); - struct nvme_rdma_ctrl *ctrl = queue->ctrl; struct nvme_rdma_device *dev = queue->device; struct ib_device *ibdev = dev->dev; - int res; if (!blk_rq_bytes(rq)) return; - if (req->mr->need_inval && test_bit(NVME_RDMA_Q_LIVE, &req->queue->flags)) { - res = nvme_rdma_inv_rkey(queue, req); - if (unlikely(res < 0)) { - dev_err(ctrl->ctrl.device, - "Queueing INV WR for rkey %#x failed (%d)\n", - req->mr->rkey, res); - nvmf_error_recovery(&queue->ctrl->ctrl); - } - } - ib_dma_unmap_sg(ibdev, req->sg_table.sgl, req->nents, rq_data_dir(rq) == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE); @@ -1014,7 +1013,7 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc) } req->send_completed = true; - if (req->resp_completed) + if (req->resp_completed && !req->mr->need_inval) nvme_end_request(rq, req->cqe.status, req->cqe.result); } @@ -1139,10 +1138,19 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue, req->resp_completed = true; if ((wc->wc_flags & IB_WC_WITH_INVALIDATE) && - wc->ex.invalidate_rkey == req->mr->rkey) + wc->ex.invalidate_rkey == req->mr->rkey) { req->mr->need_inval = false; + } else if (req->mr->need_inval) { + ret = nvme_rdma_inv_rkey(queue, req); + if (unlikely(ret < 0)) { + dev_err(queue->ctrl->ctrl.device, + "Queueing INV WR for rkey %#x failed (%d)\n", + req->mr->rkey, ret); + nvmf_error_recovery(&queue->ctrl->ctrl); + } + } - if (req->send_completed) + if (req->send_completed && !req->mr->need_inval) nvme_end_request(rq, req->cqe.status, req->cqe.result); return ret; -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-10-31 8:55 [PATCH 0/3] Fix request completion holes Sagi Grimberg ` (2 preceding siblings ...) 2017-10-31 8:55 ` [PATCH 3/3] nvme-rdma: wait for local invalidation before completing a request Sagi Grimberg @ 2017-10-31 9:38 ` Max Gurtovoy 2017-10-31 11:10 ` Sagi Grimberg 3 siblings, 1 reply; 23+ messages in thread From: Max Gurtovoy @ 2017-10-31 9:38 UTC (permalink / raw) On 10/31/2017 10:55 AM, Sagi Grimberg wrote: > We have two holes in nvme-rdma when completing request. > Thanks Sagi. We'll run these patches in our lab + our patches for the performance improvments that we sent few days ago and update the results. I guess I'll need to run it with my cq moderation patches to get less penalty in performance. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-10-31 9:38 ` [PATCH 0/3] Fix request completion holes Max Gurtovoy @ 2017-10-31 11:10 ` Sagi Grimberg 2017-11-01 16:02 ` idanb 0 siblings, 1 reply; 23+ messages in thread From: Sagi Grimberg @ 2017-10-31 11:10 UTC (permalink / raw) > Thanks Sagi. > We'll run these patches in our lab + our patches for the performance > improvments that we sent few days ago and update the results. Thanks, > I guess I'll need to run it with my cq moderation patches to get less > penalty in performance. Fine, but just a heads up, any cq moderation that is not adaptive is not something that we'd like to pursue. I have a generic adaptive moderation library in the works, so any ideas should be integrated into that (or something equivalent). ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-10-31 11:10 ` Sagi Grimberg @ 2017-11-01 16:02 ` idanb 2017-11-01 16:09 ` Max Gurtovoy ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: idanb @ 2017-11-01 16:02 UTC (permalink / raw) Sagi, Thanks for you patches, Max and me ran your patches adding completion signaling for both send (command capsule) and local invalidation request. The results were significant reduction of IOPs due to interrupts and completion processing, see below our results for READ/WRITE IOPs (512B, 64 Jobs): (1) No send signal and no invalidation signal w/ moderation 7.4M / 11.9M (2) No send signal and no invalidation signal w/o moderation 7.1M / 8.7M (3) No send signal with invalidation signal w/o moderation 5.2M / 8.3M (4) Send signal with invalidation signal w/o moderation 3.4M / 4.6M (5) Send signal with invalidation signal w/ moderation 6.2M / 11.8M I understand that (1), (2) and (3) are invalid, the performance reduction induce by mode (4) need to motivate us to enable cqe/event moderation as soon as possible. On 10/31/2017 1:10 PM, Sagi Grimberg wrote: >> Thanks Sagi. >> We'll run these patches in our lab + our patches for the performance >> improvments that we sent few days ago and update the results. > > Thanks, > >> I guess I'll need to run it with my cq moderation patches to get less >> penalty in performance. > > Fine, but just a heads up, any cq moderation that is not adaptive is > not something > that we'd like to pursue. I have a generic adaptive moderation library > in the > works, so any ideas should be integrated into that (or something > equivalent). ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 16:02 ` idanb @ 2017-11-01 16:09 ` Max Gurtovoy 2017-11-01 16:50 ` Jason Gunthorpe 2017-11-01 17:26 ` Sagi Grimberg 2 siblings, 0 replies; 23+ messages in thread From: Max Gurtovoy @ 2017-11-01 16:09 UTC (permalink / raw) On 11/1/2017 6:02 PM, idanb wrote: > Sagi, > > Thanks for you patches, Max and me ran your patches adding completion > signaling for both send (command capsule) and local invalidation request. > > The results were significant reduction of IOPs due to interrupts and > completion processing, see below our results for READ/WRITE IOPs (512B, > 64 Jobs): > > (1) No send signal and no invalidation signal w/ moderation 7.4M / 11.9M > > (2) No send signal and no invalidation signal w/o moderation 7.1M / 8.7M > > (3) No send signal with invalidation signal w/o moderation 5.2M / 8.3M > > (4) Send signal with invalidation signal w/o moderation 3.4M / 4.6M > > (5) Send signal with invalidation signal w/ moderation 6.2M / 11.8M > > > I understand that (1), (2) and (3) are invalid, the performance > reduction induce by mode (4) need to motivate us to enable cqe/event > moderation as soon as possible. > attached the full report we ran with Sagi's patches (+ cq moderation cases with cq_completions=32 cq_timeout=16 + old send signal [once in queue_size/2]) -------------- next part -------------- READ: Signal with moderation +++++++++++++++++++++++++++++++ read (512B, 32 CPUs, 32 jobs, iodepth=128): io=61146MB, bw=3057.2MB/s, iops=6261.9K, runt= 20001msec read (1k, 32 CPUs, 32 jobs, iodepth=128) : io=110943MB, bw=5546.1MB/s, iops=5680.2K, runt= 20001msec read (2k, 32 CPUs, 32 jobs, iodepth=128): io=163192MB, bw=8159.3MB/s, iops=4177.6K, runt= 20001msec read (4k, 32 CPUs, 32 jobs, iodepth=128): io=177120MB, bw=8855.2MB/s, iops=2266.1K, runt= 20002msec read (8k, 32 CPUs, 32 jobs, iodepth=128): io=187884MB, bw=9392.9MB/s, iops=1202.3K, runt= 20003msec read (16k, 32 CPUs, 32 jobs, iodepth=128): io=190151MB, bw=9505.7MB/s, iops=608359, runt= 20004msec read (32k, 32 CPUs, 32 jobs, iodepth=128): io=191209MB, bw=9556.7MB/s, iops=305812, runt= 20008msec read (64k, 32 CPUs, 32 jobs, iodepth=128): io=192417MB, bw=9611.8MB/s, iops=153787, runt= 20019msec read (512B, 32 CPUs, 64 jobs, iodepth=128): io=34871MB, bw=1743.5MB/s, iops=3570.7K, runt= 20001msec read (1k, 32 CPUs, 64 jobs, iodepth=128): io=106594MB, bw=5329.2MB/s, iops=5457.8K, runt= 20002msec read (2k, 32 CPUs, 64 jobs, iodepth=128): io=153952MB, bw=7696.9MB/s, iops=3940.8K, runt= 20002msec read (4k, 32 CPUs, 64 jobs, iodepth=128): io=171229MB, bw=8560.2MB/s, iops=2191.4K, runt= 20003msec read (8k, 32 CPUs, 64 jobs, iodepth=128): io=178982MB, bw=8946.9MB/s, iops=1145.2K, runt= 20005msec read (16k, 32 CPUs, 64 jobs, iodepth=128): io=182606MB, bw=9126.2MB/s, iops=584076, runt= 20009msec read (32k, 32 CPUs, 64 jobs, iodepth=128): io=183271MB, bw=9157.2MB/s, iops=293028, runt= 20014msec read (64k, 32 CPUs, 64 jobs, iodepth=128): io=169174MB, bw=8447.3MB/s, iops=135156, runt= 20027msec READ: Signal without moderation ++++++++++++++++++++++++++++++++ read : io=33410MB, bw=1670.5MB/s, iops=3420.1K, runt= 20001msec read : io=66287MB, bw=3314.2MB/s, iops=3393.8K, runt= 20001msec read : io=128976MB, bw=6448.5MB/s, iops=3301.7K, runt= 20001msec read : io=172651MB, bw=8631.8MB/s, iops=2209.8K, runt= 20002msec read : io=181532MB, bw=9076.2MB/s, iops=1161.8K, runt= 20001msec read : io=194935MB, bw=9744.9MB/s, iops=623667, runt= 20004msec read : io=192299MB, bw=9610.7MB/s, iops=307540, runt= 20009msec read : io=192919MB, bw=9635.9MB/s, iops=154173, runt= 20021msec read : io=31905MB, bw=1595.2MB/s, iops=3266.1K, runt= 20001msec read : io=69284MB, bw=3464.2MB/s, iops=3547.2K, runt= 20001msec read : io=146078MB, bw=7303.2MB/s, iops=3739.3K, runt= 20002msec read : io=168585MB, bw=8428.4MB/s, iops=2157.6K, runt= 20003msec read : io=181224MB, bw=9058.1MB/s, iops=1159.6K, runt= 20005msec read : io=184970MB, bw=9244.4MB/s, iops=591638, runt= 20009msec read : io=184823MB, bw=9234.8MB/s, iops=295510, runt= 20014msec read : io=175313MB, bw=8753.4MB/s, iops=140054, runt= 20028msec READ: No SEND Signal with moderation (with signal on local invalidation) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ read : io=61968MB, bw=3098.3MB/s, iops=6345.3K, runt= 20001msec read : io=112583MB, bw=5628.9MB/s, iops=5763.1K, runt= 20001msec read : io=167024MB, bw=8350.9MB/s, iops=4275.7K, runt= 20001msec read : io=181638MB, bw=9081.1MB/s, iops=2324.8K, runt= 20002msec read : io=185373MB, bw=9267.3MB/s, iops=1186.3K, runt= 20003msec read : io=191548MB, bw=9575.6MB/s, iops=612832, runt= 20004msec read : io=190085MB, bw=9499.2MB/s, iops=303999, runt= 20009msec read : io=192519MB, bw=9616.9MB/s, iops=153868, runt= 20019msec read : io=58495MB, bw=2924.6MB/s, iops=5989.6K, runt= 20001msec read : io=109628MB, bw=5480.9MB/s, iops=5612.4K, runt= 20002msec read : io=156339MB, bw=7816.6MB/s, iops=4002.8K, runt= 20001msec read : io=171421MB, bw=8569.9MB/s, iops=2193.9K, runt= 20003msec read : io=179565MB, bw=8976.3MB/s, iops=1148.1K, runt= 20005msec read : io=182728MB, bw=9132.8MB/s, iops=584495, runt= 20008msec read : io=184840MB, bw=9235.2MB/s, iops=295551, runt= 20013msec read : io=180544MB, bw=9015.8MB/s, iops=144240, runt= 20027msec READ: No SEND Signal without moderation (with signal on local invalidation) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ read : io=40289MB, bw=2014.4MB/s, iops=4125.5K, runt= 20001msec read : io=78689MB, bw=3934.3MB/s, iops=4028.7K, runt= 20001msec read : io=143805MB, bw=7189.1MB/s, iops=3681.3K, runt= 20001msec read : io=173558MB, bw=8677.4MB/s, iops=2221.4K, runt= 20002msec read : io=186197MB, bw=9308.5MB/s, iops=1191.5K, runt= 20003msec read : io=192793MB, bw=9637.8MB/s, iops=616812, runt= 20004msec read : io=192399MB, bw=9616.1MB/s, iops=307714, runt= 20008msec read : io=194790MB, bw=9731.8MB/s, iops=155707, runt= 20016msec read : io=50862MB, bw=2542.2MB/s, iops=5208.3K, runt= 20001msec read : io=103616MB, bw=5180.6MB/s, iops=5304.9K, runt= 20001msec read : io=150399MB, bw=7519.3MB/s, iops=3849.9K, runt= 20002msec read : io=173678MB, bw=8682.2MB/s, iops=2222.7K, runt= 20004msec read : io=181481MB, bw=9071.9MB/s, iops=1161.2K, runt= 20005msec read : io=186601MB, bw=9326.4MB/s, iops=596884, runt= 20008msec read : io=187268MB, bw=9356.9MB/s, iops=299419, runt= 20014msec read : io=176620MB, bw=8817.4MB/s, iops=141077, runt= 20031msec Write: Signal with moderation +++++++++++++++++++++++++++++++ write: io=115326MB, bw=5766.2MB/s, iops=11809K, runt= 20001msec write: io=148240MB, bw=7411.7MB/s, iops=7589.6K, runt= 20001msec write: io=161919MB, bw=8095.6MB/s, iops=4144.1K, runt= 20001msec write: io=171162MB, bw=8557.3MB/s, iops=2190.7K, runt= 20002msec write: io=199720MB, bw=9984.6MB/s, iops=1278.2K, runt= 20003msec write: io=206792MB, bw=10338MB/s, iops=661600, runt= 20004msec write: io=209549MB, bw=10474MB/s, iops=335161, runt= 20007msec write: io=208180MB, bw=10402MB/s, iops=166427, runt= 20014msec write: io=46900MB, bw=2344.1MB/s, iops=4802.4K, runt= 20001msec write: io=105243MB, bw=5261.1MB/s, iops=5388.2K, runt= 20001msec write: io=160543MB, bw=8026.4MB/s, iops=4109.6K, runt= 20002msec write: io=174457MB, bw=8721.2MB/s, iops=2232.7K, runt= 20004msec write: io=192598MB, bw=9627.2MB/s, iops=1232.4K, runt= 20004msec write: io=201587MB, bw=10076MB/s, iops=644851, runt= 20007msec write: io=205567MB, bw=10272MB/s, iops=328693, runt= 20013msec write: io=208508MB, bw=10412MB/s, iops=166590, runt= 20026msec Write: Signal without moderation +++++++++++++++++++++++++++++++++ write: io=43445MB, bw=2172.2MB/s, iops=4448.6K, runt= 20001msec write: io=93361MB, bw=4667.9MB/s, iops=4779.9K, runt= 20001msec write: io=160929MB, bw=8046.6MB/s, iops=4119.6K, runt= 20001msec write: io=171522MB, bw=8575.3MB/s, iops=2195.3K, runt= 20002msec write: io=198314MB, bw=9914.2MB/s, iops=1269.2K, runt= 20003msec write: io=206332MB, bw=10315MB/s, iops=660129, runt= 20004msec write: io=207733MB, bw=10383MB/s, iops=332240, runt= 20008msec write: io=208141MB, bw=10399MB/s, iops=166379, runt= 20016msec write: io=45763MB, bw=2288.2MB/s, iops=4685.9K, runt= 20001msec write: io=100083MB, bw=5003.1MB/s, iops=5123.1K, runt= 20001msec write: io=159084MB, bw=7953.5MB/s, iops=4072.2K, runt= 20002msec write: io=173523MB, bw=8674.5MB/s, iops=2220.7K, runt= 20004msec write: io=189560MB, bw=9476.2MB/s, iops=1212.1K, runt= 20004msec write: io=199250MB, bw=9958.6MB/s, iops=637344, runt= 20008msec write: io=204977MB, bw=10243MB/s, iops=327767, runt= 20012msec write: io=208619MB, bw=10417MB/s, iops=166678, runt= 20026msec Write: No SEND Signal with moderation (with signal on local invalidation) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ write: io=116784MB, bw=5838.1MB/s, iops=11958K, runt= 20001msec write: io=143658MB, bw=7182.6MB/s, iops=7354.1K, runt= 20001msec write: io=161343MB, bw=8066.4MB/s, iops=4129.1K, runt= 20002msec write: io=170761MB, bw=8537.3MB/s, iops=2185.6K, runt= 20002msec write: io=199668MB, bw=9982.5MB/s, iops=1277.8K, runt= 20002msec write: io=207307MB, bw=10363MB/s, iops=663248, runt= 20004msec write: io=209903MB, bw=10491MB/s, iops=335727, runt= 20007msec write: io=208268MB, bw=10406MB/s, iops=166489, runt= 20015msec write: io=87286MB, bw=4364.1MB/s, iops=8937.7K, runt= 20001msec write: io=145686MB, bw=7283.6MB/s, iops=7458.4K, runt= 20002msec write: io=160259MB, bw=8012.2MB/s, iops=4102.3K, runt= 20002msec write: io=174527MB, bw=8724.7MB/s, iops=2233.5K, runt= 20004msec write: io=193620MB, bw=9679.5MB/s, iops=1238.1K, runt= 20004msec write: io=202051MB, bw=10099MB/s, iops=646336, runt= 20007msec write: io=206876MB, bw=10337MB/s, iops=330770, runt= 20014msec write: io=208138MB, bw=10394MB/s, iops=166302, runt= 20025msec Write: No SEND Signal without moderation (with signal on local invalidation) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ write: io=73271MB, bw=3663.4MB/s, iops=7502.6K, runt= 20001msec write: io=129638MB, bw=6481.6MB/s, iops=6637.2K, runt= 20001msec write: io=161002MB, bw=8049.3MB/s, iops=4121.3K, runt= 20002msec write: io=170759MB, bw=8537.1MB/s, iops=2185.5K, runt= 20002msec write: io=198565MB, bw=9927.3MB/s, iops=1270.7K, runt= 20002msec write: io=207304MB, bw=10363MB/s, iops=663239, runt= 20004msec write: io=207251MB, bw=10358MB/s, iops=331468, runt= 20008msec write: io=208262MB, bw=10405MB/s, iops=166484, runt= 20015msec write: io=81719MB, bw=4085.8MB/s, iops=8367.6K, runt= 20001msec write: io=145468MB, bw=7273.2MB/s, iops=7447.6K, runt= 20001msec write: io=159924MB, bw=7995.5MB/s, iops=4093.7K, runt= 20002msec write: io=174306MB, bw=8713.6MB/s, iops=2230.7K, runt= 20004msec write: io=186670MB, bw=9333.3MB/s, iops=1194.7K, runt= 20001msec write: io=201011MB, bw=10047MB/s, iops=642978, runt= 20008msec write: io=206175MB, bw=10303MB/s, iops=329681, runt= 20012msec write: io=208853MB, bw=10430MB/s, iops=166882, runt= 20024msec ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 16:02 ` idanb 2017-11-01 16:09 ` Max Gurtovoy @ 2017-11-01 16:50 ` Jason Gunthorpe 2017-11-01 17:31 ` Sagi Grimberg 2017-11-01 17:26 ` Sagi Grimberg 2 siblings, 1 reply; 23+ messages in thread From: Jason Gunthorpe @ 2017-11-01 16:50 UTC (permalink / raw) On Wed, Nov 01, 2017@06:02:38PM +0200, idanb wrote: > (3) No send signal with invalidation signal w/o moderation 5.2M / 8.3M Actually, isn't this valid and reasonable? You don't need to see the SEND completion, the only thing that matters is the invalidate completion, and the invalidate will naturally order after the send is finished, yes? Jason ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 16:50 ` Jason Gunthorpe @ 2017-11-01 17:31 ` Sagi Grimberg 2017-11-01 17:58 ` Jason Gunthorpe 0 siblings, 1 reply; 23+ messages in thread From: Sagi Grimberg @ 2017-11-01 17:31 UTC (permalink / raw) >> (3) No send signal with invalidation signal w/o moderation 5.2M / 8.3M > > Actually, isn't this valid and reasonable? You don't need to see the > SEND completion, the only thing that matters is the invalidate > completion, and the invalidate will naturally order after the send > is finished, yes? the default mode utilize remote invalidation, so no, its not valid. Without remote invalidate I'm not sure, don't remember the ordering rules from the top of my head. Idan, does a local invalidate completion guarantee a prior send completion (of a non-related buffer)? I vaguely remember that explicit fencing is required to guarantee that. I guess we could not signal send completions if we are in a local invalidate mode, but I'm liking this less and less... ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 17:31 ` Sagi Grimberg @ 2017-11-01 17:58 ` Jason Gunthorpe 2017-11-02 8:06 ` Sagi Grimberg 0 siblings, 1 reply; 23+ messages in thread From: Jason Gunthorpe @ 2017-11-01 17:58 UTC (permalink / raw) On Wed, Nov 01, 2017@07:31:39PM +0200, Sagi Grimberg wrote: > the default mode utilize remote invalidation, so no, its not valid. Well, usually the ULP design should allow some things to be reaped async, and latency senstive things to be sync. Eg if you have a SEND using a local buffer with a RKEY, then you'd declare the RKEY data completed when SEND_WITH_INVALIDATE is returned. Recycling the lkey command buffer is an async process and can wait unsignaled until something signalled comes to push its completion through. This approach reduces the # of CQEs required at the expense of delaying reaping the lkey buffer. > Without remote invalidate I'm not sure, don't remember the ordering > rules from the top of my head. Idan, does a local invalidate completion > guarantee a prior send completion (of a non-related buffer)? I vaguely > remember that explicit fencing is required to guarantee that. This was the case I thought Idan was testing.. Local invalidate is defined to always be ordered by the spec, so it is required to guarentee that the SEND is completed. > I guess we could not signal send completions if we are in a local > invalidate mode, but I'm liking this less and less... If the SEND reaping can be defered as above, then the local invalidate completion becomes the signaled CQE that allows the SEND to be reaped. Jason ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 17:58 ` Jason Gunthorpe @ 2017-11-02 8:06 ` Sagi Grimberg 2017-11-02 15:12 ` Jason Gunthorpe 0 siblings, 1 reply; 23+ messages in thread From: Sagi Grimberg @ 2017-11-02 8:06 UTC (permalink / raw) >> the default mode utilize remote invalidation, so no, its not valid. > > Well, usually the ULP design should allow some things to be reaped > async, and latency senstive things to be sync. > > Eg if you have a SEND using a local buffer with a RKEY, then you'd > declare the RKEY data completed when SEND_WITH_INVALIDATE is returned. > > Recycling the lkey command buffer is an async process and can wait > unsignaled until something signalled comes to push its completion > through. Not when using inline data with the send, which is the main issue here. if we inline data to the command, we will use the local dma lkey, which does not even have a local invalidate following it. > This approach reduces the # of CQEs required at the expense of > delaying reaping the lkey buffer. Yes, I understand the advantage of suppressing send completions.. But I'm not very fond of non-trivial behavior depending on all sort of overly complicated conditions. >> Without remote invalidate I'm not sure, don't remember the ordering >> rules from the top of my head. Idan, does a local invalidate completion >> guarantee a prior send completion (of a non-related buffer)? I vaguely >> remember that explicit fencing is required to guarantee that. > > This was the case I thought Idan was testing.. Still doesn't mean it is guaranteed to do the right thing :) > Local invalidate is defined to always be ordered by the spec, so it > is required to guarentee that the SEND is completed. So local invalidate completion is guaranteed to come after all the completions prior to it in the send queue? ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 8:06 ` Sagi Grimberg @ 2017-11-02 15:12 ` Jason Gunthorpe 2017-11-02 15:23 ` Sagi Grimberg 2017-11-02 16:18 ` Steve Wise 0 siblings, 2 replies; 23+ messages in thread From: Jason Gunthorpe @ 2017-11-02 15:12 UTC (permalink / raw) On Thu, Nov 02, 2017@10:06:30AM +0200, Sagi Grimberg wrote: > > >>the default mode utilize remote invalidation, so no, its not valid. > > > >Well, usually the ULP design should allow some things to be reaped > >async, and latency senstive things to be sync. > > > >Eg if you have a SEND using a local buffer with a RKEY, then you'd > >declare the RKEY data completed when SEND_WITH_INVALIDATE is returned. > > > >Recycling the lkey command buffer is an async process and can wait > >unsignaled until something signalled comes to push its completion > >through. > > Not when using inline data with the send, which is the main issue > here. if we inline data to the command, we will use the local > dma lkey, which does not even have a local invalidate following it. Does nvme use inline data send and RKEY transfer in the same SEND? Then it would need to signal the SEND if remote invalidate is used, otherwise it only needs to signal the local invalidate for the RKEY.. > >Local invalidate is defined to always be ordered by the spec, so it > >is required to guarentee that the SEND is completed. > So local invalidate completion is guaranteed to come after all the > completions prior to it in the send queue? IBA spec says so.. Jason ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 15:12 ` Jason Gunthorpe @ 2017-11-02 15:23 ` Sagi Grimberg 2017-11-02 15:51 ` Jason Gunthorpe 2017-11-02 16:18 ` Steve Wise 1 sibling, 1 reply; 23+ messages in thread From: Sagi Grimberg @ 2017-11-02 15:23 UTC (permalink / raw) >> Not when using inline data with the send, which is the main issue >> here. if we inline data to the command, we will use the local >> dma lkey, which does not even have a local invalidate following it. > > Does nvme use inline data send and RKEY transfer in the same SEND? No, we use local dma lkey for inline data. > Then it would need to signal the SEND if remote invalidate is used, > otherwise it only needs to signal the local invalidate for the RKEY.. Also if its simply using the local dma lkey without sending any rkey. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 15:23 ` Sagi Grimberg @ 2017-11-02 15:51 ` Jason Gunthorpe 2017-11-02 16:15 ` Sagi Grimberg 0 siblings, 1 reply; 23+ messages in thread From: Jason Gunthorpe @ 2017-11-02 15:51 UTC (permalink / raw) On Thu, Nov 02, 2017@05:23:36PM +0200, Sagi Grimberg wrote: > >Then it would need to signal the SEND if remote invalidate is used, > >otherwise it only needs to signal the local invalidate for the RKEY.. > > Also if its simply using the local dma lkey without sending any rkey. Right, OK, to be clear, my emails were only about the case where there was a SEND and INVALIDATE required together, I wasn't discussing SEND alone, that is already as good as it can be I think :) Thanks, Jason ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 15:51 ` Jason Gunthorpe @ 2017-11-02 16:15 ` Sagi Grimberg 0 siblings, 0 replies; 23+ messages in thread From: Sagi Grimberg @ 2017-11-02 16:15 UTC (permalink / raw) >>> Then it would need to signal the SEND if remote invalidate is used, >>> otherwise it only needs to signal the local invalidate for the RKEY.. >> >> Also if its simply using the local dma lkey without sending any rkey. > > Right, OK, to be clear, my emails were only about the case where there > was a SEND and INVALIDATE required together, I wasn't discussing SEND > alone, that is already as good as it can be I think :) Not arguing here :) ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 15:12 ` Jason Gunthorpe 2017-11-02 15:23 ` Sagi Grimberg @ 2017-11-02 16:18 ` Steve Wise 2017-11-02 16:36 ` Jason Gunthorpe 1 sibling, 1 reply; 23+ messages in thread From: Steve Wise @ 2017-11-02 16:18 UTC (permalink / raw) > -----Original Message----- > From: linux-rdma-owner at vger.kernel.org [mailto:linux-rdma- > owner at vger.kernel.org] On Behalf Of Jason Gunthorpe > Sent: Thursday, November 02, 2017 10:13 AM > To: Sagi Grimberg > Cc: idanb; Max Gurtovoy; linux-rdma at vger.kernel.org; linux- > nvme at lists.infradead.org; Christoph Hellwig > Subject: Re: [PATCH 0/3] Fix request completion holes > > On Thu, Nov 02, 2017@10:06:30AM +0200, Sagi Grimberg wrote: > > > > >>the default mode utilize remote invalidation, so no, its not valid. > > > > > >Well, usually the ULP design should allow some things to be reaped > > >async, and latency senstive things to be sync. > > > > > >Eg if you have a SEND using a local buffer with a RKEY, then you'd > > >declare the RKEY data completed when SEND_WITH_INVALIDATE is returned. > > > > > >Recycling the lkey command buffer is an async process and can wait > > >unsignaled until something signalled comes to push its completion > > >through. > > > > Not when using inline data with the send, which is the main issue > > here. if we inline data to the command, we will use the local > > dma lkey, which does not even have a local invalidate following it. > > Does nvme use inline data send and RKEY transfer in the same SEND? > Then it would need to signal the SEND if remote invalidate is used, > otherwise it only needs to signal the local invalidate for the RKEY.. > > > >Local invalidate is defined to always be ordered by the spec, so it > > >is required to guarentee that the SEND is completed. > > > So local invalidate completion is guaranteed to come after all the > > completions prior to it in the send queue? > > IBA spec says so.. > iWARP spec too. This is in regard to completion ordering though. The local invalidate send WR must have the IB_SEND_FENCE flag set if you want it to only be executed after all prior send WRs are executed. Either way, the completions are always inserted into the cq in-sq-submission-order and a signaled completion implies completions of all prior unsignaled WRs. Steve. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 16:18 ` Steve Wise @ 2017-11-02 16:36 ` Jason Gunthorpe 2017-11-02 16:53 ` Steve Wise 0 siblings, 1 reply; 23+ messages in thread From: Jason Gunthorpe @ 2017-11-02 16:36 UTC (permalink / raw) On Thu, Nov 02, 2017@11:18:42AM -0500, Steve Wise wrote: > iWARP spec too. This is in regard to completion ordering though. The local > invalidate send WR must have the IB_SEND_FENCE flag set if you want it to only > be executed after all prior send WRs are executed. Oh right, yes, you need to have local invalidate fence to guarentee that, but usually a local operation will not refer to a RKEY, so execution ordering wont matter. But this is a good general point, doesn't a ULP need to set FENCE on SEND, eg: RDMA READ (rkey) RDMA READ (rkey) SEND w/ FENCE (tell remote to invalidate rkey) Otherwise IBA Table 79 says RDMA READ can pass SEND and we have a situation where the rkey has become invalidated when the remote is still trying to use it. IBA actually explicitly calls this out (pg 538): RDMA Read operations may be executed at the target after subse- quent Send and Invalidate operation already performed the invalida- tion at the target. That may cause the RDMA Read operation to fail. Setting the Fence Indicator on the subsequent operations guarantees that the RDMA Read will fully complete before the invalidation takes place. None of our ULPs use IB_SEND_FENCE - so this is a bug! > Either way, the completions are always inserted into the cq > in-sq-submission-order and a signaled completion implies completions > of all prior unsignaled WRs. Yes, excactly what is relevant for this nvme case.. Jason ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 16:36 ` Jason Gunthorpe @ 2017-11-02 16:53 ` Steve Wise 2017-11-02 16:54 ` Jason Gunthorpe 0 siblings, 1 reply; 23+ messages in thread From: Steve Wise @ 2017-11-02 16:53 UTC (permalink / raw) > > > iWARP spec too. This is in regard to completion ordering though. The local > > invalidate send WR must have the IB_SEND_FENCE flag set if you want it to only > > be executed after all prior send WRs are executed. > > Oh right, yes, you need to have local invalidate fence to guarentee > that, but usually a local operation will not refer to a RKEY, so > execution ordering wont matter. > > But this is a good general point, doesn't a ULP need to set FENCE on > SEND, eg: > > RDMA READ (rkey) > RDMA READ (rkey) > SEND w/ FENCE (tell remote to invalidate rkey) > > Otherwise IBA Table 79 says RDMA READ can pass SEND and we have a > situation where the rkey has become invalidated when the remote is > still trying to use it. > > IBA actually explicitly calls this out (pg 538): > > RDMA Read operations may be executed at the target after subse- > quent Send and Invalidate operation already performed the invalida- > tion at the target. That may cause the RDMA Read operation to fail. > Setting the Fence Indicator on the subsequent operations guarantees > that the RDMA Read will fully complete before the invalidation takes > place. > > None of our ULPs use IB_SEND_FENCE - so this is a bug! > I think the nvmf target doesn't post the SEND_W_INVALIDATE (NVMF CQE) until the RDMA READ(s) (and back end BIO submission) complete. Steve. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-02 16:53 ` Steve Wise @ 2017-11-02 16:54 ` Jason Gunthorpe 0 siblings, 0 replies; 23+ messages in thread From: Jason Gunthorpe @ 2017-11-02 16:54 UTC (permalink / raw) On Thu, Nov 02, 2017@11:53:44AM -0500, Steve Wise wrote: > I think the nvmf target doesn't post the SEND_W_INVALIDATE (NVMF CQE) until the > RDMA READ(s) (and back end BIO submission) complete. Ah! That makes sense for a storage use case Thanks, Jason ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 16:02 ` idanb 2017-11-01 16:09 ` Max Gurtovoy 2017-11-01 16:50 ` Jason Gunthorpe @ 2017-11-01 17:26 ` Sagi Grimberg 2017-11-01 22:23 ` Max Gurtovoy 2017-11-02 17:55 ` Steve Wise 2 siblings, 2 replies; 23+ messages in thread From: Sagi Grimberg @ 2017-11-01 17:26 UTC (permalink / raw) > (4) Send signal with invalidation signal w/o moderation 3.4M / 4.6M What about remote invalidation? That is how it will usually be used... > (5) Send signal with invalidation signal w/ moderation 6.2M / 11.8M What about latency effects? Any noticeable change? > I understand that (1), (2) and (3) are invalid, the performance > reduction induce by mode (4) need to motivate us to enable cqe/event > moderation as soon as possible. Well good, it means you can help test my adaptive coalescing code :) May I ask what moderation parameters did you use? it will be useful to pin down the profile levels the adaptive moderator would bounce between. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 17:26 ` Sagi Grimberg @ 2017-11-01 22:23 ` Max Gurtovoy 2017-11-02 17:55 ` Steve Wise 1 sibling, 0 replies; 23+ messages in thread From: Max Gurtovoy @ 2017-11-01 22:23 UTC (permalink / raw) On 11/1/2017 7:26 PM, Sagi Grimberg wrote: > >> (4) Send signal with invalidation signal w/o moderation 3.4M / 4.6M > > What about remote invalidation? That is how it will usually be used... We haven't tested remote invalidation yet. > >> (5) Send signal with invalidation signal w/ moderation 6.2M / 11.8M > > What about latency effects? Any noticeable change? > >> I understand that (1), (2) and (3) are invalid, the performance >> reduction induce by mode (4) need to motivate us to enable cqe/event >> moderation as soon as possible. > > Well good, it means you can help test my adaptive coalescing code :) Of course. > > May I ask what moderation parameters did you use? it will be useful > to pin down the profile levels the adaptive moderator would bounce > between. Sure, I mentioned the params earlier (cq_completions=32 cq_timeout=16). Sagi, As we can see from our results, the changes made in this patchset must be pushed together with some cq moderation (adaptive or manual option meanwhile). In the testing, I created a module parameter but we can create a different option for the user to configure. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] Fix request completion holes 2017-11-01 17:26 ` Sagi Grimberg 2017-11-01 22:23 ` Max Gurtovoy @ 2017-11-02 17:55 ` Steve Wise 1 sibling, 0 replies; 23+ messages in thread From: Steve Wise @ 2017-11-02 17:55 UTC (permalink / raw) > What about latency effects? Any noticeable change? > > > I understand that (1), (2) and (3) are invalid, the performance > > reduction induce by mode (4) need to motivate us to enable cqe/event > > moderation as soon as possible. > > Well good, it means you can help test my adaptive coalescing code :) > > May I ask what moderation parameters did you use? it will be useful > to pin down the profile levels the adaptive moderator would bounce > between. > Hey Sagi, I can help too on testing this with iw_cxgb4. Where is your current WIP? Steve. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2017-11-02 17:55 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-31 8:55 [PATCH 0/3] Fix request completion holes Sagi Grimberg 2017-10-31 8:55 ` [PATCH 1/3] nvme-rdma: don't suppress send completions Sagi Grimberg 2017-10-31 8:55 ` [PATCH 2/3] nvme-rdma: don't complete requests before a send work request has completed Sagi Grimberg 2017-10-31 8:55 ` [PATCH 3/3] nvme-rdma: wait for local invalidation before completing a request Sagi Grimberg 2017-10-31 9:38 ` [PATCH 0/3] Fix request completion holes Max Gurtovoy 2017-10-31 11:10 ` Sagi Grimberg 2017-11-01 16:02 ` idanb 2017-11-01 16:09 ` Max Gurtovoy 2017-11-01 16:50 ` Jason Gunthorpe 2017-11-01 17:31 ` Sagi Grimberg 2017-11-01 17:58 ` Jason Gunthorpe 2017-11-02 8:06 ` Sagi Grimberg 2017-11-02 15:12 ` Jason Gunthorpe 2017-11-02 15:23 ` Sagi Grimberg 2017-11-02 15:51 ` Jason Gunthorpe 2017-11-02 16:15 ` Sagi Grimberg 2017-11-02 16:18 ` Steve Wise 2017-11-02 16:36 ` Jason Gunthorpe 2017-11-02 16:53 ` Steve Wise 2017-11-02 16:54 ` Jason Gunthorpe 2017-11-01 17:26 ` Sagi Grimberg 2017-11-01 22:23 ` Max Gurtovoy 2017-11-02 17:55 ` Steve Wise
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).