From: sagi@grimberg.me (Sagi Grimberg)
Subject: [PATCH 1/3] nvme-rdma: don't suppress send completions
Date: Tue, 31 Oct 2017 10:55:20 +0200 [thread overview]
Message-ID: <1509440122-1190-2-git-send-email-sagi@grimberg.me> (raw)
In-Reply-To: <1509440122-1190-1-git-send-email-sagi@grimberg.me>
The entire completions suppress mechanism is currently
broken because the HCA might retry a send operation
(due to dropped ack) after the nvme transaction has completed.
In order to handle this, we signal all send completions (besides
async event which is not racing anything).
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
drivers/nvme/host/rdma.c | 40 ++++------------------------------------
1 file changed, 4 insertions(+), 36 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 53ad241493a1..ccbae327fe72 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1001,21 +1001,9 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
nvme_rdma_wr_error(cq, wc, "SEND");
}
-/*
- * We want to signal completion at least every queue depth/2. This returns the
- * largest power of two that is not above half of (queue size + 1) to optimize
- * (avoid divisions).
- */
-static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue *queue)
-{
- int limit = 1 << ilog2((queue->queue_size + 1) / 2);
-
- return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0;
-}
-
static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge,
- struct ib_send_wr *first, bool flush)
+ struct ib_send_wr *first)
{
struct ib_send_wr wr, *bad_wr;
int ret;
@@ -1031,24 +1019,7 @@ static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
wr.sg_list = sge;
wr.num_sge = num_sge;
wr.opcode = IB_WR_SEND;
- wr.send_flags = 0;
-
- /*
- * Unsignalled send completions are another giant desaster in the
- * IB Verbs spec: If we don't regularly post signalled sends
- * the send queue will fill up and only a QP reset will rescue us.
- * Would have been way to obvious to handle this in hardware or
- * at least the RDMA stack..
- *
- * Always signal the flushes. The magic request used for the flush
- * sequencer is not allocated in our driver's tagset and it's
- * triggered to be freed by blk_cleanup_queue(). So we need to
- * always mark it as signaled to ensure that the "wr_cqe", which is
- * embedded in request's payload, is not freed when __ib_process_cq()
- * calls wr_cqe->done().
- */
- if (nvme_rdma_queue_sig_limit(queue) || flush)
- wr.send_flags |= IB_SEND_SIGNALED;
+ wr.send_flags = IB_SEND_SIGNALED;
if (first)
first->next = ≀
@@ -1122,7 +1093,7 @@ static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg, int aer_idx)
ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(*cmd),
DMA_TO_DEVICE);
- ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL, false);
+ ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL);
WARN_ON_ONCE(ret);
}
@@ -1407,7 +1378,6 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
struct nvme_rdma_qe *sqe = &req->sqe;
struct nvme_command *c = sqe->data;
- bool flush = false;
struct ib_device *dev;
blk_status_t ret;
int err;
@@ -1439,10 +1409,8 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
ib_dma_sync_single_for_device(dev, sqe->dma,
sizeof(struct nvme_command), DMA_TO_DEVICE);
- if (req_op(rq) == REQ_OP_FLUSH)
- flush = true;
err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge,
- req->mr->need_inval ? &req->reg_wr.wr : NULL, flush);
+ req->mr->need_inval ? &req->reg_wr.wr : NULL);
if (unlikely(err)) {
nvme_rdma_unmap_data(queue, rq);
goto err;
--
2.7.4
next prev parent reply other threads:[~2017-10-31 8:55 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-31 8:55 [PATCH 0/3] Fix request completion holes Sagi Grimberg
2017-10-31 8:55 ` Sagi Grimberg [this message]
2017-10-31 8:55 ` [PATCH 2/3] nvme-rdma: don't complete requests before a send work request has completed Sagi Grimberg
2017-10-31 8:55 ` [PATCH 3/3] nvme-rdma: wait for local invalidation before completing a request Sagi Grimberg
2017-10-31 9:38 ` [PATCH 0/3] Fix request completion holes Max Gurtovoy
2017-10-31 11:10 ` Sagi Grimberg
2017-11-01 16:02 ` idanb
2017-11-01 16:09 ` Max Gurtovoy
2017-11-01 16:50 ` Jason Gunthorpe
2017-11-01 17:31 ` Sagi Grimberg
2017-11-01 17:58 ` Jason Gunthorpe
2017-11-02 8:06 ` Sagi Grimberg
2017-11-02 15:12 ` Jason Gunthorpe
2017-11-02 15:23 ` Sagi Grimberg
2017-11-02 15:51 ` Jason Gunthorpe
2017-11-02 16:15 ` Sagi Grimberg
2017-11-02 16:18 ` Steve Wise
2017-11-02 16:36 ` Jason Gunthorpe
2017-11-02 16:53 ` Steve Wise
2017-11-02 16:54 ` Jason Gunthorpe
2017-11-01 17:26 ` Sagi Grimberg
2017-11-01 22:23 ` Max Gurtovoy
2017-11-02 17:55 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1509440122-1190-2-git-send-email-sagi@grimberg.me \
--to=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).