linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvmet-rdma: Invoke fatal error on error completion
@ 2016-06-23 17:01 Sagi Grimberg
  2016-06-24  7:11 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Sagi Grimberg @ 2016-06-23 17:01 UTC (permalink / raw)


In case we got an error completion the rdma queue pair
is in error state, teardown the entire controller. Note
that in recv or read error completion we might not have
a controller yet, so check for the controller exsistence.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/target/rdma.c | 45 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index b1c6e5bb0b70..f5986a7b776e 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -134,6 +134,7 @@ static void nvmet_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_qp_event(struct ib_event *event, void *priv);
+static void nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue);
 
 static struct nvmet_fabrics_ops nvmet_rdma_ops;
 
@@ -486,12 +487,32 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 	nvmet_rdma_put_rsp(rsp);
 }
 
+static void nvmet_rdma_error_comp(struct nvmet_rdma_queue *queue)
+{
+	if (queue->nvme_sq.ctrl)
+		nvmet_ctrl_fatal_error(queue->nvme_sq.ctrl);
+	else
+		/*
+		 * we didn't setup the controller yet in case
+		 * of admin connect error, just disconnect and
+		 * cleanup the queue
+		 */
+		nvmet_rdma_queue_disconnect(queue);
+}
+
 static void nvmet_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
 {
 	struct nvmet_rdma_rsp *rsp =
 		container_of(wc->wr_cqe, struct nvmet_rdma_rsp, send_cqe);
 
 	nvmet_rdma_release_rsp(rsp);
+
+	if (unlikely(wc->status != IB_WC_SUCCESS &&
+		wc->status != IB_WC_WR_FLUSH_ERR)) {
+		pr_err("SEND for CQE 0x%p failed with status %s (%d).\n",
+			wc->wr_cqe, ib_wc_status_msg(wc->status), wc->status);
+		nvmet_rdma_error_comp(rsp->queue);
+	}
 }
 
 static void nvmet_rdma_queue_response(struct nvmet_req *req)
@@ -534,11 +555,13 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
 			rsp->req.sg_cnt, nvmet_data_dir(&rsp->req));
 	rsp->n_rdma = 0;
 
-	if (unlikely(wc->status != IB_WC_SUCCESS &&
-		wc->status != IB_WC_WR_FLUSH_ERR)) {
-		pr_info("RDMA READ for CQE 0x%p failed with status %s (%d).\n",
-			wc->wr_cqe, ib_wc_status_msg(wc->status), wc->status);
-		nvmet_req_complete(&rsp->req, NVME_SC_DATA_XFER_ERROR);
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		nvmet_rdma_release_rsp(rsp);
+		if (wc->status != IB_WC_WR_FLUSH_ERR) {
+			pr_info("RDMA READ for CQE 0x%p failed with status %s (%d).\n",
+				wc->wr_cqe, ib_wc_status_msg(wc->status), wc->status);
+			nvmet_rdma_error_comp(queue);
+		}
 		return;
 	}
 
@@ -705,13 +728,19 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct nvmet_rdma_queue *queue = cq->cq_context;
 	struct nvmet_rdma_rsp *rsp;
 
-	if (unlikely(wc->status != IB_WC_SUCCESS))
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		if (wc->status != IB_WC_WR_FLUSH_ERR) {
+			pr_err("RECV for CQE 0x%p failed with status %s (%d)\n",
+				wc->wr_cqe, ib_wc_status_msg(wc->status),
+				wc->status);
+			nvmet_rdma_error_comp(queue);
+		}
 		return;
+	}
 
 	if (unlikely(wc->byte_len < sizeof(struct nvme_command))) {
 		pr_err("Ctrl Fatal Error: capsule size less than 64 bytes\n");
-		if (queue->nvme_sq.ctrl)
-			nvmet_ctrl_fatal_error(queue->nvme_sq.ctrl);
+		nvmet_rdma_error_comp(queue);
 		return;
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] nvmet-rdma: Invoke fatal error on error completion
  2016-06-23 17:01 [PATCH] nvmet-rdma: Invoke fatal error on error completion Sagi Grimberg
@ 2016-06-24  7:11 ` Christoph Hellwig
  2016-06-26 15:57   ` Sagi Grimberg
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2016-06-24  7:11 UTC (permalink / raw)


On Thu, Jun 23, 2016@08:01:30PM +0300, Sagi Grimberg wrote:
> In case we got an error completion the rdma queue pair
> is in error state, teardown the entire controller. Note
> that in recv or read error completion we might not have
> a controller yet, so check for the controller exsistence.
> 
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

This looks fine minus a few minor codingstyle nitpicks that I'd be
happy to fix up:

> +static void nvmet_rdma_error_comp(struct nvmet_rdma_queue *queue)
> +{
> +	if (queue->nvme_sq.ctrl)
> +		nvmet_ctrl_fatal_error(queue->nvme_sq.ctrl);
> +	else
> +		/*
> +		 * we didn't setup the controller yet in case
> +		 * of admin connect error, just disconnect and
> +		 * cleanup the queue
> +		 */
> +		nvmet_rdma_queue_disconnect(queue);
> +}

With such a long comment I'd prefer to have curly braces just to make
the else visually more obvious
> +
> +	if (unlikely(wc->status != IB_WC_SUCCESS &&
> +		wc->status != IB_WC_WR_FLUSH_ERR)) {

Indenting the second line of a condition by a single tab is always wrong,
either indent it with two tabs, or so that it aligns with first line.
The second is probably nicer here:

	if (unlikely(wc->status != IB_WC_SUCCESS &&
		     wc->status != IB_WC_WR_FLUSH_ERR)) {

Given how many !success not !flush_err conditionals we have in various
drivers I wonder if we should have a helper in the RDMA core, though.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] nvmet-rdma: Invoke fatal error on error completion
  2016-06-24  7:11 ` Christoph Hellwig
@ 2016-06-26 15:57   ` Sagi Grimberg
  0 siblings, 0 replies; 3+ messages in thread
From: Sagi Grimberg @ 2016-06-26 15:57 UTC (permalink / raw)



>> +static void nvmet_rdma_error_comp(struct nvmet_rdma_queue *queue)
>> +{
>> +	if (queue->nvme_sq.ctrl)
>> +		nvmet_ctrl_fatal_error(queue->nvme_sq.ctrl);
>> +	else
>> +		/*
>> +		 * we didn't setup the controller yet in case
>> +		 * of admin connect error, just disconnect and
>> +		 * cleanup the queue
>> +		 */
>> +		nvmet_rdma_queue_disconnect(queue);
>> +}
>
> With such a long comment I'd prefer to have curly braces just to make
> the else visually more obvious

Sure.

>> +
>> +	if (unlikely(wc->status != IB_WC_SUCCESS &&
>> +		wc->status != IB_WC_WR_FLUSH_ERR)) {
>
> Indenting the second line of a condition by a single tab is always wrong,
> either indent it with two tabs, or so that it aligns with first line.
> The second is probably nicer here:
>
> 	if (unlikely(wc->status != IB_WC_SUCCESS &&
> 		     wc->status != IB_WC_WR_FLUSH_ERR)) {

OK, I'll do that.

> Given how many !success not !flush_err conditionals we have in various
> drivers I wonder if we should have a helper in the RDMA core, though.

They don't always come together. But I can take a look at it. Given that
it might bring some slight logic shifts, let's do it incrementally.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-06-26 15:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-23 17:01 [PATCH] nvmet-rdma: Invoke fatal error on error completion Sagi Grimberg
2016-06-24  7:11 ` Christoph Hellwig
2016-06-26 15:57   ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).