linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/6] Last WQE Reached event treatment
@ 2024-06-18  0:10 Max Gurtovoy
  2024-06-18  0:10 ` [PATCH 1/6] IB/core: add support for draining Shared receive queues Max Gurtovoy
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-18  0:10 UTC (permalink / raw)
  To: leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet, Max Gurtovoy

Hi Jason/Leon/Sagi,

This series adds a support for draining a QP that is associated with a
SRQ (Shared Receive Queue).
Leakage problem can occur if we won't treat Last WQE Reached event.

In the series, that is based on some old series I've send during 2018, I
used a different approach and handled the event in the RDMA core, as was
suggested in discussion in the mailing list.

I've updated RDMA ULPs. Most of them were trivial except IPoIB that was
handling the Last WQE reached in the ULP.

I've tested this series with NVMf/RDMA on RoCE.

Max Gurtovoy (6):
  IB/core: add support for draining Shared receive queues
  IB/isert: remove the handling of last WQE reached event
  RDMA/srpt: remove the handling of last WQE reached event
  nvmet-rdma: remove the handling of last WQE reached event
  svcrdma: remove the handling of last WQE reached event
  RDMA/IPoIB: remove the handling of last WQE reached event

 drivers/infiniband/core/verbs.c          | 83 +++++++++++++++++++++++-
 drivers/infiniband/ulp/ipoib/ipoib.h     | 33 +---------
 drivers/infiniband/ulp/ipoib/ipoib_cm.c  | 71 ++------------------
 drivers/infiniband/ulp/isert/ib_isert.c  |  3 -
 drivers/infiniband/ulp/srpt/ib_srpt.c    |  5 --
 drivers/nvme/target/rdma.c               |  4 --
 include/rdma/ib_verbs.h                  |  2 +
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  1 -
 8 files changed, 92 insertions(+), 110 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] IB/core: add support for draining Shared receive queues
  2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
@ 2024-06-18  0:10 ` Max Gurtovoy
  2024-06-18 16:07   ` Bart Van Assche
  2024-06-19  9:09   ` Sagi Grimberg
  2024-06-18  0:10 ` [PATCH 2/6] IB/isert: remove the handling of last WQE reached event Max Gurtovoy
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-18  0:10 UTC (permalink / raw)
  To: leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet, Max Gurtovoy

To avoid leakage for QPs assocoated with SRQ, according to IB spec
(section 10.3.1):

"Note, for QPs that are associated with an SRQ, the Consumer should take
the QP through the Error State before invoking a Destroy QP or a Modify
QP to the Reset State. The Consumer may invoke the Destroy QP without
first performing a Modify QP to the Error State and waiting for the Affiliated
Asynchronous Last WQE Reached Event. However, if the Consumer
does not wait for the Affiliated Asynchronous Last WQE Reached Event,
then WQE and Data Segment leakage may occur. Therefore, it is good
programming practice to teardown a QP that is associated with an SRQ
by using the following process:
 - Put the QP in the Error State;
 - wait for the Affiliated Asynchronous Last WQE Reached Event;
 - either:
   - drain the CQ by invoking the Poll CQ verb and either wait for CQ
     to be empty or the number of Poll CQ operations has exceeded
     CQ capacity size; or
   - post another WR that completes on the same CQ and wait for this
     WR to return as a WC;
 - and then invoke a Destroy QP or Reset QP."

Catch the Last WQE Reached Event in the core layer without involving the
ULP drivers with extra logic during drain and destroy QP flows.

The "Last WQE Reached" event will only be send on the errant QP, for
signaling that the SRQ flushed all the WQEs that have been dequeued from
the SRQ for processing by the associated QP.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/infiniband/core/verbs.c | 83 ++++++++++++++++++++++++++++++++-
 include/rdma/ib_verbs.h         |  2 +
 2 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 94a7f3b0c71c..9e4df7d40e0c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1101,6 +1101,16 @@ EXPORT_SYMBOL(ib_destroy_srq_user);
 
 /* Queue pairs */
 
+static void __ib_qp_event_handler(struct ib_event *event, void *context)
+{
+	struct ib_qp *qp = event->element.qp;
+
+	if (event->event == IB_EVENT_QP_LAST_WQE_REACHED)
+		complete(&qp->srq_completion);
+	else if (qp->registered_event_handler)
+		qp->registered_event_handler(event, qp->qp_context);
+}
+
 static void __ib_shared_qp_event_handler(struct ib_event *event, void *context)
 {
 	struct ib_qp *qp = context;
@@ -1221,7 +1231,10 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
 	qp->qp_type = attr->qp_type;
 	qp->rwq_ind_tbl = attr->rwq_ind_tbl;
 	qp->srq = attr->srq;
-	qp->event_handler = attr->event_handler;
+	if (qp->srq)
+		init_completion(&qp->srq_completion);
+	qp->event_handler = __ib_qp_event_handler;
+	qp->registered_event_handler = attr->event_handler;
 	qp->port = attr->port_num;
 	qp->qp_context = attr->qp_context;
 
@@ -2884,6 +2897,72 @@ static void __ib_drain_rq(struct ib_qp *qp)
 		wait_for_completion(&rdrain.done);
 }
 
+/*
+ * __ib_drain_srq() - Block until Last WQE Reached event arrives, or timeout
+ *                    expires.
+ * @qp:               queue pair associated with SRQ to drain
+ *
+ * Quoting 10.3.1 Queue Pair and EE Context States:
+ *
+ * Note, for QPs that are associated with an SRQ, the Consumer should take the
+ * QP through the Error State before invoking a Destroy QP or a Modify QP to the
+ * Reset State.  The Consumer may invoke the Destroy QP without first performing
+ * a Modify QP to the Error State and waiting for the Affiliated Asynchronous
+ * Last WQE Reached Event. However, if the Consumer does not wait for the
+ * Affiliated Asynchronous Last WQE Reached Event, then WQE and Data Segment
+ * leakage may occur. Therefore, it is good programming practice to tear down a
+ * QP that is associated with an SRQ by using the following process:
+ *
+ * - Put the QP in the Error State
+ * - Wait for the Affiliated Asynchronous Last WQE Reached Event;
+ * - either:
+ *       drain the CQ by invoking the Poll CQ verb and either wait for CQ
+ *       to be empty or the number of Poll CQ operations has exceeded
+ *       CQ capacity size;
+ * - or
+ *       post another WR that completes on the same CQ and wait for this
+ *       WR to return as a WC;
+ * - and then invoke a Destroy QP or Reset QP.
+ *
+ * We use the first option.
+ */
+static void __ib_drain_srq(struct ib_qp *qp)
+{
+	struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
+	struct ib_cq *cq;
+	int n, polled = 0;
+	int ret;
+
+	if (!qp->srq) {
+		WARN_ONCE(1, "QP 0x%p is not associated with SRQ\n", qp);
+		return;
+	}
+
+	ret = ib_modify_qp(qp, &attr, IB_QP_STATE);
+	if (ret) {
+		WARN_ONCE(ret, "failed to drain shared recv queue: %d\n", ret);
+		return;
+	}
+
+	if (ib_srq_has_cq(qp->srq->srq_type)) {
+		cq = qp->srq->ext.cq;
+	} else if (qp->recv_cq) {
+		cq = qp->recv_cq;
+	} else {
+		WARN_ONCE(1, "QP 0x%p has no CQ associated with SRQ\n", qp);
+		return;
+	}
+
+	if (wait_for_completion_timeout(&qp->srq_completion, 10 * HZ) > 0) {
+		while (polled != cq->cqe) {
+			n = ib_process_cq_direct(cq, cq->cqe - polled);
+			if (!n)
+				return;
+			polled += n;
+		}
+	}
+}
+
 /**
  * ib_drain_sq() - Block until all SQ CQEs have been consumed by the
  *		   application.
@@ -2962,6 +3041,8 @@ void ib_drain_qp(struct ib_qp *qp)
 	ib_drain_sq(qp);
 	if (!qp->srq)
 		ib_drain_rq(qp);
+	else
+		__ib_drain_srq(qp);
 }
 EXPORT_SYMBOL(ib_drain_qp);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 477bf9dd5e71..5a193008f99c 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1788,6 +1788,7 @@ struct ib_qp {
 	struct list_head	rdma_mrs;
 	struct list_head	sig_mrs;
 	struct ib_srq	       *srq;
+	struct completion	srq_completion;
 	struct ib_xrcd	       *xrcd; /* XRC TGT QPs only */
 	struct list_head	xrcd_list;
 
@@ -1797,6 +1798,7 @@ struct ib_qp {
 	struct ib_qp           *real_qp;
 	struct ib_uqp_object   *uobject;
 	void                  (*event_handler)(struct ib_event *, void *);
+	void                  (*registered_event_handler)(struct ib_event *, void *);
 	void		       *qp_context;
 	/* sgid_attrs associated with the AV's */
 	const struct ib_gid_attr *av_sgid_attr;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] IB/isert: remove the handling of last WQE reached event
  2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
  2024-06-18  0:10 ` [PATCH 1/6] IB/core: add support for draining Shared receive queues Max Gurtovoy
@ 2024-06-18  0:10 ` Max Gurtovoy
  2024-06-19  9:16   ` Sagi Grimberg
  2024-06-18  0:10 ` [PATCH 3/6] RDMA/srpt: " Max Gurtovoy
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-18  0:10 UTC (permalink / raw)
  To: leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet, Max Gurtovoy

This event is handled by the RDMA core layer.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 00a7303c8cc6..42977a5326ee 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -91,9 +91,6 @@ isert_qp_event_callback(struct ib_event *e, void *context)
 	case IB_EVENT_COMM_EST:
 		rdma_notify(isert_conn->cm_id, IB_EVENT_COMM_EST);
 		break;
-	case IB_EVENT_QP_LAST_WQE_REACHED:
-		isert_warn("Reached TX IB_EVENT_QP_LAST_WQE_REACHED\n");
-		break;
 	default:
 		break;
 	}
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] RDMA/srpt: remove the handling of last WQE reached event
  2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
  2024-06-18  0:10 ` [PATCH 1/6] IB/core: add support for draining Shared receive queues Max Gurtovoy
  2024-06-18  0:10 ` [PATCH 2/6] IB/isert: remove the handling of last WQE reached event Max Gurtovoy
@ 2024-06-18  0:10 ` Max Gurtovoy
  2024-06-18 16:08   ` Bart Van Assche
  2024-06-18  0:10 ` [PATCH 4/6] nvmet-rdma: " Max Gurtovoy
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-18  0:10 UTC (permalink / raw)
  To: leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet, Max Gurtovoy

This event is handled by the RDMA core layer.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9632afbd727b..8503f56b5202 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -231,11 +231,6 @@ static void srpt_qp_event(struct ib_event *event, void *ptr)
 		else
 			ib_cm_notify(ch->ib_cm.cm_id, event->event);
 		break;
-	case IB_EVENT_QP_LAST_WQE_REACHED:
-		pr_debug("%s-%d, state %s: received Last WQE event.\n",
-			 ch->sess_name, ch->qp->qp_num,
-			 get_ch_state_name(ch->state));
-		break;
 	default:
 		pr_err("received unrecognized IB QP event %d\n", event->event);
 		break;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] nvmet-rdma: remove the handling of last WQE reached event
  2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
                   ` (2 preceding siblings ...)
  2024-06-18  0:10 ` [PATCH 3/6] RDMA/srpt: " Max Gurtovoy
@ 2024-06-18  0:10 ` Max Gurtovoy
  2024-06-18  0:10 ` [PATCH 5/6] svcrdma: " Max Gurtovoy
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-18  0:10 UTC (permalink / raw)
  To: leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet, Max Gurtovoy

This event is handled by the RDMA core layer.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/nvme/target/rdma.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 689bb5d3cfdc..29860acdb335 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1520,10 +1520,6 @@ static void nvmet_rdma_qp_event(struct ib_event *event, void *priv)
 	case IB_EVENT_COMM_EST:
 		rdma_notify(queue->cm_id, event->event);
 		break;
-	case IB_EVENT_QP_LAST_WQE_REACHED:
-		pr_debug("received last WQE reached event for queue=0x%p\n",
-			 queue);
-		break;
 	default:
 		pr_err("received IB QP event: %s (%d)\n",
 		       ib_event_msg(event->event), event->event);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] svcrdma: remove the handling of last WQE reached event
  2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
                   ` (3 preceding siblings ...)
  2024-06-18  0:10 ` [PATCH 4/6] nvmet-rdma: " Max Gurtovoy
@ 2024-06-18  0:10 ` Max Gurtovoy
  2024-06-18 15:12   ` Chuck Lever
  2024-06-18  0:10 ` [PATCH 6/6] RDMA/IPoIB: " Max Gurtovoy
  2024-06-23 13:03 ` [PATCH v1 0/6] Last WQE Reached event treatment Zhu Yanjun
  6 siblings, 1 reply; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-18  0:10 UTC (permalink / raw)
  To: leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet, Max Gurtovoy

This event is handled by the RDMA core layer.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 2b1c16b9547d..4bb94b02c34c 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -107,7 +107,6 @@ static void qp_event_handler(struct ib_event *event, void *context)
 	case IB_EVENT_PATH_MIG:
 	case IB_EVENT_COMM_EST:
 	case IB_EVENT_SQ_DRAINED:
-	case IB_EVENT_QP_LAST_WQE_REACHED:
 		break;
 
 	/* These are considered fatal events */
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] RDMA/IPoIB: remove the handling of last WQE reached event
  2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
                   ` (4 preceding siblings ...)
  2024-06-18  0:10 ` [PATCH 5/6] svcrdma: " Max Gurtovoy
@ 2024-06-18  0:10 ` Max Gurtovoy
  2024-06-19  9:18   ` Sagi Grimberg
  2024-06-23 13:03 ` [PATCH v1 0/6] Last WQE Reached event treatment Zhu Yanjun
  6 siblings, 1 reply; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-18  0:10 UTC (permalink / raw)
  To: leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet, Max Gurtovoy

This event is handled by the RDMA core layer.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h    | 33 +-----------
 drivers/infiniband/ulp/ipoib/ipoib_cm.c | 71 +++----------------------
 2 files changed, 8 insertions(+), 96 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 963e936da5e3..0f1e4b431af4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -198,37 +198,10 @@ struct ipoib_cm_data {
 	__be32 mtu;
 };
 
-/*
- * Quoting 10.3.1 Queue Pair and EE Context States:
- *
- * Note, for QPs that are associated with an SRQ, the Consumer should take the
- * QP through the Error State before invoking a Destroy QP or a Modify QP to the
- * Reset State.  The Consumer may invoke the Destroy QP without first performing
- * a Modify QP to the Error State and waiting for the Affiliated Asynchronous
- * Last WQE Reached Event. However, if the Consumer does not wait for the
- * Affiliated Asynchronous Last WQE Reached Event, then WQE and Data Segment
- * leakage may occur. Therefore, it is good programming practice to tear down a
- * QP that is associated with an SRQ by using the following process:
- *
- * - Put the QP in the Error State
- * - Wait for the Affiliated Asynchronous Last WQE Reached Event;
- * - either:
- *       drain the CQ by invoking the Poll CQ verb and either wait for CQ
- *       to be empty or the number of Poll CQ operations has exceeded
- *       CQ capacity size;
- * - or
- *       post another WR that completes on the same CQ and wait for this
- *       WR to return as a WC;
- * - and then invoke a Destroy QP or Reset QP.
- *
- * We use the second option and wait for a completion on the
- * same CQ before destroying QPs attached to our SRQ.
- */
-
 enum ipoib_cm_state {
 	IPOIB_CM_RX_LIVE,
 	IPOIB_CM_RX_ERROR, /* Ignored by stale task */
-	IPOIB_CM_RX_FLUSH  /* Last WQE Reached event observed */
+	IPOIB_CM_RX_FLUSH
 };
 
 struct ipoib_cm_rx {
@@ -267,9 +240,7 @@ struct ipoib_cm_dev_priv {
 	struct ib_cm_id	       *id;
 	struct list_head	passive_ids;   /* state: LIVE */
 	struct list_head	rx_error_list; /* state: ERROR */
-	struct list_head	rx_flush_list; /* state: FLUSH, drain not started */
-	struct list_head	rx_drain_list; /* state: FLUSH, drain started */
-	struct list_head	rx_reap_list;  /* state: FLUSH, drain done */
+	struct list_head	rx_reap_list;  /* state: FLUSH */
 	struct work_struct      start_task;
 	struct work_struct      reap_task;
 	struct work_struct      skb_task;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index b610d36295bb..18ead80b5756 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -71,12 +71,6 @@ static struct ib_qp_attr ipoib_cm_err_attr = {
 	.qp_state = IB_QPS_ERR
 };
 
-#define IPOIB_CM_RX_DRAIN_WRID 0xffffffff
-
-static struct ib_send_wr ipoib_cm_rx_drain_wr = {
-	.opcode = IB_WR_SEND,
-};
-
 static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
 			       const struct ib_cm_event *event);
 
@@ -208,50 +202,11 @@ static void ipoib_cm_free_rx_ring(struct net_device *dev,
 	vfree(rx_ring);
 }
 
-static void ipoib_cm_start_rx_drain(struct ipoib_dev_priv *priv)
-{
-	struct ipoib_cm_rx *p;
-
-	/* We only reserved 1 extra slot in CQ for drain WRs, so
-	 * make sure we have at most 1 outstanding WR. */
-	if (list_empty(&priv->cm.rx_flush_list) ||
-	    !list_empty(&priv->cm.rx_drain_list))
-		return;
-
-	/*
-	 * QPs on flush list are error state.  This way, a "flush
-	 * error" WC will be immediately generated for each WR we post.
-	 */
-	p = list_entry(priv->cm.rx_flush_list.next, typeof(*p), list);
-	ipoib_cm_rx_drain_wr.wr_id = IPOIB_CM_RX_DRAIN_WRID;
-	if (ib_post_send(p->qp, &ipoib_cm_rx_drain_wr, NULL))
-		ipoib_warn(priv, "failed to post drain wr\n");
-
-	list_splice_init(&priv->cm.rx_flush_list, &priv->cm.rx_drain_list);
-}
-
-static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx)
-{
-	struct ipoib_cm_rx *p = ctx;
-	struct ipoib_dev_priv *priv = ipoib_priv(p->dev);
-	unsigned long flags;
-
-	if (event->event != IB_EVENT_QP_LAST_WQE_REACHED)
-		return;
-
-	spin_lock_irqsave(&priv->lock, flags);
-	list_move(&p->list, &priv->cm.rx_flush_list);
-	p->state = IPOIB_CM_RX_FLUSH;
-	ipoib_cm_start_rx_drain(priv);
-	spin_unlock_irqrestore(&priv->lock, flags);
-}
-
 static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
 					   struct ipoib_cm_rx *p)
 {
 	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_init_attr attr = {
-		.event_handler = ipoib_cm_rx_event_handler,
 		.send_cq = priv->recv_cq, /* For drain WR */
 		.recv_cq = priv->recv_cq,
 		.srq = priv->cm.srq,
@@ -479,8 +434,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id,
 	spin_lock_irq(&priv->lock);
 	queue_delayed_work(priv->wq,
 			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
-	/* Add this entry to passive ids list head, but do not re-add it
-	 * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
+	/* Add this entry to passive ids list head. */
 	p->jiffies = jiffies;
 	if (p->state == IPOIB_CM_RX_LIVE)
 		list_move(&p->list, &priv->cm.passive_ids);
@@ -574,15 +528,8 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 		       wr_id, wc->status);
 
 	if (unlikely(wr_id >= ipoib_recvq_size)) {
-		if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~(IPOIB_OP_CM | IPOIB_OP_RECV))) {
-			spin_lock_irqsave(&priv->lock, flags);
-			list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list);
-			ipoib_cm_start_rx_drain(priv);
-			queue_work(priv->wq, &priv->cm.rx_reap_task);
-			spin_unlock_irqrestore(&priv->lock, flags);
-		} else
-			ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n",
-				   wr_id, ipoib_recvq_size);
+		ipoib_warn(priv, "cm recv completion event with wrid %d (> %d)\n",
+			   wr_id, ipoib_recvq_size);
 		return;
 	}
 
@@ -603,6 +550,7 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 		else {
 			if (!--p->recv_count) {
 				spin_lock_irqsave(&priv->lock, flags);
+				p->state = IPOIB_CM_RX_FLUSH;
 				list_move(&p->list, &priv->cm.rx_reap_list);
 				spin_unlock_irqrestore(&priv->lock, flags);
 				queue_work(priv->wq, &priv->cm.rx_reap_task);
@@ -912,6 +860,7 @@ static void ipoib_cm_free_rx_reap_list(struct net_device *dev)
 	spin_unlock_irq(&priv->lock);
 
 	list_for_each_entry_safe(rx, n, &list, list) {
+		ib_drain_qp(rx->qp);
 		ib_destroy_cm_id(rx->id);
 		ib_destroy_qp(rx->qp);
 		if (!ipoib_cm_has_srq(dev)) {
@@ -952,21 +901,15 @@ void ipoib_cm_dev_stop(struct net_device *dev)
 	/* Wait for all RX to be drained */
 	begin = jiffies;
 
-	while (!list_empty(&priv->cm.rx_error_list) ||
-	       !list_empty(&priv->cm.rx_flush_list) ||
-	       !list_empty(&priv->cm.rx_drain_list)) {
+	while (!list_empty(&priv->cm.rx_error_list)) {
 		if (time_after(jiffies, begin + 5 * HZ)) {
 			ipoib_warn(priv, "RX drain timing out\n");
 
 			/*
 			 * assume the HW is wedged and just free up everything.
 			 */
-			list_splice_init(&priv->cm.rx_flush_list,
-					 &priv->cm.rx_reap_list);
 			list_splice_init(&priv->cm.rx_error_list,
 					 &priv->cm.rx_reap_list);
-			list_splice_init(&priv->cm.rx_drain_list,
-					 &priv->cm.rx_reap_list);
 			break;
 		}
 		spin_unlock_irq(&priv->lock);
@@ -1589,8 +1532,6 @@ int ipoib_cm_dev_init(struct net_device *dev)
 	INIT_LIST_HEAD(&priv->cm.reap_list);
 	INIT_LIST_HEAD(&priv->cm.start_list);
 	INIT_LIST_HEAD(&priv->cm.rx_error_list);
-	INIT_LIST_HEAD(&priv->cm.rx_flush_list);
-	INIT_LIST_HEAD(&priv->cm.rx_drain_list);
 	INIT_LIST_HEAD(&priv->cm.rx_reap_list);
 	INIT_WORK(&priv->cm.start_task, ipoib_cm_tx_start);
 	INIT_WORK(&priv->cm.reap_task, ipoib_cm_tx_reap);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] svcrdma: remove the handling of last WQE reached event
  2024-06-18  0:10 ` [PATCH 5/6] svcrdma: " Max Gurtovoy
@ 2024-06-18 15:12   ` Chuck Lever
  0 siblings, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2024-06-18 15:12 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: leonro@nvidia.com, jgg@nvidia.com, linux-nvme@lists.infradead.org,
	linux-rdma@vger.kernel.org, oren@nvidia.com, israelr@nvidia.com,
	maorg@nvidia.com, yishaih@nvidia.com, hch@lst.de,
	bvanassche@acm.org, shiraz.saleem@intel.com, edumazet@google.com

On Mon, Jun 17, 2024 at 08:10:33PM -0400, Max Gurtovoy wrote:
> This event is handled by the RDMA core layer.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 2b1c16b9547d..4bb94b02c34c 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -107,7 +107,6 @@ static void qp_event_handler(struct ib_event *event, void *context)
>  	case IB_EVENT_PATH_MIG:
>  	case IB_EVENT_COMM_EST:
>  	case IB_EVENT_SQ_DRAINED:
> -	case IB_EVENT_QP_LAST_WQE_REACHED:
>  		break;
>  
>  	/* These are considered fatal events */
> -- 
> 2.18.1
> 

Acked-by: Chuck Lever <chuck.lever@oracle.com>

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] IB/core: add support for draining Shared receive queues
  2024-06-18  0:10 ` [PATCH 1/6] IB/core: add support for draining Shared receive queues Max Gurtovoy
@ 2024-06-18 16:07   ` Bart Van Assche
  2024-06-19  9:14     ` Sagi Grimberg
  2024-06-19  9:09   ` Sagi Grimberg
  1 sibling, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2024-06-18 16:07 UTC (permalink / raw)
  To: Max Gurtovoy, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, shiraz.saleem, edumazet

On 6/17/24 5:10 PM, Max Gurtovoy wrote:
> +	if (wait_for_completion_timeout(&qp->srq_completion, 10 * HZ) > 0) {
> +		while (polled != cq->cqe) {
> +			n = ib_process_cq_direct(cq, cq->cqe - polled);
> +			if (!n)
> +				return;
> +			polled += n;
> +		}
> +	}

Why a hardcoded timeout (10 * HZ) instead of waiting forever?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] RDMA/srpt: remove the handling of last WQE reached event
  2024-06-18  0:10 ` [PATCH 3/6] RDMA/srpt: " Max Gurtovoy
@ 2024-06-18 16:08   ` Bart Van Assche
  0 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2024-06-18 16:08 UTC (permalink / raw)
  To: Max Gurtovoy, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, shiraz.saleem, edumazet

On 6/17/24 5:10 PM, Max Gurtovoy wrote:
> This event is handled by the RDMA core layer.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>   drivers/infiniband/ulp/srpt/ib_srpt.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
> index 9632afbd727b..8503f56b5202 100644
> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c
> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
> @@ -231,11 +231,6 @@ static void srpt_qp_event(struct ib_event *event, void *ptr)
>   		else
>   			ib_cm_notify(ch->ib_cm.cm_id, event->event);
>   		break;
> -	case IB_EVENT_QP_LAST_WQE_REACHED:
> -		pr_debug("%s-%d, state %s: received Last WQE event.\n",
> -			 ch->sess_name, ch->qp->qp_num,
> -			 get_ch_state_name(ch->state));
> -		break;
>   	default:
>   		pr_err("received unrecognized IB QP event %d\n", event->event);
>   		break;

Acked-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] IB/core: add support for draining Shared receive queues
  2024-06-18  0:10 ` [PATCH 1/6] IB/core: add support for draining Shared receive queues Max Gurtovoy
  2024-06-18 16:07   ` Bart Van Assche
@ 2024-06-19  9:09   ` Sagi Grimberg
  2024-06-19 11:16     ` Max Gurtovoy
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2024-06-19  9:09 UTC (permalink / raw)
  To: Max Gurtovoy, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet



On 18/06/2024 3:10, Max Gurtovoy wrote:
> To avoid leakage for QPs assocoated with SRQ, according to IB spec
> (section 10.3.1):
>
> "Note, for QPs that are associated with an SRQ, the Consumer should take
> the QP through the Error State before invoking a Destroy QP or a Modify
> QP to the Reset State. The Consumer may invoke the Destroy QP without
> first performing a Modify QP to the Error State and waiting for the Affiliated
> Asynchronous Last WQE Reached Event. However, if the Consumer
> does not wait for the Affiliated Asynchronous Last WQE Reached Event,
> then WQE and Data Segment leakage may occur. Therefore, it is good
> programming practice to teardown a QP that is associated with an SRQ
> by using the following process:
>   - Put the QP in the Error State;
>   - wait for the Affiliated Asynchronous Last WQE Reached Event;
>   - either:
>     - drain the CQ by invoking the Poll CQ verb and either wait for CQ
>       to be empty or the number of Poll CQ operations has exceeded
>       CQ capacity size; or
>     - post another WR that completes on the same CQ and wait for this
>       WR to return as a WC;
>   - and then invoke a Destroy QP or Reset QP."
>
> Catch the Last WQE Reached Event in the core layer without involving the
> ULP drivers with extra logic during drain and destroy QP flows.
>
> The "Last WQE Reached" event will only be send on the errant QP, for
> signaling that the SRQ flushed all the WQEs that have been dequeued from
> the SRQ for processing by the associated QP.
>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>   drivers/infiniband/core/verbs.c | 83 ++++++++++++++++++++++++++++++++-
>   include/rdma/ib_verbs.h         |  2 +
>   2 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 94a7f3b0c71c..9e4df7d40e0c 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1101,6 +1101,16 @@ EXPORT_SYMBOL(ib_destroy_srq_user);
>   
>   /* Queue pairs */
>   
> +static void __ib_qp_event_handler(struct ib_event *event, void *context)
> +{
> +	struct ib_qp *qp = event->element.qp;
> +
> +	if (event->event == IB_EVENT_QP_LAST_WQE_REACHED)
> +		complete(&qp->srq_completion);
> +	else if (qp->registered_event_handler)
> +		qp->registered_event_handler(event, qp->qp_context);

There is no reason what-so-ever to withhold the LAST_WQE_REACHED from 
the ulp.
The ULP may be interested in consuming this event.

This should become:

+static void __ib_qp_event_handler(struct ib_event *event, void *context)
+{
+	struct ib_qp *qp = event->element.qp;
+
+	if (event->event == IB_EVENT_QP_LAST_WQE_REACHED)
+		complete(&qp->srq_completion);
+	if (qp->registered_event_handler)
+		qp->registered_event_handler(event, qp->qp_context);




> +}
> +
>   static void __ib_shared_qp_event_handler(struct ib_event *event, void *context)
>   {
>   	struct ib_qp *qp = context;
> @@ -1221,7 +1231,10 @@ static struct ib_qp *create_qp(struct ib_device *dev, struct ib_pd *pd,
>   	qp->qp_type = attr->qp_type;
>   	qp->rwq_ind_tbl = attr->rwq_ind_tbl;
>   	qp->srq = attr->srq;
> -	qp->event_handler = attr->event_handler;
> +	if (qp->srq)
> +		init_completion(&qp->srq_completion);

I think that if you unconditionally complete, you should also 
unconditionally initialize.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] IB/core: add support for draining Shared receive queues
  2024-06-18 16:07   ` Bart Van Assche
@ 2024-06-19  9:14     ` Sagi Grimberg
  2024-06-19 11:12       ` Max Gurtovoy
  0 siblings, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2024-06-19  9:14 UTC (permalink / raw)
  To: Bart Van Assche, Max Gurtovoy, leonro, jgg, linux-nvme,
	linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, shiraz.saleem, edumazet



On 18/06/2024 19:07, Bart Van Assche wrote:
> On 6/17/24 5:10 PM, Max Gurtovoy wrote:
>> +    if (wait_for_completion_timeout(&qp->srq_completion, 10 * HZ) > 
>> 0) {

I think this warrants a comment to why you stop after consuming cq->cqe 
completions
(i.e. shared completions).

>> +        while (polled != cq->cqe) {
>> +            n = ib_process_cq_direct(cq, cq->cqe - polled);
>> +            if (!n)
>> +                return;
>> +            polled += n;
>> +        }
>> +    }
>
> Why a hardcoded timeout (10 * HZ) instead of waiting forever?

Agreed. Is there a scenario where the IB event is missed or something?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] IB/isert: remove the handling of last WQE reached event
  2024-06-18  0:10 ` [PATCH 2/6] IB/isert: remove the handling of last WQE reached event Max Gurtovoy
@ 2024-06-19  9:16   ` Sagi Grimberg
  2024-06-19 15:25     ` Max Gurtovoy
  0 siblings, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2024-06-19  9:16 UTC (permalink / raw)
  To: Max Gurtovoy, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet



On 18/06/2024 3:10, Max Gurtovoy wrote:
> This event is handled by the RDMA core layer.
>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>   drivers/infiniband/ulp/isert/ib_isert.c | 3 ---
>   1 file changed, 3 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index 00a7303c8cc6..42977a5326ee 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -91,9 +91,6 @@ isert_qp_event_callback(struct ib_event *e, void *context)
>   	case IB_EVENT_COMM_EST:
>   		rdma_notify(isert_conn->cm_id, IB_EVENT_COMM_EST);
>   		break;
> -	case IB_EVENT_QP_LAST_WQE_REACHED:
> -		isert_warn("Reached TX IB_EVENT_QP_LAST_WQE_REACHED\n");
> -		break;

Don't think you need to touch the ulps, they want to log these events, 
so be it.
Although, a warn is not appropriate at all here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] RDMA/IPoIB: remove the handling of last WQE reached event
  2024-06-18  0:10 ` [PATCH 6/6] RDMA/IPoIB: " Max Gurtovoy
@ 2024-06-19  9:18   ` Sagi Grimberg
  2024-06-19  9:25     ` Leon Romanovsky
  0 siblings, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2024-06-19  9:18 UTC (permalink / raw)
  To: Max Gurtovoy, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet



On 18/06/2024 3:10, Max Gurtovoy wrote:
> This event is handled by the RDMA core layer.

I'm assuming that this patch is well tested?
Looks like a fairly involved change.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] RDMA/IPoIB: remove the handling of last WQE reached event
  2024-06-19  9:18   ` Sagi Grimberg
@ 2024-06-19  9:25     ` Leon Romanovsky
  0 siblings, 0 replies; 19+ messages in thread
From: Leon Romanovsky @ 2024-06-19  9:25 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Max Gurtovoy, jgg, linux-nvme, linux-rdma, chuck.lever, oren,
	israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem, edumazet

On Wed, Jun 19, 2024 at 12:18:27PM +0300, Sagi Grimberg wrote:
> 
> 
> On 18/06/2024 3:10, Max Gurtovoy wrote:
> > This event is handled by the RDMA core layer.
> 
> I'm assuming that this patch is well tested?

Not yet.

Thanks

> Looks like a fairly involved change.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] IB/core: add support for draining Shared receive queues
  2024-06-19  9:14     ` Sagi Grimberg
@ 2024-06-19 11:12       ` Max Gurtovoy
  0 siblings, 0 replies; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-19 11:12 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche, leonro, jgg, linux-nvme,
	linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, shiraz.saleem, edumazet


On 19/06/2024 12:14, Sagi Grimberg wrote:
>
>
> On 18/06/2024 19:07, Bart Van Assche wrote:
>> On 6/17/24 5:10 PM, Max Gurtovoy wrote:
>>> +    if (wait_for_completion_timeout(&qp->srq_completion, 10 * HZ) > 
>>> 0) {
>
> I think this warrants a comment to why you stop after consuming 
> cq->cqe completions
> (i.e. shared completions).
>
There is a full explanation in the function documentation.


>>> +        while (polled != cq->cqe) {
>>> +            n = ib_process_cq_direct(cq, cq->cqe - polled);
>>> +            if (!n)
>>> +                return;
>>> +            polled += n;
>>> +        }
>>> +    }
>>
>> Why a hardcoded timeout (10 * HZ) instead of waiting forever?
>
> Agreed. Is there a scenario where the IB event is missed or something?

I can change it to (60 * HZ) instead.

I prefer not waiting forever and getting a stuck kernel if the 
underlying device is defected.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] IB/core: add support for draining Shared receive queues
  2024-06-19  9:09   ` Sagi Grimberg
@ 2024-06-19 11:16     ` Max Gurtovoy
  0 siblings, 0 replies; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-19 11:16 UTC (permalink / raw)
  To: Sagi Grimberg, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet


On 19/06/2024 12:09, Sagi Grimberg wrote:
>
>
> On 18/06/2024 3:10, Max Gurtovoy wrote:
>> To avoid leakage for QPs assocoated with SRQ, according to IB spec
>> (section 10.3.1):
>>
>> "Note, for QPs that are associated with an SRQ, the Consumer should take
>> the QP through the Error State before invoking a Destroy QP or a Modify
>> QP to the Reset State. The Consumer may invoke the Destroy QP without
>> first performing a Modify QP to the Error State and waiting for the 
>> Affiliated
>> Asynchronous Last WQE Reached Event. However, if the Consumer
>> does not wait for the Affiliated Asynchronous Last WQE Reached Event,
>> then WQE and Data Segment leakage may occur. Therefore, it is good
>> programming practice to teardown a QP that is associated with an SRQ
>> by using the following process:
>>   - Put the QP in the Error State;
>>   - wait for the Affiliated Asynchronous Last WQE Reached Event;
>>   - either:
>>     - drain the CQ by invoking the Poll CQ verb and either wait for CQ
>>       to be empty or the number of Poll CQ operations has exceeded
>>       CQ capacity size; or
>>     - post another WR that completes on the same CQ and wait for this
>>       WR to return as a WC;
>>   - and then invoke a Destroy QP or Reset QP."
>>
>> Catch the Last WQE Reached Event in the core layer without involving the
>> ULP drivers with extra logic during drain and destroy QP flows.
>>
>> The "Last WQE Reached" event will only be send on the errant QP, for
>> signaling that the SRQ flushed all the WQEs that have been dequeued from
>> the SRQ for processing by the associated QP.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   drivers/infiniband/core/verbs.c | 83 ++++++++++++++++++++++++++++++++-
>>   include/rdma/ib_verbs.h         |  2 +
>>   2 files changed, 84 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/verbs.c 
>> b/drivers/infiniband/core/verbs.c
>> index 94a7f3b0c71c..9e4df7d40e0c 100644
>> --- a/drivers/infiniband/core/verbs.c
>> +++ b/drivers/infiniband/core/verbs.c
>> @@ -1101,6 +1101,16 @@ EXPORT_SYMBOL(ib_destroy_srq_user);
>>     /* Queue pairs */
>>   +static void __ib_qp_event_handler(struct ib_event *event, void 
>> *context)
>> +{
>> +    struct ib_qp *qp = event->element.qp;
>> +
>> +    if (event->event == IB_EVENT_QP_LAST_WQE_REACHED)
>> +        complete(&qp->srq_completion);
>> +    else if (qp->registered_event_handler)
>> +        qp->registered_event_handler(event, qp->qp_context);
>
> There is no reason what-so-ever to withhold the LAST_WQE_REACHED from 
> the ulp.
> The ULP may be interested in consuming this event.
>
> This should become:
>
> +static void __ib_qp_event_handler(struct ib_event *event, void *context)
> +{
> +    struct ib_qp *qp = event->element.qp;
> +
> +    if (event->event == IB_EVENT_QP_LAST_WQE_REACHED)
> +        complete(&qp->srq_completion);
> +    if (qp->registered_event_handler)
> +        qp->registered_event_handler(event, qp->qp_context);
>
>
Good idea.

Thanks.

>
>
>> +}
>> +
>>   static void __ib_shared_qp_event_handler(struct ib_event *event, 
>> void *context)
>>   {
>>       struct ib_qp *qp = context;
>> @@ -1221,7 +1231,10 @@ static struct ib_qp *create_qp(struct 
>> ib_device *dev, struct ib_pd *pd,
>>       qp->qp_type = attr->qp_type;
>>       qp->rwq_ind_tbl = attr->rwq_ind_tbl;
>>       qp->srq = attr->srq;
>> -    qp->event_handler = attr->event_handler;
>> +    if (qp->srq)
>> +        init_completion(&qp->srq_completion);
>
> I think that if you unconditionally complete, you should also 
> unconditionally initialize.

Non "SRQ" QP will not get "Last WQE reached" event. Unless device is 
defected.

Anyway, I'm ok with your suggestion. There is no hard in initializing it 
for Non "SRQ" QPs as well.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] IB/isert: remove the handling of last WQE reached event
  2024-06-19  9:16   ` Sagi Grimberg
@ 2024-06-19 15:25     ` Max Gurtovoy
  0 siblings, 0 replies; 19+ messages in thread
From: Max Gurtovoy @ 2024-06-19 15:25 UTC (permalink / raw)
  To: Sagi Grimberg, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet


On 19/06/2024 12:16, Sagi Grimberg wrote:
>
>
> On 18/06/2024 3:10, Max Gurtovoy wrote:
>> This event is handled by the RDMA core layer.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   drivers/infiniband/ulp/isert/ib_isert.c | 3 ---
>>   1 file changed, 3 deletions(-)
>>
>> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
>> b/drivers/infiniband/ulp/isert/ib_isert.c
>> index 00a7303c8cc6..42977a5326ee 100644
>> --- a/drivers/infiniband/ulp/isert/ib_isert.c
>> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
>> @@ -91,9 +91,6 @@ isert_qp_event_callback(struct ib_event *e, void 
>> *context)
>>       case IB_EVENT_COMM_EST:
>>           rdma_notify(isert_conn->cm_id, IB_EVENT_COMM_EST);
>>           break;
>> -    case IB_EVENT_QP_LAST_WQE_REACHED:
>> -        isert_warn("Reached TX IB_EVENT_QP_LAST_WQE_REACHED\n");
>> -        break;
>
> Don't think you need to touch the ulps, they want to log these events, 
> so be it.
> Although, a warn is not appropriate at all here.

Ok. I'll remove the ULP patches, besides iSER target that is not even 
implement SRQ.

This is a dead code.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/6] Last WQE Reached event treatment
  2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
                   ` (5 preceding siblings ...)
  2024-06-18  0:10 ` [PATCH 6/6] RDMA/IPoIB: " Max Gurtovoy
@ 2024-06-23 13:03 ` Zhu Yanjun
  6 siblings, 0 replies; 19+ messages in thread
From: Zhu Yanjun @ 2024-06-23 13:03 UTC (permalink / raw)
  To: Max Gurtovoy, leonro, jgg, linux-nvme, linux-rdma, chuck.lever
  Cc: oren, israelr, maorg, yishaih, hch, bvanassche, shiraz.saleem,
	edumazet

在 2024/6/18 8:10, Max Gurtovoy 写道:
> Hi Jason/Leon/Sagi,
> 
> This series adds a support for draining a QP that is associated with a
> SRQ (Shared Receive Queue).
> Leakage problem can occur if we won't treat Last WQE Reached event.
> 
> In the series, that is based on some old series I've send during 2018, I

The old series is as below. It had better to post the link.

https://www.spinics.net/lists/linux-rdma/msg59633.html

Zhu Yanjun

> used a different approach and handled the event in the RDMA core, as was
> suggested in discussion in the mailing list.
> 
> I've updated RDMA ULPs. Most of them were trivial except IPoIB that was
> handling the Last WQE reached in the ULP.
> 
> I've tested this series with NVMf/RDMA on RoCE.
> 
> Max Gurtovoy (6):
>    IB/core: add support for draining Shared receive queues
>    IB/isert: remove the handling of last WQE reached event
>    RDMA/srpt: remove the handling of last WQE reached event
>    nvmet-rdma: remove the handling of last WQE reached event
>    svcrdma: remove the handling of last WQE reached event
>    RDMA/IPoIB: remove the handling of last WQE reached event
> 
>   drivers/infiniband/core/verbs.c          | 83 +++++++++++++++++++++++-
>   drivers/infiniband/ulp/ipoib/ipoib.h     | 33 +---------
>   drivers/infiniband/ulp/ipoib/ipoib_cm.c  | 71 ++------------------
>   drivers/infiniband/ulp/isert/ib_isert.c  |  3 -
>   drivers/infiniband/ulp/srpt/ib_srpt.c    |  5 --
>   drivers/nvme/target/rdma.c               |  4 --
>   include/rdma/ib_verbs.h                  |  2 +
>   net/sunrpc/xprtrdma/svc_rdma_transport.c |  1 -
>   8 files changed, 92 insertions(+), 110 deletions(-)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-06-23 13:03 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-18  0:10 [PATCH v1 0/6] Last WQE Reached event treatment Max Gurtovoy
2024-06-18  0:10 ` [PATCH 1/6] IB/core: add support for draining Shared receive queues Max Gurtovoy
2024-06-18 16:07   ` Bart Van Assche
2024-06-19  9:14     ` Sagi Grimberg
2024-06-19 11:12       ` Max Gurtovoy
2024-06-19  9:09   ` Sagi Grimberg
2024-06-19 11:16     ` Max Gurtovoy
2024-06-18  0:10 ` [PATCH 2/6] IB/isert: remove the handling of last WQE reached event Max Gurtovoy
2024-06-19  9:16   ` Sagi Grimberg
2024-06-19 15:25     ` Max Gurtovoy
2024-06-18  0:10 ` [PATCH 3/6] RDMA/srpt: " Max Gurtovoy
2024-06-18 16:08   ` Bart Van Assche
2024-06-18  0:10 ` [PATCH 4/6] nvmet-rdma: " Max Gurtovoy
2024-06-18  0:10 ` [PATCH 5/6] svcrdma: " Max Gurtovoy
2024-06-18 15:12   ` Chuck Lever
2024-06-18  0:10 ` [PATCH 6/6] RDMA/IPoIB: " Max Gurtovoy
2024-06-19  9:18   ` Sagi Grimberg
2024-06-19  9:25     ` Leon Romanovsky
2024-06-23 13:03 ` [PATCH v1 0/6] Last WQE Reached event treatment Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).