* [PATCH rdma-next 0/3] Batch of IBCM improvements
@ 2024-11-13 11:12 Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 1/3] IB/cm: Explicitly mark if a response MAD is a retransmission Leon Romanovsky
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-11-13 11:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: linux-kernel, linux-rdma, Or Har-Toov, Sean Hefty,
Vlad Dumitrescu
This patchset from Sean fixes old standing issues with IB/cm.
Thanks
Sean Hefty (3):
IB/cm: Explicitly mark if a response MAD is a retransmission
IB/cm: Do not hold reference on cm_id unless needed
IB/cm: Rework sending DREQ when destroying a cm_id
drivers/infiniband/core/cm.c | 170 ++++++++++++++++++-----------------
1 file changed, 87 insertions(+), 83 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH rdma-next 1/3] IB/cm: Explicitly mark if a response MAD is a retransmission
2024-11-13 11:12 [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
@ 2024-11-13 11:12 ` Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 2/3] IB/cm: Do not hold reference on cm_id unless needed Leon Romanovsky
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-11-13 11:12 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Sean Hefty, linux-rdma, Or Har-Toov, Vlad Dumitrescu
From: Sean Hefty <shefty@nvidia.com>
In several situations the CM may send a reply to a received MAD
without the reply being directly linked with a cm_id. For
example, it may send a REJ in response to a REQ which does not
match a listener. Or, it may send a DREP in response to a DREQ
if the cm_id has already been destroyed. This can happen if the
original DREP was lost and the DREQ was retried.
When such a response MAD completes, it updates a counter tracking
how many MADs were retried. However, not all response MADs issued
directly by the CM may be retries. The REJ mentioned in the example
above is such a case. To distinguish between responses which were
retries versus those that are not, the send_handler performs the
following check: is a retry if the response is not associated with
a cm_id and the response is not a REJ message.
Replace this indirect method of checking if a response is a retry
with an explicit check. Note that these retries are generated
directly by the CM, rather than retried by the MAD layer.
This change will be needed by later changes which would otherwise
break the indirect check.
Signed-off-by: Sean Hefty <shefty@nvidia.com>
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cm.c | 51 ++++++++++++++++++++++--------------
1 file changed, 31 insertions(+), 20 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 07fb8d3c037f..99246e49dd3a 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -35,6 +35,8 @@ MODULE_DESCRIPTION("InfiniBand CM");
MODULE_LICENSE("Dual BSD/GPL");
#define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
+#define CM_DIRECT_RETRY_CTX ((void *) 1UL)
+
static const char * const ibcm_rej_reason_strs[] = {
[IB_CM_REJ_NO_QP] = "no QP",
[IB_CM_REJ_NO_EEC] = "no EEC",
@@ -358,13 +360,20 @@ static void cm_free_priv_msg(struct ib_mad_send_buf *msg)
ib_free_send_mad(msg);
}
-static struct ib_mad_send_buf *cm_alloc_response_msg_no_ah(struct cm_port *port,
- struct ib_mad_recv_wc *mad_recv_wc)
+static struct ib_mad_send_buf *
+cm_alloc_response_msg_no_ah(struct cm_port *port,
+ struct ib_mad_recv_wc *mad_recv_wc,
+ bool direct_retry)
{
- return ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index,
- 0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
- GFP_ATOMIC,
- IB_MGMT_BASE_VERSION);
+ struct ib_mad_send_buf *m;
+
+ m = ib_create_send_mad(port->mad_agent, 1, mad_recv_wc->wc->pkey_index,
+ 0, IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
+ GFP_ATOMIC, IB_MGMT_BASE_VERSION);
+ if (!IS_ERR(m))
+ m->context[0] = direct_retry ? CM_DIRECT_RETRY_CTX : NULL;
+
+ return m;
}
static int cm_create_response_msg_ah(struct cm_port *port,
@@ -384,12 +393,13 @@ static int cm_create_response_msg_ah(struct cm_port *port,
static int cm_alloc_response_msg(struct cm_port *port,
struct ib_mad_recv_wc *mad_recv_wc,
+ bool direct_retry,
struct ib_mad_send_buf **msg)
{
struct ib_mad_send_buf *m;
int ret;
- m = cm_alloc_response_msg_no_ah(port, mad_recv_wc);
+ m = cm_alloc_response_msg_no_ah(port, mad_recv_wc, direct_retry);
if (IS_ERR(m))
return PTR_ERR(m);
@@ -1598,7 +1608,7 @@ static int cm_issue_rej(struct cm_port *port,
struct cm_rej_msg *rej_msg, *rcv_msg;
int ret;
- ret = cm_alloc_response_msg(port, mad_recv_wc, &msg);
+ ret = cm_alloc_response_msg(port, mad_recv_wc, false, &msg);
if (ret)
return ret;
@@ -1951,7 +1961,7 @@ static void cm_dup_req_handler(struct cm_work *work,
}
spin_unlock_irq(&cm_id_priv->lock);
- ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg);
+ ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, true, &msg);
if (ret)
return;
@@ -2444,7 +2454,7 @@ static void cm_dup_rep_handler(struct cm_work *work)
atomic_long_inc(
&work->port->counters[CM_RECV_DUPLICATES][CM_REP_COUNTER]);
- ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg);
+ ret = cm_alloc_response_msg(work->port, work->mad_recv_wc, true, &msg);
if (ret)
goto deref;
@@ -2791,7 +2801,7 @@ static int cm_issue_drep(struct cm_port *port,
struct cm_drep_msg *drep_msg;
int ret;
- ret = cm_alloc_response_msg(port, mad_recv_wc, &msg);
+ ret = cm_alloc_response_msg(port, mad_recv_wc, true, &msg);
if (ret)
return ret;
@@ -2856,7 +2866,8 @@ static int cm_dreq_handler(struct cm_work *work)
case IB_CM_TIMEWAIT:
atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
[CM_DREQ_COUNTER]);
- msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc);
+ msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc,
+ true);
if (IS_ERR(msg))
goto unlock;
@@ -3361,7 +3372,8 @@ static int cm_lap_handler(struct cm_work *work)
case IB_CM_MRA_LAP_SENT:
atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
[CM_LAP_COUNTER]);
- msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc);
+ msg = cm_alloc_response_msg_no_ah(work->port, work->mad_recv_wc,
+ true);
if (IS_ERR(msg))
goto unlock;
@@ -3826,7 +3838,7 @@ static void cm_send_handler(struct ib_mad_agent *mad_agent,
struct ib_mad_send_wc *mad_send_wc)
{
struct ib_mad_send_buf *msg = mad_send_wc->send_buf;
- struct cm_id_private *cm_id_priv = msg->context[0];
+ struct cm_id_private *cm_id_priv;
enum ib_cm_state state =
(enum ib_cm_state)(unsigned long)msg->context[1];
struct cm_port *port;
@@ -3836,13 +3848,12 @@ static void cm_send_handler(struct ib_mad_agent *mad_agent,
attr_index = be16_to_cpu(((struct ib_mad_hdr *)
msg->mad)->attr_id) - CM_ATTR_ID_OFFSET;
- /*
- * If the send was in response to a received message (context[0] is not
- * set to a cm_id), and is not a REJ, then it is a send that was
- * manually retried.
- */
- if (!cm_id_priv && (attr_index != CM_REJ_COUNTER))
+ if (msg->context[0] == CM_DIRECT_RETRY_CTX) {
msg->retries = 1;
+ cm_id_priv = NULL;
+ } else {
+ cm_id_priv = msg->context[0];
+ }
atomic_long_add(1 + msg->retries, &port->counters[CM_XMIT][attr_index]);
if (msg->retries)
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rdma-next 2/3] IB/cm: Do not hold reference on cm_id unless needed
2024-11-13 11:12 [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 1/3] IB/cm: Explicitly mark if a response MAD is a retransmission Leon Romanovsky
@ 2024-11-13 11:12 ` Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 3/3] IB/cm: Rework sending DREQ when destroying a cm_id Leon Romanovsky
2024-11-17 9:52 ` [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-11-13 11:12 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Sean Hefty, linux-rdma, Or Har-Toov, Vlad Dumitrescu
From: Sean Hefty <shefty@nvidia.com>
Typically, when the CM sends a MAD it bumps a reference count
on the associated cm_id. There are some exceptions, such
as when the MAD is a direct response to a receive MAD. For
example, the CM may generate an MRA in response to a duplicate
REQ. But, in general, if a MAD may be sent as a result of
the user invoking an API call (e.g. ib_send_cm_rep(),
ib_send_cm_rtu(), etc.), a reference is taken on the cm_id.
This reference is necessary if the MAD requires a response.
The reference allows routing a response MAD back to the
cm_id, or, if no response is received, allows updating the
cm_id state to reflect the failure.
For MADs which do not generate a response from the
target, however, there's no need to hold a reference on the cm_id.
Such MADs will not be retried by the MAD layer and their
completions do not change the state of the cm_id.
There are 2 internal calls used to allocate MADs which take
a reference on the cm_id: cm_alloc_msg() and cm_alloc_priv_msg().
The latter calls the former. It turns out that all other places
where cm_alloc_msg() is called are for MADs that do not generate
a response from the target: sending an RTU, DREP, REJ, MRA, or
SIDR REP. In all of these cases, there's no need to hold a
reference on the cm_id.
The benefit of dropping unneeded references is that it allows
destruction of the cm_id to proceed immediately. Currently,
the cm_destroy_id() call blocks as long as there's a reference
held on the cm_id. Worse, is that cm_destroy_id() will send
MADs, which it then needs to complete. Sending the MADs is
beneficial, as they notify the peer that a connection is
being destroyed. However, since the MADs hold a reference
on the cm_id, they block destruction and cannot be retried.
Move cm_id referencing from cm_alloc_msg() to cm_alloc_priv_msg().
The latter should hold a reference on the cm_id in all cases but
one, which will be handled in a separate patch. cm_alloc_priv_msg()
is used when sending a REQ, REP, DREQ, and SIDR REQ, all of which
require a response.
Also, merge common code into cm_alloc_priv_msg() and combine the
freeing of all messages which do not need a response.
Signed-off-by: Sean Hefty <shefty@nvidia.com>
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cm.c | 66 +++++++++++++-----------------------
1 file changed, 24 insertions(+), 42 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 99246e49dd3a..2517bfebcfd5 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -309,12 +309,7 @@ static struct ib_mad_send_buf *cm_alloc_msg(struct cm_id_private *cm_id_priv)
goto out;
}
- /* Timeout set by caller if response is expected. */
m->ah = ah;
- m->retries = cm_id_priv->max_cm_retries;
-
- refcount_inc(&cm_id_priv->refcount);
- m->context[0] = cm_id_priv;
out:
spin_unlock(&cm_id_priv->av.port->cm_dev->mad_agent_lock);
@@ -323,16 +318,13 @@ static struct ib_mad_send_buf *cm_alloc_msg(struct cm_id_private *cm_id_priv)
static void cm_free_msg(struct ib_mad_send_buf *msg)
{
- struct cm_id_private *cm_id_priv = msg->context[0];
-
if (msg->ah)
rdma_destroy_ah(msg->ah, 0);
- cm_deref_id(cm_id_priv);
ib_free_send_mad(msg);
}
static struct ib_mad_send_buf *
-cm_alloc_priv_msg(struct cm_id_private *cm_id_priv)
+cm_alloc_priv_msg(struct cm_id_private *cm_id_priv, enum ib_cm_state state)
{
struct ib_mad_send_buf *msg;
@@ -341,7 +333,15 @@ cm_alloc_priv_msg(struct cm_id_private *cm_id_priv)
msg = cm_alloc_msg(cm_id_priv);
if (IS_ERR(msg))
return msg;
+
cm_id_priv->msg = msg;
+ refcount_inc(&cm_id_priv->refcount);
+ msg->context[0] = cm_id_priv;
+ msg->context[1] = (void *) (unsigned long) state;
+
+ msg->retries = cm_id_priv->max_cm_retries;
+ msg->timeout_ms = cm_id_priv->timeout_ms;
+
return msg;
}
@@ -413,13 +413,6 @@ static int cm_alloc_response_msg(struct cm_port *port,
return 0;
}
-static void cm_free_response_msg(struct ib_mad_send_buf *msg)
-{
- if (msg->ah)
- rdma_destroy_ah(msg->ah, 0);
- ib_free_send_mad(msg);
-}
-
static void *cm_copy_private_data(const void *private_data, u8 private_data_len)
{
void *data;
@@ -1567,7 +1560,7 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
if (param->alternate_path)
cm_move_av_from_path(&cm_id_priv->alt_av, &alt_av);
- msg = cm_alloc_priv_msg(cm_id_priv);
+ msg = cm_alloc_priv_msg(cm_id_priv, IB_CM_REQ_SENT);
if (IS_ERR(msg)) {
ret = PTR_ERR(msg);
goto out_unlock;
@@ -1576,8 +1569,6 @@ int ib_send_cm_req(struct ib_cm_id *cm_id,
req_msg = (struct cm_req_msg *)msg->mad;
cm_format_req(req_msg, cm_id_priv, param);
cm_id_priv->tid = req_msg->hdr.tid;
- msg->timeout_ms = cm_id_priv->timeout_ms;
- msg->context[1] = (void *)(unsigned long)IB_CM_REQ_SENT;
cm_id_priv->local_qpn = cpu_to_be32(IBA_GET(CM_REQ_LOCAL_QPN, req_msg));
cm_id_priv->rq_psn = cpu_to_be32(IBA_GET(CM_REQ_STARTING_PSN, req_msg));
@@ -1634,7 +1625,7 @@ static int cm_issue_rej(struct cm_port *port,
IBA_GET(CM_REJ_REMOTE_COMM_ID, rcv_msg));
ret = ib_post_send_mad(msg, NULL);
if (ret)
- cm_free_response_msg(msg);
+ cm_free_msg(msg);
return ret;
}
@@ -1990,7 +1981,7 @@ static void cm_dup_req_handler(struct cm_work *work,
return;
unlock: spin_unlock_irq(&cm_id_priv->lock);
-free: cm_free_response_msg(msg);
+free: cm_free_msg(msg);
}
static struct cm_id_private *cm_match_req(struct cm_work *work,
@@ -2304,7 +2295,7 @@ int ib_send_cm_rep(struct ib_cm_id *cm_id,
goto out;
}
- msg = cm_alloc_priv_msg(cm_id_priv);
+ msg = cm_alloc_priv_msg(cm_id_priv, IB_CM_REP_SENT);
if (IS_ERR(msg)) {
ret = PTR_ERR(msg);
goto out;
@@ -2312,8 +2303,6 @@ int ib_send_cm_rep(struct ib_cm_id *cm_id,
rep_msg = (struct cm_rep_msg *) msg->mad;
cm_format_rep(rep_msg, cm_id_priv, param);
- msg->timeout_ms = cm_id_priv->timeout_ms;
- msg->context[1] = (void *) (unsigned long) IB_CM_REP_SENT;
trace_icm_send_rep(cm_id);
ret = ib_post_send_mad(msg, NULL);
@@ -2479,7 +2468,7 @@ static void cm_dup_rep_handler(struct cm_work *work)
goto deref;
unlock: spin_unlock_irq(&cm_id_priv->lock);
-free: cm_free_response_msg(msg);
+free: cm_free_msg(msg);
deref: cm_deref_id(cm_id_priv);
}
@@ -2683,7 +2672,7 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
cm_id_priv->id.lap_state == IB_CM_MRA_LAP_RCVD)
ib_cancel_mad(cm_id_priv->msg);
- msg = cm_alloc_priv_msg(cm_id_priv);
+ msg = cm_alloc_priv_msg(cm_id_priv, IB_CM_DREQ_SENT);
if (IS_ERR(msg)) {
cm_enter_timewait(cm_id_priv);
return PTR_ERR(msg);
@@ -2691,8 +2680,6 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
cm_format_dreq((struct cm_dreq_msg *) msg->mad, cm_id_priv,
private_data, private_data_len);
- msg->timeout_ms = cm_id_priv->timeout_ms;
- msg->context[1] = (void *) (unsigned long) IB_CM_DREQ_SENT;
trace_icm_send_dreq(&cm_id_priv->id);
ret = ib_post_send_mad(msg, NULL);
@@ -2819,7 +2806,7 @@ static int cm_issue_drep(struct cm_port *port,
IBA_GET(CM_DREQ_REMOTE_COMM_ID, dreq_msg));
ret = ib_post_send_mad(msg, NULL);
if (ret)
- cm_free_response_msg(msg);
+ cm_free_msg(msg);
return ret;
}
@@ -2878,7 +2865,7 @@ static int cm_dreq_handler(struct cm_work *work)
if (cm_create_response_msg_ah(work->port, work->mad_recv_wc, msg) ||
ib_post_send_mad(msg, NULL))
- cm_free_response_msg(msg);
+ cm_free_msg(msg);
goto deref;
case IB_CM_DREQ_RCVD:
atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
@@ -3386,7 +3373,7 @@ static int cm_lap_handler(struct cm_work *work)
if (cm_create_response_msg_ah(work->port, work->mad_recv_wc, msg) ||
ib_post_send_mad(msg, NULL))
- cm_free_response_msg(msg);
+ cm_free_msg(msg);
goto deref;
case IB_CM_LAP_RCVD:
atomic_long_inc(&work->port->counters[CM_RECV_DUPLICATES]
@@ -3525,7 +3512,7 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id,
goto out_unlock;
}
- msg = cm_alloc_priv_msg(cm_id_priv);
+ msg = cm_alloc_priv_msg(cm_id_priv, IB_CM_SIDR_REQ_SENT);
if (IS_ERR(msg)) {
ret = PTR_ERR(msg);
goto out_unlock;
@@ -3533,8 +3520,6 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id,
cm_format_sidr_req((struct cm_sidr_req_msg *)msg->mad, cm_id_priv,
param);
- msg->timeout_ms = cm_id_priv->timeout_ms;
- msg->context[1] = (void *)(unsigned long)IB_CM_SIDR_REQ_SENT;
trace_icm_send_sidr_req(&cm_id_priv->id);
ret = ib_post_send_mad(msg, NULL);
@@ -3780,17 +3765,17 @@ static int cm_sidr_rep_handler(struct cm_work *work)
static void cm_process_send_error(struct cm_id_private *cm_id_priv,
struct ib_mad_send_buf *msg,
- enum ib_cm_state state,
enum ib_wc_status wc_status)
{
+ enum ib_cm_state state = (unsigned long) msg->context[1];
struct ib_cm_event cm_event = {};
int ret;
- /* Discard old sends or ones without a response. */
+ /* Discard old sends. */
spin_lock_irq(&cm_id_priv->lock);
if (msg != cm_id_priv->msg) {
spin_unlock_irq(&cm_id_priv->lock);
- cm_free_msg(msg);
+ cm_free_priv_msg(msg);
return;
}
cm_free_priv_msg(msg);
@@ -3839,8 +3824,6 @@ static void cm_send_handler(struct ib_mad_agent *mad_agent,
{
struct ib_mad_send_buf *msg = mad_send_wc->send_buf;
struct cm_id_private *cm_id_priv;
- enum ib_cm_state state =
- (enum ib_cm_state)(unsigned long)msg->context[1];
struct cm_port *port;
u16 attr_index;
@@ -3861,10 +3844,9 @@ static void cm_send_handler(struct ib_mad_agent *mad_agent,
&port->counters[CM_XMIT_RETRIES][attr_index]);
if (cm_id_priv)
- cm_process_send_error(cm_id_priv, msg, state,
- mad_send_wc->status);
+ cm_process_send_error(cm_id_priv, msg, mad_send_wc->status);
else
- cm_free_response_msg(msg);
+ cm_free_msg(msg);
}
static void cm_work_handler(struct work_struct *_work)
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rdma-next 3/3] IB/cm: Rework sending DREQ when destroying a cm_id
2024-11-13 11:12 [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 1/3] IB/cm: Explicitly mark if a response MAD is a retransmission Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 2/3] IB/cm: Do not hold reference on cm_id unless needed Leon Romanovsky
@ 2024-11-13 11:12 ` Leon Romanovsky
2024-11-17 9:52 ` [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-11-13 11:12 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Sean Hefty, linux-rdma, Or Har-Toov, Vlad Dumitrescu
From: Sean Hefty <shefty@nvidia.com>
A DREQ is sent in 2 situations:
1. When requested by the user.
This DREQ has to wait for a DREP, which will be routed to the user.
2. When the cm_id is destroyed.
This DREQ is generated by the CM to notify the peer that the
connection has been destroyed.
In the latter case, any DREP that is received will be discarded.
There's no need to hold a reference on the cm_id. Today, both
situations are covered by the same function: cm_send_dreq_locked().
When invoked in the cm_id destroy path, the cm_id reference would be
held until the DREQ completes, blocking the destruction. Because it
could take several seconds to minutes before the DREQ receives a DREP,
the destroy call posts a send for the DREQ then immediately cancels the
MAD. However, cancellation is not immediate in the MAD layer. There
could still be a delay before the MAD layer returns the DREQ to the CM.
Moreover, the only guarantee is that the DREQ will be sent at most once.
Introduce a separate flow for sending a DREQ when destroying the cm_id.
The new flow will not hold a reference on the cm_id, allowing it to be
cleaned up immediately. The cancellation trick is no longer needed.
The MAD layer will send the DREQ exactly once.
Signed-off-by: Sean Hefty <shefty@nvidia.com>
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cm.c | 53 ++++++++++++++++++++++--------------
1 file changed, 32 insertions(+), 21 deletions(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 2517bfebcfd5..142170473e75 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -95,8 +95,7 @@ static void cm_process_work(struct cm_id_private *cm_id_priv,
struct cm_work *work);
static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
struct ib_cm_sidr_rep_param *param);
-static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
- const void *private_data, u8 private_data_len);
+static void cm_issue_dreq(struct cm_id_private *cm_id_priv);
static int cm_send_drep_locked(struct cm_id_private *cm_id_priv,
void *private_data, u8 private_data_len);
static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
@@ -1112,7 +1111,8 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
cm_id->state = IB_CM_IDLE;
break;
}
- cm_send_dreq_locked(cm_id_priv, NULL, 0);
+ cm_issue_dreq(cm_id_priv);
+ cm_enter_timewait(cm_id_priv);
goto retest;
case IB_CM_DREQ_SENT:
ib_cancel_mad(cm_id_priv->msg);
@@ -2652,20 +2652,42 @@ static void cm_format_dreq(struct cm_dreq_msg *dreq_msg,
private_data_len);
}
-static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
- const void *private_data, u8 private_data_len)
+static void cm_issue_dreq(struct cm_id_private *cm_id_priv)
{
struct ib_mad_send_buf *msg;
int ret;
lockdep_assert_held(&cm_id_priv->lock);
+ msg = cm_alloc_msg(cm_id_priv);
+ if (IS_ERR(msg))
+ return;
+
+ cm_format_dreq((struct cm_dreq_msg *) msg->mad, cm_id_priv, NULL, 0);
+
+ trace_icm_send_dreq(&cm_id_priv->id);
+ ret = ib_post_send_mad(msg, NULL);
+ if (ret)
+ cm_free_msg(msg);
+}
+
+int ib_send_cm_dreq(struct ib_cm_id *cm_id, const void *private_data,
+ u8 private_data_len)
+{
+ struct cm_id_private *cm_id_priv =
+ container_of(cm_id, struct cm_id_private, id);
+ struct ib_mad_send_buf *msg;
+ unsigned long flags;
+ int ret;
+
if (private_data && private_data_len > IB_CM_DREQ_PRIVATE_DATA_SIZE)
return -EINVAL;
+ spin_lock_irqsave(&cm_id_priv->lock, flags);
if (cm_id_priv->id.state != IB_CM_ESTABLISHED) {
trace_icm_dreq_skipped(&cm_id_priv->id);
- return -EINVAL;
+ ret = -EINVAL;
+ goto unlock;
}
if (cm_id_priv->id.lap_state == IB_CM_LAP_SENT ||
@@ -2675,7 +2697,8 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
msg = cm_alloc_priv_msg(cm_id_priv, IB_CM_DREQ_SENT);
if (IS_ERR(msg)) {
cm_enter_timewait(cm_id_priv);
- return PTR_ERR(msg);
+ ret = PTR_ERR(msg);
+ goto unlock;
}
cm_format_dreq((struct cm_dreq_msg *) msg->mad, cm_id_priv,
@@ -2686,23 +2709,11 @@ static int cm_send_dreq_locked(struct cm_id_private *cm_id_priv,
if (ret) {
cm_enter_timewait(cm_id_priv);
cm_free_priv_msg(msg);
- return ret;
+ goto unlock;
}
cm_id_priv->id.state = IB_CM_DREQ_SENT;
- return 0;
-}
-
-int ib_send_cm_dreq(struct ib_cm_id *cm_id, const void *private_data,
- u8 private_data_len)
-{
- struct cm_id_private *cm_id_priv =
- container_of(cm_id, struct cm_id_private, id);
- unsigned long flags;
- int ret;
-
- spin_lock_irqsave(&cm_id_priv->lock, flags);
- ret = cm_send_dreq_locked(cm_id_priv, private_data, private_data_len);
+unlock:
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
return ret;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH rdma-next 0/3] Batch of IBCM improvements
2024-11-13 11:12 [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
` (2 preceding siblings ...)
2024-11-13 11:12 ` [PATCH rdma-next 3/3] IB/cm: Rework sending DREQ when destroying a cm_id Leon Romanovsky
@ 2024-11-17 9:52 ` Leon Romanovsky
3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2024-11-17 9:52 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: linux-kernel, linux-rdma, Or Har-Toov, Sean Hefty,
Vlad Dumitrescu
On Wed, 13 Nov 2024 13:12:53 +0200, Leon Romanovsky wrote:
> This patchset from Sean fixes old standing issues with IB/cm.
>
> Thanks
>
> Sean Hefty (3):
> IB/cm: Explicitly mark if a response MAD is a retransmission
> IB/cm: Do not hold reference on cm_id unless needed
> IB/cm: Rework sending DREQ when destroying a cm_id
>
> [...]
Applied, thanks!
[1/3] IB/cm: Explicitly mark if a response MAD is a retransmission
https://git.kernel.org/rdma/rdma/c/0492458750c9fb
[2/3] IB/cm: Do not hold reference on cm_id unless needed
https://git.kernel.org/rdma/rdma/c/1e5159219076dd
[3/3] IB/cm: Rework sending DREQ when destroying a cm_id
https://git.kernel.org/rdma/rdma/c/fc0856c3a32576
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-17 9:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-13 11:12 [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 1/3] IB/cm: Explicitly mark if a response MAD is a retransmission Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 2/3] IB/cm: Do not hold reference on cm_id unless needed Leon Romanovsky
2024-11-13 11:12 ` [PATCH rdma-next 3/3] IB/cm: Rework sending DREQ when destroying a cm_id Leon Romanovsky
2024-11-17 9:52 ` [PATCH rdma-next 0/3] Batch of IBCM improvements Leon Romanovsky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.