* [PATCH rdma-next v1 0/2] IB/mad: Add Flow Control for Solicited MADs
@ 2025-01-07 14:02 Leon Romanovsky
2025-01-07 14:02 ` [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer Leon Romanovsky
2025-01-07 14:02 ` [PATCH rdma-next v1 2/2] IB/mad: Add flow control for solicited MADs Leon Romanovsky
0 siblings, 2 replies; 6+ messages in thread
From: Leon Romanovsky @ 2025-01-07 14:02 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, linux-kernel, linux-rdma, Maher Sanalla,
Or Har-Toov
From: Leon Romanovsky <leonro@nvidia.com>
Changelog:
v1:
* Add a cancel state to the state machine which allows removing the
status field in MAD's struct.
* Add change_mad_state function which handles all the state transition.
* Add WARN_ONs to check MAD states
* Reorganize patches to have only two patches in this series instead of three.
v0: https://lore.kernel.org/all/cover.1733233636.git.leonro@nvidia.com
--------------------------------------------------------------------------
From Or,
This patch series introduces flow control for solicited MADs in
the MAD layer, addressing the need to avoid loss caused by insufficient
resources at the receiver side.
Both the client and the server act as receivers - the latter receives
requests, while the former receives responses.
To facilitate this flow control, the series also refactors the MAD code,
improving readability and enabling more straightforward implementation.
Patch #1: Add state machine to MAD layer
Patch #2: Add flow control for solicited MADs
The primary goal of this series is to add a flow control mechanism
to the MAD layer, reducing the number of timeouts.
The accompanying refactoring simplifies state management, making
the code more maintainable and supporting the new flow control
logic effectively.
Thanks
Or Har-Toov (2):
IB/mad: Add state machine to MAD layer
IB/mad: Add flow control for solicited MADs
drivers/infiniband/core/mad.c | 415 ++++++++++++++++++++++-------
drivers/infiniband/core/mad_priv.h | 64 ++++-
drivers/infiniband/core/mad_rmpp.c | 41 +--
3 files changed, 410 insertions(+), 110 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer
2025-01-07 14:02 [PATCH rdma-next v1 0/2] IB/mad: Add Flow Control for Solicited MADs Leon Romanovsky
@ 2025-01-07 14:02 ` Leon Romanovsky
2025-01-14 19:42 ` Jason Gunthorpe
2025-01-07 14:02 ` [PATCH rdma-next v1 2/2] IB/mad: Add flow control for solicited MADs Leon Romanovsky
1 sibling, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2025-01-07 14:02 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Or Har-Toov, linux-rdma, Maher Sanalla
From: Or Har-Toov <ohartoov@nvidia.com>
Replace the use of refcount, timeout and status with a 'state'
field to track the status of MADs send work requests (WRs).
The state machine better represents the stages in the MAD lifecycle,
specifically indicating whether the MAD is waiting for a completion,
waiting for a response, was canceld or is done.
The existing refcount only takes two values:
1 : MAD is waiting either for completion or for response.
2 : MAD is waiting for both response and completion. Also when a
response was received before a completion notification.
The status field represents if the MAD was canceled at some point
in the flow.
The timeout is used to represent if a response was received.
The current state transitions are not clearly visible, and developers
needs to infer the state from the refcount's, timeout's or status's
value, which is error-prone and difficult to follow.
Thus, replace with a state machine as the following:
- 'IB_MAD_STATE_SEND_START': MAD was sent to the QP and is waiting for
completion notification in send list
- 'IB_MAD_STATE_WAIT_RESP': MAD send completed successfully, waiting for
a response in wait list
- 'IB_MAD_STATE_EARLY_RESP': Response came early, before send
completion notification, MAD is in the send list
- 'IB_MAD_STATE_CANCELED': MAD was canceled while in send or wait list
- 'IB_MAD_STATE_DONE': MAD processing completed, MAD is in no list
Adding the state machine also make it possible to remove the double
call for ib_mad_complete_send_wr in case of an early response and the
use of a done list in case of a regular response.
While at it, define a helper to clear error MADs which will handle
freeing MADs that timed out or have been cancelled.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/mad.c | 250 +++++++++++++++++++----------
drivers/infiniband/core/mad_priv.h | 49 +++++-
drivers/infiniband/core/mad_rmpp.c | 41 +++--
3 files changed, 233 insertions(+), 107 deletions(-)
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 1fd54d5c4dd8b..49da1d246f72a 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -391,7 +391,6 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
spin_lock_init(&mad_agent_priv->lock);
INIT_LIST_HEAD(&mad_agent_priv->send_list);
INIT_LIST_HEAD(&mad_agent_priv->wait_list);
- INIT_LIST_HEAD(&mad_agent_priv->done_list);
INIT_LIST_HEAD(&mad_agent_priv->rmpp_list);
INIT_DELAYED_WORK(&mad_agent_priv->timed_work, timeout_sends);
INIT_LIST_HEAD(&mad_agent_priv->local_list);
@@ -1055,6 +1054,98 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
return ret;
}
+static void handle_send_state(struct ib_mad_send_wr_private *mad_send_wr,
+ struct ib_mad_agent_private *mad_agent_priv)
+{
+ if (!mad_send_wr->state) {
+ list_add_tail(&mad_send_wr->agent_list,
+ &mad_agent_priv->send_list);
+ } else {
+ EXPECT_MAD_STATE(mad_send_wr, IB_MAD_STATE_WAIT_RESP);
+ list_move_tail(&mad_send_wr->agent_list,
+ &mad_agent_priv->send_list);
+ }
+}
+
+static void handle_wait_state(struct ib_mad_send_wr_private *mad_send_wr,
+ struct ib_mad_agent_private *mad_agent_priv)
+{
+ struct ib_mad_send_wr_private *temp_mad_send_wr;
+ struct list_head *list_item;
+ unsigned long delay;
+
+ EXPECT_MAD_STATE3(mad_send_wr, IB_MAD_STATE_SEND_START,
+ IB_MAD_STATE_WAIT_RESP, IB_MAD_STATE_CANCELED);
+ list_del_init(&mad_send_wr->agent_list);
+
+ delay = mad_send_wr->timeout;
+ mad_send_wr->timeout += jiffies;
+
+ if (delay) {
+ list_for_each_prev(list_item,
+ &mad_agent_priv->wait_list) {
+ temp_mad_send_wr = list_entry(
+ list_item,
+ struct ib_mad_send_wr_private,
+ agent_list);
+ if (time_after(mad_send_wr->timeout,
+ temp_mad_send_wr->timeout))
+ break;
+ }
+ } else {
+ list_item = &mad_agent_priv->wait_list;
+ }
+
+ list_add(&mad_send_wr->agent_list, list_item);
+}
+
+static void handle_early_resp_state(struct ib_mad_send_wr_private *mad_send_wr,
+ struct ib_mad_agent_private *mad_agent_priv)
+{
+ EXPECT_MAD_STATE(mad_send_wr, IB_MAD_STATE_SEND_START);
+}
+
+static void handle_canceled_state(struct ib_mad_send_wr_private *mad_send_wr,
+ struct ib_mad_agent_private *mad_agent_priv)
+{
+ NOT_EXPECT_MAD_STATE(mad_send_wr, IB_MAD_STATE_DONE);
+}
+
+static void handle_done_state(struct ib_mad_send_wr_private *mad_send_wr,
+ struct ib_mad_agent_private *mad_agent_priv)
+{
+ list_del_init(&mad_send_wr->agent_list);
+}
+
+void change_mad_state(struct ib_mad_send_wr_private *mad_send_wr,
+ enum ib_mad_state new_state)
+{
+ struct ib_mad_agent_private *mad_agent_priv =
+ mad_send_wr->mad_agent_priv;
+
+ switch (new_state) {
+ case IB_MAD_STATE_SEND_START:
+ handle_send_state(mad_send_wr, mad_agent_priv);
+ break;
+ case IB_MAD_STATE_WAIT_RESP:
+ handle_wait_state(mad_send_wr, mad_agent_priv);
+ if (mad_send_wr->state == IB_MAD_STATE_CANCELED)
+ return;
+ break;
+ case IB_MAD_STATE_EARLY_RESP:
+ handle_early_resp_state(mad_send_wr, mad_agent_priv);
+ break;
+ case IB_MAD_STATE_CANCELED:
+ handle_canceled_state(mad_send_wr, mad_agent_priv);
+ break;
+ case IB_MAD_STATE_DONE:
+ handle_done_state(mad_send_wr, mad_agent_priv);
+ break;
+ }
+
+ mad_send_wr->state = new_state;
+}
+
/*
* ib_post_send_mad - Posts MAD(s) to the send queue of the QP associated
* with the registered client
@@ -1118,15 +1209,12 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf,
mad_send_wr->max_retries = send_buf->retries;
mad_send_wr->retries_left = send_buf->retries;
send_buf->retries = 0;
- /* Reference for work request to QP + response */
- mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0);
- mad_send_wr->status = IB_WC_SUCCESS;
+ mad_send_wr->state = 0;
/* Reference MAD agent until send completes */
refcount_inc(&mad_agent_priv->refcount);
spin_lock_irqsave(&mad_agent_priv->lock, flags);
- list_add_tail(&mad_send_wr->agent_list,
- &mad_agent_priv->send_list);
+ change_mad_state(mad_send_wr, IB_MAD_STATE_SEND_START);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
if (ib_mad_kernel_rmpp_agent(&mad_agent_priv->agent)) {
@@ -1138,7 +1226,7 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf,
if (ret < 0) {
/* Fail send request */
spin_lock_irqsave(&mad_agent_priv->lock, flags);
- list_del(&mad_send_wr->agent_list);
+ change_mad_state(mad_send_wr, IB_MAD_STATE_DONE);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
deref_mad_agent(mad_agent_priv);
goto error;
@@ -1746,7 +1834,7 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv,
*/
(is_direct(mad_hdr->mgmt_class) ||
rcv_has_same_gid(mad_agent_priv, wr, wc)))
- return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
+ return (wr->state != IB_MAD_STATE_CANCELED) ? wr : NULL;
}
/*
@@ -1765,7 +1853,7 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv,
(is_direct(mad_hdr->mgmt_class) ||
rcv_has_same_gid(mad_agent_priv, wr, wc)))
/* Verify request has not been canceled */
- return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
+ return (wr->state != IB_MAD_STATE_CANCELED) ? wr : NULL;
}
return NULL;
}
@@ -1773,9 +1861,10 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv,
void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr)
{
mad_send_wr->timeout = 0;
- if (mad_send_wr->refcount == 1)
- list_move_tail(&mad_send_wr->agent_list,
- &mad_send_wr->mad_agent_priv->done_list);
+ if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP)
+ change_mad_state(mad_send_wr, IB_MAD_STATE_DONE);
+ else
+ change_mad_state(mad_send_wr, IB_MAD_STATE_EARLY_RESP);
}
static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
@@ -1784,6 +1873,7 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
struct ib_mad_send_wr_private *mad_send_wr;
struct ib_mad_send_wc mad_send_wc;
unsigned long flags;
+ bool is_mad_done;
int ret;
INIT_LIST_HEAD(&mad_recv_wc->rmpp_list);
@@ -1832,6 +1922,7 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
}
} else {
ib_mark_mad_done(mad_send_wr);
+ is_mad_done = (mad_send_wr->state == IB_MAD_STATE_DONE);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
/* Defined behavior is to complete response before request */
@@ -1841,10 +1932,13 @@ static void ib_mad_complete_recv(struct ib_mad_agent_private *mad_agent_priv,
mad_recv_wc);
deref_mad_agent(mad_agent_priv);
- mad_send_wc.status = IB_WC_SUCCESS;
- mad_send_wc.vendor_err = 0;
- mad_send_wc.send_buf = &mad_send_wr->send_buf;
- ib_mad_complete_send_wr(mad_send_wr, &mad_send_wc);
+ if (is_mad_done) {
+ mad_send_wc.status = IB_WC_SUCCESS;
+ mad_send_wc.vendor_err = 0;
+ mad_send_wc.send_buf = &mad_send_wr->send_buf;
+ ib_mad_complete_send_wr(mad_send_wr,
+ &mad_send_wc);
+ }
}
} else {
mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, NULL,
@@ -2172,30 +2266,11 @@ static void adjust_timeout(struct ib_mad_agent_private *mad_agent_priv)
static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr)
{
struct ib_mad_agent_private *mad_agent_priv;
- struct ib_mad_send_wr_private *temp_mad_send_wr;
- struct list_head *list_item;
unsigned long delay;
mad_agent_priv = mad_send_wr->mad_agent_priv;
- list_del(&mad_send_wr->agent_list);
-
delay = mad_send_wr->timeout;
- mad_send_wr->timeout += jiffies;
-
- if (delay) {
- list_for_each_prev(list_item, &mad_agent_priv->wait_list) {
- temp_mad_send_wr = list_entry(list_item,
- struct ib_mad_send_wr_private,
- agent_list);
- if (time_after(mad_send_wr->timeout,
- temp_mad_send_wr->timeout))
- break;
- }
- } else {
- list_item = &mad_agent_priv->wait_list;
- }
-
- list_add(&mad_send_wr->agent_list, list_item);
+ change_mad_state(mad_send_wr, IB_MAD_STATE_WAIT_RESP);
/* Reschedule a work item if we have a shorter timeout */
if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list)
@@ -2229,27 +2304,20 @@ void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr,
} else
ret = IB_RMPP_RESULT_UNHANDLED;
- if (mad_send_wc->status != IB_WC_SUCCESS &&
- mad_send_wr->status == IB_WC_SUCCESS) {
- mad_send_wr->status = mad_send_wc->status;
- mad_send_wr->refcount -= (mad_send_wr->timeout > 0);
- }
-
- if (--mad_send_wr->refcount > 0) {
- if (mad_send_wr->refcount == 1 && mad_send_wr->timeout &&
- mad_send_wr->status == IB_WC_SUCCESS) {
- wait_for_response(mad_send_wr);
- }
+ if (mad_send_wr->state == IB_MAD_STATE_CANCELED)
+ mad_send_wc->status = IB_WC_WR_FLUSH_ERR;
+ else if (mad_send_wr->state == IB_MAD_STATE_SEND_START &&
+ mad_send_wr->timeout) {
+ wait_for_response(mad_send_wr);
goto done;
}
/* Remove send from MAD agent and notify client of completion */
- list_del(&mad_send_wr->agent_list);
+ if (mad_send_wr->state != IB_MAD_STATE_DONE)
+ change_mad_state(mad_send_wr, IB_MAD_STATE_DONE);
adjust_timeout(mad_agent_priv);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
- if (mad_send_wr->status != IB_WC_SUCCESS)
- mad_send_wc->status = mad_send_wr->status;
if (ret == IB_RMPP_RESULT_INTERNAL)
ib_rmpp_send_handler(mad_send_wc);
else
@@ -2407,12 +2475,8 @@ static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv)
spin_lock_irqsave(&mad_agent_priv->lock, flags);
list_for_each_entry_safe(mad_send_wr, temp_mad_send_wr,
- &mad_agent_priv->send_list, agent_list) {
- if (mad_send_wr->status == IB_WC_SUCCESS) {
- mad_send_wr->status = IB_WC_WR_FLUSH_ERR;
- mad_send_wr->refcount -= (mad_send_wr->timeout > 0);
- }
- }
+ &mad_agent_priv->send_list, agent_list)
+ change_mad_state(mad_send_wr, IB_MAD_STATE_CANCELED);
/* Empty wait list to prevent receives from finding a request */
list_splice_init(&mad_agent_priv->wait_list, &cancel_list);
@@ -2468,16 +2532,15 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms)
struct ib_mad_agent_private, agent);
spin_lock_irqsave(&mad_agent_priv->lock, flags);
mad_send_wr = find_send_wr(mad_agent_priv, send_buf);
- if (!mad_send_wr || mad_send_wr->status != IB_WC_SUCCESS) {
+ if (!mad_send_wr || mad_send_wr->state == IB_MAD_STATE_CANCELED) {
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
return -EINVAL;
}
- active = (!mad_send_wr->timeout || mad_send_wr->refcount > 1);
- if (!timeout_ms) {
- mad_send_wr->status = IB_WC_WR_FLUSH_ERR;
- mad_send_wr->refcount -= (mad_send_wr->timeout > 0);
- }
+ active = (mad_send_wr->state == IB_MAD_STATE_SEND_START ||
+ mad_send_wr->state == IB_MAD_STATE_EARLY_RESP);
+ if (!timeout_ms)
+ change_mad_state(mad_send_wr, IB_MAD_STATE_CANCELED);
mad_send_wr->send_buf.timeout_ms = timeout_ms;
if (active)
@@ -2606,26 +2669,43 @@ static int retry_send(struct ib_mad_send_wr_private *mad_send_wr)
} else
ret = ib_send_mad(mad_send_wr);
- if (!ret) {
- mad_send_wr->refcount++;
- list_add_tail(&mad_send_wr->agent_list,
- &mad_send_wr->mad_agent_priv->send_list);
- }
+ if (!ret)
+ change_mad_state(mad_send_wr, IB_MAD_STATE_SEND_START);
+
return ret;
}
-static void timeout_sends(struct work_struct *work)
+static void clear_mad_error_list(struct list_head *list,
+ enum ib_wc_status wc_status,
+ struct ib_mad_agent_private *mad_agent_priv)
{
struct ib_mad_send_wr_private *mad_send_wr, *n;
- struct ib_mad_agent_private *mad_agent_priv;
struct ib_mad_send_wc mad_send_wc;
- struct list_head local_list;
+
+ mad_send_wc.status = wc_status;
+ mad_send_wc.vendor_err = 0;
+
+ list_for_each_entry_safe(mad_send_wr, n, list, agent_list) {
+ mad_send_wc.send_buf = &mad_send_wr->send_buf;
+ mad_agent_priv->agent.send_handler(&mad_agent_priv->agent,
+ &mad_send_wc);
+ deref_mad_agent(mad_agent_priv);
+ }
+}
+
+static void timeout_sends(struct work_struct *work)
+{
+ struct ib_mad_send_wr_private *mad_send_wr;
+ struct ib_mad_agent_private *mad_agent_priv;
+ struct list_head timeout_list;
+ struct list_head cancel_list;
+ struct list_head *list_item;
unsigned long flags, delay;
mad_agent_priv = container_of(work, struct ib_mad_agent_private,
timed_work.work);
- mad_send_wc.vendor_err = 0;
- INIT_LIST_HEAD(&local_list);
+ INIT_LIST_HEAD(&timeout_list);
+ INIT_LIST_HEAD(&cancel_list);
spin_lock_irqsave(&mad_agent_priv->lock, flags);
while (!list_empty(&mad_agent_priv->wait_list)) {
@@ -2643,25 +2723,21 @@ static void timeout_sends(struct work_struct *work)
break;
}
- list_del_init(&mad_send_wr->agent_list);
- if (mad_send_wr->status == IB_WC_SUCCESS &&
- !retry_send(mad_send_wr))
+ if (mad_send_wr->state == IB_MAD_STATE_CANCELED)
+ list_item = &cancel_list;
+ else if (retry_send(mad_send_wr))
+ list_item = &timeout_list;
+ else
continue;
- list_add_tail(&mad_send_wr->agent_list, &local_list);
+ change_mad_state(mad_send_wr, IB_MAD_STATE_DONE);
+ list_add_tail(&mad_send_wr->agent_list, list_item);
}
- spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
- list_for_each_entry_safe(mad_send_wr, n, &local_list, agent_list) {
- if (mad_send_wr->status == IB_WC_SUCCESS)
- mad_send_wc.status = IB_WC_RESP_TIMEOUT_ERR;
- else
- mad_send_wc.status = mad_send_wr->status;
- mad_send_wc.send_buf = &mad_send_wr->send_buf;
- mad_agent_priv->agent.send_handler(&mad_agent_priv->agent,
- &mad_send_wc);
- deref_mad_agent(mad_agent_priv);
- }
+ spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
+ clear_mad_error_list(&timeout_list, IB_WC_RESP_TIMEOUT_ERR,
+ mad_agent_priv);
+ clear_mad_error_list(&cancel_list, IB_WC_WR_FLUSH_ERR, mad_agent_priv);
}
/*
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index 1b7445a6f6714..bfcf0a62fdd65 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -96,7 +96,6 @@ struct ib_mad_agent_private {
spinlock_t lock;
struct list_head send_list;
struct list_head wait_list;
- struct list_head done_list;
struct delayed_work timed_work;
unsigned long timeout;
struct list_head local_list;
@@ -118,6 +117,47 @@ struct ib_mad_snoop_private {
struct completion comp;
};
+enum ib_mad_state {
+ /*
+ * MAD was sent to the QP and is waiting for completion
+ * notification in send list.
+ */
+ IB_MAD_STATE_SEND_START = 1,
+ /*
+ * MAD send completed successfully, waiting for a response
+ * in wait list.
+ */
+ IB_MAD_STATE_WAIT_RESP,
+ /*
+ * Response came early, before send completion notification,
+ * in send list.
+ */
+ IB_MAD_STATE_EARLY_RESP,
+ /* MAD was canceled while in wait or send list */
+ IB_MAD_STATE_CANCELED,
+ /* MAD processing completed, MAD in no list */
+ IB_MAD_STATE_DONE
+};
+
+#define EXPECT_MAD_STATE(mad_send_wr, expected_state) \
+ { \
+ if (IS_ENABLED(CONFIG_LOCKDEP)) \
+ WARN_ON(mad_send_wr->state != expected_state); \
+ }
+#define EXPECT_MAD_STATE3(mad_send_wr, expected_state1, expected_state2, \
+ expected_state3) \
+ { \
+ if (IS_ENABLED(CONFIG_LOCKDEP)) \
+ WARN_ON(mad_send_wr->state != expected_state1 && \
+ mad_send_wr->state != expected_state2 && \
+ mad_send_wr->state != expected_state3); \
+ }
+#define NOT_EXPECT_MAD_STATE(mad_send_wr, wrong_state) \
+ { \
+ if (IS_ENABLED(CONFIG_LOCKDEP)) \
+ WARN_ON(mad_send_wr->state == wrong_state); \
+ }
+
struct ib_mad_send_wr_private {
struct ib_mad_list_head mad_list;
struct list_head agent_list;
@@ -132,8 +172,6 @@ struct ib_mad_send_wr_private {
int max_retries;
int retries_left;
int retry;
- int refcount;
- enum ib_wc_status status;
/* RMPP control */
struct list_head rmpp_list;
@@ -143,6 +181,8 @@ struct ib_mad_send_wr_private {
int seg_num;
int newwin;
int pad;
+
+ enum ib_mad_state state;
};
struct ib_mad_local_private {
@@ -222,4 +262,7 @@ void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr);
void ib_reset_mad_timeout(struct ib_mad_send_wr_private *mad_send_wr,
unsigned long timeout_ms);
+void change_mad_state(struct ib_mad_send_wr_private *mad_send_wr,
+ enum ib_mad_state new_state);
+
#endif /* __IB_MAD_PRIV_H__ */
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index 8af0619a39cda..a33842eac2215 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -608,16 +608,20 @@ static void abort_send(struct ib_mad_agent_private *agent,
goto out; /* Unmatched send */
if ((mad_send_wr->last_ack == mad_send_wr->send_buf.seg_count) ||
- (!mad_send_wr->timeout) || (mad_send_wr->status != IB_WC_SUCCESS))
+ (!mad_send_wr->timeout) ||
+ (mad_send_wr->state == IB_MAD_STATE_CANCELED))
goto out; /* Send is already done */
ib_mark_mad_done(mad_send_wr);
+ if (mad_send_wr->state == IB_MAD_STATE_DONE) {
+ spin_unlock_irqrestore(&agent->lock, flags);
+ wc.status = IB_WC_REM_ABORT_ERR;
+ wc.vendor_err = rmpp_status;
+ wc.send_buf = &mad_send_wr->send_buf;
+ ib_mad_complete_send_wr(mad_send_wr, &wc);
+ return;
+ }
spin_unlock_irqrestore(&agent->lock, flags);
-
- wc.status = IB_WC_REM_ABORT_ERR;
- wc.vendor_err = rmpp_status;
- wc.send_buf = &mad_send_wr->send_buf;
- ib_mad_complete_send_wr(mad_send_wr, &wc);
return;
out:
spin_unlock_irqrestore(&agent->lock, flags);
@@ -684,7 +688,8 @@ static void process_rmpp_ack(struct ib_mad_agent_private *agent,
}
if ((mad_send_wr->last_ack == mad_send_wr->send_buf.seg_count) ||
- (!mad_send_wr->timeout) || (mad_send_wr->status != IB_WC_SUCCESS))
+ (!mad_send_wr->timeout) ||
+ (mad_send_wr->state == IB_MAD_STATE_CANCELED))
goto out; /* Send is already done */
if (seg_num > mad_send_wr->send_buf.seg_count ||
@@ -709,21 +714,24 @@ static void process_rmpp_ack(struct ib_mad_agent_private *agent,
struct ib_mad_send_wc wc;
ib_mark_mad_done(mad_send_wr);
+ if (mad_send_wr->state == IB_MAD_STATE_DONE) {
+ spin_unlock_irqrestore(&agent->lock, flags);
+ wc.status = IB_WC_SUCCESS;
+ wc.vendor_err = 0;
+ wc.send_buf = &mad_send_wr->send_buf;
+ ib_mad_complete_send_wr(mad_send_wr, &wc);
+ return;
+ }
spin_unlock_irqrestore(&agent->lock, flags);
-
- wc.status = IB_WC_SUCCESS;
- wc.vendor_err = 0;
- wc.send_buf = &mad_send_wr->send_buf;
- ib_mad_complete_send_wr(mad_send_wr, &wc);
return;
}
- if (mad_send_wr->refcount == 1)
+ if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP)
ib_reset_mad_timeout(mad_send_wr,
mad_send_wr->send_buf.timeout_ms);
spin_unlock_irqrestore(&agent->lock, flags);
ack_ds_ack(agent, mad_recv_wc);
return;
- } else if (mad_send_wr->refcount == 1 &&
+ } else if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP &&
mad_send_wr->seg_num < mad_send_wr->newwin &&
mad_send_wr->seg_num < mad_send_wr->send_buf.seg_count) {
/* Send failure will just result in a timeout/retry */
@@ -731,7 +739,7 @@ static void process_rmpp_ack(struct ib_mad_agent_private *agent,
if (ret)
goto out;
- mad_send_wr->refcount++;
+ change_mad_state(mad_send_wr, IB_MAD_STATE_SEND_START);
list_move_tail(&mad_send_wr->agent_list,
&mad_send_wr->mad_agent_priv->send_list);
}
@@ -890,7 +898,6 @@ int ib_send_rmpp_mad(struct ib_mad_send_wr_private *mad_send_wr)
mad_send_wr->newwin = init_newwin(mad_send_wr);
/* We need to wait for the final ACK even if there isn't a response */
- mad_send_wr->refcount += (mad_send_wr->timeout == 0);
ret = send_next_seg(mad_send_wr);
if (!ret)
return IB_RMPP_RESULT_CONSUMED;
@@ -912,7 +919,7 @@ int ib_process_rmpp_send_wc(struct ib_mad_send_wr_private *mad_send_wr,
return IB_RMPP_RESULT_INTERNAL; /* ACK, STOP, or ABORT */
if (mad_send_wc->status != IB_WC_SUCCESS ||
- mad_send_wr->status != IB_WC_SUCCESS)
+ mad_send_wr->state == IB_MAD_STATE_CANCELED)
return IB_RMPP_RESULT_PROCESSED; /* Canceled or send error */
if (!mad_send_wr->timeout)
--
2.47.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v1 2/2] IB/mad: Add flow control for solicited MADs
2025-01-07 14:02 [PATCH rdma-next v1 0/2] IB/mad: Add Flow Control for Solicited MADs Leon Romanovsky
2025-01-07 14:02 ` [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer Leon Romanovsky
@ 2025-01-07 14:02 ` Leon Romanovsky
1 sibling, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2025-01-07 14:02 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Or Har-Toov, linux-rdma, Maher Sanalla
From: Or Har-Toov <ohartoov@nvidia.com>
Currently, MADs sent via an agent are being forwarded directly to the
corresponding MAD QP layer.
MADs with a timeout value set and requiring a response (solicited MADs)
will be resent if the timeout expires without receiving a response.
In a congested subnet, flooding MAD QP layer with more solicited send
requests from the agent will only worsen the situation by triggering
more timeouts and therefore more retries.
Thus, add flow control for non-user solicited MADs to block agents from
issuing new solicited MAD requests to the MAD QP until outstanding
requests are completed and the MAD QP is ready to process additional
requests.
Therefore, keep track of the total outstanding solicited MAD work
requests in send or wait list. The number of outstanding send WRs
will be limited by a fraction of the RQ size, and any new send WR
that exceeds that limit will be held in a backlog list.
Backlog MADs will be forwarded to agent send list only once the total
number of outstanding send WRs falls below the limit.
Unsolicited MADs, RMPP MADs and MADs which are not SA, SMP or CM are
not subject to this flow control mechanism and will not be affected by
this change.
For this purpose, a new state is introduced:
- 'IB_MAD_STATE_QUEUED': MAD is in backlog list
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/mad.c | 173 +++++++++++++++++++++++++++--
drivers/infiniband/core/mad_priv.h | 17 ++-
2 files changed, 182 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 49da1d246f72a..3dc31640f53af 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -210,6 +210,29 @@ int ib_response_mad(const struct ib_mad_hdr *hdr)
}
EXPORT_SYMBOL(ib_response_mad);
+#define SOL_FC_MAX_DEFAULT_FRAC 4
+#define SOL_FC_MAX_SA_FRAC 32
+
+static int get_sol_fc_max_outstanding(struct ib_mad_reg_req *mad_reg_req)
+{
+ if (!mad_reg_req)
+ /* Send only agent */
+ return mad_recvq_size / SOL_FC_MAX_DEFAULT_FRAC;
+
+ switch (mad_reg_req->mgmt_class) {
+ case IB_MGMT_CLASS_CM:
+ return mad_recvq_size / SOL_FC_MAX_DEFAULT_FRAC;
+ case IB_MGMT_CLASS_SUBN_ADM:
+ return mad_recvq_size / SOL_FC_MAX_SA_FRAC;
+ case IB_MGMT_CLASS_SUBN_LID_ROUTED:
+ case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
+ return min(mad_recvq_size, IB_MAD_QP_RECV_SIZE) /
+ SOL_FC_MAX_DEFAULT_FRAC;
+ default:
+ return 0;
+ }
+}
+
/*
* ib_register_mad_agent - Register to send/receive MADs
*
@@ -392,12 +415,16 @@ struct ib_mad_agent *ib_register_mad_agent(struct ib_device *device,
INIT_LIST_HEAD(&mad_agent_priv->send_list);
INIT_LIST_HEAD(&mad_agent_priv->wait_list);
INIT_LIST_HEAD(&mad_agent_priv->rmpp_list);
+ INIT_LIST_HEAD(&mad_agent_priv->backlog_list);
INIT_DELAYED_WORK(&mad_agent_priv->timed_work, timeout_sends);
INIT_LIST_HEAD(&mad_agent_priv->local_list);
INIT_WORK(&mad_agent_priv->local_work, local_completions);
refcount_set(&mad_agent_priv->refcount, 1);
init_completion(&mad_agent_priv->comp);
-
+ mad_agent_priv->sol_fc_send_count = 0;
+ mad_agent_priv->sol_fc_wait_count = 0;
+ mad_agent_priv->sol_fc_max =
+ get_sol_fc_max_outstanding(mad_reg_req);
ret2 = ib_mad_agent_security_setup(&mad_agent_priv->agent, qp_type);
if (ret2) {
ret = ERR_PTR(ret2);
@@ -1054,6 +1081,13 @@ int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr)
return ret;
}
+static void handle_queued_state(struct ib_mad_send_wr_private *mad_send_wr,
+ struct ib_mad_agent_private *mad_agent_priv)
+{
+ NOT_EXPECT_MAD_STATE(mad_send_wr, 0);
+ list_add_tail(&mad_send_wr->agent_list, &mad_agent_priv->backlog_list);
+}
+
static void handle_send_state(struct ib_mad_send_wr_private *mad_send_wr,
struct ib_mad_agent_private *mad_agent_priv)
{
@@ -1061,10 +1095,17 @@ static void handle_send_state(struct ib_mad_send_wr_private *mad_send_wr,
list_add_tail(&mad_send_wr->agent_list,
&mad_agent_priv->send_list);
} else {
- EXPECT_MAD_STATE(mad_send_wr, IB_MAD_STATE_WAIT_RESP);
+ EXPECT_MAD_STATE2(mad_send_wr, IB_MAD_STATE_QUEUED,
+ IB_MAD_STATE_WAIT_RESP);
list_move_tail(&mad_send_wr->agent_list,
&mad_agent_priv->send_list);
}
+
+ if (mad_send_wr->is_solicited_fc) {
+ if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP)
+ mad_agent_priv->sol_fc_wait_count--;
+ mad_agent_priv->sol_fc_send_count++;
+ }
}
static void handle_wait_state(struct ib_mad_send_wr_private *mad_send_wr,
@@ -1076,8 +1117,13 @@ static void handle_wait_state(struct ib_mad_send_wr_private *mad_send_wr,
EXPECT_MAD_STATE3(mad_send_wr, IB_MAD_STATE_SEND_START,
IB_MAD_STATE_WAIT_RESP, IB_MAD_STATE_CANCELED);
- list_del_init(&mad_send_wr->agent_list);
+ if (mad_send_wr->state == IB_MAD_STATE_SEND_START &&
+ mad_send_wr->is_solicited_fc) {
+ mad_agent_priv->sol_fc_send_count--;
+ mad_agent_priv->sol_fc_wait_count++;
+ }
+ list_del_init(&mad_send_wr->agent_list);
delay = mad_send_wr->timeout;
mad_send_wr->timeout += jiffies;
@@ -1103,17 +1149,32 @@ static void handle_early_resp_state(struct ib_mad_send_wr_private *mad_send_wr,
struct ib_mad_agent_private *mad_agent_priv)
{
EXPECT_MAD_STATE(mad_send_wr, IB_MAD_STATE_SEND_START);
+ mad_agent_priv->sol_fc_send_count -= mad_send_wr->is_solicited_fc;
}
static void handle_canceled_state(struct ib_mad_send_wr_private *mad_send_wr,
struct ib_mad_agent_private *mad_agent_priv)
{
NOT_EXPECT_MAD_STATE(mad_send_wr, IB_MAD_STATE_DONE);
+ if (mad_send_wr->is_solicited_fc) {
+ if (mad_send_wr->state == IB_MAD_STATE_SEND_START)
+ mad_agent_priv->sol_fc_send_count--;
+ else if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP)
+ mad_agent_priv->sol_fc_wait_count--;
+ }
}
static void handle_done_state(struct ib_mad_send_wr_private *mad_send_wr,
struct ib_mad_agent_private *mad_agent_priv)
{
+ NOT_EXPECT_MAD_STATE(mad_send_wr, IB_MAD_STATE_QUEUED);
+ if (mad_send_wr->is_solicited_fc) {
+ if (mad_send_wr->state == IB_MAD_STATE_SEND_START)
+ mad_agent_priv->sol_fc_send_count--;
+ else if (mad_send_wr->state == IB_MAD_STATE_WAIT_RESP)
+ mad_agent_priv->sol_fc_wait_count--;
+ }
+
list_del_init(&mad_send_wr->agent_list);
}
@@ -1124,6 +1185,9 @@ void change_mad_state(struct ib_mad_send_wr_private *mad_send_wr,
mad_send_wr->mad_agent_priv;
switch (new_state) {
+ case IB_MAD_STATE_QUEUED:
+ handle_queued_state(mad_send_wr, mad_agent_priv);
+ break;
case IB_MAD_STATE_SEND_START:
handle_send_state(mad_send_wr, mad_agent_priv);
break;
@@ -1146,6 +1210,43 @@ void change_mad_state(struct ib_mad_send_wr_private *mad_send_wr,
mad_send_wr->state = new_state;
}
+static bool is_solicited_fc_mad(struct ib_mad_send_wr_private *mad_send_wr)
+{
+ struct ib_rmpp_mad *rmpp_mad;
+ u8 mgmt_class;
+
+ if (!mad_send_wr->timeout)
+ return 0;
+
+ rmpp_mad = mad_send_wr->send_buf.mad;
+ if (mad_send_wr->mad_agent_priv->agent.rmpp_version &&
+ (ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE))
+ return 0;
+
+ mgmt_class =
+ ((struct ib_mad_hdr *)mad_send_wr->send_buf.mad)->mgmt_class;
+ return mgmt_class == IB_MGMT_CLASS_CM ||
+ mgmt_class == IB_MGMT_CLASS_SUBN_ADM ||
+ mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED ||
+ mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE;
+}
+
+static bool mad_is_for_backlog(struct ib_mad_send_wr_private *mad_send_wr)
+{
+ struct ib_mad_agent_private *mad_agent_priv =
+ mad_send_wr->mad_agent_priv;
+
+ if (!mad_send_wr->is_solicited_fc || !mad_agent_priv->sol_fc_max)
+ return false;
+
+ if (!list_empty(&mad_agent_priv->backlog_list))
+ return true;
+
+ return mad_agent_priv->sol_fc_send_count +
+ mad_agent_priv->sol_fc_wait_count >=
+ mad_agent_priv->sol_fc_max;
+}
+
/*
* ib_post_send_mad - Posts MAD(s) to the send queue of the QP associated
* with the registered client
@@ -1214,6 +1315,13 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf,
/* Reference MAD agent until send completes */
refcount_inc(&mad_agent_priv->refcount);
spin_lock_irqsave(&mad_agent_priv->lock, flags);
+ mad_send_wr->is_solicited_fc = is_solicited_fc_mad(mad_send_wr);
+ if (mad_is_for_backlog(mad_send_wr)) {
+ change_mad_state(mad_send_wr, IB_MAD_STATE_QUEUED);
+ spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
+ return 0;
+ }
+
change_mad_state(mad_send_wr, IB_MAD_STATE_SEND_START);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
@@ -1858,6 +1966,42 @@ ib_find_send_mad(const struct ib_mad_agent_private *mad_agent_priv,
return NULL;
}
+static void
+process_mad_from_backlog(struct ib_mad_agent_private *mad_agent_priv)
+{
+ struct ib_mad_send_wr_private *mad_send_wr;
+ struct ib_mad_send_wc mad_send_wc = {};
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&mad_agent_priv->lock, flags);
+ while (!list_empty(&mad_agent_priv->backlog_list) &&
+ (mad_agent_priv->sol_fc_send_count +
+ mad_agent_priv->sol_fc_wait_count <
+ mad_agent_priv->sol_fc_max)) {
+ mad_send_wr = list_entry(mad_agent_priv->backlog_list.next,
+ struct ib_mad_send_wr_private,
+ agent_list);
+ change_mad_state(mad_send_wr, IB_MAD_STATE_SEND_START);
+ spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
+ ret = ib_send_mad(mad_send_wr);
+ if (!ret)
+ return;
+
+ spin_lock_irqsave(&mad_agent_priv->lock, flags);
+ deref_mad_agent(mad_agent_priv);
+ change_mad_state(mad_send_wr, IB_MAD_STATE_DONE);
+ spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
+ mad_send_wc.send_buf = &mad_send_wr->send_buf;
+ mad_send_wc.status = IB_WC_LOC_QP_OP_ERR;
+ mad_agent_priv->agent.send_handler(&mad_agent_priv->agent,
+ &mad_send_wc);
+ spin_lock_irqsave(&mad_agent_priv->lock, flags);
+ }
+
+ spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
+}
+
void ib_mark_mad_done(struct ib_mad_send_wr_private *mad_send_wr)
{
mad_send_wr->timeout = 0;
@@ -2318,11 +2462,14 @@ void ib_mad_complete_send_wr(struct ib_mad_send_wr_private *mad_send_wr,
adjust_timeout(mad_agent_priv);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
- if (ret == IB_RMPP_RESULT_INTERNAL)
+ if (ret == IB_RMPP_RESULT_INTERNAL) {
ib_rmpp_send_handler(mad_send_wc);
- else
+ } else {
+ if (mad_send_wr->is_solicited_fc)
+ process_mad_from_backlog(mad_agent_priv);
mad_agent_priv->agent.send_handler(&mad_agent_priv->agent,
mad_send_wc);
+ }
/* Release reference on agent taken when sending */
deref_mad_agent(mad_agent_priv);
@@ -2480,6 +2627,8 @@ static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv)
/* Empty wait list to prevent receives from finding a request */
list_splice_init(&mad_agent_priv->wait_list, &cancel_list);
+ mad_agent_priv->sol_fc_wait_count = 0;
+ list_splice_tail_init(&mad_agent_priv->backlog_list, &cancel_list);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
/* Report all cancelled requests */
@@ -2515,6 +2664,13 @@ find_send_wr(struct ib_mad_agent_private *mad_agent_priv,
&mad_send_wr->send_buf == send_buf)
return mad_send_wr;
}
+
+ list_for_each_entry(mad_send_wr, &mad_agent_priv->backlog_list,
+ agent_list) {
+ if (&mad_send_wr->send_buf == send_buf)
+ return mad_send_wr;
+ }
+
return NULL;
}
@@ -2537,8 +2693,9 @@ int ib_modify_mad(struct ib_mad_send_buf *send_buf, u32 timeout_ms)
return -EINVAL;
}
- active = (mad_send_wr->state == IB_MAD_STATE_SEND_START ||
- mad_send_wr->state == IB_MAD_STATE_EARLY_RESP);
+ active = ((mad_send_wr->state == IB_MAD_STATE_SEND_START) ||
+ (mad_send_wr->state == IB_MAD_STATE_EARLY_RESP) ||
+ (mad_send_wr->state == IB_MAD_STATE_QUEUED && timeout_ms));
if (!timeout_ms)
change_mad_state(mad_send_wr, IB_MAD_STATE_CANCELED);
@@ -2687,6 +2844,8 @@ static void clear_mad_error_list(struct list_head *list,
list_for_each_entry_safe(mad_send_wr, n, list, agent_list) {
mad_send_wc.send_buf = &mad_send_wr->send_buf;
+ if (mad_send_wr->is_solicited_fc)
+ process_mad_from_backlog(mad_agent_priv);
mad_agent_priv->agent.send_handler(&mad_agent_priv->agent,
&mad_send_wc);
deref_mad_agent(mad_agent_priv);
diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h
index bfcf0a62fdd65..52655f24e16d4 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -95,12 +95,16 @@ struct ib_mad_agent_private {
spinlock_t lock;
struct list_head send_list;
+ unsigned int sol_fc_send_count;
struct list_head wait_list;
+ unsigned int sol_fc_wait_count;
struct delayed_work timed_work;
unsigned long timeout;
struct list_head local_list;
struct work_struct local_work;
struct list_head rmpp_list;
+ unsigned int sol_fc_max;
+ struct list_head backlog_list;
refcount_t refcount;
union {
@@ -118,11 +122,13 @@ struct ib_mad_snoop_private {
};
enum ib_mad_state {
+ /* MAD is in backlog list */
+ IB_MAD_STATE_QUEUED = 1,
/*
* MAD was sent to the QP and is waiting for completion
* notification in send list.
*/
- IB_MAD_STATE_SEND_START = 1,
+ IB_MAD_STATE_SEND_START,
/*
* MAD send completed successfully, waiting for a response
* in wait list.
@@ -144,6 +150,12 @@ enum ib_mad_state {
if (IS_ENABLED(CONFIG_LOCKDEP)) \
WARN_ON(mad_send_wr->state != expected_state); \
}
+#define EXPECT_MAD_STATE2(mad_send_wr, expected_state1, expected_state2) \
+ { \
+ if (IS_ENABLED(CONFIG_LOCKDEP)) \
+ WARN_ON(mad_send_wr->state != expected_state1 && \
+ mad_send_wr->state != expected_state2); \
+ }
#define EXPECT_MAD_STATE3(mad_send_wr, expected_state1, expected_state2, \
expected_state3) \
{ \
@@ -183,6 +195,9 @@ struct ib_mad_send_wr_private {
int pad;
enum ib_mad_state state;
+
+ /* Solicited MAD flow control */
+ bool is_solicited_fc;
};
struct ib_mad_local_private {
--
2.47.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer
2025-01-07 14:02 ` [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer Leon Romanovsky
@ 2025-01-14 19:42 ` Jason Gunthorpe
2025-01-19 8:26 ` Leon Romanovsky
0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2025-01-14 19:42 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Or Har-Toov, linux-rdma, Maher Sanalla
On Tue, Jan 07, 2025 at 04:02:12PM +0200, Leon Romanovsky wrote:
> +static void handle_send_state(struct ib_mad_send_wr_private *mad_send_wr,
> + struct ib_mad_agent_private *mad_agent_priv)
> +{
> + if (!mad_send_wr->state) {
What is this doing? state is an enum, what is !state supposed to be? 0
is not a valid value in the enum.
> @@ -1118,15 +1209,12 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf,
> mad_send_wr->max_retries = send_buf->retries;
> mad_send_wr->retries_left = send_buf->retries;
> send_buf->retries = 0;
> - /* Reference for work request to QP + response */
> - mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0);
> - mad_send_wr->status = IB_WC_SUCCESS;
> + mad_send_wr->state = 0;
Same, enums should not be assigned to constants. If you want another
state you need another IB_MAD_STATE value and use it here and above.
> +#define EXPECT_MAD_STATE(mad_send_wr, expected_state) \
> + { \
> + if (IS_ENABLED(CONFIG_LOCKDEP)) \
> + WARN_ON(mad_send_wr->state != expected_state); \
> + }
> +#define EXPECT_MAD_STATE3(mad_send_wr, expected_state1, expected_state2, \
> + expected_state3) \
> + { \
> + if (IS_ENABLED(CONFIG_LOCKDEP)) \
> + WARN_ON(mad_send_wr->state != expected_state1 && \
> + mad_send_wr->state != expected_state2 && \
> + mad_send_wr->state != expected_state3); \
> + }
> +#define NOT_EXPECT_MAD_STATE(mad_send_wr, wrong_state) \
> + { \
> + if (IS_ENABLED(CONFIG_LOCKDEP)) \
> + WARN_ON(mad_send_wr->state == wrong_state); \
> + }
These could all be static inlines, otherwise at least
mad_send_wr->state needs brackets (mad_send_wr)->state
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer
2025-01-14 19:42 ` Jason Gunthorpe
@ 2025-01-19 8:26 ` Leon Romanovsky
2025-01-20 13:47 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2025-01-19 8:26 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Or Har-Toov, linux-rdma, Maher Sanalla
On Tue, Jan 14, 2025 at 03:42:08PM -0400, Jason Gunthorpe wrote:
> On Tue, Jan 07, 2025 at 04:02:12PM +0200, Leon Romanovsky wrote:
> > +static void handle_send_state(struct ib_mad_send_wr_private *mad_send_wr,
> > + struct ib_mad_agent_private *mad_agent_priv)
> > +{
> > + if (!mad_send_wr->state) {
>
> What is this doing? state is an enum, what is !state supposed to be? 0
> is not a valid value in the enum.
>
> > @@ -1118,15 +1209,12 @@ int ib_post_send_mad(struct ib_mad_send_buf *send_buf,
> > mad_send_wr->max_retries = send_buf->retries;
> > mad_send_wr->retries_left = send_buf->retries;
> > send_buf->retries = 0;
> > - /* Reference for work request to QP + response */
> > - mad_send_wr->refcount = 1 + (mad_send_wr->timeout > 0);
> > - mad_send_wr->status = IB_WC_SUCCESS;
> > + mad_send_wr->state = 0;
>
> Same, enums should not be assigned to constants. If you want another
> state you need another IB_MAD_STATE value and use it here and above.
Sure, will change.
>
> > +#define EXPECT_MAD_STATE(mad_send_wr, expected_state) \
> > + { \
> > + if (IS_ENABLED(CONFIG_LOCKDEP)) \
> > + WARN_ON(mad_send_wr->state != expected_state); \
> > + }
> > +#define EXPECT_MAD_STATE3(mad_send_wr, expected_state1, expected_state2, \
> > + expected_state3) \
> > + { \
> > + if (IS_ENABLED(CONFIG_LOCKDEP)) \
> > + WARN_ON(mad_send_wr->state != expected_state1 && \
> > + mad_send_wr->state != expected_state2 && \
> > + mad_send_wr->state != expected_state3); \
> > + }
> > +#define NOT_EXPECT_MAD_STATE(mad_send_wr, wrong_state) \
> > + { \
> > + if (IS_ENABLED(CONFIG_LOCKDEP)) \
> > + WARN_ON(mad_send_wr->state == wrong_state); \
> > + }
>
> These could all be static inlines, otherwise at least
> mad_send_wr->state needs brackets (mad_send_wr)->state
I don't think that it is worth to have functions here.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer
2025-01-19 8:26 ` Leon Romanovsky
@ 2025-01-20 13:47 ` Jason Gunthorpe
0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2025-01-20 13:47 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: Or Har-Toov, linux-rdma, Maher Sanalla
On Sun, Jan 19, 2025 at 10:26:24AM +0200, Leon Romanovsky wrote:
> > > +#define NOT_EXPECT_MAD_STATE(mad_send_wr, wrong_state) \
> > > + { \
> > > + if (IS_ENABLED(CONFIG_LOCKDEP)) \
> > > + WARN_ON(mad_send_wr->state == wrong_state); \
> > > + }
> >
> > These could all be static inlines, otherwise at least
> > mad_send_wr->state needs brackets (mad_send_wr)->state
>
> I don't think that it is worth to have functions here.
You should always prefer functions to function-like-macros if
possible..
The expected_state needs () as well
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-01-20 13:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07 14:02 [PATCH rdma-next v1 0/2] IB/mad: Add Flow Control for Solicited MADs Leon Romanovsky
2025-01-07 14:02 ` [PATCH rdma-next v1 1/2] IB/mad: Add state machine to MAD layer Leon Romanovsky
2025-01-14 19:42 ` Jason Gunthorpe
2025-01-19 8:26 ` Leon Romanovsky
2025-01-20 13:47 ` Jason Gunthorpe
2025-01-07 14:02 ` [PATCH rdma-next v1 2/2] IB/mad: Add flow control for solicited MADs Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).