From: Tariq Toukan <tariqt@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
"Andrew Lunn" <andrew+netdev@lunn.ch>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Moshe Shemesh <moshe@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Vlad Dogaru <vdogaru@nvidia.com>,
Yevgeny Kliteynik <kliteyn@nvidia.com>,
Gal Pressman <gal@nvidia.com>
Subject: [PATCH net-next 09/10] net/mlx5: HWS, rework rehash loop
Date: Sun, 11 May 2025 22:38:09 +0300 [thread overview]
Message-ID: <1746992290-568936-10-git-send-email-tariqt@nvidia.com> (raw)
In-Reply-To: <1746992290-568936-1-git-send-email-tariqt@nvidia.com>
From: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reworking the rehash loop - simplifying the code and making it less
error prone:
- Instead of doing round-robin on all the queues with batch of rules in
each cycle, just go over all the queues and move all the rules that
belong to this queue.
- If at some stage of moving the rule we get a failure (which should
not happen), this can't be rolled back. So instead of aborting
rehash and leaving the matcher in a broken state, allow the loop
to continue: attempt to move the rest of the rules and delete the
old matcher. A rule that failed to move to a new matcher will loose
its match STE once the rehash is completed and the old matcher is
deleted, so the rule won't match any traffic any more. This rule's
packets will fall back to the steering pipeline w/o HW offload.
Rehash procedure will return an error, which will cause the rule
insertion to fail for the rule that started this whole rehash.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/steering/hws/bwc.c | 127 +++++++-----------
1 file changed, 52 insertions(+), 75 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
index 456fac895f5e..9e057f808ea5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
@@ -610,95 +610,69 @@ hws_bwc_matcher_find_at(struct mlx5hws_bwc_matcher *bwc_matcher,
static int hws_bwc_matcher_move_all_simple(struct mlx5hws_bwc_matcher *bwc_matcher)
{
+ bool move_error = false, poll_error = false, drain_error = false;
struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx;
+ struct mlx5hws_matcher *matcher = bwc_matcher->matcher;
u16 bwc_queues = mlx5hws_bwc_queues(ctx);
- struct mlx5hws_bwc_rule **bwc_rules;
struct mlx5hws_rule_attr rule_attr;
- u32 *pending_rules;
- int i, j, ret = 0;
- bool all_done;
- u16 burst_th;
+ struct mlx5hws_bwc_rule *bwc_rule;
+ struct mlx5hws_send_engine *queue;
+ struct list_head *rules_list;
+ u32 pending_rules;
+ int i, ret = 0;
mlx5hws_bwc_rule_fill_attr(bwc_matcher, 0, 0, &rule_attr);
- pending_rules = kcalloc(bwc_queues, sizeof(*pending_rules), GFP_KERNEL);
- if (!pending_rules)
- return -ENOMEM;
-
- bwc_rules = kcalloc(bwc_queues, sizeof(*bwc_rules), GFP_KERNEL);
- if (!bwc_rules) {
- ret = -ENOMEM;
- goto free_pending_rules;
- }
-
for (i = 0; i < bwc_queues; i++) {
if (list_empty(&bwc_matcher->rules[i]))
- bwc_rules[i] = NULL;
- else
- bwc_rules[i] = list_first_entry(&bwc_matcher->rules[i],
- struct mlx5hws_bwc_rule,
- list_node);
- }
+ continue;
- do {
- all_done = true;
+ pending_rules = 0;
+ rule_attr.queue_id = mlx5hws_bwc_get_queue_id(ctx, i);
+ rules_list = &bwc_matcher->rules[i];
- for (i = 0; i < bwc_queues; i++) {
- rule_attr.queue_id = mlx5hws_bwc_get_queue_id(ctx, i);
- burst_th = hws_bwc_get_burst_th(ctx, rule_attr.queue_id);
-
- for (j = 0; j < burst_th && bwc_rules[i]; j++) {
- rule_attr.burst = !!((j + 1) % burst_th);
- ret = mlx5hws_matcher_resize_rule_move(bwc_matcher->matcher,
- bwc_rules[i]->rule,
- &rule_attr);
- if (unlikely(ret)) {
- mlx5hws_err(ctx,
- "Moving BWC rule failed during rehash (%d)\n",
- ret);
- goto free_bwc_rules;
- }
+ list_for_each_entry(bwc_rule, rules_list, list_node) {
+ ret = mlx5hws_matcher_resize_rule_move(matcher,
+ bwc_rule->rule,
+ &rule_attr);
+ if (unlikely(ret && !move_error)) {
+ mlx5hws_err(ctx,
+ "Moving BWC rule: move failed (%d), attempting to move rest of the rules\n",
+ ret);
+ move_error = true;
+ }
- all_done = false;
- pending_rules[i]++;
- bwc_rules[i] = list_is_last(&bwc_rules[i]->list_node,
- &bwc_matcher->rules[i]) ?
- NULL : list_next_entry(bwc_rules[i], list_node);
-
- ret = mlx5hws_bwc_queue_poll(ctx,
- rule_attr.queue_id,
- &pending_rules[i],
- false);
- if (unlikely(ret)) {
- mlx5hws_err(ctx,
- "Moving BWC rule failed during rehash (%d)\n",
- ret);
- goto free_bwc_rules;
- }
+ pending_rules++;
+ ret = mlx5hws_bwc_queue_poll(ctx,
+ rule_attr.queue_id,
+ &pending_rules,
+ false);
+ if (unlikely(ret && !poll_error)) {
+ mlx5hws_err(ctx,
+ "Moving BWC rule: poll failed (%d), attempting to move rest of the rules\n",
+ ret);
+ poll_error = true;
}
}
- } while (!all_done);
-
- /* drain all the bwc queues */
- for (i = 0; i < bwc_queues; i++) {
- if (pending_rules[i]) {
- u16 queue_id = mlx5hws_bwc_get_queue_id(ctx, i);
- mlx5hws_send_engine_flush_queue(&ctx->send_queue[queue_id]);
- ret = mlx5hws_bwc_queue_poll(ctx, queue_id,
- &pending_rules[i], true);
- if (unlikely(ret)) {
+ if (pending_rules) {
+ queue = &ctx->send_queue[rule_attr.queue_id];
+ mlx5hws_send_engine_flush_queue(queue);
+ ret = mlx5hws_bwc_queue_poll(ctx,
+ rule_attr.queue_id,
+ &pending_rules,
+ true);
+ if (unlikely(ret && !drain_error)) {
mlx5hws_err(ctx,
- "Moving BWC rule failed during rehash (%d)\n", ret);
- goto free_bwc_rules;
+ "Moving BWC rule: drain failed (%d), attempting to move rest of the rules\n",
+ ret);
+ drain_error = true;
}
}
}
-free_bwc_rules:
- kfree(bwc_rules);
-free_pending_rules:
- kfree(pending_rules);
+ if (move_error || poll_error || drain_error)
+ ret = -EINVAL;
return ret;
}
@@ -742,15 +716,18 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher)
}
ret = hws_bwc_matcher_move_all(bwc_matcher);
- if (ret) {
- mlx5hws_err(ctx, "Rehash error: moving rules failed\n");
- return -ENOMEM;
- }
+ if (ret)
+ mlx5hws_err(ctx, "Rehash error: moving rules failed, attempting to remove the old matcher\n");
+
+ /* Error during rehash can't be rolled back.
+ * The best option here is to allow the rehash to complete and remove
+ * the old matcher - can't leave the matcher in the 'in_resize' state.
+ */
bwc_matcher->matcher = new_matcher;
mlx5hws_matcher_destroy(old_matcher);
- return 0;
+ return ret;
}
static int
--
2.31.1
next prev parent reply other threads:[~2025-05-11 19:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-11 19:38 [PATCH net-next 00/10] net/mlx5: HWS, Complex Matchers and rehash mechanism fixes Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 01/10] net/mlx5: HWS, expose function mlx5hws_table_ft_set_next_ft in header Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 02/10] net/mlx5: HWS, add definer function to get field name str Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 03/10] net/mlx5: HWS, expose polling function in header file Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 04/10] net/mlx5: HWS, introduce isolated matchers Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 05/10] net/mlx5: HWS, support complex matchers Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 06/10] net/mlx5: HWS, force rehash when rule insertion failed Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 07/10] net/mlx5: HWS, fix counting of rules in the matcher Tariq Toukan
2025-05-11 19:38 ` [PATCH net-next 08/10] net/mlx5: HWS, fix redundant extension of action templates Tariq Toukan
2025-05-11 19:38 ` Tariq Toukan [this message]
2025-05-11 19:38 ` [PATCH net-next 10/10] net/mlx5: HWS, dump bad completion details Tariq Toukan
2025-05-13 22:50 ` [PATCH net-next 00/10] net/mlx5: HWS, Complex Matchers and rehash mechanism fixes patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1746992290-568936-10-git-send-email-tariqt@nvidia.com \
--to=tariqt@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=kliteyn@nvidia.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=vdogaru@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox