From: saeed@kernel.org
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, Eran Ben Elisha <eranbe@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Moshe Shemesh <moshe@nvidia.com>
Subject: [net 05/15] net/mlx5: Add retry mechanism to the command entry index allocation
Date: Wed, 30 Sep 2020 19:05:06 -0700 [thread overview]
Message-ID: <20201001020516.41217-6-saeed@kernel.org> (raw)
In-Reply-To: <20201001020516.41217-1-saeed@kernel.org>
From: Eran Ben Elisha <eranbe@nvidia.com>
It is possible that new command entry index allocation will temporarily
fail. The new command holds the semaphore, so it means that a free entry
should be ready soon. Add one second retry mechanism before returning an
error.
Patch "net/mlx5: Avoid possible free of command entry while timeout comp
handler" increase the possibility to bump into this temporarily failure
as it delays the entry index release for non-callback commands.
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 65ae6ef2039e..4b54c9241fd7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -883,6 +883,25 @@ static bool opcode_allowed(struct mlx5_cmd *cmd, u16 opcode)
return cmd->allowed_opcode == opcode;
}
+static int cmd_alloc_index_retry(struct mlx5_cmd *cmd)
+{
+ unsigned long alloc_end = jiffies + msecs_to_jiffies(1000);
+ int idx;
+
+retry:
+ idx = cmd_alloc_index(cmd);
+ if (idx < 0 && time_before(jiffies, alloc_end)) {
+ /* Index allocation can fail on heavy load of commands. This is a temporary
+ * situation as the current command already holds the semaphore, meaning that
+ * another command completion is being handled and it is expected to release
+ * the entry index soon.
+ */
+ cond_resched();
+ goto retry;
+ }
+ return idx;
+}
+
static void cmd_work_handler(struct work_struct *work)
{
struct mlx5_cmd_work_ent *ent = container_of(work, struct mlx5_cmd_work_ent, work);
@@ -900,7 +919,7 @@ static void cmd_work_handler(struct work_struct *work)
sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem;
down(sem);
if (!ent->page_queue) {
- alloc_ret = cmd_alloc_index(cmd);
+ alloc_ret = cmd_alloc_index_retry(cmd);
if (alloc_ret < 0) {
mlx5_core_err_rl(dev, "failed to allocate command entry\n");
if (ent->callback) {
--
2.26.2
next prev parent reply other threads:[~2020-10-01 2:05 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-01 2:05 [pull request][net 00/15] mlx5 fixes 2020-09-30 saeed
2020-10-01 2:05 ` [net 01/15] net/mlx5: Don't allow health work when device is uninitialized saeed
2020-10-01 19:21 ` David Miller
2020-10-01 19:41 ` Saeed Mahameed
2020-10-01 2:05 ` [net 02/15] net/mlx5: Fix a race when moving command interface to polling mode saeed
2020-10-01 2:05 ` [net 03/15] net/mlx5: Avoid possible free of command entry while timeout comp handler saeed
2020-10-01 2:05 ` [net 04/15] net/mlx5: poll cmd EQ in case of command timeout saeed
2020-10-01 2:05 ` saeed [this message]
2020-10-01 2:05 ` [net 06/15] net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible saeed
2020-10-01 2:05 ` [net 07/15] net/mlx5: Fix request_irqs error flow saeed
2020-10-01 2:05 ` [net 08/15] net/mlx5e: Fix error path for RQ alloc saeed
2020-10-01 2:05 ` [net 09/15] net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU saeed
2020-10-01 2:05 ` [net 10/15] net/mlx5e: CT, Fix coverity issue saeed
2020-10-01 2:05 ` [net 11/15] net/mlx5e: Fix driver's declaration to support GRE offload saeed
2020-10-01 2:05 ` [net 12/15] net/mlx5e: Fix return status when setting unsupported FEC mode saeed
2020-10-01 2:05 ` [net 13/15] net/mlx5e: Fix VLAN cleanup flow saeed
2020-10-01 2:05 ` [net 14/15] net/mlx5e: Fix VLAN create flow saeed
2020-10-01 2:05 ` [net 15/15] net/mlx5e: Fix race condition on nhe->n pointer in neigh update saeed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201001020516.41217-6-saeed@kernel.org \
--to=saeed@kernel.org \
--cc=davem@davemloft.net \
--cc=eranbe@nvidia.com \
--cc=kuba@kernel.org \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.