All of lore.kernel.org
 help / color / mirror / Atom feed
From: saeed@kernel.org
To: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, Eran Ben Elisha <eranbe@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>
Subject: [net V3 04/14] net/mlx5: Add retry mechanism to the command entry index allocation
Date: Fri,  2 Oct 2020 11:06:44 -0700	[thread overview]
Message-ID: <20201002180654.262800-5-saeed@kernel.org> (raw)
In-Reply-To: <20201002180654.262800-1-saeed@kernel.org>

From: Eran Ben Elisha <eranbe@nvidia.com>

It is possible that new command entry index allocation will temporarily
fail. The new command holds the semaphore, so it means that a free entry
should be ready soon. Add one second retry mechanism before returning an
error.

Patch "net/mlx5: Avoid possible free of command entry while timeout comp
handler" increase the possibility to bump into this temporarily failure
as it delays the entry index release for non-callback commands.

Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 ++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 37dae95e61d5..2b597ac365f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -883,6 +883,25 @@ static bool opcode_allowed(struct mlx5_cmd *cmd, u16 opcode)
 	return cmd->allowed_opcode == opcode;
 }
 
+static int cmd_alloc_index_retry(struct mlx5_cmd *cmd)
+{
+	unsigned long alloc_end = jiffies + msecs_to_jiffies(1000);
+	int idx;
+
+retry:
+	idx = cmd_alloc_index(cmd);
+	if (idx < 0 && time_before(jiffies, alloc_end)) {
+		/* Index allocation can fail on heavy load of commands. This is a temporary
+		 * situation as the current command already holds the semaphore, meaning that
+		 * another command completion is being handled and it is expected to release
+		 * the entry index soon.
+		 */
+		cpu_relax();
+		goto retry;
+	}
+	return idx;
+}
+
 static void cmd_work_handler(struct work_struct *work)
 {
 	struct mlx5_cmd_work_ent *ent = container_of(work, struct mlx5_cmd_work_ent, work);
@@ -900,7 +919,7 @@ static void cmd_work_handler(struct work_struct *work)
 	sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem;
 	down(sem);
 	if (!ent->page_queue) {
-		alloc_ret = cmd_alloc_index(cmd);
+		alloc_ret = cmd_alloc_index_retry(cmd);
 		if (alloc_ret < 0) {
 			mlx5_core_err_rl(dev, "failed to allocate command entry\n");
 			if (ent->callback) {
-- 
2.26.2


  parent reply	other threads:[~2020-10-02 18:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 18:06 [pull request][net V3 00/14] mlx5 fixes 2020-09-30 saeed
2020-10-02 18:06 ` [net V3 01/14] net/mlx5: Fix a race when moving command interface to polling mode saeed
2020-10-02 18:06 ` [net V3 02/14] net/mlx5: Avoid possible free of command entry while timeout comp handler saeed
2020-10-02 18:06 ` [net V3 03/14] net/mlx5: poll cmd EQ in case of command timeout saeed
2020-10-02 18:06 ` saeed [this message]
2020-10-02 18:06 ` [net V3 05/14] net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible saeed
2020-10-02 18:06 ` [net V3 06/14] net/mlx5: Fix request_irqs error flow saeed
2020-10-02 18:06 ` [net V3 07/14] net/mlx5e: Fix error path for RQ alloc saeed
2020-10-02 18:06 ` [net V3 08/14] net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU saeed
2020-10-02 18:06 ` [net V3 09/14] net/mlx5e: CT, Fix coverity issue saeed
2020-10-02 18:06 ` [net V3 10/14] net/mlx5e: Fix driver's declaration to support GRE offload saeed
2020-10-02 18:06 ` [net V3 11/14] net/mlx5e: Fix return status when setting unsupported FEC mode saeed
2020-10-02 18:06 ` [net V3 12/14] net/mlx5e: Fix VLAN cleanup flow saeed
2020-10-02 18:06 ` [net V3 13/14] net/mlx5e: Fix VLAN create flow saeed
2020-10-02 18:06 ` [net V3 14/14] net/mlx5e: Fix race condition on nhe->n pointer in neigh update saeed
2020-10-02 23:20 ` [pull request][net V3 00/14] mlx5 fixes 2020-09-30 David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201002180654.262800-5-saeed@kernel.org \
    --to=saeed@kernel.org \
    --cc=davem@davemloft.net \
    --cc=eranbe@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.