public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
@ 2024-05-13  6:46 Michael Margolin
  2024-05-16 10:54 ` Gal Pressman
  2024-05-30 12:49 ` Leon Romanovsky
  0 siblings, 2 replies; 4+ messages in thread
From: Michael Margolin @ 2024-05-13  6:46 UTC (permalink / raw)
  To: jgg, leon, linux-rdma
  Cc: sleybo, matua, gal.pressman, Firas Jahjah, Yehuda Yitschak

Do not try to handle admin command completion if it has an unexpected
command id and print a relevant error message.

Reviewed-by: Firas Jahjah <firasj@amazon.com>
Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
 drivers/infiniband/hw/efa/efa_com.c | 30 ++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/efa/efa_com.c b/drivers/infiniband/hw/efa/efa_com.c
index 16a24a05fc2a..bafd210dd43e 100644
--- a/drivers/infiniband/hw/efa/efa_com.c
+++ b/drivers/infiniband/hw/efa/efa_com.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
 /*
- * Copyright 2018-2021 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2024 Amazon.com, Inc. or its affiliates. All rights reserved.
  */
 
 #include "efa_com.h"
@@ -406,8 +406,8 @@ static struct efa_comp_ctx *efa_com_submit_admin_cmd(struct efa_com_admin_queue
 	return comp_ctx;
 }
 
-static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq,
-						   struct efa_admin_acq_entry *cqe)
+static int efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq,
+						  struct efa_admin_acq_entry *cqe)
 {
 	struct efa_comp_ctx *comp_ctx;
 	u16 cmd_id;
@@ -416,11 +416,11 @@ static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a
 			 EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID);
 
 	comp_ctx = efa_com_get_comp_ctx(aq, cmd_id, false);
-	if (!comp_ctx) {
+	if (comp_ctx->status != EFA_CMD_SUBMITTED) {
 		ibdev_err(aq->efa_dev,
-			  "comp_ctx is NULL. Changing the admin queue running state\n");
-		clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
-		return;
+			  "Received completion with unexpected command id[%d], sq producer: %d, sq consumer: %d, cq consumer: %d\n",
+			  cmd_id, aq->sq.pc, aq->sq.cc, aq->cq.cc);
+		return -EINVAL;
 	}
 
 	comp_ctx->status = EFA_CMD_COMPLETED;
@@ -428,14 +428,17 @@ static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a
 
 	if (!test_bit(EFA_AQ_STATE_POLLING_BIT, &aq->state))
 		complete(&comp_ctx->wait_event);
+
+	return 0;
 }
 
 static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
 {
 	struct efa_admin_acq_entry *cqe;
 	u16 queue_size_mask;
-	u16 comp_num = 0;
+	u16 comp_cmds = 0;
 	u8 phase;
+	int err;
 	u16 ci;
 
 	queue_size_mask = aq->depth - 1;
@@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
 		 * phase bit was validated
 		 */
 		dma_rmb();
-		efa_com_handle_single_admin_completion(aq, cqe);
+		err = efa_com_handle_single_admin_completion(aq, cqe);
+		if (!err)
+			comp_cmds++;
 
+		aq->cq.cc++;
 		ci++;
-		comp_num++;
 		if (ci == aq->depth) {
 			ci = 0;
 			phase = !phase;
@@ -465,10 +470,9 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
 		cqe = &aq->cq.entries[ci];
 	}
 
-	aq->cq.cc += comp_num;
 	aq->cq.phase = phase;
-	aq->sq.cc += comp_num;
-	atomic64_add(comp_num, &aq->stats.completed_cmd);
+	aq->sq.cc += comp_cmds;
+	atomic64_add(comp_cmds, &aq->stats.completed_cmd);
 }
 
 static int efa_com_comp_status_to_errno(u8 comp_status)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
  2024-05-13  6:46 [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions Michael Margolin
@ 2024-05-16 10:54 ` Gal Pressman
  2024-05-22 11:02   ` Margolin, Michael
  2024-05-30 12:49 ` Leon Romanovsky
  1 sibling, 1 reply; 4+ messages in thread
From: Gal Pressman @ 2024-05-16 10:54 UTC (permalink / raw)
  To: Michael Margolin, jgg, leon, linux-rdma
  Cc: sleybo, matua, Firas Jahjah, Yehuda Yitschak

On 13/05/2024 9:46, Michael Margolin wrote:
> Do not try to handle admin command completion if it has an unexpected
> command id and print a relevant error message.
> 
> Reviewed-by: Firas Jahjah <firasj@amazon.com>
> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
> ---
>  static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>  {
>  	struct efa_admin_acq_entry *cqe;
>  	u16 queue_size_mask;
> -	u16 comp_num = 0;
> +	u16 comp_cmds = 0;
>  	u8 phase;
> +	int err;
>  	u16 ci;
>  
>  	queue_size_mask = aq->depth - 1;
> @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>  		 * phase bit was validated
>  		 */
>  		dma_rmb();
> -		efa_com_handle_single_admin_completion(aq, cqe);
> +		err = efa_com_handle_single_admin_completion(aq, cqe);
> +		if (!err)
> +			comp_cmds++;

I would count the unexpected completions as well.
Regardless, I would definitely add a counter to track these (hopefully)
rare cases.

Whatever you decide:
Reviewed-by: Gal Pressman <gal.pressman@linux.dev>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
  2024-05-16 10:54 ` Gal Pressman
@ 2024-05-22 11:02   ` Margolin, Michael
  0 siblings, 0 replies; 4+ messages in thread
From: Margolin, Michael @ 2024-05-22 11:02 UTC (permalink / raw)
  To: Gal Pressman, jgg, leon, linux-rdma
  Cc: sleybo, matua, Firas Jahjah, Yehuda Yitschak

Thanks Gal, we indeed shouldn't see such errors on regular basis and the 
print is sufficient to help tracking and debugging in case it appears.


Michael

On 5/16/2024 1:54 PM, Gal Pressman wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 13/05/2024 9:46, Michael Margolin wrote:
>> Do not try to handle admin command completion if it has an unexpected
>> command id and print a relevant error message.
>>
>> Reviewed-by: Firas Jahjah <firasj@amazon.com>
>> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
>> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
>> ---
>>   static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>>   {
>>        struct efa_admin_acq_entry *cqe;
>>        u16 queue_size_mask;
>> -     u16 comp_num = 0;
>> +     u16 comp_cmds = 0;
>>        u8 phase;
>> +     int err;
>>        u16 ci;
>>
>>        queue_size_mask = aq->depth - 1;
>> @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>>                 * phase bit was validated
>>                 */
>>                dma_rmb();
>> -             efa_com_handle_single_admin_completion(aq, cqe);
>> +             err = efa_com_handle_single_admin_completion(aq, cqe);
>> +             if (!err)
>> +                     comp_cmds++;
> I would count the unexpected completions as well.
> Regardless, I would definitely add a counter to track these (hopefully)
> rare cases.
>
> Whatever you decide:
> Reviewed-by: Gal Pressman <gal.pressman@linux.dev>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
  2024-05-13  6:46 [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions Michael Margolin
  2024-05-16 10:54 ` Gal Pressman
@ 2024-05-30 12:49 ` Leon Romanovsky
  1 sibling, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2024-05-30 12:49 UTC (permalink / raw)
  To: linux-rdma, Jason Gunthorpe, Michael Margolin
  Cc: sleybo, matua, gal.pressman, Firas Jahjah, Yehuda Yitschak


On Mon, 13 May 2024 06:46:30 +0000, Michael Margolin wrote:
> Do not try to handle admin command completion if it has an unexpected
> command id and print a relevant error message.
> 
> 

Applied, thanks!

[1/1] RDMA/efa: Properly handle unexpected AQ completions
      https://git.kernel.org/rdma/rdma/c/2d0e7ba468eae3

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-30 14:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-13  6:46 [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions Michael Margolin
2024-05-16 10:54 ` Gal Pressman
2024-05-22 11:02   ` Margolin, Michael
2024-05-30 12:49 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox