* [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
@ 2024-05-13 6:46 Michael Margolin
2024-05-16 10:54 ` Gal Pressman
2024-05-30 12:49 ` Leon Romanovsky
0 siblings, 2 replies; 4+ messages in thread
From: Michael Margolin @ 2024-05-13 6:46 UTC (permalink / raw)
To: jgg, leon, linux-rdma
Cc: sleybo, matua, gal.pressman, Firas Jahjah, Yehuda Yitschak
Do not try to handle admin command completion if it has an unexpected
command id and print a relevant error message.
Reviewed-by: Firas Jahjah <firasj@amazon.com>
Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
Signed-off-by: Michael Margolin <mrgolin@amazon.com>
---
drivers/infiniband/hw/efa/efa_com.c | 30 ++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/hw/efa/efa_com.c b/drivers/infiniband/hw/efa/efa_com.c
index 16a24a05fc2a..bafd210dd43e 100644
--- a/drivers/infiniband/hw/efa/efa_com.c
+++ b/drivers/infiniband/hw/efa/efa_com.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause
/*
- * Copyright 2018-2021 Amazon.com, Inc. or its affiliates. All rights reserved.
+ * Copyright 2018-2024 Amazon.com, Inc. or its affiliates. All rights reserved.
*/
#include "efa_com.h"
@@ -406,8 +406,8 @@ static struct efa_comp_ctx *efa_com_submit_admin_cmd(struct efa_com_admin_queue
return comp_ctx;
}
-static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq,
- struct efa_admin_acq_entry *cqe)
+static int efa_com_handle_single_admin_completion(struct efa_com_admin_queue *aq,
+ struct efa_admin_acq_entry *cqe)
{
struct efa_comp_ctx *comp_ctx;
u16 cmd_id;
@@ -416,11 +416,11 @@ static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a
EFA_ADMIN_ACQ_COMMON_DESC_COMMAND_ID);
comp_ctx = efa_com_get_comp_ctx(aq, cmd_id, false);
- if (!comp_ctx) {
+ if (comp_ctx->status != EFA_CMD_SUBMITTED) {
ibdev_err(aq->efa_dev,
- "comp_ctx is NULL. Changing the admin queue running state\n");
- clear_bit(EFA_AQ_STATE_RUNNING_BIT, &aq->state);
- return;
+ "Received completion with unexpected command id[%d], sq producer: %d, sq consumer: %d, cq consumer: %d\n",
+ cmd_id, aq->sq.pc, aq->sq.cc, aq->cq.cc);
+ return -EINVAL;
}
comp_ctx->status = EFA_CMD_COMPLETED;
@@ -428,14 +428,17 @@ static void efa_com_handle_single_admin_completion(struct efa_com_admin_queue *a
if (!test_bit(EFA_AQ_STATE_POLLING_BIT, &aq->state))
complete(&comp_ctx->wait_event);
+
+ return 0;
}
static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
{
struct efa_admin_acq_entry *cqe;
u16 queue_size_mask;
- u16 comp_num = 0;
+ u16 comp_cmds = 0;
u8 phase;
+ int err;
u16 ci;
queue_size_mask = aq->depth - 1;
@@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
* phase bit was validated
*/
dma_rmb();
- efa_com_handle_single_admin_completion(aq, cqe);
+ err = efa_com_handle_single_admin_completion(aq, cqe);
+ if (!err)
+ comp_cmds++;
+ aq->cq.cc++;
ci++;
- comp_num++;
if (ci == aq->depth) {
ci = 0;
phase = !phase;
@@ -465,10 +470,9 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
cqe = &aq->cq.entries[ci];
}
- aq->cq.cc += comp_num;
aq->cq.phase = phase;
- aq->sq.cc += comp_num;
- atomic64_add(comp_num, &aq->stats.completed_cmd);
+ aq->sq.cc += comp_cmds;
+ atomic64_add(comp_cmds, &aq->stats.completed_cmd);
}
static int efa_com_comp_status_to_errno(u8 comp_status)
--
2.40.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
2024-05-13 6:46 [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions Michael Margolin
@ 2024-05-16 10:54 ` Gal Pressman
2024-05-22 11:02 ` Margolin, Michael
2024-05-30 12:49 ` Leon Romanovsky
1 sibling, 1 reply; 4+ messages in thread
From: Gal Pressman @ 2024-05-16 10:54 UTC (permalink / raw)
To: Michael Margolin, jgg, leon, linux-rdma
Cc: sleybo, matua, Firas Jahjah, Yehuda Yitschak
On 13/05/2024 9:46, Michael Margolin wrote:
> Do not try to handle admin command completion if it has an unexpected
> command id and print a relevant error message.
>
> Reviewed-by: Firas Jahjah <firasj@amazon.com>
> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
> ---
> static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
> {
> struct efa_admin_acq_entry *cqe;
> u16 queue_size_mask;
> - u16 comp_num = 0;
> + u16 comp_cmds = 0;
> u8 phase;
> + int err;
> u16 ci;
>
> queue_size_mask = aq->depth - 1;
> @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
> * phase bit was validated
> */
> dma_rmb();
> - efa_com_handle_single_admin_completion(aq, cqe);
> + err = efa_com_handle_single_admin_completion(aq, cqe);
> + if (!err)
> + comp_cmds++;
I would count the unexpected completions as well.
Regardless, I would definitely add a counter to track these (hopefully)
rare cases.
Whatever you decide:
Reviewed-by: Gal Pressman <gal.pressman@linux.dev>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
2024-05-16 10:54 ` Gal Pressman
@ 2024-05-22 11:02 ` Margolin, Michael
0 siblings, 0 replies; 4+ messages in thread
From: Margolin, Michael @ 2024-05-22 11:02 UTC (permalink / raw)
To: Gal Pressman, jgg, leon, linux-rdma
Cc: sleybo, matua, Firas Jahjah, Yehuda Yitschak
Thanks Gal, we indeed shouldn't see such errors on regular basis and the
print is sufficient to help tracking and debugging in case it appears.
Michael
On 5/16/2024 1:54 PM, Gal Pressman wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 13/05/2024 9:46, Michael Margolin wrote:
>> Do not try to handle admin command completion if it has an unexpected
>> command id and print a relevant error message.
>>
>> Reviewed-by: Firas Jahjah <firasj@amazon.com>
>> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com>
>> Signed-off-by: Michael Margolin <mrgolin@amazon.com>
>> ---
>> static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>> {
>> struct efa_admin_acq_entry *cqe;
>> u16 queue_size_mask;
>> - u16 comp_num = 0;
>> + u16 comp_cmds = 0;
>> u8 phase;
>> + int err;
>> u16 ci;
>>
>> queue_size_mask = aq->depth - 1;
>> @@ -453,10 +456,12 @@ static void efa_com_handle_admin_completion(struct efa_com_admin_queue *aq)
>> * phase bit was validated
>> */
>> dma_rmb();
>> - efa_com_handle_single_admin_completion(aq, cqe);
>> + err = efa_com_handle_single_admin_completion(aq, cqe);
>> + if (!err)
>> + comp_cmds++;
> I would count the unexpected completions as well.
> Regardless, I would definitely add a counter to track these (hopefully)
> rare cases.
>
> Whatever you decide:
> Reviewed-by: Gal Pressman <gal.pressman@linux.dev>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions
2024-05-13 6:46 [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions Michael Margolin
2024-05-16 10:54 ` Gal Pressman
@ 2024-05-30 12:49 ` Leon Romanovsky
1 sibling, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2024-05-30 12:49 UTC (permalink / raw)
To: linux-rdma, Jason Gunthorpe, Michael Margolin
Cc: sleybo, matua, gal.pressman, Firas Jahjah, Yehuda Yitschak
On Mon, 13 May 2024 06:46:30 +0000, Michael Margolin wrote:
> Do not try to handle admin command completion if it has an unexpected
> command id and print a relevant error message.
>
>
Applied, thanks!
[1/1] RDMA/efa: Properly handle unexpected AQ completions
https://git.kernel.org/rdma/rdma/c/2d0e7ba468eae3
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-30 14:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-13 6:46 [PATCH for-next] RDMA/efa: Properly handle unexpected AQ completions Michael Margolin
2024-05-16 10:54 ` Gal Pressman
2024-05-22 11:02 ` Margolin, Michael
2024-05-30 12:49 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox