* [PATCH] net/mlx5: HWS, change error flow on matcher disconnect
@ 2025-09-03 8:40 Subramaniam, Sujana
2025-09-03 9:41 ` Greg KH
0 siblings, 1 reply; 5+ messages in thread
From: Subramaniam, Sujana @ 2025-09-03 8:40 UTC (permalink / raw)
To: stable@vger.kernel.org
Cc: Subramaniam, Sujana, Yevgeny Kliteynik, Itamar Gozlan, Mark Bloch,
Tariq Toukan, Jakub Kicinski, Sasha Levin, Akendo
From: SujanaSubr <sujana.subramaniam@sap.com>
[ Upstream commit 1ce840c7a659aa53a31ef49f0271b4fd0dc10296 ]
Currently, when firmware failure occurs during matcher disconnect flow,
the error flow of the function reconnects the matcher back and returns
an error, which continues running the calling function and eventually
frees the matcher that is being disconnected.
This leads to a case where we have a freed matcher on the matchers list,
which in turn leads to use-after-free and eventual crash.
This patch fixes that by not trying to reconnect the matcher back when
some FW command fails during disconnect.
Note that we're dealing here with FW error. We can't overcome this
problem. This might lead to bad steering state (e.g. wrong connection
between matchers), and will also lead to resource leakage, as it is
the case with any other error handling during resource destruction.
However, the goal here is to allow the driver to continue and not crash
the machine with use-after-free error.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Akendo <akendo@akendo.eu>
Signed-off-by: SujanaSubr <sujana.subramaniam@sap.com>
---
.../mlx5/core/steering/hws/mlx5hws_matcher.c | 24 +++++++------------
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws_matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws_matcher.c
index 61a1155d4b4f..ce541c60c5b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws_matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws_matcher.c
@@ -165,14 +165,14 @@ static int hws_matcher_disconnect(struct mlx5hws_matcher *matcher)
next->match_ste.rtc_0_id,
next->match_ste.rtc_1_id);
if (ret) {
- mlx5hws_err(tbl->ctx, "Failed to disconnect matcher\n");
- goto matcher_reconnect;
+ mlx5hws_err(tbl->ctx, "Fatal error, failed to disconnect matcher\n");
+ return ret;
}
} else {
ret = mlx5hws_table_connect_to_miss_table(tbl, tbl->default_miss.miss_tbl);
if (ret) {
- mlx5hws_err(tbl->ctx, "Failed to disconnect last matcher\n");
- goto matcher_reconnect;
+ mlx5hws_err(tbl->ctx, "Fatal error, failed to disconnect last matcher\n");
+ return ret;
}
}
@@ -180,27 +180,19 @@ static int hws_matcher_disconnect(struct mlx5hws_matcher *matcher)
if (prev_ft_id == tbl->ft_id) {
ret = mlx5hws_table_update_connected_miss_tables(tbl);
if (ret) {
- mlx5hws_err(tbl->ctx, "Fatal error, failed to update connected miss table\n");
- goto matcher_reconnect;
+ mlx5hws_err(tbl->ctx,
+ "Fatal error, failed to update connected miss table\n");
+ return ret;
}
}
ret = mlx5hws_table_ft_set_default_next_ft(tbl, prev_ft_id);
if (ret) {
mlx5hws_err(tbl->ctx, "Fatal error, failed to restore matcher ft default miss\n");
- goto matcher_reconnect;
+ return ret;
}
return 0;
-
-matcher_reconnect:
- if (list_empty(&tbl->matchers_list) || !prev)
- list_add(&matcher->list_node, &tbl->matchers_list);
- else
- /* insert after prev matcher */
- list_add(&matcher->list_node, &prev->list_node);
-
- return ret;
}
static void hws_matcher_set_rtc_attr_sz(struct mlx5hws_matcher *matcher,
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] net/mlx5: HWS, change error flow on matcher disconnect
2025-09-03 8:40 [PATCH] net/mlx5: HWS, change error flow on matcher disconnect Subramaniam, Sujana
@ 2025-09-03 9:41 ` Greg KH
2025-09-03 12:21 ` akendo
2025-09-03 12:46 ` akendo
0 siblings, 2 replies; 5+ messages in thread
From: Greg KH @ 2025-09-03 9:41 UTC (permalink / raw)
To: Subramaniam, Sujana
Cc: stable@vger.kernel.org, Yevgeny Kliteynik, Itamar Gozlan,
Mark Bloch, Tariq Toukan, Jakub Kicinski, Sasha Levin, Akendo
On Wed, Sep 03, 2025 at 08:40:13AM +0000, Subramaniam, Sujana wrote:
> From: SujanaSubr <sujana.subramaniam@sap.com>
>
> [ Upstream commit 1ce840c7a659aa53a31ef49f0271b4fd0dc10296 ]
>
> Currently, when firmware failure occurs during matcher disconnect flow,
> the error flow of the function reconnects the matcher back and returns
> an error, which continues running the calling function and eventually
> frees the matcher that is being disconnected.
> This leads to a case where we have a freed matcher on the matchers list,
> which in turn leads to use-after-free and eventual crash.
>
> This patch fixes that by not trying to reconnect the matcher back when
> some FW command fails during disconnect.
>
> Note that we're dealing here with FW error. We can't overcome this
> problem. This might lead to bad steering state (e.g. wrong connection
> between matchers), and will also lead to resource leakage, as it is
> the case with any other error handling during resource destruction.
>
> However, the goal here is to allow the driver to continue and not crash
> the machine with use-after-free error.
>
> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
> Reviewed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sasha didn't sign off on this original commit, did they?
> Signed-off-by: Akendo <akendo@akendo.eu>
Real name?
> Signed-off-by: SujanaSubr <sujana.subramaniam@sap.com>
Correct name?
What is this being sent for?
totally confused,
greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net/mlx5: HWS, change error flow on matcher disconnect
2025-09-03 9:41 ` Greg KH
@ 2025-09-03 12:21 ` akendo
2025-09-03 12:31 ` Greg KH
2025-09-03 12:46 ` akendo
1 sibling, 1 reply; 5+ messages in thread
From: akendo @ 2025-09-03 12:21 UTC (permalink / raw)
To: Greg KH, Subramaniam, Sujana
Cc: stable@vger.kernel.org, Yevgeny Kliteynik, Itamar Gozlan,
Mark Bloch, Tariq Toukan, Jakub Kicinski, Sasha Levin
Hello Greg,
Thank you for your responses. We’re in the process of learning the
process and figuring out how to get the git send-mail out.
This patch aims for the kernel 6.12 and backports the changes for the
mlx5 from 6.13 to it. We use the
1ce840c7a659aa53a31ef49f0271b4fd0dc10296 commit from upsteam to do it.
We had to update the path within the patch to make the patch apply
that’s the only change we made. We roll this out in our kernel and test
it already.
I forgot to add my full name to it, we will fix Sujana's Name is
correct. Please, I apologize for the puzzlement we might have caused.
Best regards,
akendo
On 9/3/25 11:41 AM, Greg KH wrote:
> On Wed, Sep 03, 2025 at 08:40:13AM +0000, Subramaniam, Sujana wrote:
>> From: SujanaSubr <sujana.subramaniam@sap.com>
>>
>> [ Upstream commit 1ce840c7a659aa53a31ef49f0271b4fd0dc10296 ]
>>
>> Currently, when firmware failure occurs during matcher disconnect flow,
>> the error flow of the function reconnects the matcher back and returns
>> an error, which continues running the calling function and eventually
>> frees the matcher that is being disconnected.
>> This leads to a case where we have a freed matcher on the matchers list,
>> which in turn leads to use-after-free and eventual crash.
>>
>> This patch fixes that by not trying to reconnect the matcher back when
>> some FW command fails during disconnect.
>>
>> Note that we're dealing here with FW error. We can't overcome this
>> problem. This might lead to bad steering state (e.g. wrong connection
>> between matchers), and will also lead to resource leakage, as it is
>> the case with any other error handling during resource destruction.
>>
>> However, the goal here is to allow the driver to continue and not crash
>> the machine with use-after-free error.
>>
>> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
>> Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
>> Reviewed-by: Mark Bloch <mbloch@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> Sasha didn't sign off on this original commit, did they?
>
>> Signed-off-by: Akendo <akendo@akendo.eu>
>
> Real name?
>
>> Signed-off-by: SujanaSubr <sujana.subramaniam@sap.com>
>
> Correct name?
>
> What is this being sent for?
>
> totally confused,
>
> greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net/mlx5: HWS, change error flow on matcher disconnect
2025-09-03 12:21 ` akendo
@ 2025-09-03 12:31 ` Greg KH
0 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2025-09-03 12:31 UTC (permalink / raw)
To: akendo
Cc: Subramaniam, Sujana, stable@vger.kernel.org, Yevgeny Kliteynik,
Itamar Gozlan, Mark Bloch, Tariq Toukan, Jakub Kicinski,
Sasha Levin
A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
A: No.
Q: Should I include quotations after my reply?
http://daringfireball.net/2007/07/on_top
On Wed, Sep 03, 2025 at 02:21:29PM +0200, akendo wrote:
> Hello Greg,
>
> Thank you for your responses. We’re in the process of learning the process
> and figuring out how to get the git send-mail out.
>
> This patch aims for the kernel 6.12 and backports the changes for the mlx5
> from 6.13 to it. We use the 1ce840c7a659aa53a31ef49f0271b4fd0dc10296 commit
> from upsteam to do it. We had to update the path within the patch to make
> the patch apply that’s the only change we made. We roll this out in our
> kernel and test it already.
>
> I forgot to add my full name to it, we will fix Sujana's Name is correct.
> Please, I apologize for the puzzlement we might have caused.
Great, please fix up and resend.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net/mlx5: HWS, change error flow on matcher disconnect
2025-09-03 9:41 ` Greg KH
2025-09-03 12:21 ` akendo
@ 2025-09-03 12:46 ` akendo
1 sibling, 0 replies; 5+ messages in thread
From: akendo @ 2025-09-03 12:46 UTC (permalink / raw)
To: Greg KH, Subramaniam, Sujana
Cc: stable@vger.kernel.org, Yevgeny Kliteynik, Itamar Gozlan,
Mark Bloch, Tariq Toukan, Jakub Kicinski, Sasha Levin
Hello Greg,
Thank you for your responses. We’re in the process of learning the
process and figuring out how to get the git send-mail out.
This patch aims for the kernel 6.12 and backports the changes for the
mlx5 from 6.13 to it. We use the
1ce840c7a659aa53a31ef49f0271b4fd0dc10296 commit from upsteam to do it.
We had to update the path within the patch to make the patch apply
that’s the only change we made. We roll this out in our kernel and test
it already.
I forgot to add my full name to it, we will fix Sujana's Name is
correct. Please, I apologize for the puzzlement we might have caused.
Best regards,
akendo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-03 12:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-03 8:40 [PATCH] net/mlx5: HWS, change error flow on matcher disconnect Subramaniam, Sujana
2025-09-03 9:41 ` Greg KH
2025-09-03 12:21 ` akendo
2025-09-03 12:31 ` Greg KH
2025-09-03 12:46 ` akendo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.