netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown
@ 2023-06-02 18:24 Asmaa Mnebhi
  2023-06-05 23:15 ` Jakub Kicinski
  2023-06-06 10:47 ` Paolo Abeni
  0 siblings, 2 replies; 6+ messages in thread
From: Asmaa Mnebhi @ 2023-06-02 18:24 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni
  Cc: Asmaa Mnebhi, netdev, cai.huoqing, brgl, chenhao288,
	huangguangbin2, David Thompson

There is a race condition happening during shutdown due to pending napi transactions.
Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a
result causes a kernel panic.
To fix this during shutdown, invoke mlxbf_gige_remove to disable and dequeue napi.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
index 694de9513b9f..7017f14595db 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
@@ -485,10 +485,7 @@ static int mlxbf_gige_remove(struct platform_device *pdev)
 
 static void mlxbf_gige_shutdown(struct platform_device *pdev)
 {
-	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
-
-	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
-	mlxbf_gige_clean_port(priv);
+	mlxbf_gige_remove(pdev);
 }
 
 static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = {
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown
  2023-06-02 18:24 [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
@ 2023-06-05 23:15 ` Jakub Kicinski
  2023-06-06 12:25   ` Asmaa Mnebhi
  2023-06-06 10:47 ` Paolo Abeni
  1 sibling, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2023-06-05 23:15 UTC (permalink / raw)
  To: Asmaa Mnebhi
  Cc: davem, edumazet, pabeni, netdev, cai.huoqing, brgl, chenhao288,
	huangguangbin2, David Thompson

On Fri, 2 Jun 2023 14:24:43 -0400 Asmaa Mnebhi wrote:
> There is a race condition happening during shutdown due to pending napi transactions.
> Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a
> result causes a kernel panic.
> To fix this during shutdown, invoke mlxbf_gige_remove to disable and dequeue napi.
> 
> Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>

Judging by the Fixes tag the problem can happen on 6.4-rc5 already,
right? So the tree in the [PATCH ] tag should have been net rather
than net-next?

https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#git-trees-and-patch-flow

No need to repost confirmation is enough.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown
  2023-06-02 18:24 [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
  2023-06-05 23:15 ` Jakub Kicinski
@ 2023-06-06 10:47 ` Paolo Abeni
  2023-06-06 17:29   ` Jakub Kicinski
  1 sibling, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2023-06-06 10:47 UTC (permalink / raw)
  To: Asmaa Mnebhi, davem, edumazet, kuba
  Cc: netdev, cai.huoqing, brgl, chenhao288, huangguangbin2,
	David Thompson

On Fri, 2023-06-02 at 14:24 -0400, Asmaa Mnebhi wrote:
> There is a race condition happening during shutdown due to pending napi transactions.
> Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as a
> result causes a kernel panic.
> To fix this during shutdown, invoke mlxbf_gige_remove to disable and dequeue napi.
> 
> Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> index 694de9513b9f..7017f14595db 100644
> --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
> @@ -485,10 +485,7 @@ static int mlxbf_gige_remove(struct platform_device *pdev)
>  
>  static void mlxbf_gige_shutdown(struct platform_device *pdev)
>  {
> -	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
> -
> -	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
> -	mlxbf_gige_clean_port(priv);
> +	mlxbf_gige_remove(pdev);
>  }
>  
>  static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = {

if the device goes through both shutdown() and remove(), the netdevice
will go through unregister_netdevice() 2 times, which is wrong. Am I
missing something relevant?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown
  2023-06-05 23:15 ` Jakub Kicinski
@ 2023-06-06 12:25   ` Asmaa Mnebhi
  0 siblings, 0 replies; 6+ messages in thread
From: Asmaa Mnebhi @ 2023-06-06 12:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	netdev@vger.kernel.org, cai.huoqing@linux.dev, brgl@bgdev.pl,
	chenhao288@hisilicon.com, huangguangbin2@huawei.com,
	David Thompson

Hi Jakub , 

Yes indeed. Thank you!

Best,
Asmaa
> 
> Judging by the Fixes tag the problem can happen on 6.4-rc5 already, right? So
> the tree in the [PATCH ] tag should have been net rather than net-next?
> 
> https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#git-
> trees-and-patch-flow
> 
> No need to repost confirmation is enough.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown
  2023-06-06 10:47 ` Paolo Abeni
@ 2023-06-06 17:29   ` Jakub Kicinski
  2023-06-07 13:54     ` Asmaa Mnebhi
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2023-06-06 17:29 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Asmaa Mnebhi, davem, edumazet, netdev, cai.huoqing, brgl,
	chenhao288, huangguangbin2, David Thompson

On Tue, 06 Jun 2023 12:47:09 +0200 Paolo Abeni wrote:
> >  static void mlxbf_gige_shutdown(struct platform_device *pdev)
> >  {
> > -	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
> > -
> > -	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
> > -	mlxbf_gige_clean_port(priv);
> > +	mlxbf_gige_remove(pdev);
> >  }
> >  
> >  static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = {  
> 
> if the device goes through both shutdown() and remove(), the netdevice
> will go through unregister_netdevice() 2 times, which is wrong. Am I
> missing something relevant?

Good point, mlxbf_gige_remove() needs to check that the priv pointer
is not NULL.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown
  2023-06-06 17:29   ` Jakub Kicinski
@ 2023-06-07 13:54     ` Asmaa Mnebhi
  0 siblings, 0 replies; 6+ messages in thread
From: Asmaa Mnebhi @ 2023-06-07 13:54 UTC (permalink / raw)
  To: Jakub Kicinski, Paolo Abeni
  Cc: davem@davemloft.net, edumazet@google.com, netdev@vger.kernel.org,
	cai.huoqing@linux.dev, brgl@bgdev.pl, chenhao288@hisilicon.com,
	huangguangbin2@huawei.com, David Thompson

> > >  static void mlxbf_gige_shutdown(struct platform_device *pdev)  {
> > > -	struct mlxbf_gige *priv = platform_get_drvdata(pdev);
> > > -
> > > -	writeq(0, priv->base + MLXBF_GIGE_INT_EN);
> > > -	mlxbf_gige_clean_port(priv);
> > > +	mlxbf_gige_remove(pdev);
> > >  }
> > >
> > >  static const struct acpi_device_id __maybe_unused
> > > mlxbf_gige_acpi_match[] = {
> >
> > if the device goes through both shutdown() and remove(), the netdevice
> > will go through unregister_netdevice() 2 times, which is wrong. Am I
> > missing something relevant?
> 
> Good point, mlxbf_gige_remove() needs to check that the priv pointer is not
> NULL.

Thank you all for your feedback. I will fix it shortly along with net-next -> net.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-07 13:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-02 18:24 [PATCH net-next v1 1/1] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
2023-06-05 23:15 ` Jakub Kicinski
2023-06-06 12:25   ` Asmaa Mnebhi
2023-06-06 10:47 ` Paolo Abeni
2023-06-06 17:29   ` Jakub Kicinski
2023-06-07 13:54     ` Asmaa Mnebhi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).