* [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
@ 2023-07-21 14:19 Asmaa Mnebhi
2023-07-21 15:02 ` Vladimir Oltean
2023-07-26 16:27 ` Jakub Kicinski
0 siblings, 2 replies; 8+ messages in thread
From: Asmaa Mnebhi @ 2023-07-21 14:19 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni
Cc: Asmaa Mnebhi, netdev, sridhar.samudrala, maciej.fijalkowski,
olteanv, davthompson
There is a race condition happening during shutdown due to pending
napi transactions. Since mlxbf_gige_poll is still running, it tries
to access a NULL pointer and as a result causes a kernel panic.
To fix this during shutdown, invoke mlxbf_gige_remove to disable and
dequeue napi.
Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
---
v3-v4:
- replace .remove with .remove_new to be able to return a void to
.shutdown as well.
v2-v3:
- remove mlxbf_gige_shutdown() since it is redundant with mlxbf_gige_remove().
v1->v2:
- fixed the tag from net-next to net
- check that the "priv" pointer is not NULL in mlxbf_gige_remove()
.../mellanox/mlxbf_gige/mlxbf_gige_main.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
index 694de9513b9f..5e677bd32956 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c
@@ -471,24 +471,19 @@ static int mlxbf_gige_probe(struct platform_device *pdev)
return err;
}
-static int mlxbf_gige_remove(struct platform_device *pdev)
+static void mlxbf_gige_remove(struct platform_device *pdev)
{
struct mlxbf_gige *priv = platform_get_drvdata(pdev);
+ if (!priv)
+ return;
+
unregister_netdev(priv->netdev);
phy_disconnect(priv->netdev->phydev);
mlxbf_gige_mdio_remove(priv);
platform_set_drvdata(pdev, NULL);
- return 0;
-}
-
-static void mlxbf_gige_shutdown(struct platform_device *pdev)
-{
- struct mlxbf_gige *priv = platform_get_drvdata(pdev);
-
- writeq(0, priv->base + MLXBF_GIGE_INT_EN);
- mlxbf_gige_clean_port(priv);
+ return;
}
static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = {
@@ -499,8 +494,8 @@ MODULE_DEVICE_TABLE(acpi, mlxbf_gige_acpi_match);
static struct platform_driver mlxbf_gige_driver = {
.probe = mlxbf_gige_probe,
- .remove = mlxbf_gige_remove,
- .shutdown = mlxbf_gige_shutdown,
+ .remove_new = mlxbf_gige_remove,
+ .shutdown = mlxbf_gige_remove,
.driver = {
.name = KBUILD_MODNAME,
.acpi_match_table = ACPI_PTR(mlxbf_gige_acpi_match),
--
2.30.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
2023-07-21 14:19 [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
@ 2023-07-21 15:02 ` Vladimir Oltean
2023-07-21 19:06 ` Asmaa Mnebhi
2023-07-26 16:27 ` Jakub Kicinski
1 sibling, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2023-07-21 15:02 UTC (permalink / raw)
To: Asmaa Mnebhi
Cc: davem, edumazet, kuba, pabeni, netdev, sridhar.samudrala,
maciej.fijalkowski, davthompson
Hi Asmaa,
On Fri, Jul 21, 2023 at 10:19:56AM -0400, Asmaa Mnebhi wrote:
> There is a race condition happening during shutdown due to pending
> napi transactions. Since mlxbf_gige_poll is still running, it tries
> to access a NULL pointer and as a result causes a kernel panic.
> To fix this during shutdown, invoke mlxbf_gige_remove to disable and
> dequeue napi.
>
> Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
> ---
What is the race condition; what does the stack trace of the NPD look like?
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
2023-07-21 15:02 ` Vladimir Oltean
@ 2023-07-21 19:06 ` Asmaa Mnebhi
2023-07-21 22:32 ` Vladimir Oltean
0 siblings, 1 reply; 8+ messages in thread
From: Asmaa Mnebhi @ 2023-07-21 19:06 UTC (permalink / raw)
To: Vladimir Oltean
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
sridhar.samudrala@intel.com, maciej.fijalkowski@intel.com,
David Thompson
> What is the race condition; what does the stack trace of the NPD look like?
Hi Vladimir,
[ OK ] Reached target Shutdown.
[ OK ] Reached target Final Step.
[ OK ] Started Reboot.
[ OK ] Reached target Reboot.
...
[ 285.126250] mlxbf_gige MLNXBF17:00: shutdown
[ 285.130669] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
[ 285.139447] Mem abort info:
[ 285.142228] ESR = 0x0000000096000004
[ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits
[ 285.151261] SET = 0, FnV = 0
[ 285.154303] EA = 0, S1PTW = 0
[ 285.157430] FSC = 0x04: level 0 translation fault
[ 285.162293] Data abort info:
[ 285.165159] ISV = 0, ISS = 0x00000004
[ 285.168980] CM = 0, WnR = 0
[ 285.171932] user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000
[ 285.178358] [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
[ 285.185134] Internal error: Oops: 96000004 [#1] SMP
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
2023-07-21 19:06 ` Asmaa Mnebhi
@ 2023-07-21 22:32 ` Vladimir Oltean
2023-07-21 23:27 ` Asmaa Mnebhi
0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Oltean @ 2023-07-21 22:32 UTC (permalink / raw)
To: Asmaa Mnebhi
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
sridhar.samudrala@intel.com, maciej.fijalkowski@intel.com,
David Thompson
Hi Asmaa,
On Fri, Jul 21, 2023 at 07:06:03PM +0000, Asmaa Mnebhi wrote:
> > What is the race condition; what does the stack trace of the NPD look like?
>
> Hi Vladimir,
>
> [ OK ] Reached target Shutdown.
> [ OK ] Reached target Final Step.
> [ OK ] Started Reboot.
> [ OK ] Reached target Reboot.
> ...
> [ 285.126250] mlxbf_gige MLNXBF17:00: shutdown
> [ 285.130669] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070
> [ 285.139447] Mem abort info:
> [ 285.142228] ESR = 0x0000000096000004
> [ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 285.151261] SET = 0, FnV = 0
> [ 285.154303] EA = 0, S1PTW = 0
> [ 285.157430] FSC = 0x04: level 0 translation fault
> [ 285.162293] Data abort info:
> [ 285.165159] ISV = 0, ISS = 0x00000004
> [ 285.168980] CM = 0, WnR = 0
> [ 285.171932] user pgtable: 4k pages, 48-bit VAs, pgdp=000000011d373000
> [ 285.178358] [0000000000000070] pgd=0000000000000000, p4d=0000000000000000
> [ 285.185134] Internal error: Oops: 96000004 [#1] SMP
That is not a stack trace. The stack trace is what appears between the
lines "Call trace:" and "---[ end trace 0000000000000000 ]---", and it
is missing from what you've posted.
It would be nice if you could also post-process the stack trace you get
through the command below (run in a kernel tree):
cat stack_trace | scripts/decode_stacktrace.sh path/to/vmlinux .
so that we could see the actual C code line numbers, and not just the
offsets within your particular binary image.
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
2023-07-21 22:32 ` Vladimir Oltean
@ 2023-07-21 23:27 ` Asmaa Mnebhi
2023-07-21 23:29 ` Vladimir Oltean
0 siblings, 1 reply; 8+ messages in thread
From: Asmaa Mnebhi @ 2023-07-21 23:27 UTC (permalink / raw)
To: Vladimir Oltean
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
sridhar.samudrala@intel.com, maciej.fijalkowski@intel.com,
David Thompson
> > [ 285.126250] mlxbf_gige MLNXBF17:00: shutdown [ 285.130669] Unable
> > to handle kernel NULL pointer dereference at virtual address
> > 0000000000000070 [ 285.139447] Mem abort info:
> > [ 285.142228] ESR = 0x0000000096000004
> > [ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 285.151261] SET = 0, FnV = 0
> > [ 285.154303] EA = 0, S1PTW = 0
> > [ 285.157430] FSC = 0x04: level 0 translation fault
> > [ 285.162293] Data abort info:
> > [ 285.165159] ISV = 0, ISS = 0x00000004
> > [ 285.168980] CM = 0, WnR = 0
> > [ 285.171932] user pgtable: 4k pages, 48-bit VAs,
> > pgdp=000000011d373000 [ 285.178358] [0000000000000070]
> > pgd=0000000000000000, p4d=0000000000000000 [ 285.185134] Internal
> > error: Oops: 96000004 [#1] SMP
>
> That is not a stack trace. The stack trace is what appears between the lines "Call
> trace:" and "---[ end trace 0000000000000000 ]---", and it is missing from what
> you've posted.
>
> It would be nice if you could also post-process the stack trace you get through
> the command below (run in a kernel tree):
>
> cat stack_trace | scripts/decode_stacktrace.sh path/to/vmlinux .
>
> so that we could see the actual C code line numbers, and not just the offsets
> within your particular binary image.
Hi Vladimir,
Do you want me to just send it as a reply or send a v5 and add it to the commit message?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
2023-07-21 23:27 ` Asmaa Mnebhi
@ 2023-07-21 23:29 ` Vladimir Oltean
0 siblings, 0 replies; 8+ messages in thread
From: Vladimir Oltean @ 2023-07-21 23:29 UTC (permalink / raw)
To: Asmaa Mnebhi
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
sridhar.samudrala@intel.com, maciej.fijalkowski@intel.com,
David Thompson
On Fri, Jul 21, 2023 at 11:27:10PM +0000, Asmaa Mnebhi wrote:
> > > [ 285.126250] mlxbf_gige MLNXBF17:00: shutdown [ 285.130669] Unable
> > > to handle kernel NULL pointer dereference at virtual address
> > > 0000000000000070 [ 285.139447] Mem abort info:
> > > [ 285.142228] ESR = 0x0000000096000004
> > > [ 285.145964] EC = 0x25: DABT (current EL), IL = 32 bits
> > > [ 285.151261] SET = 0, FnV = 0
> > > [ 285.154303] EA = 0, S1PTW = 0
> > > [ 285.157430] FSC = 0x04: level 0 translation fault
> > > [ 285.162293] Data abort info:
> > > [ 285.165159] ISV = 0, ISS = 0x00000004
> > > [ 285.168980] CM = 0, WnR = 0
> > > [ 285.171932] user pgtable: 4k pages, 48-bit VAs,
> > > pgdp=000000011d373000 [ 285.178358] [0000000000000070]
> > > pgd=0000000000000000, p4d=0000000000000000 [ 285.185134] Internal
> > > error: Oops: 96000004 [#1] SMP
> >
> > That is not a stack trace. The stack trace is what appears between the lines "Call
> > trace:" and "---[ end trace 0000000000000000 ]---", and it is missing from what
> > you've posted.
> >
> > It would be nice if you could also post-process the stack trace you get through
> > the command below (run in a kernel tree):
> >
> > cat stack_trace | scripts/decode_stacktrace.sh path/to/vmlinux .
> >
> > so that we could see the actual C code line numbers, and not just the offsets
> > within your particular binary image.
>
> Hi Vladimir,
>
> Do you want me to just send it as a reply or send a v5 and add it to the commit message?
As a reply, for now, please. I haven't requested any changes to the patch.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
2023-07-21 14:19 [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
2023-07-21 15:02 ` Vladimir Oltean
@ 2023-07-26 16:27 ` Jakub Kicinski
2023-08-08 17:45 ` Asmaa Mnebhi
1 sibling, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2023-07-26 16:27 UTC (permalink / raw)
To: Asmaa Mnebhi
Cc: davem, edumazet, pabeni, netdev, sridhar.samudrala,
maciej.fijalkowski, olteanv, davthompson
On Fri, 21 Jul 2023 10:19:56 -0400 Asmaa Mnebhi wrote:
> There is a race condition happening during shutdown due to pending
> napi transactions. Since mlxbf_gige_poll is still running, it tries
> to access a NULL pointer and as a result causes a kernel panic.
> To fix this during shutdown, invoke mlxbf_gige_remove to disable and
> dequeue napi.
>
> Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Asmaa, you need to reply to Vladimir, I'm dropping this patch :(
--
pw-bot: defer
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown
2023-07-26 16:27 ` Jakub Kicinski
@ 2023-08-08 17:45 ` Asmaa Mnebhi
0 siblings, 0 replies; 8+ messages in thread
From: Asmaa Mnebhi @ 2023-08-08 17:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
netdev@vger.kernel.org, sridhar.samudrala@intel.com,
maciej.fijalkowski@intel.com, olteanv@gmail.com, David Thompson
> > Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
> > Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
>
> Asmaa, you need to reply to Vladimir, I'm dropping this patch :(
> --
Hi Jakub,
Apologies for the delay. Actually this worked out for the best, because this change caused another issue on our systems.
I will submit a different patch once I test it more thoroughly and will attach the panic as requested previously.
Thanks again for your help and feedback.
Asmaa
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-08 18:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-21 14:19 [PATCH net v4 1/1] mlxbf_gige: Fix kernel panic at shutdown Asmaa Mnebhi
2023-07-21 15:02 ` Vladimir Oltean
2023-07-21 19:06 ` Asmaa Mnebhi
2023-07-21 22:32 ` Vladimir Oltean
2023-07-21 23:27 ` Asmaa Mnebhi
2023-07-21 23:29 ` Vladimir Oltean
2023-07-26 16:27 ` Jakub Kicinski
2023-08-08 17:45 ` Asmaa Mnebhi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).