public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
@ 2026-02-18 16:05 Vladimir Oltean
  2026-02-18 16:08 ` Russell King (Oracle)
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Vladimir Oltean @ 2026-02-18 16:05 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

There is a crash when unbinding the sja1105 driver under special
circumstances:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
Call trace:
phylink_run_resolve_and_disable+0x10/0x90
sja1105_static_config_reload+0xc0/0x410
sja1105_vlan_filtering+0x100/0x140
dsa_port_vlan_filtering+0x13c/0x368
dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
dsa_port_bridge_leave+0x130/0x248
dsa_user_changeupper.part.0+0x74/0x158
dsa_user_netdevice_event+0x50c/0xa50
notifier_call_chain+0x78/0x148
raw_notifier_call_chain+0x20/0x38
call_netdevice_notifiers_info+0x58/0xa8
__netdev_upper_dev_unlink+0xac/0x220
netdev_upper_dev_unlink+0x38/0x70
del_nbp+0x1a4/0x320
br_del_if+0x3c/0xd8
br_device_event+0xf8/0x2d8
notifier_call_chain+0x78/0x148
raw_notifier_call_chain+0x20/0x38
call_netdevice_notifiers_info+0x58/0xa8
unregister_netdevice_many_notify+0x314/0x848
unregister_netdevice_queue+0xe8/0xf8
dsa_user_destroy+0x50/0xa8
dsa_port_teardown+0x80/0x98
dsa_switch_teardown_ports+0x4c/0xb8
dsa_switch_deinit+0x94/0xb8
dsa_switch_put_tree+0x2c/0xc0
dsa_unregister_switch+0x38/0x60
sja1105_remove+0x24/0x40
spi_remove+0x38/0x60
device_remove+0x54/0x90
device_release_driver_internal+0x1d4/0x230
device_driver_detach+0x20/0x38
unbind_store+0xbc/0xc8
---[ end trace 0000000000000000 ]---

which requires an explanation.

When a port offloads a bridge, the switch must be reset to change
the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
sja1105_static_config_reload()). When the port leaves a VLAN-aware
bridge, it must also be reset for the same reason: it is returning
to operation as a VLAN-unaware standalone port.

sja1105_static_config_reload() triggers the phylink link replay helpers.

Because sja1105 is a switch, it has multiple user ports. During unbind,
ports are torn down one by one in dsa_switch_teardown_ports() ->
dsa_port_teardown() -> dsa_user_destroy().

The crash happens when the first user port is not part of the VLAN-aware
bridge, but any other user port is.

Tearing down the first user port causes phylink_destroy() to be called
on dp->pl, and this pointer to be set to NULL. Then, when the second
user port is torn down, this was offloading a VLAN-aware bridge port, so
indirectly it will trigger sja1105_static_config_reload().

The latter function iterates using dsa_switch_for_each_available_port(),
and unconditionally dereferences dp->pl, including for the
aforementioned torn down previous port, and passes that to phylink.
This is where the NULL pointer is coming from.

There are multiple levels at which this could be avoided:
- add an "if (dp->pl)" in sja1105_static_config_reload()
- make the phylink replay helpers NULL-tolerant
- mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
  has run, such that subsequent dsa_switch_for_each_available_port()
  iterations skip them
- disconnect the entire switch at once from switchdev and
  NETDEV_CHANGEUPPER events while unbinding, not just port by port,
  likely using a "ds->unbinding = true" mechanism or similar

however options 3 and 4 are quite heavy and might have side effects.
Although 2 allows to keep the driver simpler, the phylink API it not
NULL-tolerant in general and is not responsible for the NULL pointer
(this is something done by dsa_port_phylink_destroy()). So I went
with 1.

Functionally speaking, skipping the replay helpers for ports without
a phylink instance is fine, because that only happens during driver
removal (an operation which cannot be cancelled). The ports are not
required to work (although they probably still will - untested
assumption - as long as we don't overwrite the last port speed with
SJA1105_SPEED_AUTO).

Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2: select option 1 instead of 2

v1 link:
https://lore.kernel.org/netdev/20260205192344.21797-1-vladimir.oltean@nxp.com/
---
 drivers/net/dsa/sja1105/sja1105_main.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 71a817c07a90..94d17b06da40 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -2278,6 +2278,12 @@ int sja1105_static_config_reload(struct sja1105_private *priv,
 	 * change it through the dynamic interface later.
 	 */
 	dsa_switch_for_each_available_port(dp, ds) {
+		/* May be called during unbind when we unoffload a VLAN-aware
+		 * bridge from port 1 while port 0 was already torn down
+		 */
+		if (!dp->pl)
+			continue;
+
 		phylink_replay_link_begin(dp->pl);
 		mac[dp->index].speed = priv->info->port_speed[SJA1105_SPEED_AUTO];
 	}
@@ -2334,7 +2340,8 @@ int sja1105_static_config_reload(struct sja1105_private *priv,
 	}
 
 	dsa_switch_for_each_available_port(dp, ds)
-		phylink_replay_link_end(dp->pl);
+		if (dp->pl)
+			phylink_replay_link_end(dp->pl);
 
 	rc = sja1105_reload_cbs(priv);
 	if (rc < 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
  2026-02-18 16:05 [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance Vladimir Oltean
@ 2026-02-18 16:08 ` Russell King (Oracle)
  2026-02-18 16:10   ` Vladimir Oltean
  2026-02-18 16:53 ` Andrew Lunn
  2026-02-19 23:00 ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 7+ messages in thread
From: Russell King (Oracle) @ 2026-02-18 16:08 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

On Wed, Feb 18, 2026 at 06:05:51PM +0200, Vladimir Oltean wrote:
> There are multiple levels at which this could be avoided:
> - add an "if (dp->pl)" in sja1105_static_config_reload()
> - make the phylink replay helpers NULL-tolerant
> - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
>   has run, such that subsequent dsa_switch_for_each_available_port()
>   iterations skip them
> - disconnect the entire switch at once from switchdev and
>   NETDEV_CHANGEUPPER events while unbinding, not just port by port,
>   likely using a "ds->unbinding = true" mechanism or similar
> 
> however options 3 and 4 are quite heavy and might have side effects.
> Although 2 allows to keep the driver simpler, the phylink API it not
> NULL-tolerant in general and is not responsible for the NULL pointer
> (this is something done by dsa_port_phylink_destroy()). So I went
> with 1.

^^^^^^^^

> v1->v2: select option 1 instead of 2

Given this, I think the paragraph above needs updating.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
  2026-02-18 16:08 ` Russell King (Oracle)
@ 2026-02-18 16:10   ` Vladimir Oltean
  2026-02-18 16:21     ` Russell King (Oracle)
  0 siblings, 1 reply; 7+ messages in thread
From: Vladimir Oltean @ 2026-02-18 16:10 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

On Wed, Feb 18, 2026 at 04:08:17PM +0000, Russell King (Oracle) wrote:
> On Wed, Feb 18, 2026 at 06:05:51PM +0200, Vladimir Oltean wrote:
> > There are multiple levels at which this could be avoided:
> > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > - make the phylink replay helpers NULL-tolerant
> > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> >   has run, such that subsequent dsa_switch_for_each_available_port()
> >   iterations skip them
> > - disconnect the entire switch at once from switchdev and
> >   NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> >   likely using a "ds->unbinding = true" mechanism or similar
> > 
> > however options 3 and 4 are quite heavy and might have side effects.
> > Although 2 allows to keep the driver simpler, the phylink API it not
> > NULL-tolerant in general and is not responsible for the NULL pointer
> > (this is something done by dsa_port_phylink_destroy()). So I went
> > with 1.
> 
> ^^^^^^^^
> 
> > v1->v2: select option 1 instead of 2
> 
> Given this, I think the paragraph above needs updating.

Sorry, I don't understand what needs updating.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
  2026-02-18 16:10   ` Vladimir Oltean
@ 2026-02-18 16:21     ` Russell King (Oracle)
  2026-02-18 16:39       ` Vladimir Oltean
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King (Oracle) @ 2026-02-18 16:21 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

On Wed, Feb 18, 2026 at 06:10:37PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 18, 2026 at 04:08:17PM +0000, Russell King (Oracle) wrote:
> > On Wed, Feb 18, 2026 at 06:05:51PM +0200, Vladimir Oltean wrote:
> > > There are multiple levels at which this could be avoided:
> > > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > > - make the phylink replay helpers NULL-tolerant
> > > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> > >   has run, such that subsequent dsa_switch_for_each_available_port()
> > >   iterations skip them
> > > - disconnect the entire switch at once from switchdev and
> > >   NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> > >   likely using a "ds->unbinding = true" mechanism or similar
> > > 
> > > however options 3 and 4 are quite heavy and might have side effects.
> > > Although 2 allows to keep the driver simpler, the phylink API it not
> > > NULL-tolerant in general and is not responsible for the NULL pointer
> > > (this is something done by dsa_port_phylink_destroy()). So I went
> > > with 1.
> > 
> > ^^^^^^^^
> > 
> > > v1->v2: select option 1 instead of 2
> > 
> > Given this, I think the paragraph above needs updating.
> 
> Sorry, I don't understand what needs updating.

Oh, you rearranged the options, which makes this changelog comment
wrong. You're still going with option 1, but you've swapped what was
option 1 and option 2.

So, the changelog comment should be:

v1->v2: implement option 2 in the original submission, swapping their
  order so it becomes option 1.

since "select option 1 instead of 2" is ambiguous (obviously) because
it doesn't state whether these refer to the initial patch or this
patch. I assumed they were referring to the options on this patch,
that the options hadn't changed, and thus "So, I went with option 1"
was an error.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
  2026-02-18 16:21     ` Russell King (Oracle)
@ 2026-02-18 16:39       ` Vladimir Oltean
  0 siblings, 0 replies; 7+ messages in thread
From: Vladimir Oltean @ 2026-02-18 16:39 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: netdev, Andrew Lunn, Heiner Kallweit, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

On Wed, Feb 18, 2026 at 04:21:53PM +0000, Russell King (Oracle) wrote:
> > > > v1->v2: select option 1 instead of 2
> > > 
> > > Given this, I think the paragraph above needs updating.
> > 
> > Sorry, I don't understand what needs updating.
> 
> Oh, you rearranged the options, which makes this changelog comment
> wrong. You're still going with option 1, but you've swapped what was
> option 1 and option 2.
> 
> So, the changelog comment should be:
> 
> v1->v2: implement option 2 in the original submission, swapping their
>   order so it becomes option 1.
> 
> since "select option 1 instead of 2" is ambiguous (obviously) because
> it doesn't state whether these refer to the initial patch or this
> patch. I assumed they were referring to the options on this patch,
> that the options hadn't changed, and thus "So, I went with option 1"
> was an error.

I didn't rearrange the options... They are still the same in v1 as well.
Why would have Andrew said "I prefer option 1", and you as well?
https://lore.kernel.org/netdev/877b4780-9f99-478c-82fa-a32817d72004@lunn.ch/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
  2026-02-18 16:05 [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance Vladimir Oltean
  2026-02-18 16:08 ` Russell King (Oracle)
@ 2026-02-18 16:53 ` Andrew Lunn
  2026-02-19 23:00 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 7+ messages in thread
From: Andrew Lunn @ 2026-02-18 16:53 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Heiner Kallweit, Russell King, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

> Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

For the code change:

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

I will leave open the discussion about the commit message.

    Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
  2026-02-18 16:05 [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance Vladimir Oltean
  2026-02-18 16:08 ` Russell King (Oracle)
  2026-02-18 16:53 ` Andrew Lunn
@ 2026-02-19 23:00 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-02-19 23:00 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni,
	linux-kernel

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 18 Feb 2026 18:05:51 +0200 you wrote:
> There is a crash when unbinding the sja1105 driver under special
> circumstances:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> Call trace:
> phylink_run_resolve_and_disable+0x10/0x90
> sja1105_static_config_reload+0xc0/0x410
> sja1105_vlan_filtering+0x100/0x140
> dsa_port_vlan_filtering+0x13c/0x368
> dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> dsa_port_bridge_leave+0x130/0x248
> dsa_user_changeupper.part.0+0x74/0x158
> dsa_user_netdevice_event+0x50c/0xa50
> notifier_call_chain+0x78/0x148
> raw_notifier_call_chain+0x20/0x38
> call_netdevice_notifiers_info+0x58/0xa8
> __netdev_upper_dev_unlink+0xac/0x220
> netdev_upper_dev_unlink+0x38/0x70
> del_nbp+0x1a4/0x320
> br_del_if+0x3c/0xd8
> br_device_event+0xf8/0x2d8
> notifier_call_chain+0x78/0x148
> raw_notifier_call_chain+0x20/0x38
> call_netdevice_notifiers_info+0x58/0xa8
> unregister_netdevice_many_notify+0x314/0x848
> unregister_netdevice_queue+0xe8/0xf8
> dsa_user_destroy+0x50/0xa8
> dsa_port_teardown+0x80/0x98
> dsa_switch_teardown_ports+0x4c/0xb8
> dsa_switch_deinit+0x94/0xb8
> dsa_switch_put_tree+0x2c/0xc0
> dsa_unregister_switch+0x38/0x60
> sja1105_remove+0x24/0x40
> spi_remove+0x38/0x60
> device_remove+0x54/0x90
> device_release_driver_internal+0x1d4/0x230
> device_driver_detach+0x20/0x38
> unbind_store+0xbc/0xc8
> 
> [...]

Here is the summary with links:
  - [v2,net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance
    https://git.kernel.org/netdev/net/c/bfd264fbbbca

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-19 22:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 16:05 [PATCH v2 net] net: dsa: sja1105: protect link replay helpers against NULL phylink instance Vladimir Oltean
2026-02-18 16:08 ` Russell King (Oracle)
2026-02-18 16:10   ` Vladimir Oltean
2026-02-18 16:21     ` Russell King (Oracle)
2026-02-18 16:39       ` Vladimir Oltean
2026-02-18 16:53 ` Andrew Lunn
2026-02-19 23:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox