* [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
@ 2026-02-05 19:23 Vladimir Oltean
2026-02-17 8:22 ` Paolo Abeni
0 siblings, 1 reply; 4+ messages in thread
From: Vladimir Oltean @ 2026-02-05 19:23 UTC (permalink / raw)
To: netdev
Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
There is a crash when unbinding the sja1105 driver under special
circumstances:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
Call trace:
phylink_run_resolve_and_disable+0x10/0x90
sja1105_static_config_reload+0xc0/0x410
sja1105_vlan_filtering+0x100/0x140
dsa_port_vlan_filtering+0x13c/0x368
dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
dsa_port_bridge_leave+0x130/0x248
dsa_user_changeupper.part.0+0x74/0x158
dsa_user_netdevice_event+0x50c/0xa50
notifier_call_chain+0x78/0x148
raw_notifier_call_chain+0x20/0x38
call_netdevice_notifiers_info+0x58/0xa8
__netdev_upper_dev_unlink+0xac/0x220
netdev_upper_dev_unlink+0x38/0x70
del_nbp+0x1a4/0x320
br_del_if+0x3c/0xd8
br_device_event+0xf8/0x2d8
notifier_call_chain+0x78/0x148
raw_notifier_call_chain+0x20/0x38
call_netdevice_notifiers_info+0x58/0xa8
unregister_netdevice_many_notify+0x314/0x848
unregister_netdevice_queue+0xe8/0xf8
dsa_user_destroy+0x50/0xa8
dsa_port_teardown+0x80/0x98
dsa_switch_teardown_ports+0x4c/0xb8
dsa_switch_deinit+0x94/0xb8
dsa_switch_put_tree+0x2c/0xc0
dsa_unregister_switch+0x38/0x60
sja1105_remove+0x24/0x40
spi_remove+0x38/0x60
device_remove+0x54/0x90
device_release_driver_internal+0x1d4/0x230
device_driver_detach+0x20/0x38
unbind_store+0xbc/0xc8
---[ end trace 0000000000000000 ]---
which requires an explanation.
When a port offloads a bridge, the switch must be reset to change
the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
sja1105_static_config_reload()). When the port leaves a VLAN-aware
bridge, it must also be reset for the same reason: it is returning
to operation as a VLAN-unaware standalone port.
sja1105_static_config_reload() triggers the phylink link replay helpers.
Because sja1105 is a switch, it has multiple user ports. During unbind,
ports are torn down one by one in dsa_switch_teardown_ports() ->
dsa_port_teardown() -> dsa_user_destroy().
The crash happens when the numerically first user port is not part of
the VLAN-aware bridge, but any other user port is.
Tearing down the first user port causes phylink_destroy() to be called
on dp->pl, and this pointer to be set to NULL. Then, when the second
user port is torn down, this was offloading a VLAN-aware bridge port, so
indirectly it will trigger sja1105_static_config_reload().
The latter function iterates using dsa_switch_for_each_available_port(),
and unconditionally dereferences dp->pl, including for the
aforementioned torn down previous port, and passes that to phylink.
This is where the NULL pointer is coming from.
There are multiple levels at which this could be avoided:
- add an "if (dp->pl)" in sja1105_static_config_reload()
- make the phylink replay helpers NULL-tolerant
- mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
has run, such that subsequent dsa_switch_for_each_available_port()
iterations skip them
- disconnect the entire switch at once from switchdev and
NETDEV_CHANGEUPPER events while unbinding, not just port by port,
likely using a "ds->unbinding = true" mechanism or similar
however options 3 and 4 are quite heavy and might have side effects,
option 1 is very unassuming and option 2 seems a more elegant variant
of 1, given the fact that sja1105 is the only user of these phylink
replay helpers. It allows to keep the driver simple and is the option
I went with.
Functionally speaking, transforming the replay helpers into no-ops for
ports without a phylink instance is fine, because that only happens
during driver removal (an operation which cannot be cancelled). The
ports are not required to work.
Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
drivers/net/phy/phylink.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index e1f01d7fc4da..c09c357d0fbd 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -4379,6 +4379,9 @@ void phylink_replay_link_begin(struct phylink *pl)
{
ASSERT_RTNL();
+ if (!pl)
+ return;
+
phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_REPLAY);
}
EXPORT_SYMBOL_GPL(phylink_replay_link_begin);
@@ -4402,6 +4405,9 @@ void phylink_replay_link_end(struct phylink *pl)
{
ASSERT_RTNL();
+ if (!pl)
+ return;
+
if (WARN(!test_bit(PHYLINK_DISABLE_REPLAY,
&pl->phylink_disable_state),
"phylink_replay_link_end() called without a prior phylink_replay_link_begin()\n"))
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
2026-02-05 19:23 [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance Vladimir Oltean
@ 2026-02-17 8:22 ` Paolo Abeni
2026-02-17 13:51 ` Andrew Lunn
0 siblings, 1 reply; 4+ messages in thread
From: Paolo Abeni @ 2026-02-17 8:22 UTC (permalink / raw)
To: Vladimir Oltean, netdev
Cc: Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
Eric Dumazet, Jakub Kicinski, linux-kernel
On 2/5/26 8:23 PM, Vladimir Oltean wrote:
> There is a crash when unbinding the sja1105 driver under special
> circumstances:
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> Call trace:
> phylink_run_resolve_and_disable+0x10/0x90
> sja1105_static_config_reload+0xc0/0x410
> sja1105_vlan_filtering+0x100/0x140
> dsa_port_vlan_filtering+0x13c/0x368
> dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> dsa_port_bridge_leave+0x130/0x248
> dsa_user_changeupper.part.0+0x74/0x158
> dsa_user_netdevice_event+0x50c/0xa50
> notifier_call_chain+0x78/0x148
> raw_notifier_call_chain+0x20/0x38
> call_netdevice_notifiers_info+0x58/0xa8
> __netdev_upper_dev_unlink+0xac/0x220
> netdev_upper_dev_unlink+0x38/0x70
> del_nbp+0x1a4/0x320
> br_del_if+0x3c/0xd8
> br_device_event+0xf8/0x2d8
> notifier_call_chain+0x78/0x148
> raw_notifier_call_chain+0x20/0x38
> call_netdevice_notifiers_info+0x58/0xa8
> unregister_netdevice_many_notify+0x314/0x848
> unregister_netdevice_queue+0xe8/0xf8
> dsa_user_destroy+0x50/0xa8
> dsa_port_teardown+0x80/0x98
> dsa_switch_teardown_ports+0x4c/0xb8
> dsa_switch_deinit+0x94/0xb8
> dsa_switch_put_tree+0x2c/0xc0
> dsa_unregister_switch+0x38/0x60
> sja1105_remove+0x24/0x40
> spi_remove+0x38/0x60
> device_remove+0x54/0x90
> device_release_driver_internal+0x1d4/0x230
> device_driver_detach+0x20/0x38
> unbind_store+0xbc/0xc8
> ---[ end trace 0000000000000000 ]---
>
> which requires an explanation.
>
> When a port offloads a bridge, the switch must be reset to change
> the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
> sja1105_static_config_reload()). When the port leaves a VLAN-aware
> bridge, it must also be reset for the same reason: it is returning
> to operation as a VLAN-unaware standalone port.
>
> sja1105_static_config_reload() triggers the phylink link replay helpers.
>
> Because sja1105 is a switch, it has multiple user ports. During unbind,
> ports are torn down one by one in dsa_switch_teardown_ports() ->
> dsa_port_teardown() -> dsa_user_destroy().
>
> The crash happens when the numerically first user port is not part of
> the VLAN-aware bridge, but any other user port is.
>
> Tearing down the first user port causes phylink_destroy() to be called
> on dp->pl, and this pointer to be set to NULL. Then, when the second
> user port is torn down, this was offloading a VLAN-aware bridge port, so
> indirectly it will trigger sja1105_static_config_reload().
>
> The latter function iterates using dsa_switch_for_each_available_port(),
> and unconditionally dereferences dp->pl, including for the
> aforementioned torn down previous port, and passes that to phylink.
> This is where the NULL pointer is coming from.
>
> There are multiple levels at which this could be avoided:
> - add an "if (dp->pl)" in sja1105_static_config_reload()
> - make the phylink replay helpers NULL-tolerant
> - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> has run, such that subsequent dsa_switch_for_each_available_port()
> iterations skip them
> - disconnect the entire switch at once from switchdev and
> NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> likely using a "ds->unbinding = true" mechanism or similar
>
> however options 3 and 4 are quite heavy and might have side effects,
> option 1 is very unassuming and option 2 seems a more elegant variant
> of 1, given the fact that sja1105 is the only user of these phylink
> replay helpers. It allows to keep the driver simple and is the option
> I went with.
>
> Functionally speaking, transforming the replay helpers into no-ops for
> ports without a phylink instance is fine, because that only happens
> during driver removal (an operation which cannot be cancelled). The
> ports are not required to work.
>
> Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
I think this patch could land on current net, but it would be nice an
ack from phylib SMEs.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
2026-02-17 8:22 ` Paolo Abeni
@ 2026-02-17 13:51 ` Andrew Lunn
2026-02-17 15:48 ` Russell King (Oracle)
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Lunn @ 2026-02-17 13:51 UTC (permalink / raw)
To: Paolo Abeni
Cc: Vladimir Oltean, netdev, Heiner Kallweit, Russell King,
David S. Miller, Eric Dumazet, Jakub Kicinski, linux-kernel
On Tue, Feb 17, 2026 at 09:22:25AM +0100, Paolo Abeni wrote:
> On 2/5/26 8:23 PM, Vladimir Oltean wrote:
> > There is a crash when unbinding the sja1105 driver under special
> > circumstances:
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> > Call trace:
> > phylink_run_resolve_and_disable+0x10/0x90
> > sja1105_static_config_reload+0xc0/0x410
> > sja1105_vlan_filtering+0x100/0x140
> > dsa_port_vlan_filtering+0x13c/0x368
> > dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> > dsa_port_bridge_leave+0x130/0x248
> > dsa_user_changeupper.part.0+0x74/0x158
> > dsa_user_netdevice_event+0x50c/0xa50
> > notifier_call_chain+0x78/0x148
> > raw_notifier_call_chain+0x20/0x38
> > call_netdevice_notifiers_info+0x58/0xa8
> > __netdev_upper_dev_unlink+0xac/0x220
> > netdev_upper_dev_unlink+0x38/0x70
> > del_nbp+0x1a4/0x320
> > br_del_if+0x3c/0xd8
> > br_device_event+0xf8/0x2d8
> > notifier_call_chain+0x78/0x148
> > raw_notifier_call_chain+0x20/0x38
> > call_netdevice_notifiers_info+0x58/0xa8
> > unregister_netdevice_many_notify+0x314/0x848
> > unregister_netdevice_queue+0xe8/0xf8
> > dsa_user_destroy+0x50/0xa8
> > dsa_port_teardown+0x80/0x98
> > dsa_switch_teardown_ports+0x4c/0xb8
> > dsa_switch_deinit+0x94/0xb8
> > dsa_switch_put_tree+0x2c/0xc0
> > dsa_unregister_switch+0x38/0x60
> > sja1105_remove+0x24/0x40
> > spi_remove+0x38/0x60
> > device_remove+0x54/0x90
> > device_release_driver_internal+0x1d4/0x230
> > device_driver_detach+0x20/0x38
> > unbind_store+0xbc/0xc8
> > ---[ end trace 0000000000000000 ]---
> >
> > which requires an explanation.
> >
> > When a port offloads a bridge, the switch must be reset to change
> > the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
> > sja1105_static_config_reload()). When the port leaves a VLAN-aware
> > bridge, it must also be reset for the same reason: it is returning
> > to operation as a VLAN-unaware standalone port.
> >
> > sja1105_static_config_reload() triggers the phylink link replay helpers.
> >
> > Because sja1105 is a switch, it has multiple user ports. During unbind,
> > ports are torn down one by one in dsa_switch_teardown_ports() ->
> > dsa_port_teardown() -> dsa_user_destroy().
> >
> > The crash happens when the numerically first user port is not part of
> > the VLAN-aware bridge, but any other user port is.
> >
> > Tearing down the first user port causes phylink_destroy() to be called
> > on dp->pl, and this pointer to be set to NULL. Then, when the second
> > user port is torn down, this was offloading a VLAN-aware bridge port, so
> > indirectly it will trigger sja1105_static_config_reload().
> >
> > The latter function iterates using dsa_switch_for_each_available_port(),
> > and unconditionally dereferences dp->pl, including for the
> > aforementioned torn down previous port, and passes that to phylink.
> > This is where the NULL pointer is coming from.
> >
> > There are multiple levels at which this could be avoided:
> > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > - make the phylink replay helpers NULL-tolerant
> > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> > has run, such that subsequent dsa_switch_for_each_available_port()
> > iterations skip them
> > - disconnect the entire switch at once from switchdev and
> > NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> > likely using a "ds->unbinding = true" mechanism or similar
> >
> > however options 3 and 4 are quite heavy and might have side effects,
> > option 1 is very unassuming and option 2 seems a more elegant variant
> > of 1, given the fact that sja1105 is the only user of these phylink
> > replay helpers. It allows to keep the driver simple and is the option
> > I went with.
> >
> > Functionally speaking, transforming the replay helpers into no-ops for
> > ports without a phylink instance is fine, because that only happens
> > during driver removal (an operation which cannot be cancelled). The
> > ports are not required to work.
> >
> > Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> I think this patch could land on current net, but it would be nice an
> ack from phylib SMEs.
Sorry, weekend away.
I prefer option 1. I _think_ option 2 only works because the MAC
driver set dp->pl to NULL. phylink is not responsible for the NULL, so
it seems odd for phylink to assume there is a NULL. Only the MAC
driver knows if the MAC driver has set dp->pl to NULL.
Andrew
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
2026-02-17 13:51 ` Andrew Lunn
@ 2026-02-17 15:48 ` Russell King (Oracle)
0 siblings, 0 replies; 4+ messages in thread
From: Russell King (Oracle) @ 2026-02-17 15:48 UTC (permalink / raw)
To: Andrew Lunn
Cc: Paolo Abeni, Vladimir Oltean, netdev, Heiner Kallweit,
David S. Miller, Eric Dumazet, Jakub Kicinski, linux-kernel
On Tue, Feb 17, 2026 at 02:51:25PM +0100, Andrew Lunn wrote:
> On Tue, Feb 17, 2026 at 09:22:25AM +0100, Paolo Abeni wrote:
> > On 2/5/26 8:23 PM, Vladimir Oltean wrote:
> > > There is a crash when unbinding the sja1105 driver under special
> > > circumstances:
> > >
> > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> > > Call trace:
> > > phylink_run_resolve_and_disable+0x10/0x90
> > > sja1105_static_config_reload+0xc0/0x410
> > > sja1105_vlan_filtering+0x100/0x140
> > > dsa_port_vlan_filtering+0x13c/0x368
> > > dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> > > dsa_port_bridge_leave+0x130/0x248
> > > dsa_user_changeupper.part.0+0x74/0x158
> > > dsa_user_netdevice_event+0x50c/0xa50
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > __netdev_upper_dev_unlink+0xac/0x220
> > > netdev_upper_dev_unlink+0x38/0x70
> > > del_nbp+0x1a4/0x320
> > > br_del_if+0x3c/0xd8
> > > br_device_event+0xf8/0x2d8
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > unregister_netdevice_many_notify+0x314/0x848
> > > unregister_netdevice_queue+0xe8/0xf8
> > > dsa_user_destroy+0x50/0xa8
> > > dsa_port_teardown+0x80/0x98
> > > dsa_switch_teardown_ports+0x4c/0xb8
> > > dsa_switch_deinit+0x94/0xb8
> > > dsa_switch_put_tree+0x2c/0xc0
> > > dsa_unregister_switch+0x38/0x60
> > > sja1105_remove+0x24/0x40
> > > spi_remove+0x38/0x60
> > > device_remove+0x54/0x90
> > > device_release_driver_internal+0x1d4/0x230
> > > device_driver_detach+0x20/0x38
> > > unbind_store+0xbc/0xc8
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > which requires an explanation.
> > >
> > > When a port offloads a bridge, the switch must be reset to change
> > > the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
> > > sja1105_static_config_reload()). When the port leaves a VLAN-aware
> > > bridge, it must also be reset for the same reason: it is returning
> > > to operation as a VLAN-unaware standalone port.
> > >
> > > sja1105_static_config_reload() triggers the phylink link replay helpers.
> > >
> > > Because sja1105 is a switch, it has multiple user ports. During unbind,
> > > ports are torn down one by one in dsa_switch_teardown_ports() ->
> > > dsa_port_teardown() -> dsa_user_destroy().
> > >
> > > The crash happens when the numerically first user port is not part of
> > > the VLAN-aware bridge, but any other user port is.
> > >
> > > Tearing down the first user port causes phylink_destroy() to be called
> > > on dp->pl, and this pointer to be set to NULL. Then, when the second
> > > user port is torn down, this was offloading a VLAN-aware bridge port, so
> > > indirectly it will trigger sja1105_static_config_reload().
> > >
> > > The latter function iterates using dsa_switch_for_each_available_port(),
> > > and unconditionally dereferences dp->pl, including for the
> > > aforementioned torn down previous port, and passes that to phylink.
> > > This is where the NULL pointer is coming from.
> > >
> > > There are multiple levels at which this could be avoided:
> > > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > > - make the phylink replay helpers NULL-tolerant
> > > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> > > has run, such that subsequent dsa_switch_for_each_available_port()
> > > iterations skip them
> > > - disconnect the entire switch at once from switchdev and
> > > NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> > > likely using a "ds->unbinding = true" mechanism or similar
> > >
> > > however options 3 and 4 are quite heavy and might have side effects,
> > > option 1 is very unassuming and option 2 seems a more elegant variant
> > > of 1, given the fact that sja1105 is the only user of these phylink
> > > replay helpers. It allows to keep the driver simple and is the option
> > > I went with.
> > >
> > > Functionally speaking, transforming the replay helpers into no-ops for
> > > ports without a phylink instance is fine, because that only happens
> > > during driver removal (an operation which cannot be cancelled). The
> > > ports are not required to work.
> > >
> > > Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > I think this patch could land on current net, but it would be nice an
> > ack from phylib SMEs.
>
> Sorry, weekend away.
>
> I prefer option 1. I _think_ option 2 only works because the MAC
> driver set dp->pl to NULL. phylink is not responsible for the NULL, so
> it seems odd for phylink to assume there is a NULL. Only the MAC
> driver knows if the MAC driver has set dp->pl to NULL.
I also prefer option 1. Currently, none of phylink's driver API permits
struct phylink to be NULL, and I'd prefer that to remain the case for
consistency.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-17 15:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-05 19:23 [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance Vladimir Oltean
2026-02-17 8:22 ` Paolo Abeni
2026-02-17 13:51 ` Andrew Lunn
2026-02-17 15:48 ` Russell King (Oracle)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox