public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Andrew Lunn <andrew@lunn.ch>
Cc: Paolo Abeni <pabeni@redhat.com>,
	Vladimir Oltean <vladimir.oltean@nxp.com>,
	netdev@vger.kernel.org, Heiner Kallweit <hkallweit1@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
Date: Tue, 17 Feb 2026 15:48:24 +0000	[thread overview]
Message-ID: <aZSNyIhG6oTNSTa9@shell.armlinux.org.uk> (raw)
In-Reply-To: <877b4780-9f99-478c-82fa-a32817d72004@lunn.ch>

On Tue, Feb 17, 2026 at 02:51:25PM +0100, Andrew Lunn wrote:
> On Tue, Feb 17, 2026 at 09:22:25AM +0100, Paolo Abeni wrote:
> > On 2/5/26 8:23 PM, Vladimir Oltean wrote:
> > > There is a crash when unbinding the sja1105 driver under special
> > > circumstances:
> > > 
> > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> > > Call trace:
> > > phylink_run_resolve_and_disable+0x10/0x90
> > > sja1105_static_config_reload+0xc0/0x410
> > > sja1105_vlan_filtering+0x100/0x140
> > > dsa_port_vlan_filtering+0x13c/0x368
> > > dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> > > dsa_port_bridge_leave+0x130/0x248
> > > dsa_user_changeupper.part.0+0x74/0x158
> > > dsa_user_netdevice_event+0x50c/0xa50
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > __netdev_upper_dev_unlink+0xac/0x220
> > > netdev_upper_dev_unlink+0x38/0x70
> > > del_nbp+0x1a4/0x320
> > > br_del_if+0x3c/0xd8
> > > br_device_event+0xf8/0x2d8
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > unregister_netdevice_many_notify+0x314/0x848
> > > unregister_netdevice_queue+0xe8/0xf8
> > > dsa_user_destroy+0x50/0xa8
> > > dsa_port_teardown+0x80/0x98
> > > dsa_switch_teardown_ports+0x4c/0xb8
> > > dsa_switch_deinit+0x94/0xb8
> > > dsa_switch_put_tree+0x2c/0xc0
> > > dsa_unregister_switch+0x38/0x60
> > > sja1105_remove+0x24/0x40
> > > spi_remove+0x38/0x60
> > > device_remove+0x54/0x90
> > > device_release_driver_internal+0x1d4/0x230
> > > device_driver_detach+0x20/0x38
> > > unbind_store+0xbc/0xc8
> > > ---[ end trace 0000000000000000 ]---
> > > 
> > > which requires an explanation.
> > > 
> > > When a port offloads a bridge, the switch must be reset to change
> > > the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
> > > sja1105_static_config_reload()). When the port leaves a VLAN-aware
> > > bridge, it must also be reset for the same reason: it is returning
> > > to operation as a VLAN-unaware standalone port.
> > > 
> > > sja1105_static_config_reload() triggers the phylink link replay helpers.
> > > 
> > > Because sja1105 is a switch, it has multiple user ports. During unbind,
> > > ports are torn down one by one in dsa_switch_teardown_ports() ->
> > > dsa_port_teardown() -> dsa_user_destroy().
> > > 
> > > The crash happens when the numerically first user port is not part of
> > > the VLAN-aware bridge, but any other user port is.
> > > 
> > > Tearing down the first user port causes phylink_destroy() to be called
> > > on dp->pl, and this pointer to be set to NULL. Then, when the second
> > > user port is torn down, this was offloading a VLAN-aware bridge port, so
> > > indirectly it will trigger sja1105_static_config_reload().
> > > 
> > > The latter function iterates using dsa_switch_for_each_available_port(),
> > > and unconditionally dereferences dp->pl, including for the
> > > aforementioned torn down previous port, and passes that to phylink.
> > > This is where the NULL pointer is coming from.
> > > 
> > > There are multiple levels at which this could be avoided:
> > > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > > - make the phylink replay helpers NULL-tolerant
> > > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> > >   has run, such that subsequent dsa_switch_for_each_available_port()
> > >   iterations skip them
> > > - disconnect the entire switch at once from switchdev and
> > >   NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> > >   likely using a "ds->unbinding = true" mechanism or similar
> > > 
> > > however options 3 and 4 are quite heavy and might have side effects,
> > > option 1 is very unassuming and option 2 seems a more elegant variant
> > > of 1, given the fact that sja1105 is the only user of these phylink
> > > replay helpers. It allows to keep the driver simple and is the option
> > > I went with.
> > > 
> > > Functionally speaking, transforming the replay helpers into no-ops for
> > > ports without a phylink instance is fine, because that only happens
> > > during driver removal (an operation which cannot be cancelled). The
> > > ports are not required to work.
> > > 
> > > Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > I think this patch could land on current net, but it would be nice an
> > ack from phylib SMEs.
> 
> Sorry, weekend away.
> 
> I prefer option 1. I _think_ option 2 only works because the MAC
> driver set dp->pl to NULL. phylink is not responsible for the NULL, so
> it seems odd for phylink to assume there is a NULL. Only the MAC
> driver knows if the MAC driver has set dp->pl to NULL.

I also prefer option 1. Currently, none of phylink's driver API permits
struct phylink to be NULL, and I'd prefer that to remain the case for
consistency.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

      reply	other threads:[~2026-02-17 15:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 19:23 [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance Vladimir Oltean
2026-02-17  8:22 ` Paolo Abeni
2026-02-17 13:51   ` Andrew Lunn
2026-02-17 15:48     ` Russell King (Oracle) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZSNyIhG6oTNSTa9@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=vladimir.oltean@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox