All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Andrew Lunn <andrew@lunn.ch>
Cc: Paolo Abeni <pabeni@redhat.com>,
	Vladimir Oltean <vladimir.oltean@nxp.com>,
	netdev@vger.kernel.org, Heiner Kallweit <hkallweit1@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
Date: Tue, 17 Feb 2026 15:48:24 +0000	[thread overview]
Message-ID: <aZSNyIhG6oTNSTa9@shell.armlinux.org.uk> (raw)
In-Reply-To: <877b4780-9f99-478c-82fa-a32817d72004@lunn.ch>

On Tue, Feb 17, 2026 at 02:51:25PM +0100, Andrew Lunn wrote:
> On Tue, Feb 17, 2026 at 09:22:25AM +0100, Paolo Abeni wrote:
> > On 2/5/26 8:23 PM, Vladimir Oltean wrote:
> > > There is a crash when unbinding the sja1105 driver under special
> > > circumstances:
> > > 
> > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> > > Call trace:
> > > phylink_run_resolve_and_disable+0x10/0x90
> > > sja1105_static_config_reload+0xc0/0x410
> > > sja1105_vlan_filtering+0x100/0x140
> > > dsa_port_vlan_filtering+0x13c/0x368
> > > dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> > > dsa_port_bridge_leave+0x130/0x248
> > > dsa_user_changeupper.part.0+0x74/0x158
> > > dsa_user_netdevice_event+0x50c/0xa50
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > __netdev_upper_dev_unlink+0xac/0x220
> > > netdev_upper_dev_unlink+0x38/0x70
> > > del_nbp+0x1a4/0x320
> > > br_del_if+0x3c/0xd8
> > > br_device_event+0xf8/0x2d8
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > unregister_netdevice_many_notify+0x314/0x848
> > > unregister_netdevice_queue+0xe8/0xf8
> > > dsa_user_destroy+0x50/0xa8
> > > dsa_port_teardown+0x80/0x98
> > > dsa_switch_teardown_ports+0x4c/0xb8
> > > dsa_switch_deinit+0x94/0xb8
> > > dsa_switch_put_tree+0x2c/0xc0
> > > dsa_unregister_switch+0x38/0x60
> > > sja1105_remove+0x24/0x40
> > > spi_remove+0x38/0x60
> > > device_remove+0x54/0x90
> > > device_release_driver_internal+0x1d4/0x230
> > > device_driver_detach+0x20/0x38
> > > unbind_store+0xbc/0xc8
> > > ---[ end trace 0000000000000000 ]---
> > > 
> > > which requires an explanation.
> > > 
> > > When a port offloads a bridge, the switch must be reset to change
> > > the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
> > > sja1105_static_config_reload()). When the port leaves a VLAN-aware
> > > bridge, it must also be reset for the same reason: it is returning
> > > to operation as a VLAN-unaware standalone port.
> > > 
> > > sja1105_static_config_reload() triggers the phylink link replay helpers.
> > > 
> > > Because sja1105 is a switch, it has multiple user ports. During unbind,
> > > ports are torn down one by one in dsa_switch_teardown_ports() ->
> > > dsa_port_teardown() -> dsa_user_destroy().
> > > 
> > > The crash happens when the numerically first user port is not part of
> > > the VLAN-aware bridge, but any other user port is.
> > > 
> > > Tearing down the first user port causes phylink_destroy() to be called
> > > on dp->pl, and this pointer to be set to NULL. Then, when the second
> > > user port is torn down, this was offloading a VLAN-aware bridge port, so
> > > indirectly it will trigger sja1105_static_config_reload().
> > > 
> > > The latter function iterates using dsa_switch_for_each_available_port(),
> > > and unconditionally dereferences dp->pl, including for the
> > > aforementioned torn down previous port, and passes that to phylink.
> > > This is where the NULL pointer is coming from.
> > > 
> > > There are multiple levels at which this could be avoided:
> > > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > > - make the phylink replay helpers NULL-tolerant
> > > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> > >   has run, such that subsequent dsa_switch_for_each_available_port()
> > >   iterations skip them
> > > - disconnect the entire switch at once from switchdev and
> > >   NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> > >   likely using a "ds->unbinding = true" mechanism or similar
> > > 
> > > however options 3 and 4 are quite heavy and might have side effects,
> > > option 1 is very unassuming and option 2 seems a more elegant variant
> > > of 1, given the fact that sja1105 is the only user of these phylink
> > > replay helpers. It allows to keep the driver simple and is the option
> > > I went with.
> > > 
> > > Functionally speaking, transforming the replay helpers into no-ops for
> > > ports without a phylink instance is fine, because that only happens
> > > during driver removal (an operation which cannot be cancelled). The
> > > ports are not required to work.
> > > 
> > > Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > I think this patch could land on current net, but it would be nice an
> > ack from phylib SMEs.
> 
> Sorry, weekend away.
> 
> I prefer option 1. I _think_ option 2 only works because the MAC
> driver set dp->pl to NULL. phylink is not responsible for the NULL, so
> it seems odd for phylink to assume there is a NULL. Only the MAC
> driver knows if the MAC driver has set dp->pl to NULL.

I also prefer option 1. Currently, none of phylink's driver API permits
struct phylink to be NULL, and I'd prefer that to remain the case for
consistency.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

      reply	other threads:[~2026-02-17 15:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05 19:23 [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance Vladimir Oltean
2026-02-17  8:22 ` Paolo Abeni
2026-02-17 13:51   ` Andrew Lunn
2026-02-17 15:48     ` Russell King (Oracle) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZSNyIhG6oTNSTa9@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=vladimir.oltean@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.