From: Vladimir Oltean <olteanv@gmail.com>
To: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
Andrew Lunn <andrew@lunn.ch>,
netdev@vger.kernel.org
Subject: Re: Fw: [Bug 220932] New: Possible bug (use after free) on DSA driver removal
Date: Sat, 3 Jan 2026 02:24:10 +0200 [thread overview]
Message-ID: <20260103002410.brxrcajbnd2bpq5a@skbuf> (raw)
In-Reply-To: <20260102114605.3351c6eb@phoenix.local> <20260102114605.3351c6eb@phoenix.local>
Hi Luiz,
On Fri, Jan 02, 2026 at 11:46:05AM -0800, Stephen Hemminger wrote:
>
>
> Begin forwarded message:
>
> Date: Thu, 01 Jan 2026 22:56:38 +0000
> From: bugzilla-daemon@kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 220932] New: Possible bug (use after free) on DSA driver removal
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=220932
>
> Bug ID: 220932
> Summary: Possible bug (use after free) on DSA driver removal
> Product: Networking
> Version: 2.5
> Hardware: Mips32
> OS: Linux
> Status: NEW
> Severity: normal
> Priority: P3
> Component: Other
> Assignee: stephen@networkplumber.org
> Reporter: luizluca@gmail.com
> Regression: No
>
> While testing a driver patch for OpenWrt (dev), I noticed that the system
> sometimes crashes a little after I remove the module. I dropped all my patches
> and bruteforce it:
>
>
> echo 'file drivers/net/dsa/realtek/rtl8365mb.c +p' >
> /sys/kernel/debug/dynamic_debug/control; echo 'file net/dsa/* +p' >
> /sys/kernel/debug/dynamic_debug/control; rmmod rtl8365mb; echo 0 >
> /proc/sys/kernel/panic; while true; do sleep 1; insmod /tmp/rtl8365mb.ko; sleep
> 10; rmmod rtl8365mb; done
>
>
> After a couple of cycles, I got this (repeatable) crash below.
> rtl8365mb_get_tag_protocol and rtl8365mb_port_stp_state_set messages are from a
> small debug patch I added trying to trace the crash origin but it should not
> matter.
>
>
> [ 469.884379] DSA: tree 0 torn down
> [ 471.094669] rtl8365mb-mdio mdio-bus:1d: found an RTL8367S switch
> [ 471.100980] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_get_tag_protocol priv:126ea59d
> [ 471.349018] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_port_stp_state_set priv:126ea59d
> [ 471.357364] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_port_stp_state_set priv:126ea59d
> [ 471.365716] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_port_stp_state_set priv:126ea59d
> [ 471.373964] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_port_stp_state_set priv:126ea59d
> [ 471.382228] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_port_stp_state_set priv:126ea59d
> [ 471.390503] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_port_stp_state_set priv:126ea59d
> [ 471.398580] rtl8365mb-mdio mdio-bus:1d: rtl8365mb_port_change_mtu priv:126ea59d
> [ 471.647590] mtk_soc_eth 10100000.ethernet eth0: port 5 link down
> [ 471.674092] CPU 0 Unable to handle kernel paging request at virtual address 702e7660, epc == 702e7660, ra == 80001e90
> [ 471.685048] Oops[#1]:
> [ 471.687381] CPU: 0 UID: 0 PID: 7473 Comm: modprobe Tainted: G O 6.12.60 #0
> [ 471.695837] Tainted: [O]=OOT_MODULE
> [ 471.699401] Hardware name: TP-Link Archer C5 v4
> [ 471.704029] $ 0 : 00000000 00000001 81c40560 80a63cdc
> [ 471.709403] $ 4 : 00000cc0 00000001 0004c50b 82ab2f00
> [ 471.714771] $ 8 : 0004c50c 00000cc0 00000000 77e89000
> [ 471.720139] $12 : 00000003 82b8dc0c 00000001 77e8afff
> [ 471.725508] $16 : 00001173 77e89000 7f958894 00400dc1
> [ 471.730877] $20 : 8383fbf8 77e903d0 00000000 7f958730
> [ 471.736246] $24 : 00000003 8084aba8
> [ 471.741613] $28 : 81c1c000 81c1df28 00000000 80001e90
> [ 471.746982] Hi : 00000000
> [ 471.749926] Lo : 00000000
> [ 471.752868] epc : 702e7660 0x702e7660
> [ 471.756798] ra : 80001e90 work_notifysig+0x10/0x18
> [ 471.761975] Status: 1100b403 KERNEL EXL IE
> [ 471.766269] Cause : 50800008 (ExcCode 02)
> [ 471.770366] BadVA : 702e7660
> [ 471.773309] PrId : 00019650 (MIPS 24KEc)
> [ 471.777406] Modules linked in: rtl8365mb(+) rt2800soc(O) rt2800mmio(O) rt2800lib(O) pppoe ppp_async nft_fib_inet nf_flow_table_inet rt2x00mmio(O) rt2x00lib(O) pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt76x2e(O) mt76x2_common(O) mt76x02_lib(O) mt76(O) mac80211(O) cfg80211(O) slhc nfne tlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc_ccitt compat(O) i2c_dev ledtrig_usbport sha512_generic seqiv sha3_generic jitterentropy_rng drbg hmac geniv rng cmac leds_gpio tag_rtl8_4 realtek_dsa dsa_core gpio_button_hotplug(O) realtek hwmon i2c_core phylink crc32c_generic [last unloaded: rtl8365mb]
> [ 471.854523] Process modprobe (pid: 7473, threadinfo=674a8fb4, task=b017bdbf,tls=77e98dfc)
> [ 471.862981] Stack : 00000000 00000000 00000000 00000000 77e97290 00420f3877e97290 00420f10
> [ 471.871571] 00000000 00000001 00000000 77e1f644 77e89000 0000117300000000 00000000
> [ 471.880157] 0000000c 83855940 77e85000 77e77000 81b911e5 0000000181bbac60 77e85fff
> [ 471.888745] 00001173 77e89000 7f958894 00400dc1 8383fbf8 77e903d000000000 7f958730
> [ 471.897333] 81bbac60 77e556d0 00000001 00000000 77e97290 7f95845000000000 77e1f674
> [ 471.905921] ...
> [ 471.908431] Call Trace:
> [ 471.908437]
> [ 471.912653]
> [ 471.914177] Code: (Bad address in epc)
> [ 471.914177]
> [ 471.919517]
> [ 471.921240] ---[ end trace 0000000000000000 ]---
> [ 471.926052] Kernel panic - not syncing: Fatal exception
> [ 471.931404] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
>
> The RA value (80001e90 work_notifysig+0x10/0x18) indicates that the crash came
> from a notification. Maybe DSA didn't unregister/drain notifications after the
> tear down.
My reading of work_notifysig() is that this is delivering signals to
user space, completely unrelated to DSA. It is just what the return
address was at the time of the crash.
The epc == 702e7660 possibly means that the kernel tried to execute code
through a stale function pointer.
Nothing in rtl8365mb looks particularly out of place in terms of things
that could linger on after the driver is unregistered. I looked at:
- priv->user_mii_bus could host a PHY whose state machine continues to
run. But it is allocated and registered using devres.
- mb->irq cannot fire after rtl8365mb_irq_teardown()
- p->mib_work cannot get rescheduled after rtl8365mb_stats_teardown(),
because the ports are already torn down by the time the switch is torn
down, and the phylink instance which schedules the mib_work is destroyed
The only question mark right now is with the many out-of-tree modules.
If there's anything in the kernel holding a pointer to the DSA switch,
it needs to drop it when the switch driver is removed.
>
> I'm using kernel 6.12.60 (LTS) and I also didn't notice any relevant changes
> since that version. I'm just not sure if
> 2bcf4772e45adb00649a4e9cbff14b08a144f9e3 would be related.
>
Without a stack trace, it's hard to say what could be wrong. Could you
retest with CONFIG_KALLSYMS, CONFIG_FRAME_POINTER, CONFIG_STACKTRACE and
whatever else might be needed to produce a stack trace on MIPS?
In addition, could you try enabling some debug options for use-after-free
which are more lightweight than KASAN? I'm thinking of:
CONFIG_SLUB_DEBUG + CONFIG_SLUB_DEBUG_ON,
CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT (if memory allows)
If this doesn't point to anything, you could simplify the setup and
teardown (and then probe and remove) methods little by little until you
find the culprit. The idea being that a driver which doesn't do anything
on probe and remove shouldn't crash the kernel.
prev parent reply other threads:[~2026-01-03 0:24 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-02 19:46 Fw: [Bug 220932] New: Possible bug (use after free) on DSA driver removal Stephen Hemminger
2026-01-03 0:24 ` Vladimir Oltean [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260103002410.brxrcajbnd2bpq5a@skbuf \
--to=olteanv@gmail.com \
--cc=andrew@lunn.ch \
--cc=luizluca@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).