public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
* mt7996e kernel panic
@ 2026-04-03 13:35 Matteo Croce
  2026-04-15 19:53 ` Matteo Croce
  0 siblings, 1 reply; 2+ messages in thread
From: Matteo Croce @ 2026-04-03 13:35 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Ryder Lee; +Cc: linux-wireless, linux-kernel

Hi,

I'm experiencing a crash with the mt7996e driver.

The system is OpenWrt running on a Banana Pi R4, so kernel is 6.12.74 + 
mt76 backport

This is the crash:


[518291.616307] mt7996e 0000:01:00.0: Message 001a0003 (seq 1) timeout
[518291.622615] ap-mld0: HW problem - can not stop rx aggregation for 
4a:11:96:52:7e:6d tid 0
[518291.686603] pcieport 0000:00:00.0: AER: Multiple Uncorrectable 
(Non-Fatal) error message received from 0000:00:00.0
[518291.686603] pcieport 0001:00:00.0: AER: Multiple Uncorrectable 
(Non-Fatal) error message received from 0001:00:00.0
[518291.686628] pcieport 0000:00:00.0: PCIe Bus Error: 
severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)
[518291.697137] pcieport 0001:00:00.0: PCIe Bus Error: 
severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)
[518291.707639] pcieport 0000:00:00.0:   device [14c3:7988] error 
status/mask=00004000/00400000
[518291.707645] pcieport 0000:00:00.0:    [14] CmpltTO                
(First)
[518291.719022] pcieport 0001:00:00.0:   device [14c3:7988] error 
status/mask=00004000/00400000
[518291.730390] pcieport 0000:00:00.0: AER: broadcast error_detected message
[518291.730395] mt7996e 0000:01:00.0: AER: can't recover (no 
error_detected callback)
[518291.738822] pcieport 0001:00:00.0:    [14] CmpltTO                
(First)
[518291.745754] pcieport 0000:00:00.0: AER: device recovery failed
[518291.754122] pcieport 0001:00:00.0: AER: broadcast error_detected message
[518291.786622] Unable to handle kernel paging request at virtual 
address 7ae91c2bcb8708b6
[518291.788007] mt7996e_hif 0001:01:00.0: AER: can't recover (no 
error_detected callback)
[518291.795992] Mem abort info:
[518291.803956] pcieport 0001:00:00.0: AER: device recovery failed
[518291.806766]   ESR = 0x0000000096000004
[518291.806769]   EC = 0x25: DABT (current EL), IL = 32 bits
[518291.821919]   SET = 0, FnV = 0
[518291.825052]   EA = 0, S1PTW = 0
[518291.828278]   FSC = 0x04: level 0 translation fault
[518291.833233] Data abort info:
[518291.836189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[518291.841761]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[518291.846895]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[518291.852282] [7ae91c2bcb8708b6] address between user and kernel 
address ranges
[518291.859500] Internal error: Oops: 0000000096000004 [#1] SMP
[518291.865152] Modules linked in: pppoe ppp_async nft_fib_inet 
nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 
nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat 
nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 
nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table 
nf_conntrack mt7996e(O) mt76_connac_lib(O) mt76(O) mac80211(O) 
cfg80211(O) slhc sfp rtc_pcf8563 nfnetlink nf_reject_ipv6 nf_reject_ipv4 
nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mdio_i2c macvlan libcrc32c 
compat(O) at24 crypto_safexcel pwm_fan i2c_mux_pca954x i2c_mux 
sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes 
authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd 
xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia crc_itu_t
[518291.935834] CPU: 3 UID: 0 PID: 22229 Comm: kworker/u16:0 Tainted: 
G        W  O       6.12.74 #0
[518291.944702] Tainted: [W]=WARN, [O]=OOT_MODULE
[518291.949135] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT)
[518291.954696] Workqueue: mt76 mt7996_mac_reset_work [mt7996e]
[518291.960367] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[518291.967406] pc : __free_pages+0x14/0x88
[518291.971326] lr : mtk_wed_hwrro_free_buffer+0x5c/0xc0
[518291.976371] sp : ffffffc08cf83b80
[518291.979761] x29: ffffffc08cf83b80 x28: ffffff80c5ce4c30 x27: 
ffffff80c5ce4c18
[518291.986976] x26: 0000000000000001 x25: ffffff80c4e1c080 x24: 
ffffffc081d03000
[518291.994188] x23: ffffff80c266f000 x22: ffffff80c2670000 x21: 
0000000000000001
[518292.001401] x20: 7ae91c2bcb8708b6 x19: ffffff80c266f810 x18: 
ffffff80ff7ae100
[518292.008613] x17: ffffffc07eb8b000 x16: ffffffc080e00000 x15: 
0000000000000001
[518292.015825] x14: 0000000000000000 x13: ffffff80c0129cc0 x12: 
0000000000000006
[518292.023037] x11: 0000000000000000 x10: ffffff80c0129cc0 x9 : 
ffffff80ff7ae180
[518292.030249] x8 : 0000000000000000 x7 : ffffffc080d76400 x6 : 
7ae91c2bcb8708ea
[518292.037461] x5 : ffffffc080dafce8 x4 : 0000000000000020 x3 : 
000000000000003f
[518292.044673] x2 : 0000000000000040 x1 : 0000000000000000 x0 : 
7ae91c2bcb8708b6
[518292.051885] Call trace:
[518292.054408]  __free_pages+0x14/0x88
[518292.057974]  mtk_wed_hwrro_free_buffer+0x5c/0xc0
[518292.062668]  mtk_wed_reset_dma+0x78c/0xafc
[518292.066842]  mt7996_dma_reset+0x1b0/0x444 [mt7996e]
[518292.071807]  mt7996_mac_reset_work+0x324/0x1250 [mt7996e]
[518292.077287]  process_one_work+0x174/0x300
[518292.081376]  worker_thread+0x278/0x430
[518292.085202]  kthread+0xd8/0xdc
[518292.088335]  ret_from_fork+0x10/0x20
[518292.091992] Code: 9100d006 910003fd f90013f5 52800035 (f9400004)
[518292.098160] ---[ end trace 0000000000000000 ]---
[518292.110206] pstore: backend (ramoops) writing error (-28)
[518292.115685] Kernel panic - not syncing: Oops: Fatal exception
[518292.121506] SMP: stopping secondary CPUs
[518292.125508] Kernel Offset: disabled
[518292.129073] CPU features: 0x00,00000002,00000000,4200400b
[518292.134548] Memory Limit: none
[518292.144967] Rebooting in 3 seconds..


I resolved the line numbers and I've got:

    __free_pages (include/asm-generic/bitops/generic-non-atomic.h:128)
    mtk_wed_hwrro_free_buffer (drivers/net/ethernet/mediatek/mtk_wed.c:870)
    mtk_wed_reset_dma (drivers/net/ethernet/mediatek/mtk_wed.c:1887)
    mt7996_dma_reset (mt7996/dma.c:1009) [mt7996e]
    mt7996_mac_restart (mt7996/mac.c:2273) [mt7996e]
    process_one_work (kernel/workqueue.c:3235)
    process_scheduled_works (kernel/workqueue.c:3304)
    kthread (kernel/kthread.c:389)
    ret_from_fork (arch/arm64/kernel/entry.S:863)


It happens randomly between a few hours and a few days. I don't know the 
code so I let an AI analyze the crash, for what it's worth, this is its 
analysis:


The crash is a use-after-free / corrupted pointer during a MAC reset 
work handler. The sequence:

1. mt7996 MCU timeout (Message 001a0003 timeout) triggers a MAC reset 
via mt7996_mac_reset_work
2. The reset path calls mt7996_dma_reset(dev, true) → 
mtk_wed_device_dma_reset() → mtk_wed_reset_dma() → mtk_wed_rx_reset() → 
mtk_wed_hwrro_free_buffer()
3. At mtk_wed.c:870-879, the loop iterates over page_list[] entries and 
calls __free_page(page) on each
4. The page pointer is 0x7ae91c2bcb8708b6 (visible in x20/x0 registers) 
— this is a garbage/poisoned value, not a valid kernel pointer
5. __free_page() → __free_pages() tries to test page flags via 
generic_test_bit() and faults on the bad pointer

The root cause is that dev->hw_rro.pages[] contains corrupted entries. 
This is likely because the PCIe Completion Timeout (CmpltTO) errors on 
both PCIe ports
(0000:00:00.0 and 0001:00:00.0) corrupted DMA memory or the device state 
before the reset handler could clean up properly. The 0x7ae91c2bcb87... 
pattern in x20 and x6
(with 0x...08ea vs 0x...08b6 — differing by 0x34 = a struct offset) 
suggests the page_list memory itself was corrupted by stale DMA writes 
after the PCIe link went
down.


Regards,

-- 

Matteo Croce

per aspera ad upstream


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mt7996e kernel panic
  2026-04-03 13:35 mt7996e kernel panic Matteo Croce
@ 2026-04-15 19:53 ` Matteo Croce
  0 siblings, 0 replies; 2+ messages in thread
From: Matteo Croce @ 2026-04-15 19:53 UTC (permalink / raw)
  To: Felix Fietkau, Lorenzo Bianconi, Ryder Lee; +Cc: linux-wireless, linux-kernel

Hi,

I confirm that the router keeps on crashing even with 6.12.80 and latest mt76.
As I workaround I'm loading mt7996e with wed_enable=0 and i'ts been
stable since a week.

Regards,
-- 
Matteo Croce

perl -e 'for($t=0;;$t++){print chr($t*($t>>8|$t>>13)&255)}' |aplay

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-15 19:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03 13:35 mt7996e kernel panic Matteo Croce
2026-04-15 19:53 ` Matteo Croce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox