* stmmac and XDP/ZC issue
@ 2024-02-20 11:02 Kurt Kanzenbach
2024-02-20 13:18 ` Serge Semin
0 siblings, 1 reply; 10+ messages in thread
From: Kurt Kanzenbach @ 2024-02-20 11:02 UTC (permalink / raw)
To: netdev; +Cc: Sebastian Andrzej Siewior, Song Yoong Siang
[-- Attachment #1: Type: text/plain, Size: 4127 bytes --]
Hello netdev community,
after updating to v6.8 kernel I've encountered an issue in the stmmac
driver.
I have an application which makes use of XDP zero-copy sockets. It works
on v6.7. On v6.8 it results in the stack trace shown below. The program
counter points to:
- ./include/net/xdp_sock.h:192 and
- ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681
It seems to be caused by the XDP meta data patches. This one in
particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC").
To reproduce:
- Hardware: imx93
- Run ptp4l/phc2sys
- Configure Qbv, Rx steering, NAPI threading
- Run my application using XDP/ZC on queue 1
Any idea what might be the issue here?
Thanks,
Kurt
Stack trace:
|[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST
|[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched
|[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode
|[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0
|[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1
|[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1
|[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
|[ 255.822602] Mem abort info:
|[ 255.822604] ESR = 0x0000000096000044
|[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits
|[ 255.822613] SET = 0, FnV = 0
|[ 255.822616] EA = 0, S1PTW = 0
|[ 255.822618] FSC = 0x04: level 0 translation fault
|[ 255.822622] Data abort info:
|[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
|[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
|[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
|[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000
|[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
|[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP
|[ 255.822655] Modules linked in:
|[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8
|[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT)
|[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
|[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38
|[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38
|[ 255.822696] sp : ffff800085ec3bc0
|[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001
|[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001
|[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000
|[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000
|[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008
|[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507
|[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84
|[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0
|[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000
|[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
|[ 255.822764] Call trace:
|[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38
|[ 255.822772] stmmac_napi_poll_rxtx+0xc4/0xec0
|[ 255.822778] __napi_poll.constprop.0+0x40/0x220
|[ 255.822785] napi_threaded_poll+0xd8/0x228
|[ 255.822790] kthread+0x108/0x120
|[ 255.822798] ret_from_fork+0x10/0x20
|[ 255.822808] Code: 910303e0 f9003be1 97ffdec0 f9403be1 (f9000020)
|[ 255.822812] ---[ end trace 0000000000000000 ]---
|[ 255.822817] Kernel panic - not syncing: Oops: Fatal exception in interrupt
|[ 255.822819] SMP: stopping secondary CPUs
|[ 255.822827] Kernel Offset: disabled
|[ 255.822829] CPU features: 0x0,c0000000,4002814a,2100720b
|[ 255.822834] Memory Limit: none
|[ 256.062429] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: stmmac and XDP/ZC issue 2024-02-20 11:02 stmmac and XDP/ZC issue Kurt Kanzenbach @ 2024-02-20 13:18 ` Serge Semin 2024-02-20 14:43 ` Maciej Fijalkowski 0 siblings, 1 reply; 10+ messages in thread From: Serge Semin @ 2024-02-20 13:18 UTC (permalink / raw) To: Kurt Kanzenbach Cc: netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Stanislav Fomichev, Alexei Starovoitov Hi Kurt On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: > Hello netdev community, > > after updating to v6.8 kernel I've encountered an issue in the stmmac > driver. > > I have an application which makes use of XDP zero-copy sockets. It works > on v6.7. On v6.8 it results in the stack trace shown below. The program > counter points to: > > - ./include/net/xdp_sock.h:192 and > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 > > It seems to be caused by the XDP meta data patches. This one in > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). > > To reproduce: > > - Hardware: imx93 > - Run ptp4l/phc2sys > - Configure Qbv, Rx steering, NAPI threading > - Run my application using XDP/ZC on queue 1 > > Any idea what might be the issue here? > > Thanks, > Kurt > > Stack trace: > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > |[ 255.822602] Mem abort info: > |[ 255.822604] ESR = 0x0000000096000044 > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits > |[ 255.822613] SET = 0, FnV = 0 > |[ 255.822616] EA = 0, S1PTW = 0 > |[ 255.822618] FSC = 0x04: level 0 translation fault > |[ 255.822622] Data abort info: > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP > |[ 255.822655] Modules linked in: > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 > |[ 255.822696] sp : ffff800085ec3bc0 > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > |[ 255.822764] Call trace: > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > |[ 255.822772] stmmac_napi_poll_rxtx+0xc4/0xec0 > |[ 255.822778] __napi_poll.constprop.0+0x40/0x220 > |[ 255.822785] napi_threaded_poll+0xd8/0x228 > |[ 255.822790] kthread+0x108/0x120 > |[ 255.822798] ret_from_fork+0x10/0x20 > |[ 255.822808] Code: 910303e0 f9003be1 97ffdec0 f9403be1 (f9000020) > |[ 255.822812] ---[ end trace 0000000000000000 ]--- > |[ 255.822817] Kernel panic - not syncing: Oops: Fatal exception in interrupt > |[ 255.822819] SMP: stopping secondary CPUs > |[ 255.822827] Kernel Offset: disabled > |[ 255.822829] CPU features: 0x0,c0000000,4002814a,2100720b > |[ 255.822834] Memory Limit: none > |[ 256.062429] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- Just confirmed the same problem on my MIPS-based SoC: Device #1: $ ifconfig eth2 192.168.2.2 up $ pktgen.sh -v -i eth2 -d 192.168.2.3 -m 4C:A5:15:59:A6:86 -n 0 -s 1496 Device #2: $ mount -t bpf none /sys/fs/bpf/ $ sysctl -w net.core.bpf_jit_enable=1 $ ifconfig eth0 192.168.2.3 up $ xdp-bench tx eth0 ... [ 559.663885] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 809a81e0, ra == 809a81dc [ 559.675786] Oops[#1]: [ 559.678324] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc3-bt1-00322-gb2c1210b8fe6-dirty #2176 [ 559.695824] $ 0 : 00000000 00000001 00000000 00000000 [ 559.701676] $ 4 : eb019c48 00000000 bf054000 81ddfe53 [ 559.707524] $ 8 : 00000000 84ea05c0 00000000 00000000 [ 559.713372] $12 : 00000000 0000002e 816e9d00 81080000 [ 559.719221] $16 : 00000002 a254c020 00000000 00000000 [ 559.725069] $20 : 84ea05c0 00000000 852b8000 00000040 [ 559.730917] $24 : 00000000 00000000 [ 559.736766] $28 : 815d8000 81ddfd88 84ea05c0 809a81dc [ 559.742615] Hi : 00000007 [ 559.745826] Lo : 00000000 [ 559.749029] epc : 809a81e0 stmmac_tx_clean+0x9f8/0xd64 [ 559.754974] ra : 809a81dc stmmac_tx_clean+0x9f4/0xd64 [ 559.760909] Status: 10000003 KERNEL EXL IE [ 559.765588] Cause : 0080040c (ExcCode 03) [ 559.770063] BadVA : 00000000 [ 559.773266] PrId : 0001a830 [ 559.777740] Modules linked in: [ 559.781150] Process swapper/0 (pid: 0, threadinfo=9e75df13, task=e559c9e5, tls=00000000) [ 559.790194] Stack : 00000001 00000001 00003138 00000001 001a07f2 4696b1a6 00000000 00000000 [ 559.799552] 00000000 00000001 00000000 81080000 00000001 00000000 84ea0b40 84ea2880 [ 559.808909] 84ea0e20 00000000 00000000 00000001 81600000 81ddfe53 810d6bcc 817b0000 [ 559.818265] 815d8000 81ddfe10 0000012c 80e83fd4 84ea05c0 a254c020 00000000 80142518 [ 559.827622] 00800400 eb019c48 81600000 00000040 81ddfebc 84ea05c0 00000000 84ea1320 [ 559.836979] ... [ 559.839710] Call Trace: [ 559.842435] [<809a81e0>] stmmac_tx_clean+0x9f8/0xd64 [ 559.847985] [<809a8610>] stmmac_napi_poll_tx+0xc4/0x18c [ 559.858885] [<80b2db94>] net_rx_action+0x128/0x288 [ 559.864232] [<80e84d48>] __do_softirq+0x134/0x4e0 [ 559.869489] [<80142484>] irq_exit+0xd4/0x138 [ 559.874261] [<807cc768>] __gic_irq_dispatch+0x154/0x1f0 [ 559.880101] [<80102d50>] except_vec_vi_end+0xc4/0xd0 [ 559.885641] [<80e78884>] default_idle_call+0x64/0x168 [ 559.891288] [<801975c4>] do_idle+0xf4/0x198 [ 559.895965] [<80197990>] cpu_startup_entry+0x30/0x40 [ 559.901513] [<80e78c1c>] kernel_init+0x0/0x120 [ 559.906477] [ 559.908126] Code: 0c2682db afa50048 8fa50048 <aca20000> aca30004 1000fded 8fc208b0 8fc308ac 0000a825 [ 559.919047] [ 559.920734] ---[ end trace 0000000000000000 ]--- [ 559.925908] Kernel panic - not syncing: Fatal exception in interrupt No problem has been spotted for the XDP drop and pass benches. As you pointed out reverting the commit 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC") fixes the bug. -Serge(y) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-20 13:18 ` Serge Semin @ 2024-02-20 14:43 ` Maciej Fijalkowski 2024-02-20 21:57 ` Stanislav Fomichev 0 siblings, 1 reply; 10+ messages in thread From: Maciej Fijalkowski @ 2024-02-20 14:43 UTC (permalink / raw) To: Serge Semin Cc: Kurt Kanzenbach, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Stanislav Fomichev, Alexei Starovoitov On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: > Hi Kurt > > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: > > Hello netdev community, > > > > after updating to v6.8 kernel I've encountered an issue in the stmmac > > driver. > > > > I have an application which makes use of XDP zero-copy sockets. It works > > on v6.7. On v6.8 it results in the stack trace shown below. The program > > counter points to: > > > > - ./include/net/xdp_sock.h:192 and > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 > > > > It seems to be caused by the XDP meta data patches. This one in > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). > > > > To reproduce: > > > > - Hardware: imx93 > > - Run ptp4l/phc2sys > > - Configure Qbv, Rx steering, NAPI threading > > - Run my application using XDP/ZC on queue 1 > > > > Any idea what might be the issue here? > > > > Thanks, > > Kurt > > > > Stack trace: > > > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > |[ 255.822602] Mem abort info: > > |[ 255.822604] ESR = 0x0000000096000044 > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits > > |[ 255.822613] SET = 0, FnV = 0 > > |[ 255.822616] EA = 0, S1PTW = 0 > > |[ 255.822618] FSC = 0x04: level 0 translation fault > > |[ 255.822622] Data abort info: > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP > > |[ 255.822655] Modules linked in: > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 > > |[ 255.822696] sp : ffff800085ec3bc0 > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > > |[ 255.822764] Call trace: > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 Shouldn't xsk_tx_metadata_complete() be called only when corresponding buf_type is STMMAC_TXBUF_T_XSK_TX? > > |[ 255.822772] stmmac_napi_poll_rxtx+0xc4/0xec0 > > |[ 255.822778] __napi_poll.constprop.0+0x40/0x220 > > |[ 255.822785] napi_threaded_poll+0xd8/0x228 > > |[ 255.822790] kthread+0x108/0x120 > > |[ 255.822798] ret_from_fork+0x10/0x20 > > |[ 255.822808] Code: 910303e0 f9003be1 97ffdec0 f9403be1 (f9000020) > > |[ 255.822812] ---[ end trace 0000000000000000 ]--- > > |[ 255.822817] Kernel panic - not syncing: Oops: Fatal exception in interrupt > > |[ 255.822819] SMP: stopping secondary CPUs > > |[ 255.822827] Kernel Offset: disabled > > |[ 255.822829] CPU features: 0x0,c0000000,4002814a,2100720b > > |[ 255.822834] Memory Limit: none > > |[ 256.062429] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- > > Just confirmed the same problem on my MIPS-based SoC: > > Device #1: > $ ifconfig eth2 192.168.2.2 up > $ pktgen.sh -v -i eth2 -d 192.168.2.3 -m 4C:A5:15:59:A6:86 -n 0 -s 1496 > > Device #2: > $ mount -t bpf none /sys/fs/bpf/ > $ sysctl -w net.core.bpf_jit_enable=1 > $ ifconfig eth0 192.168.2.3 up > $ xdp-bench tx eth0 > ... > [ 559.663885] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 809a81e0, ra == 809a81dc > [ 559.675786] Oops[#1]: > [ 559.678324] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc3-bt1-00322-gb2c1210b8fe6-dirty #2176 > [ 559.695824] $ 0 : 00000000 00000001 00000000 00000000 > [ 559.701676] $ 4 : eb019c48 00000000 bf054000 81ddfe53 > [ 559.707524] $ 8 : 00000000 84ea05c0 00000000 00000000 > [ 559.713372] $12 : 00000000 0000002e 816e9d00 81080000 > [ 559.719221] $16 : 00000002 a254c020 00000000 00000000 > [ 559.725069] $20 : 84ea05c0 00000000 852b8000 00000040 > [ 559.730917] $24 : 00000000 00000000 > [ 559.736766] $28 : 815d8000 81ddfd88 84ea05c0 809a81dc > [ 559.742615] Hi : 00000007 > [ 559.745826] Lo : 00000000 > [ 559.749029] epc : 809a81e0 stmmac_tx_clean+0x9f8/0xd64 > [ 559.754974] ra : 809a81dc stmmac_tx_clean+0x9f4/0xd64 > [ 559.760909] Status: 10000003 KERNEL EXL IE > [ 559.765588] Cause : 0080040c (ExcCode 03) > [ 559.770063] BadVA : 00000000 > [ 559.773266] PrId : 0001a830 > [ 559.777740] Modules linked in: > [ 559.781150] Process swapper/0 (pid: 0, threadinfo=9e75df13, task=e559c9e5, tls=00000000) > [ 559.790194] Stack : 00000001 00000001 00003138 00000001 001a07f2 4696b1a6 00000000 00000000 > [ 559.799552] 00000000 00000001 00000000 81080000 00000001 00000000 84ea0b40 84ea2880 > [ 559.808909] 84ea0e20 00000000 00000000 00000001 81600000 81ddfe53 810d6bcc 817b0000 > [ 559.818265] 815d8000 81ddfe10 0000012c 80e83fd4 84ea05c0 a254c020 00000000 80142518 > [ 559.827622] 00800400 eb019c48 81600000 00000040 81ddfebc 84ea05c0 00000000 84ea1320 > [ 559.836979] ... > [ 559.839710] Call Trace: > [ 559.842435] [<809a81e0>] stmmac_tx_clean+0x9f8/0xd64 > [ 559.847985] [<809a8610>] stmmac_napi_poll_tx+0xc4/0x18c > [ 559.858885] [<80b2db94>] net_rx_action+0x128/0x288 > [ 559.864232] [<80e84d48>] __do_softirq+0x134/0x4e0 > [ 559.869489] [<80142484>] irq_exit+0xd4/0x138 > [ 559.874261] [<807cc768>] __gic_irq_dispatch+0x154/0x1f0 > [ 559.880101] [<80102d50>] except_vec_vi_end+0xc4/0xd0 > [ 559.885641] [<80e78884>] default_idle_call+0x64/0x168 > [ 559.891288] [<801975c4>] do_idle+0xf4/0x198 > [ 559.895965] [<80197990>] cpu_startup_entry+0x30/0x40 > [ 559.901513] [<80e78c1c>] kernel_init+0x0/0x120 > [ 559.906477] > [ 559.908126] Code: 0c2682db afa50048 8fa50048 <aca20000> aca30004 1000fded 8fc208b0 8fc308ac 0000a825 > [ 559.919047] > [ 559.920734] ---[ end trace 0000000000000000 ]--- > [ 559.925908] Kernel panic - not syncing: Fatal exception in interrupt > > No problem has been spotted for the XDP drop and pass benches. > > As you pointed out reverting the commit 1347b419318d ("net: stmmac: > Add Tx HWTS support to XDP ZC") fixes the bug. > > -Serge(y) > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-20 14:43 ` Maciej Fijalkowski @ 2024-02-20 21:57 ` Stanislav Fomichev 2024-02-21 7:13 ` Kurt Kanzenbach 0 siblings, 1 reply; 10+ messages in thread From: Stanislav Fomichev @ 2024-02-20 21:57 UTC (permalink / raw) To: Maciej Fijalkowski Cc: Serge Semin, Kurt Kanzenbach, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Alexei Starovoitov On Tue, Feb 20, 2024 at 6:43 AM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: > > Hi Kurt > > > > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: > > > Hello netdev community, > > > > > > after updating to v6.8 kernel I've encountered an issue in the stmmac > > > driver. > > > > > > I have an application which makes use of XDP zero-copy sockets. It works > > > on v6.7. On v6.8 it results in the stack trace shown below. The program > > > counter points to: > > > > > > - ./include/net/xdp_sock.h:192 and > > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 > > > > > > It seems to be caused by the XDP meta data patches. This one in > > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). > > > > > > To reproduce: > > > > > > - Hardware: imx93 > > > - Run ptp4l/phc2sys > > > - Configure Qbv, Rx steering, NAPI threading > > > - Run my application using XDP/ZC on queue 1 > > > > > > Any idea what might be the issue here? > > > > > > Thanks, > > > Kurt > > > > > > Stack trace: > > > > > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST > > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched > > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode > > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 > > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 > > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 > > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > > |[ 255.822602] Mem abort info: > > > |[ 255.822604] ESR = 0x0000000096000044 > > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits > > > |[ 255.822613] SET = 0, FnV = 0 > > > |[ 255.822616] EA = 0, S1PTW = 0 > > > |[ 255.822618] FSC = 0x04: level 0 translation fault > > > |[ 255.822622] Data abort info: > > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 > > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 > > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP > > > |[ 255.822655] Modules linked in: > > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 > > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) > > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 > > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 > > > |[ 255.822696] sp : ffff800085ec3bc0 > > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 > > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 > > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 > > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 > > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 > > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 > > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 > > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 > > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 > > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > > > |[ 255.822764] Call trace: > > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > > Shouldn't xsk_tx_metadata_complete() be called only when corresponding > buf_type is STMMAC_TXBUF_T_XSK_TX? +1. I'm assuming Serge isn't enabling it explicitly, so none of the metadata stuff should trigger in this case. > > > |[ 255.822772] stmmac_napi_poll_rxtx+0xc4/0xec0 > > > |[ 255.822778] __napi_poll.constprop.0+0x40/0x220 > > > |[ 255.822785] napi_threaded_poll+0xd8/0x228 > > > |[ 255.822790] kthread+0x108/0x120 > > > |[ 255.822798] ret_from_fork+0x10/0x20 > > > |[ 255.822808] Code: 910303e0 f9003be1 97ffdec0 f9403be1 (f9000020) > > > |[ 255.822812] ---[ end trace 0000000000000000 ]--- > > > |[ 255.822817] Kernel panic - not syncing: Oops: Fatal exception in interrupt > > > |[ 255.822819] SMP: stopping secondary CPUs > > > |[ 255.822827] Kernel Offset: disabled > > > |[ 255.822829] CPU features: 0x0,c0000000,4002814a,2100720b > > > |[ 255.822834] Memory Limit: none > > > |[ 256.062429] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- > > > > Just confirmed the same problem on my MIPS-based SoC: > > > > Device #1: > > $ ifconfig eth2 192.168.2.2 up > > $ pktgen.sh -v -i eth2 -d 192.168.2.3 -m 4C:A5:15:59:A6:86 -n 0 -s 1496 > > > > Device #2: > > $ mount -t bpf none /sys/fs/bpf/ > > $ sysctl -w net.core.bpf_jit_enable=1 > > $ ifconfig eth0 192.168.2.3 up > > $ xdp-bench tx eth0 > > ... > > [ 559.663885] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 809a81e0, ra == 809a81dc > > [ 559.675786] Oops[#1]: > > [ 559.678324] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc3-bt1-00322-gb2c1210b8fe6-dirty #2176 > > [ 559.695824] $ 0 : 00000000 00000001 00000000 00000000 > > [ 559.701676] $ 4 : eb019c48 00000000 bf054000 81ddfe53 > > [ 559.707524] $ 8 : 00000000 84ea05c0 00000000 00000000 > > [ 559.713372] $12 : 00000000 0000002e 816e9d00 81080000 > > [ 559.719221] $16 : 00000002 a254c020 00000000 00000000 > > [ 559.725069] $20 : 84ea05c0 00000000 852b8000 00000040 > > [ 559.730917] $24 : 00000000 00000000 > > [ 559.736766] $28 : 815d8000 81ddfd88 84ea05c0 809a81dc > > [ 559.742615] Hi : 00000007 > > [ 559.745826] Lo : 00000000 > > [ 559.749029] epc : 809a81e0 stmmac_tx_clean+0x9f8/0xd64 > > [ 559.754974] ra : 809a81dc stmmac_tx_clean+0x9f4/0xd64 > > [ 559.760909] Status: 10000003 KERNEL EXL IE > > [ 559.765588] Cause : 0080040c (ExcCode 03) > > [ 559.770063] BadVA : 00000000 > > [ 559.773266] PrId : 0001a830 > > [ 559.777740] Modules linked in: > > [ 559.781150] Process swapper/0 (pid: 0, threadinfo=9e75df13, task=e559c9e5, tls=00000000) > > [ 559.790194] Stack : 00000001 00000001 00003138 00000001 001a07f2 4696b1a6 00000000 00000000 > > [ 559.799552] 00000000 00000001 00000000 81080000 00000001 00000000 84ea0b40 84ea2880 > > [ 559.808909] 84ea0e20 00000000 00000000 00000001 81600000 81ddfe53 810d6bcc 817b0000 > > [ 559.818265] 815d8000 81ddfe10 0000012c 80e83fd4 84ea05c0 a254c020 00000000 80142518 > > [ 559.827622] 00800400 eb019c48 81600000 00000040 81ddfebc 84ea05c0 00000000 84ea1320 > > [ 559.836979] ... > > [ 559.839710] Call Trace: > > [ 559.842435] [<809a81e0>] stmmac_tx_clean+0x9f8/0xd64 > > [ 559.847985] [<809a8610>] stmmac_napi_poll_tx+0xc4/0x18c > > [ 559.858885] [<80b2db94>] net_rx_action+0x128/0x288 > > [ 559.864232] [<80e84d48>] __do_softirq+0x134/0x4e0 > > [ 559.869489] [<80142484>] irq_exit+0xd4/0x138 > > [ 559.874261] [<807cc768>] __gic_irq_dispatch+0x154/0x1f0 > > [ 559.880101] [<80102d50>] except_vec_vi_end+0xc4/0xd0 > > [ 559.885641] [<80e78884>] default_idle_call+0x64/0x168 > > [ 559.891288] [<801975c4>] do_idle+0xf4/0x198 > > [ 559.895965] [<80197990>] cpu_startup_entry+0x30/0x40 > > [ 559.901513] [<80e78c1c>] kernel_init+0x0/0x120 > > [ 559.906477] > > [ 559.908126] Code: 0c2682db afa50048 8fa50048 <aca20000> aca30004 1000fded 8fc208b0 8fc308ac 0000a825 > > [ 559.919047] > > [ 559.920734] ---[ end trace 0000000000000000 ]--- > > [ 559.925908] Kernel panic - not syncing: Fatal exception in interrupt > > > > No problem has been spotted for the XDP drop and pass benches. > > > > As you pointed out reverting the commit 1347b419318d ("net: stmmac: > > Add Tx HWTS support to XDP ZC") fixes the bug. > > > > -Serge(y) > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-20 21:57 ` Stanislav Fomichev @ 2024-02-21 7:13 ` Kurt Kanzenbach 2024-02-21 9:21 ` Kurt Kanzenbach 0 siblings, 1 reply; 10+ messages in thread From: Kurt Kanzenbach @ 2024-02-21 7:13 UTC (permalink / raw) To: Stanislav Fomichev, Maciej Fijalkowski Cc: Serge Semin, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Alexei Starovoitov [-- Attachment #1: Type: text/plain, Size: 4577 bytes --] On Tue Feb 20 2024, Stanislav Fomichev wrote: > On Tue, Feb 20, 2024 at 6:43 AM Maciej Fijalkowski > <maciej.fijalkowski@intel.com> wrote: >> >> On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: >> > Hi Kurt >> > >> > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: >> > > Hello netdev community, >> > > >> > > after updating to v6.8 kernel I've encountered an issue in the stmmac >> > > driver. >> > > >> > > I have an application which makes use of XDP zero-copy sockets. It works >> > > on v6.7. On v6.8 it results in the stack trace shown below. The program >> > > counter points to: >> > > >> > > - ./include/net/xdp_sock.h:192 and >> > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 >> > > >> > > It seems to be caused by the XDP meta data patches. This one in >> > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). >> > > >> > > To reproduce: >> > > >> > > - Hardware: imx93 >> > > - Run ptp4l/phc2sys >> > > - Configure Qbv, Rx steering, NAPI threading >> > > - Run my application using XDP/ZC on queue 1 >> > > >> > > Any idea what might be the issue here? >> > > >> > > Thanks, >> > > Kurt >> > > >> > > Stack trace: >> > > >> > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST >> > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched >> > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode >> > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 >> > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 >> > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 >> > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 >> > > |[ 255.822602] Mem abort info: >> > > |[ 255.822604] ESR = 0x0000000096000044 >> > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits >> > > |[ 255.822613] SET = 0, FnV = 0 >> > > |[ 255.822616] EA = 0, S1PTW = 0 >> > > |[ 255.822618] FSC = 0x04: level 0 translation fault >> > > |[ 255.822622] Data abort info: >> > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 >> > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 >> > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >> > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 >> > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 >> > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP >> > > |[ 255.822655] Modules linked in: >> > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 >> > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) >> > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 >> > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 >> > > |[ 255.822696] sp : ffff800085ec3bc0 >> > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 >> > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 >> > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 >> > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 >> > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 >> > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 >> > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 >> > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 >> > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 >> > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 >> > > |[ 255.822764] Call trace: >> > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 >> >> Shouldn't xsk_tx_metadata_complete() be called only when corresponding >> buf_type is STMMAC_TXBUF_T_XSK_TX? > > +1. I'm assuming Serge isn't enabling it explicitly, so none of the > metadata stuff should trigger in this case. The only other user of xsk_tx_metadata_complete() in mlx5 guards it with xp_tx_metadata_enabled(). Seems like that's missing in stmmac? Thanks, Kurt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 861 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-21 7:13 ` Kurt Kanzenbach @ 2024-02-21 9:21 ` Kurt Kanzenbach 2024-02-21 15:59 ` Maciej Fijalkowski 0 siblings, 1 reply; 10+ messages in thread From: Kurt Kanzenbach @ 2024-02-21 9:21 UTC (permalink / raw) To: Stanislav Fomichev, Maciej Fijalkowski Cc: Serge Semin, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Alexei Starovoitov [-- Attachment #1: Type: text/plain, Size: 6659 bytes --] On Wed Feb 21 2024, Kurt Kanzenbach wrote: > On Tue Feb 20 2024, Stanislav Fomichev wrote: >> On Tue, Feb 20, 2024 at 6:43 AM Maciej Fijalkowski >> <maciej.fijalkowski@intel.com> wrote: >>> >>> On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: >>> > Hi Kurt >>> > >>> > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: >>> > > Hello netdev community, >>> > > >>> > > after updating to v6.8 kernel I've encountered an issue in the stmmac >>> > > driver. >>> > > >>> > > I have an application which makes use of XDP zero-copy sockets. It works >>> > > on v6.7. On v6.8 it results in the stack trace shown below. The program >>> > > counter points to: >>> > > >>> > > - ./include/net/xdp_sock.h:192 and >>> > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 >>> > > >>> > > It seems to be caused by the XDP meta data patches. This one in >>> > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). >>> > > >>> > > To reproduce: >>> > > >>> > > - Hardware: imx93 >>> > > - Run ptp4l/phc2sys >>> > > - Configure Qbv, Rx steering, NAPI threading >>> > > - Run my application using XDP/ZC on queue 1 >>> > > >>> > > Any idea what might be the issue here? >>> > > >>> > > Thanks, >>> > > Kurt >>> > > >>> > > Stack trace: >>> > > >>> > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST >>> > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched >>> > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode >>> > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 >>> > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 >>> > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 >>> > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 >>> > > |[ 255.822602] Mem abort info: >>> > > |[ 255.822604] ESR = 0x0000000096000044 >>> > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits >>> > > |[ 255.822613] SET = 0, FnV = 0 >>> > > |[ 255.822616] EA = 0, S1PTW = 0 >>> > > |[ 255.822618] FSC = 0x04: level 0 translation fault >>> > > |[ 255.822622] Data abort info: >>> > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 >>> > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 >>> > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >>> > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 >>> > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 >>> > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP >>> > > |[ 255.822655] Modules linked in: >>> > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 >>> > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) >>> > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>> > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 >>> > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 >>> > > |[ 255.822696] sp : ffff800085ec3bc0 >>> > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 >>> > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 >>> > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 >>> > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 >>> > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 >>> > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 >>> > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 >>> > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 >>> > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 >>> > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 >>> > > |[ 255.822764] Call trace: >>> > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 >>> >>> Shouldn't xsk_tx_metadata_complete() be called only when corresponding >>> buf_type is STMMAC_TXBUF_T_XSK_TX? >> >> +1. I'm assuming Serge isn't enabling it explicitly, so none of the >> metadata stuff should trigger in this case. > > The only other user of xsk_tx_metadata_complete() in mlx5 guards it with > xp_tx_metadata_enabled(). Seems like that's missing in stmmac? Well, the following patch seems to help: commit e85ab4b97b4d6e50036435ac9851b876c221f580 Author: Kurt Kanzenbach <kurt@linutronix.de> Date: Wed Feb 21 08:18:15 2024 +0100 net: stmmac: Complete meta data only when enabled Currently using XDP sockets on stmmac results in a kernel crash: |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 |[...] |[ 255.822764] Call trace: |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 The program counter indicates xsk_tx_metadata_complete(). However, this function shouldn't be called unless metadata is actually enabled. Tested on imx93. Fixes: 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 9df27f03a8cb..77c62b26342d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2678,9 +2678,10 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue, .desc = p, }; - xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, - &stmmac_xsk_tx_metadata_ops, - &tx_compl); + if (xp_tx_metadata_enabled(tx_q->xsk_pool)) + xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, + &stmmac_xsk_tx_metadata_ops, + &tx_compl); } } [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 861 bytes --] ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-21 9:21 ` Kurt Kanzenbach @ 2024-02-21 15:59 ` Maciej Fijalkowski 2024-02-21 17:20 ` Serge Semin 0 siblings, 1 reply; 10+ messages in thread From: Maciej Fijalkowski @ 2024-02-21 15:59 UTC (permalink / raw) To: Kurt Kanzenbach Cc: Stanislav Fomichev, Serge Semin, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Alexei Starovoitov On Wed, Feb 21, 2024 at 10:21:04AM +0100, Kurt Kanzenbach wrote: > On Wed Feb 21 2024, Kurt Kanzenbach wrote: > > On Tue Feb 20 2024, Stanislav Fomichev wrote: > >> On Tue, Feb 20, 2024 at 6:43 AM Maciej Fijalkowski > >> <maciej.fijalkowski@intel.com> wrote: > >>> > >>> On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: > >>> > Hi Kurt > >>> > > >>> > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: > >>> > > Hello netdev community, > >>> > > > >>> > > after updating to v6.8 kernel I've encountered an issue in the stmmac > >>> > > driver. > >>> > > > >>> > > I have an application which makes use of XDP zero-copy sockets. It works > >>> > > on v6.7. On v6.8 it results in the stack trace shown below. The program > >>> > > counter points to: > >>> > > > >>> > > - ./include/net/xdp_sock.h:192 and > >>> > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 > >>> > > > >>> > > It seems to be caused by the XDP meta data patches. This one in > >>> > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). > >>> > > > >>> > > To reproduce: > >>> > > > >>> > > - Hardware: imx93 > >>> > > - Run ptp4l/phc2sys > >>> > > - Configure Qbv, Rx steering, NAPI threading > >>> > > - Run my application using XDP/ZC on queue 1 > >>> > > > >>> > > Any idea what might be the issue here? > >>> > > > >>> > > Thanks, > >>> > > Kurt > >>> > > > >>> > > Stack trace: > >>> > > > >>> > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST > >>> > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched > >>> > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode > >>> > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 > >>> > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 > >>> > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 > >>> > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > >>> > > |[ 255.822602] Mem abort info: > >>> > > |[ 255.822604] ESR = 0x0000000096000044 > >>> > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits > >>> > > |[ 255.822613] SET = 0, FnV = 0 > >>> > > |[ 255.822616] EA = 0, S1PTW = 0 > >>> > > |[ 255.822618] FSC = 0x04: level 0 translation fault > >>> > > |[ 255.822622] Data abort info: > >>> > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 > >>> > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > >>> > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > >>> > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 > >>> > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > >>> > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP > >>> > > |[ 255.822655] Modules linked in: > >>> > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 > >>> > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) > >>> > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >>> > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 > >>> > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 > >>> > > |[ 255.822696] sp : ffff800085ec3bc0 > >>> > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 > >>> > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 > >>> > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 > >>> > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 > >>> > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 > >>> > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 > >>> > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 > >>> > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 > >>> > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 > >>> > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > >>> > > |[ 255.822764] Call trace: > >>> > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > >>> > >>> Shouldn't xsk_tx_metadata_complete() be called only when corresponding > >>> buf_type is STMMAC_TXBUF_T_XSK_TX? > >> > >> +1. I'm assuming Serge isn't enabling it explicitly, so none of the > >> metadata stuff should trigger in this case. > > > > The only other user of xsk_tx_metadata_complete() in mlx5 guards it with > > xp_tx_metadata_enabled(). Seems like that's missing in stmmac? > > Well, the following patch seems to help: > > commit e85ab4b97b4d6e50036435ac9851b876c221f580 > Author: Kurt Kanzenbach <kurt@linutronix.de> > Date: Wed Feb 21 08:18:15 2024 +0100 > > net: stmmac: Complete meta data only when enabled > > Currently using XDP sockets on stmmac results in a kernel crash: > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > |[...] > |[ 255.822764] Call trace: > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > > The program counter indicates xsk_tx_metadata_complete(). However, this > function shouldn't be called unless metadata is actually enabled. > > Tested on imx93. > > Fixes: 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC") > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 9df27f03a8cb..77c62b26342d 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -2678,9 +2678,10 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue, > .desc = p, > }; > > - xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, > - &stmmac_xsk_tx_metadata_ops, > - &tx_compl); > + if (xp_tx_metadata_enabled(tx_q->xsk_pool)) every other usage of tx metadata functions should be wrapped with xp_tx_metadata_enabled() - can you address other places and send a proper patch? > + xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, > + &stmmac_xsk_tx_metadata_ops, > + &tx_compl); > } > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-21 15:59 ` Maciej Fijalkowski @ 2024-02-21 17:20 ` Serge Semin 2024-02-22 8:35 ` Kurt Kanzenbach 0 siblings, 1 reply; 10+ messages in thread From: Serge Semin @ 2024-02-21 17:20 UTC (permalink / raw) To: Maciej Fijalkowski, Kurt Kanzenbach Cc: Stanislav Fomichev, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Alexei Starovoitov [-- Attachment #1: Type: text/plain, Size: 7987 bytes --] On Wed, Feb 21, 2024 at 04:59:10PM +0100, Maciej Fijalkowski wrote: > On Wed, Feb 21, 2024 at 10:21:04AM +0100, Kurt Kanzenbach wrote: > > On Wed Feb 21 2024, Kurt Kanzenbach wrote: > > > On Tue Feb 20 2024, Stanislav Fomichev wrote: > > >> On Tue, Feb 20, 2024 at 6:43 AM Maciej Fijalkowski > > >> <maciej.fijalkowski@intel.com> wrote: > > >>> > > >>> On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: > > >>> > Hi Kurt > > >>> > > > >>> > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: > > >>> > > Hello netdev community, > > >>> > > > > >>> > > after updating to v6.8 kernel I've encountered an issue in the stmmac > > >>> > > driver. > > >>> > > > > >>> > > I have an application which makes use of XDP zero-copy sockets. It works > > >>> > > on v6.7. On v6.8 it results in the stack trace shown below. The program > > >>> > > counter points to: > > >>> > > > > >>> > > - ./include/net/xdp_sock.h:192 and > > >>> > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 > > >>> > > > > >>> > > It seems to be caused by the XDP meta data patches. This one in > > >>> > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). > > >>> > > > > >>> > > To reproduce: > > >>> > > > > >>> > > - Hardware: imx93 > > >>> > > - Run ptp4l/phc2sys > > >>> > > - Configure Qbv, Rx steering, NAPI threading > > >>> > > - Run my application using XDP/ZC on queue 1 > > >>> > > > > >>> > > Any idea what might be the issue here? > > >>> > > > > >>> > > Thanks, > > >>> > > Kurt > > >>> > > > > >>> > > Stack trace: > > >>> > > > > >>> > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST > > >>> > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched > > >>> > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode > > >>> > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 > > >>> > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 > > >>> > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 > > >>> > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > >>> > > |[ 255.822602] Mem abort info: > > >>> > > |[ 255.822604] ESR = 0x0000000096000044 > > >>> > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits > > >>> > > |[ 255.822613] SET = 0, FnV = 0 > > >>> > > |[ 255.822616] EA = 0, S1PTW = 0 > > >>> > > |[ 255.822618] FSC = 0x04: level 0 translation fault > > >>> > > |[ 255.822622] Data abort info: > > >>> > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 > > >>> > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > > >>> > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > > >>> > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 > > >>> > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > > >>> > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP > > >>> > > |[ 255.822655] Modules linked in: > > >>> > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 > > >>> > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) > > >>> > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > >>> > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 > > >>> > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 > > >>> > > |[ 255.822696] sp : ffff800085ec3bc0 > > >>> > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 > > >>> > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 > > >>> > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 > > >>> > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 > > >>> > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 > > >>> > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 > > >>> > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 > > >>> > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 > > >>> > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 > > >>> > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > > >>> > > |[ 255.822764] Call trace: > > >>> > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > > >>> > > >>> Shouldn't xsk_tx_metadata_complete() be called only when corresponding > > >>> buf_type is STMMAC_TXBUF_T_XSK_TX? > > >> > > >> +1. I'm assuming Serge isn't enabling it explicitly, so none of the > > >> metadata stuff should trigger in this case. > > > > > > The only other user of xsk_tx_metadata_complete() in mlx5 guards it with > > > xp_tx_metadata_enabled(). Seems like that's missing in stmmac? > > > > Well, the following patch seems to help: > > > > commit e85ab4b97b4d6e50036435ac9851b876c221f580 > > Author: Kurt Kanzenbach <kurt@linutronix.de> > > Date: Wed Feb 21 08:18:15 2024 +0100 > > > > net: stmmac: Complete meta data only when enabled > > > > Currently using XDP sockets on stmmac results in a kernel crash: > > > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > > |[...] > > |[ 255.822764] Call trace: > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > > > > The program counter indicates xsk_tx_metadata_complete(). However, this > > function shouldn't be called unless metadata is actually enabled. > > > > Tested on imx93. > > > > Fixes: 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC") > > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index 9df27f03a8cb..77c62b26342d 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -2678,9 +2678,10 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue, > > .desc = p, > > }; > > > > - xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, > > - &stmmac_xsk_tx_metadata_ops, > > - &tx_compl); > > + if (xp_tx_metadata_enabled(tx_q->xsk_pool)) > > every other usage of tx metadata functions should be wrapped with > xp_tx_metadata_enabled() - can you address other places and send a proper > patch? AFAICS this is the only place. But the change above still isn't enough to fix the problem. In my case XDP zero-copy isn't activated. So xsk_pool isn't allocated and the NULL/~NULL dereference is still persistent due to xp_tx_metadata_enabled() dereferencing the NULL-structure fields. The attached patched fixes the problem in my case. Kurt, are you sure that xp_tx_metadata_enabled() is required in your case? Could you test the attached patch with the xp_tx_metadata_enabled() invocation discarded? Maciej, do we need xp_tx_metadata_enabled() guarding the xsk_tx_metadata_complete() call even if the problem is fixed without it? -Serge(y) > > > + xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, > > + &stmmac_xsk_tx_metadata_ops, > > + &tx_compl); > > } > > } > > [-- Attachment #2: 0001-net-stmmac-Complete-meta-data-only-when-enabled.patch --] [-- Type: text/x-patch, Size: 1539 bytes --] From fffab4a5d012875ff6e842901e5bb7db00d9d0ed Mon Sep 17 00:00:00 2001 From: Kurt Kanzenbach <kurt@linutronix.de> Date: Wed, 21 Feb 2024 17:24:25 +0300 Subject: [PATCH] net: stmmac: Complete meta data only when enabled Currently using XDP sockets on stmmac results in a kernel crash: |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 |[...] |[ 255.822764] Call trace: |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 The program counter indicates xsk_tx_metadata_complete(). However, this function shouldn't be called unless metadata is actually enabled. Tested on imx93. Fixes: 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Serge Semin <fancer.lancer@gmail.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 8000fa256dfc..f6c86478a820 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2634,7 +2634,8 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue, } if (skb) { stmmac_get_tx_hwtstamp(priv, p, skb); - } else { + } else if (tx_q->xsk_pool && + xp_tx_metadata_enabled(tx_q->xsk_pool)) { struct stmmac_xsk_tx_complete tx_compl = { .priv = priv, .desc = p, -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-21 17:20 ` Serge Semin @ 2024-02-22 8:35 ` Kurt Kanzenbach 2024-02-22 10:06 ` Serge Semin 0 siblings, 1 reply; 10+ messages in thread From: Kurt Kanzenbach @ 2024-02-22 8:35 UTC (permalink / raw) To: Serge Semin, Maciej Fijalkowski Cc: Stanislav Fomichev, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Alexei Starovoitov [-- Attachment #1: Type: text/plain, Size: 8363 bytes --] On Wed Feb 21 2024, Serge Semin wrote: > On Wed, Feb 21, 2024 at 04:59:10PM +0100, Maciej Fijalkowski wrote: >> On Wed, Feb 21, 2024 at 10:21:04AM +0100, Kurt Kanzenbach wrote: >> > On Wed Feb 21 2024, Kurt Kanzenbach wrote: >> > > On Tue Feb 20 2024, Stanislav Fomichev wrote: >> > >> On Tue, Feb 20, 2024 at 6:43 AM Maciej Fijalkowski >> > >> <maciej.fijalkowski@intel.com> wrote: >> > >>> >> > >>> On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: >> > >>> > Hi Kurt >> > >>> > >> > >>> > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: >> > >>> > > Hello netdev community, >> > >>> > > >> > >>> > > after updating to v6.8 kernel I've encountered an issue in the stmmac >> > >>> > > driver. >> > >>> > > >> > >>> > > I have an application which makes use of XDP zero-copy sockets. It works >> > >>> > > on v6.7. On v6.8 it results in the stack trace shown below. The program >> > >>> > > counter points to: >> > >>> > > >> > >>> > > - ./include/net/xdp_sock.h:192 and >> > >>> > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 >> > >>> > > >> > >>> > > It seems to be caused by the XDP meta data patches. This one in >> > >>> > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). >> > >>> > > >> > >>> > > To reproduce: >> > >>> > > >> > >>> > > - Hardware: imx93 >> > >>> > > - Run ptp4l/phc2sys >> > >>> > > - Configure Qbv, Rx steering, NAPI threading >> > >>> > > - Run my application using XDP/ZC on queue 1 >> > >>> > > >> > >>> > > Any idea what might be the issue here? >> > >>> > > >> > >>> > > Thanks, >> > >>> > > Kurt >> > >>> > > >> > >>> > > Stack trace: >> > >>> > > >> > >>> > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST >> > >>> > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched >> > >>> > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode >> > >>> > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 >> > >>> > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 >> > >>> > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 >> > >>> > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 >> > >>> > > |[ 255.822602] Mem abort info: >> > >>> > > |[ 255.822604] ESR = 0x0000000096000044 >> > >>> > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits >> > >>> > > |[ 255.822613] SET = 0, FnV = 0 >> > >>> > > |[ 255.822616] EA = 0, S1PTW = 0 >> > >>> > > |[ 255.822618] FSC = 0x04: level 0 translation fault >> > >>> > > |[ 255.822622] Data abort info: >> > >>> > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 >> > >>> > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 >> > >>> > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >> > >>> > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 >> > >>> > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 >> > >>> > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP >> > >>> > > |[ 255.822655] Modules linked in: >> > >>> > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 >> > >>> > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) >> > >>> > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> > >>> > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 >> > >>> > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 >> > >>> > > |[ 255.822696] sp : ffff800085ec3bc0 >> > >>> > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 >> > >>> > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 >> > >>> > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 >> > >>> > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 >> > >>> > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 >> > >>> > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 >> > >>> > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 >> > >>> > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 >> > >>> > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 >> > >>> > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 >> > >>> > > |[ 255.822764] Call trace: >> > >>> > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 >> > >>> >> > >>> Shouldn't xsk_tx_metadata_complete() be called only when corresponding >> > >>> buf_type is STMMAC_TXBUF_T_XSK_TX? >> > >> >> > >> +1. I'm assuming Serge isn't enabling it explicitly, so none of the >> > >> metadata stuff should trigger in this case. >> > > >> > > The only other user of xsk_tx_metadata_complete() in mlx5 guards it with >> > > xp_tx_metadata_enabled(). Seems like that's missing in stmmac? >> > >> > Well, the following patch seems to help: >> > >> > commit e85ab4b97b4d6e50036435ac9851b876c221f580 >> > Author: Kurt Kanzenbach <kurt@linutronix.de> >> > Date: Wed Feb 21 08:18:15 2024 +0100 >> > >> > net: stmmac: Complete meta data only when enabled >> > >> > Currently using XDP sockets on stmmac results in a kernel crash: >> > >> > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 >> > |[...] >> > |[ 255.822764] Call trace: >> > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 >> > >> > The program counter indicates xsk_tx_metadata_complete(). However, this >> > function shouldn't be called unless metadata is actually enabled. >> > >> > Tested on imx93. >> > >> > Fixes: 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC") >> > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> >> > >> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> > index 9df27f03a8cb..77c62b26342d 100644 >> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> > @@ -2678,9 +2678,10 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue, >> > .desc = p, >> > }; >> > >> > - xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, >> > - &stmmac_xsk_tx_metadata_ops, >> > - &tx_compl); >> > + if (xp_tx_metadata_enabled(tx_q->xsk_pool)) >> > >> every other usage of tx metadata functions should be wrapped with >> xp_tx_metadata_enabled() - can you address other places and send a proper >> patch? > > AFAICS this is the only place. But the change above still isn't enough > to fix the problem. In my case XDP zero-copy isn't activated. So > xsk_pool isn't allocated and the NULL/~NULL dereference is still > persistent due to xp_tx_metadata_enabled() dereferencing the > NULL-structure fields. The attached patched fixes the problem in my > case. Sure about that? In my case without ZC the else path is not executed, because skb is set. > > Kurt, are you sure that xp_tx_metadata_enabled() is required in your > case? Yes, I'm sure it's required, because I do use ZC without using any metadata. > Could you test the attached patch with the xp_tx_metadata_enabled() > invocation discarded? Well, it works. But, the xp_tx_metadata_enabled() is not discarded in the ZC case: |RtcRxThread-790 [001] b...3 202.970243: stmmac_tx_clean.constprop.0: huhu from xp_tx_metadata_enabled Let's go with your version of the patch. It works without XDP, with XDP and XDP/ZC. I'll send it upstream. Thanks for the help :). Thanks, Kurt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 861 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: stmmac and XDP/ZC issue 2024-02-22 8:35 ` Kurt Kanzenbach @ 2024-02-22 10:06 ` Serge Semin 0 siblings, 0 replies; 10+ messages in thread From: Serge Semin @ 2024-02-22 10:06 UTC (permalink / raw) To: Kurt Kanzenbach Cc: Maciej Fijalkowski, Stanislav Fomichev, netdev, Sebastian Andrzej Siewior, Song Yoong Siang, Alexei Starovoitov On Thu, Feb 22, 2024 at 09:35:02AM +0100, Kurt Kanzenbach wrote: > On Wed Feb 21 2024, Serge Semin wrote: > > On Wed, Feb 21, 2024 at 04:59:10PM +0100, Maciej Fijalkowski wrote: > >> On Wed, Feb 21, 2024 at 10:21:04AM +0100, Kurt Kanzenbach wrote: > >> > On Wed Feb 21 2024, Kurt Kanzenbach wrote: > >> > > On Tue Feb 20 2024, Stanislav Fomichev wrote: > >> > >> On Tue, Feb 20, 2024 at 6:43 AM Maciej Fijalkowski > >> > >> <maciej.fijalkowski@intel.com> wrote: > >> > >>> > >> > >>> On Tue, Feb 20, 2024 at 04:18:54PM +0300, Serge Semin wrote: > >> > >>> > Hi Kurt > >> > >>> > > >> > >>> > On Tue, Feb 20, 2024 at 12:02:25PM +0100, Kurt Kanzenbach wrote: > >> > >>> > > Hello netdev community, > >> > >>> > > > >> > >>> > > after updating to v6.8 kernel I've encountered an issue in the stmmac > >> > >>> > > driver. > >> > >>> > > > >> > >>> > > I have an application which makes use of XDP zero-copy sockets. It works > >> > >>> > > on v6.7. On v6.8 it results in the stack trace shown below. The program > >> > >>> > > counter points to: > >> > >>> > > > >> > >>> > > - ./include/net/xdp_sock.h:192 and > >> > >>> > > - ./drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2681 > >> > >>> > > > >> > >>> > > It seems to be caused by the XDP meta data patches. This one in > >> > >>> > > particular 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC"). > >> > >>> > > > >> > >>> > > To reproduce: > >> > >>> > > > >> > >>> > > - Hardware: imx93 > >> > >>> > > - Run ptp4l/phc2sys > >> > >>> > > - Configure Qbv, Rx steering, NAPI threading > >> > >>> > > - Run my application using XDP/ZC on queue 1 > >> > >>> > > > >> > >>> > > Any idea what might be the issue here? > >> > >>> > > > >> > >>> > > Thanks, > >> > >>> > > Kurt > >> > >>> > > > >> > >>> > > Stack trace: > >> > >>> > > > >> > >>> > > |[ 169.248150] imx-dwmac 428a0000.ethernet eth1: configured EST > >> > >>> > > |[ 191.820913] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched > >> > >>> > > |[ 226.039166] imx-dwmac 428a0000.ethernet eth1: entered promiscuous mode > >> > >>> > > |[ 226.203262] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 > >> > >>> > > |[ 226.203753] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-1 > >> > >>> > > |[ 226.303337] imx-dwmac 428a0000.ethernet eth1: Register MEM_TYPE_XSK_BUFF_POOL RxQ-1 > >> > >>> > > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > >> > >>> > > |[ 255.822602] Mem abort info: > >> > >>> > > |[ 255.822604] ESR = 0x0000000096000044 > >> > >>> > > |[ 255.822608] EC = 0x25: DABT (current EL), IL = 32 bits > >> > >>> > > |[ 255.822613] SET = 0, FnV = 0 > >> > >>> > > |[ 255.822616] EA = 0, S1PTW = 0 > >> > >>> > > |[ 255.822618] FSC = 0x04: level 0 translation fault > >> > >>> > > |[ 255.822622] Data abort info: > >> > >>> > > |[ 255.822624] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 > >> > >>> > > |[ 255.822627] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 > >> > >>> > > |[ 255.822630] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > >> > >>> > > |[ 255.822634] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085fe1000 > >> > >>> > > |[ 255.822638] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 > >> > >>> > > |[ 255.822650] Internal error: Oops: 0000000096000044 [#1] PREEMPT_RT SMP > >> > >>> > > |[ 255.822655] Modules linked in: > >> > >>> > > |[ 255.822660] CPU: 0 PID: 751 Comm: napi/eth1-261 Not tainted 6.8.0-rc4-rt4-00100-g9c63d995ca19 #8 > >> > >>> > > |[ 255.822666] Hardware name: NXP i.MX93 11X11 EVK board (DT) > >> > >>> > > |[ 255.822669] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >> > >>> > > |[ 255.822674] pc : stmmac_tx_clean.constprop.0+0x848/0xc38 > >> > >>> > > |[ 255.822690] lr : stmmac_tx_clean.constprop.0+0x844/0xc38 > >> > >>> > > |[ 255.822696] sp : ffff800085ec3bc0 > >> > >>> > > |[ 255.822698] x29: ffff800085ec3bc0 x28: ffff000005b609e0 x27: 0000000000000001 > >> > >>> > > |[ 255.822706] x26: 0000000000000000 x25: ffff000005b60ae0 x24: 0000000000000001 > >> > >>> > > |[ 255.822712] x23: 0000000000000001 x22: ffff000005b649e0 x21: 0000000000000000 > >> > >>> > > |[ 255.822719] x20: 0000000000000020 x19: ffff800085291030 x18: 0000000000000000 > >> > >>> > > |[ 255.822725] x17: ffff7ffffc51c000 x16: ffff800080000000 x15: 0000000000000008 > >> > >>> > > |[ 255.822732] x14: ffff80008369b880 x13: 0000000000000000 x12: 0000000000008507 > >> > >>> > > |[ 255.822738] x11: 0000000000000040 x10: 0000000000000a70 x9 : ffff800080e32f84 > >> > >>> > > |[ 255.822745] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000003ff0 > >> > >>> > > |[ 255.822751] x5 : 0000000000003c40 x4 : ffff000005b60000 x3 : 0000000000000000 > >> > >>> > > |[ 255.822757] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > >> > >>> > > |[ 255.822764] Call trace: > >> > >>> > > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > >> > >>> > >> > >>> Shouldn't xsk_tx_metadata_complete() be called only when corresponding > >> > >>> buf_type is STMMAC_TXBUF_T_XSK_TX? > >> > >> > >> > >> +1. I'm assuming Serge isn't enabling it explicitly, so none of the > >> > >> metadata stuff should trigger in this case. > >> > > > >> > > The only other user of xsk_tx_metadata_complete() in mlx5 guards it with > >> > > xp_tx_metadata_enabled(). Seems like that's missing in stmmac? > >> > > >> > Well, the following patch seems to help: > >> > > >> > commit e85ab4b97b4d6e50036435ac9851b876c221f580 > >> > Author: Kurt Kanzenbach <kurt@linutronix.de> > >> > Date: Wed Feb 21 08:18:15 2024 +0100 > >> > > >> > net: stmmac: Complete meta data only when enabled > >> > > >> > Currently using XDP sockets on stmmac results in a kernel crash: > >> > > >> > |[ 255.822584] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > >> > |[...] > >> > |[ 255.822764] Call trace: > >> > |[ 255.822766] stmmac_tx_clean.constprop.0+0x848/0xc38 > >> > > >> > The program counter indicates xsk_tx_metadata_complete(). However, this > >> > function shouldn't be called unless metadata is actually enabled. > >> > > >> > Tested on imx93. > >> > > >> > Fixes: 1347b419318d ("net: stmmac: Add Tx HWTS support to XDP ZC") > >> > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > >> > > >> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > >> > index 9df27f03a8cb..77c62b26342d 100644 > >> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > >> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > >> > @@ -2678,9 +2678,10 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue, > >> > .desc = p, > >> > }; > >> > > >> > - xsk_tx_metadata_complete(&tx_q->tx_skbuff_dma[entry].xsk_meta, > >> > - &stmmac_xsk_tx_metadata_ops, > >> > - &tx_compl); > >> > + if (xp_tx_metadata_enabled(tx_q->xsk_pool)) > >> > > > >> every other usage of tx metadata functions should be wrapped with > >> xp_tx_metadata_enabled() - can you address other places and send a proper > >> patch? > > > > AFAICS this is the only place. But the change above still isn't enough > > to fix the problem. In my case XDP zero-copy isn't activated. So > > xsk_pool isn't allocated and the NULL/~NULL dereference is still > > persistent due to xp_tx_metadata_enabled() dereferencing the > > NULL-structure fields. The attached patched fixes the problem in my > > case. > > Sure about that? In my case without ZC the else path is not executed, > because skb is set. Absolutely. Don't know why you haven't got the NULL-dereference bug as I have. I was able to track the bug up to not having the stmmac_rx_queue::xsk_pool allocated. Maybe your case is different in a aspect that the pool had been pre-allocated someway before the ZC was disabled. Anyway I agree it will be safer to keep the xsk_pool pointer sanity check as it's done in the rest of the places in the driver. > > > > > Kurt, are you sure that xp_tx_metadata_enabled() is required in your > > case? > > Yes, I'm sure it's required, because I do use ZC without using any > metadata. Ok. > > > Could you test the attached patch with the xp_tx_metadata_enabled() > > invocation discarded? > > Well, it works. But, the xp_tx_metadata_enabled() is not discarded in > the ZC case: > > |RtcRxThread-790 [001] b...3 202.970243: stmmac_tx_clean.constprop.0: huhu from xp_tx_metadata_enabled > > Let's go with your version of the patch. It works without XDP, with XDP > and XDP/ZC. I'll send it upstream. > > Thanks for the help :). Thanks for posting the patch. Here is the link for the lore archive: https://lore.kernel.org/netdev/20240222-stmmac_xdp-v1-1-e8d2d2b79ff0@linutronix.de -Serge(y) > > Thanks, > Kurt ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-02-22 10:06 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-20 11:02 stmmac and XDP/ZC issue Kurt Kanzenbach 2024-02-20 13:18 ` Serge Semin 2024-02-20 14:43 ` Maciej Fijalkowski 2024-02-20 21:57 ` Stanislav Fomichev 2024-02-21 7:13 ` Kurt Kanzenbach 2024-02-21 9:21 ` Kurt Kanzenbach 2024-02-21 15:59 ` Maciej Fijalkowski 2024-02-21 17:20 ` Serge Semin 2024-02-22 8:35 ` Kurt Kanzenbach 2024-02-22 10:06 ` Serge Semin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).