* [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify
@ 2025-08-14 21:13 Thomas Jarosch
2025-08-15 9:18 ` Thomas Jarosch
0 siblings, 1 reply; 2+ messages in thread
From: Thomas Jarosch @ 2025-08-14 21:13 UTC (permalink / raw)
To: stable; +Cc: gregkh, Lion Ackermann, regressions
Hi,
I'm seeing a reproducible kernel oops on my home router updating from 5.15.181 to 5.15.189:
kernel BUG at lib/list_debug.c:50!
invalid opcode: 0000 [#1] SMP NOPTI
..
Call Trace:
<TASK>
drr_qlen_notify+0x11/0x50 [sch_drr]
qdisc_tree_reduce_backlog+0x93/0xf0
drr_graft_class+0x109/0x220 [sch_drr]
qdisc_graft+0xdd/0x510
? qdisc_create+0x335/0x510
tc_modify_qdisc+0x53f/0x9d0
rtnetlink_rcv_msg+0x134/0x370
? __getblk_gfp+0x22/0x230
? rtnl_calcit.isra.38+0x130/0x130
netlink_rcv_skb+0x4f/0x100
rtnetlink_rcv+0x10/0x20
netlink_unicast+0x1d2/0x2a0
netlink_sendmsg+0x22a/0x480
? netlink_broadcast+0x20/0x20
____sys_sendmsg+0x25f/0x280
? copy_msghdr_from_user+0x5b/0x90
___sys_sendmsg+0x77/0xc0
? __sys_recvmsg+0x5a/0xb0
? do_filp_open+0xc3/0x120
__sys_sendmsg+0x5d/0xb0
__x64_sys_sendmsg+0x1a/0x20
x64_sys_call+0x17f1/0x1c80
do_syscall_64+0x53/0x80
? exit_to_user_mode_prepare+0x2c/0x140
? irqentry_exit_to_user_mode+0xe/0x20
? irqentry_exit+0x1d/0x30
? exc_page_fault+0x1e7/0x610
? do_syscall_64+0x5f/0x80
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
..
RIP: 0010:__list_del_entry_valid.cold.1+0xf/0x69
syzbot reported a similar looking thing here:
[v5.15] BUG: unable to handle kernel paging request in drr_qlen_notify
https://groups.google.com/g/syzkaller-lts-bugs/c/_QJHiMHwfRw/m/2j1nSU1hBgAJ
and here:
"[syzbot] [net?] general protection fault in drr_qlen_notify"
https://www.spinics.net/lists/netdev/msg1105420.html
syzboot bisected it to:
****************************************
commit e269f29e9395527bc00c213c6b15da04ebb35070
Refs: v5.15.186-114-ge269f29e9395
Author: Lion Ackermann <nnamrec@gmail.com>
AuthorDate: Mon Jun 30 15:27:30 2025 +0200
Commit: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CommitDate: Thu Jul 10 15:57:46 2025 +0200
net/sched: Always pass notifications when child class becomes empty
[ Upstream commit 103406b38c600fec1fe375a77b27d87e314aea09 ]
****************************************
The last line of the commit message mentions:
"This is not a problem after the recent patch series
that made all the classful qdiscs qlen_notify() handlers idempotent."
It looks like the "idempotent" patches are missing from the 5.15 stable series.
Like this one:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/sched/sch_drr.c?id=df008598b3a00be02a8051fde89ca0fbc416bd55
I've tried Ubuntu's backport for 5.15:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=7804135b4bdd82525f3ca0c4ad139ada6b7662d4
It seems to be identical to:
https://lore.kernel.org/stable/bcf9c70e9cf750363782816c21c69792f6c81cd9.1754751592.git.siddh.raman.pant@oracle.com/
While the kernel didn't oops anymore with the patch applied, the network traffic behaves erratic:
TCP traffic works, ICMP seems "stuck". tcpdump showed no icmp traffic on the ppp device.
Tomorrow I will try if I can reproduce the issue on a test VM.
Anything else I should try?
Thanks in advance,
Thomas
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify
2025-08-14 21:13 [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify Thomas Jarosch
@ 2025-08-15 9:18 ` Thomas Jarosch
0 siblings, 0 replies; 2+ messages in thread
From: Thomas Jarosch @ 2025-08-15 9:18 UTC (permalink / raw)
To: stable; +Cc: gregkh, Lion Ackermann, regressions, Siddh Raman Pant,
Sasha Levin
Hello again,
You wrote on Thu, Aug 14, 2025 at 11:13:32PM +0200:
> I'm seeing a reproducible kernel oops on my home router updating from 5.15.181 to 5.15.189:
> ..
>
> It seems to be identical to:
> https://lore.kernel.org/stable/bcf9c70e9cf750363782816c21c69792f6c81cd9.1754751592.git.siddh.raman.pant@oracle.com/
>
> While the kernel didn't oops anymore with the patch applied, the network traffic behaves erratic:
> TCP traffic works, ICMP seems "stuck". tcpdump showed no icmp traffic on the ppp device.
I didn't realize yesterday that the bandwidth management script uses both drr and hfsc.
There is an upcoming patch series proposed for the 5.15 stable queue:
1. "[PATCH 5.15, 5.10 2/6] sch_drr: make drr_qlen_notify() idempotent"
https://lore.kernel.org/stable/bcf9c70e9cf750363782816c21c69792f6c81cd9.1754751592.git.siddh.raman.pant@oracle.com/#t
2. "[PATCH 5.15, 5.10 3/6] sch_hfsc: make hfsc_qlen_notify() idempotent"
https://lore.kernel.org/stable/8f1d425178ad93064465e15c68b38890b10b5814.1754751592.git.siddh.raman.pant@oracle.com/
With those two patches applied, kernel 5.15.189 doesn't oops anymore.
(if I drop the hfsc related patch, ICMP is broken as reported)
Thanks for all the hard work on the stable kernel series! It's highly appreciated.
Cheers,
Thomas
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-08-15 9:18 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14 21:13 [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify Thomas Jarosch
2025-08-15 9:18 ` Thomas Jarosch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).