regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify
@ 2025-08-14 21:13 Thomas Jarosch
  2025-08-15  9:18 ` Thomas Jarosch
  0 siblings, 1 reply; 2+ messages in thread
From: Thomas Jarosch @ 2025-08-14 21:13 UTC (permalink / raw)
  To: stable; +Cc: gregkh, Lion Ackermann, regressions

Hi,

I'm seeing a reproducible kernel oops on my home router updating from 5.15.181 to 5.15.189:

kernel BUG at lib/list_debug.c:50!
invalid opcode: 0000 [#1] SMP NOPTI
..
Call Trace:
 <TASK>
 drr_qlen_notify+0x11/0x50 [sch_drr]
 qdisc_tree_reduce_backlog+0x93/0xf0
 drr_graft_class+0x109/0x220 [sch_drr]
 qdisc_graft+0xdd/0x510
 ? qdisc_create+0x335/0x510
 tc_modify_qdisc+0x53f/0x9d0
 rtnetlink_rcv_msg+0x134/0x370
 ? __getblk_gfp+0x22/0x230
 ? rtnl_calcit.isra.38+0x130/0x130
 netlink_rcv_skb+0x4f/0x100
 rtnetlink_rcv+0x10/0x20
 netlink_unicast+0x1d2/0x2a0
 netlink_sendmsg+0x22a/0x480
 ? netlink_broadcast+0x20/0x20
 ____sys_sendmsg+0x25f/0x280
 ? copy_msghdr_from_user+0x5b/0x90
 ___sys_sendmsg+0x77/0xc0
 ? __sys_recvmsg+0x5a/0xb0
 ? do_filp_open+0xc3/0x120
 __sys_sendmsg+0x5d/0xb0
 __x64_sys_sendmsg+0x1a/0x20
 x64_sys_call+0x17f1/0x1c80
 do_syscall_64+0x53/0x80
 ? exit_to_user_mode_prepare+0x2c/0x140
 ? irqentry_exit_to_user_mode+0xe/0x20
 ? irqentry_exit+0x1d/0x30
 ? exc_page_fault+0x1e7/0x610
 ? do_syscall_64+0x5f/0x80
 entry_SYSCALL_64_after_hwframe+0x6c/0xd6
..
RIP: 0010:__list_del_entry_valid.cold.1+0xf/0x69


syzbot reported a similar looking thing here:

[v5.15] BUG: unable to handle kernel paging request in drr_qlen_notify
https://groups.google.com/g/syzkaller-lts-bugs/c/_QJHiMHwfRw/m/2j1nSU1hBgAJ

and here:

"[syzbot] [net?] general protection fault in drr_qlen_notify"
https://www.spinics.net/lists/netdev/msg1105420.html

syzboot bisected it to:

****************************************
commit e269f29e9395527bc00c213c6b15da04ebb35070
Refs: v5.15.186-114-ge269f29e9395
Author:     Lion Ackermann <nnamrec@gmail.com>
AuthorDate: Mon Jun 30 15:27:30 2025 +0200
Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CommitDate: Thu Jul 10 15:57:46 2025 +0200

    net/sched: Always pass notifications when child class becomes empty

    [ Upstream commit 103406b38c600fec1fe375a77b27d87e314aea09 ]
****************************************

The last line of the commit message mentions:

"This is not a problem after the recent patch series
that made all the classful qdiscs qlen_notify() handlers idempotent."


It looks like the "idempotent" patches are missing from the 5.15 stable series.

Like this one:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/sched/sch_drr.c?id=df008598b3a00be02a8051fde89ca0fbc416bd55

I've tried Ubuntu's backport for 5.15:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=7804135b4bdd82525f3ca0c4ad139ada6b7662d4

It seems to be identical to:
https://lore.kernel.org/stable/bcf9c70e9cf750363782816c21c69792f6c81cd9.1754751592.git.siddh.raman.pant@oracle.com/

While the kernel didn't oops anymore with the patch applied, the network traffic behaves erratic:
TCP traffic works, ICMP seems "stuck". tcpdump showed no icmp traffic on the ppp device.


Tomorrow I will try if I can reproduce the issue on a test VM.

Anything else I should try?

Thanks in advance,
Thomas

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify
  2025-08-14 21:13 [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify Thomas Jarosch
@ 2025-08-15  9:18 ` Thomas Jarosch
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Jarosch @ 2025-08-15  9:18 UTC (permalink / raw)
  To: stable; +Cc: gregkh, Lion Ackermann, regressions, Siddh Raman Pant,
	Sasha Levin

Hello again,

You wrote on Thu, Aug 14, 2025 at 11:13:32PM +0200:
> I'm seeing a reproducible kernel oops on my home router updating from 5.15.181 to 5.15.189:
> ..
>
> It seems to be identical to:
> https://lore.kernel.org/stable/bcf9c70e9cf750363782816c21c69792f6c81cd9.1754751592.git.siddh.raman.pant@oracle.com/
> 
> While the kernel didn't oops anymore with the patch applied, the network traffic behaves erratic:
> TCP traffic works, ICMP seems "stuck". tcpdump showed no icmp traffic on the ppp device.

I didn't realize yesterday that the bandwidth management script uses both drr and hfsc.

There is an upcoming patch series proposed for the 5.15 stable queue:

1. "[PATCH 5.15, 5.10 2/6] sch_drr: make drr_qlen_notify() idempotent"
https://lore.kernel.org/stable/bcf9c70e9cf750363782816c21c69792f6c81cd9.1754751592.git.siddh.raman.pant@oracle.com/#t

2. "[PATCH 5.15, 5.10 3/6] sch_hfsc: make hfsc_qlen_notify() idempotent"
https://lore.kernel.org/stable/8f1d425178ad93064465e15c68b38890b10b5814.1754751592.git.siddh.raman.pant@oracle.com/


With those two patches applied, kernel 5.15.189 doesn't oops anymore.
(if I drop the hfsc related patch, ICMP is broken as reported)

Thanks for all the hard work on the stable kernel series! It's highly appreciated.

Cheers,
Thomas

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-08-15  9:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14 21:13 [REGRESSION] 5.15.181 -> 5.15.189: kernel oops in drr_qlen_notify Thomas Jarosch
2025-08-15  9:18 ` Thomas Jarosch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).