* [BUG -next] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100
@ 2025-11-13 0:52 Paul E. McKenney
2025-11-13 7:37 ` Petr Mladek
0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2025-11-13 0:52 UTC (permalink / raw)
To: linux-kernel, linux-next
Cc: d-tatianin, pmladek, john.ogness, sfr, rostedt, senozhatsky
Hello!
Some rcutorture runs on next-20251110 hit the following error on x86:
WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100, CPU#0: rcu_torture_sta/63
This happens in about 20-25% of the rcutorture runs, and is the
WARN_ON_ONCE(1) in the "else" clause of get_data(). There was no
rcutorture scenario that failed to reproduce this bug, so I am guessing
that the various .config files will not provide useful information.
Please see the end of this email for a representative splat, which is
usually rcutorture printing out something or another. (Which, in its
defense, has worked just fine in the past.)
Bisection converged on this commit:
67e1b0052f6b ("printk_ringbuffer: don't needlessly wrap data blocks around")
Reverting this commit suppressed (or at least hugely reduced the
probability of) the WARN_ON_ONCE().
The SRCU-T, SRCU-U, and TREE09 scenarios hit this most frequently at
about double the base rate, but are CONFIG_SMP=n builds. The RUDE01
scenario was the most productive CONFIG_SMP=y scenario. Reproduce as
follows, where "N" is the number of CPUs on your system divided by three,
rounded down:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 --configs "N*RUDE01"
Or if you can do CONFIG_SMP=n, the following works, where "N" is the
number of CPUs on your system:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 --configs "N*SRCU-T"
Or please tell me what debug I should enable on my runs.
Thanx, Paul
------------------------------------------------------------------------
[ 1676.478083] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100, CPU#0: rcu_torture_sta/63
[ 1676.478092] Modules linked in:
[ 1676.478099] CPU: 0 UID: 0 PID: 63 Comm: rcu_torture_sta Not tainted 6.18.0-rc4-next-20251110-dirty #6903 PREEMPT(full)
[ 1676.478104] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 1676.478107] RIP: 0010:get_data+0xb3/0x100
[ 1676.478110] Code: 39 cf 75 50 48 8b 76 08 48 8d 4e 07 48 83 e1 f8 48 39 ce 75 4a 83 fa 07 76 50 83 ea 08 48 83 c0 08 41 89 11 c3 cc cc cc cc 90 <0f> 0b 90 31 c0 c3 cc cc cc cc 48 c7 c0 ff ff ff ff 44 29 c2 48 d3
[ 1676.478114] RSP: 0018:ffffb5e0c021bb98 EFLAGS: 00010006
[ 1676.478118] RAX: 0000000000000000 RBX: ffffffffbe27c560 RCX: 0000000000000012
[ 1676.478121] RDX: 0000000000000000 RSI: ffffffffbe342060 RDI: ffffffffbe27c590
[ 1676.478123] RBP: ffffb5e0c021bc78 R08: ffffffffffffff98 R09: ffffb5e0c021bbd0
[ 1676.478126] R10: 00003fffffffffff R11: ffffffffbe342058 R12: ffffb5e0c021bc60
[ 1676.478128] R13: 00000000ffffee71 R14: 000000000000003f R15: 0000000000000400
[ 1676.478133] FS: 0000000000000000(0000) GS:ffff9d0b605f7000(0000) knlGS:0000000000000000
[ 1676.478136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1676.478139] CR2: ffff9d0b0ac01000 CR3: 0000000001ed2000 CR4: 00000000000006f0
[ 1676.478141] Call Trace:
[ 1676.478155] <TASK>
[ 1676.478157] prb_reserve_in_last+0x190/0x4d0
[ 1676.478163] ? vsnprintf+0x11/0x4a0
[ 1676.478176] vprintk_store+0x41e/0x530
[ 1676.478191] vprintk_emit+0x8f/0x360
[ 1676.478199] _printk+0x56/0x70
[ 1676.478207] rcu_torture_stats_print+0x2fe/0x780
[ 1676.478218] ? lock_release+0xc6/0x290
[ 1676.478222] ? __pfx_rcu_torture_stats+0x10/0x10
[ 1676.478237] ? __pfx_rcu_torture_stats+0x10/0x10
[ 1676.478241] rcu_torture_stats+0x25/0x70
[ 1676.478245] kthread+0x102/0x200
[ 1676.478252] ? __pfx_kthread+0x10/0x10
[ 1676.478258] ret_from_fork+0x23f/0x280
[ 1676.478266] ? __pfx_kthread+0x10/0x10
[ 1676.478270] ret_from_fork_asm+0x1a/0x30
[ 1676.478285] </TASK>
[ 1676.478287] irq event stamp: 54788
[ 1676.478289] hardirqs last enabled at (54787): [<ffffffffbc933c3d>] __up_console_sem+0x4d/0x60
[ 1676.478293] hardirqs last disabled at (54788): [<ffffffffbc935ecc>] vprintk_store+0x3cc/0x530
[ 1676.478297] softirqs last enabled at (54672): [<ffffffffbc89fb8e>] handle_softirqs+0x2ee/0x3b0
[ 1676.478303] softirqs last disabled at (54659): [<ffffffffbc89fdb1>] __irq_exit_rcu+0xa1/0xc0
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [BUG -next] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100 2025-11-13 0:52 [BUG -next] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100 Paul E. McKenney @ 2025-11-13 7:37 ` Petr Mladek 2025-11-13 17:09 ` Paul E. McKenney 0 siblings, 1 reply; 3+ messages in thread From: Petr Mladek @ 2025-11-13 7:37 UTC (permalink / raw) To: Paul E. McKenney Cc: linux-kernel, linux-next, d-tatianin, john.ogness, sfr, rostedt, senozhatsky Hi Paul, first, thanks a lot for reporting the regression. On Wed 2025-11-12 16:52:16, Paul E. McKenney wrote: > Hello! > > Some rcutorture runs on next-20251110 hit the following error on x86: > > WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100, CPU#0: rcu_torture_sta/63 > > This happens in about 20-25% of the rcutorture runs, and is the > WARN_ON_ONCE(1) in the "else" clause of get_data(). There was no > rcutorture scenario that failed to reproduce this bug, so I am guessing > that the various .config files will not provide useful information. > Please see the end of this email for a representative splat, which is > usually rcutorture printing out something or another. (Which, in its > defense, has worked just fine in the past.) > > Bisection converged on this commit: > > 67e1b0052f6b ("printk_ringbuffer: don't needlessly wrap data blocks around") > > Reverting this commit suppressed (or at least hugely reduced the > probability of) the WARN_ON_ONCE(). > > The SRCU-T, SRCU-U, and TREE09 scenarios hit this most frequently at > about double the base rate, but are CONFIG_SMP=n builds. The RUDE01 > scenario was the most productive CONFIG_SMP=y scenario. Reproduce as > follows, where "N" is the number of CPUs on your system divided by three, > rounded down: > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 --configs "N*RUDE01" > > Or if you can do CONFIG_SMP=n, the following works, where "N" is the > number of CPUs on your system: > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 --configs "N*SRCU-T" > > Or please tell me what debug I should enable on my runs. The problem was reported by two test robots last week. It happens when a message fits exactly up to the last byte before the ring buffer gets wrapped for the first time. It is interesting that you have seen so frequently (in about 20-25% rcutorture runs). Anyway, I have pushed a fix on Monday. It is the commit cc3bad11de6e0d601 ("printk_ringbuffer: Fix check of valid data size when blk_lpos overflows"), see https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git/commit/?h=for-6.19&id=cc3bad11de6e0d6012460487903e7167d3e73957 Thanks a lot for so exhaustive report. And I am sorry that you probably spent a lot of time with it. Best Regards, Petr ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG -next] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100 2025-11-13 7:37 ` Petr Mladek @ 2025-11-13 17:09 ` Paul E. McKenney 0 siblings, 0 replies; 3+ messages in thread From: Paul E. McKenney @ 2025-11-13 17:09 UTC (permalink / raw) To: Petr Mladek Cc: linux-kernel, linux-next, d-tatianin, john.ogness, sfr, rostedt, senozhatsky On Thu, Nov 13, 2025 at 08:37:15AM +0100, Petr Mladek wrote: > Hi Paul, > > first, thanks a lot for reporting the regression. > > On Wed 2025-11-12 16:52:16, Paul E. McKenney wrote: > > Hello! > > > > Some rcutorture runs on next-20251110 hit the following error on x86: > > > > WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100, CPU#0: rcu_torture_sta/63 > > > > This happens in about 20-25% of the rcutorture runs, and is the > > WARN_ON_ONCE(1) in the "else" clause of get_data(). There was no > > rcutorture scenario that failed to reproduce this bug, so I am guessing > > that the various .config files will not provide useful information. > > Please see the end of this email for a representative splat, which is > > usually rcutorture printing out something or another. (Which, in its > > defense, has worked just fine in the past.) > > > > Bisection converged on this commit: > > > > 67e1b0052f6b ("printk_ringbuffer: don't needlessly wrap data blocks around") > > > > Reverting this commit suppressed (or at least hugely reduced the > > probability of) the WARN_ON_ONCE(). > > > > The SRCU-T, SRCU-U, and TREE09 scenarios hit this most frequently at > > about double the base rate, but are CONFIG_SMP=n builds. The RUDE01 > > scenario was the most productive CONFIG_SMP=y scenario. Reproduce as > > follows, where "N" is the number of CPUs on your system divided by three, > > rounded down: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 --configs "N*RUDE01" > > > > Or if you can do CONFIG_SMP=n, the following works, where "N" is the > > number of CPUs on your system: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5 --configs "N*SRCU-T" > > > > Or please tell me what debug I should enable on my runs. > > The problem was reported by two test robots last week. It happens when > a message fits exactly up to the last byte before the ring buffer gets > wrapped for the first time. It is interesting that you have seen > so frequently (in about 20-25% rcutorture runs). > > Anyway, I have pushed a fix on Monday. It is the commit > cc3bad11de6e0d601 ("printk_ringbuffer: Fix check of > valid data size when blk_lpos overflows"), see > https://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git/commit/?h=for-6.19&id=cc3bad11de6e0d6012460487903e7167d3e73957 Even better! Thank you for the fix. > Thanks a lot for so exhaustive report. And I am sorry that you > probably spent a lot of time with it. Well, actually, it was the first time that I turned "git bisect run" loose on a full (and fully scripted) remote RCU run. Each step involved checking out 20 systems from the test group, building 20 kernels, downloading the build products to each of the 20 systems, running each of 286 guest OSes (15 each for 19 of the kernels and one instance of the large one) spread over the 20 systems, waiting for them to finish, uploading the test results, returning the systems to the test group, analyzing them, and reporting either success (all runs succeeded) or failure (at least one failure across the 286 kernels. Then my grepping through the run results directory to get you the failure rate. Of course, that failure rate indicates that I could have done the bisection more quickly and with much less hardware, but that would have required me to stop the other things I was doing and actually think about this. Each step took somewhere between 90 minutes and two hours on a total of 1600 CPUs, and all ~11 bisection steps completed without my intervention. Thus far, neither the test grid, the systems, the scripting, nor git bisect have complained about my having wasted their time, but what with AI they probably soon will do so. I am somewhat surprised that it all went through without something breaking, but I guess that we all get lucky from time to time. ;-) So not a lot of work for me, which is a good thing, given that I had lots of other distractions, including another much more troublesome bisection on ARM that actually found a bug that had not yet been fixed. A trivial bug, admittedly, but such is life! And again, thank you for so quickly fixing this! Thanx, Paul ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-11-13 17:09 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-13 0:52 [BUG -next] WARNING: kernel/printk/printk_ringbuffer.c:1278 at get_data+0xb3/0x100 Paul E. McKenney 2025-11-13 7:37 ` Petr Mladek 2025-11-13 17:09 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox