All of lore.kernel.org
 help / color / mirror / Atom feed
* [paulmckrcu:dev.2025.05.05a] [rcutorture]  54d0803f63: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
@ 2025-05-14  2:23 kernel test robot
  2025-05-16  0:37 ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: kernel test robot @ 2025-05-14  2:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: oe-lkp, lkp, Joel Fernandes, Boqun Feng, linux-kernel,
	oliver.sang


hi, Paul,

we noticed various issues for both 54d0803f63/parent in this test.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type:
  vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/i386-randconfig-061-20250510/clang-20/300s/default/trivial

ec3f43f72b268d44 54d0803f632b402e519c7d97a6b
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :100         71%          71:98    rcutorture.trivial.fail   <--- (1)
           :100         71%          71:98    dmesg.EIP:rcu_torture_stats_print   <--- (2)
         94:100          0%          94:98    dmesg.EIP:synchronize_rcu_trivial
        100:100         -2%          98:98    dmesg.UBSAN:implicit-conversion_in_drivers/firewire/core-transaction.c
        100:100         -2%          98:98    dmesg.UBSAN:negation-overflow_in_lib/sort.c
        100:100         -2%          98:98    dmesg.UBSAN:negation-overflow_in_mm/memcontrol.c
           :100         71%          71:98    dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]   <--- (3)
         94:100          0%          94:98    dmesg.WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture]
        100:100         -2%          98:98    dmesg.boot_failures


but seems (1)(2)(3) are quite persistent on 54d0803f63 and clean on parent.


Hello,

kernel test robot noticed "WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]" on:

commit: 54d0803f632b402e519c7d97a6b52d5ffb78ae78 ("rcutorture: Start rcu_torture_writer() after rcu_torture_reader()")
https://github.com/paulmckrcu/linux dev.2025.05.05a

in testcase: rcutorture
version: 
with following parameters:

	runtime: 300s
	test: default
	torture_type: trivial



config: i386-randconfig-061-20250510
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202505131651.af6e81d7-lkp@intel.com


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250513/202505131651.af6e81d7-lkp@intel.com


[  232.971337][  T965] trivial-torture: rtc: ef358cc0 ver: 9677 tfle: 0 rta: 9677 rtaf: 0 rtf: 9668 rtmbe: 0 rtmbkf: 0/0 rtbe: 0 rtbke: 0 rtbf: 0 rtb: 0 nt: 3782 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=300) barrier: 0/0:0 read-exits: 64 nocb-toggles: 0:0 gpwraps: 0
[  232.974503][  T965] trivial-torture: !!! 
[  232.974595][  T965] ------------[ cut here ]------------
[  232.976982][  T965] WARNING: CPU: 0 PID: 965 at kernel/rcu/rcutorture.c:2730 rcu_torture_stats_print+0x8e4/0x8f0 [rcutorture]
[  232.980999][  T965] Modules linked in: rcutorture torture
[  232.982356][  T965] CPU: 0 UID: 0 PID: 965 Comm: rcu_torture_sta Tainted: G        W       T   6.15.0-rc1-00059-g54d0803f632b #1 PREEMPT(full)  5f9cb9cacb9aba8a18caee0ed4d4cf4452094bc2
[  232.990730][  T965] Tainted: [W]=WARN, [T]=RANDSTRUCT
[  232.992782][  T965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  232.995869][  T965] EIP: rcu_torture_stats_print+0x8e4/0x8f0 [rcutorture]
[  232.997526][  T965] Code: ff ff 0f 0b 83 3d 88 8f 35 ef 00 0f 84 b0 fb ff ff 0f 0b 83 3d 8c 8f 35 ef 00 0f 84 ae fb ff ff 0f 0b 84 db 0f 84 ac fb ff ff <0f> 0b e9 a5 fb ff ff 00 00 00 00 00 55 89 e5 53 57 56 83 ec 14 e8
[  233.002674][  T965] EAX: 00000004 EBX: 00000001 ECX: 00000000 EDX: 00000000
[  233.004477][  T965] ESI: 00000000 EDI: 00000000 EBP: 47efbf58 ESP: 47efbedc
[  233.006060][  T965] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
[  233.007496][  T965] CR0: 80050033 CR2: 353667f8 CR3: 7c473000 CR4: 000406d0
[  233.009173][  T965] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  233.010612][  T965] DR6: fffe0ff0 DR7: 00000400
[  233.011678][  T965] Call Trace:
[  233.012511][  T965]  ? rcu_torture_stats+0x24/0x70 [rcutorture 7eef68a8ad9c4bc219a4bda0dc180602dd9a9416]
[  233.014416][  T965]  ? kthread+0x1af/0x200
[  233.016388][  T965]  ? rcu_nocb_toggle+0x1c0/0x1c0 [rcutorture 7eef68a8ad9c4bc219a4bda0dc180602dd9a9416]
[  233.018427][  T965]  ? schedule_tail+0x79/0xf0
[  233.019297][  T965]  ? kthread_blkcg+0x30/0x30
[  233.020404][  T965]  ? kthread_blkcg+0x30/0x30
[  233.021230][  T965]  ? ret_from_fork+0x2c/0x40
[  233.022315][  T965]  ? ret_from_fork_asm+0x12/0x20
[  233.023169][  T965]  ? entry_INT80_32+0x10d/0x10d
[  233.024178][  T965] irq event stamp: 495
[  233.024991][  T965] hardirqs last  enabled at (509): [<4f981d9b>] __console_unlock+0x5b/0x70
[  233.026606][  T965] hardirqs last disabled at (520): [<4f981d82>] __console_unlock+0x42/0x70
[  233.028125][  T965] softirqs last  enabled at (282): [<51ba356a>] __do_softirq+0xa/0xe
[  233.029924][  T965] softirqs last disabled at (267): [<51ba356a>] __do_softirq+0xa/0xe
[  233.031571][  T965] ---[ end trace 0000000000000000 ]---
[  233.033193][  T965] Reader Pipe:  23875253 3963 1 0 0 0 0 0 0 0 0
[  233.034030][  T965] trivial-torture: Reader Batch:  23879217 0 0 0 0 0 0 0 0 0 0
[  233.035198][  T965] trivial-torture: Free-Block Circulation:  9676 9676 9675 9674 9673 9672 9671 9670 9669 9668 0

...

[  472.316419][ T2339] trivial-torture:--- End of test: FAILURE: nreaders=1 nfakewriters=4 stat_interval=60 verbose=1 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 test_boost_holdoff=0 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0 stall_cpu_repeat=0 n_barrier_cbs=0 onoff_interval=0 onoff_holdoff=0 read_exit_delay=13 read_exit_burst=16 reader_flavor=1 nocbs_nthreads=0 nocbs_toggle=1000 test_nmis=0 preempt_duration=0 preempt_interval=1000 n_up_down=32


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [paulmckrcu:dev.2025.05.05a] [rcutorture]  54d0803f63: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
  2025-05-14  2:23 [paulmckrcu:dev.2025.05.05a] [rcutorture] 54d0803f63: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture] kernel test robot
@ 2025-05-16  0:37 ` Paul E. McKenney
  2025-05-16 21:12   ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2025-05-16  0:37 UTC (permalink / raw)
  To: kernel test robot; +Cc: oe-lkp, lkp, Joel Fernandes, Boqun Feng, linux-kernel

On Wed, May 14, 2025 at 10:23:26AM +0800, kernel test robot wrote:
> 
> hi, Paul,
> 
> we noticed various issues for both 54d0803f63/parent in this test.
> 
> =========================================================================================
> tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type:
>   vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/i386-randconfig-061-20250510/clang-20/300s/default/trivial
> 
> ec3f43f72b268d44 54d0803f632b402e519c7d97a6b
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :100         71%          71:98    rcutorture.trivial.fail   <--- (1)
>            :100         71%          71:98    dmesg.EIP:rcu_torture_stats_print   <--- (2)
>          94:100          0%          94:98    dmesg.EIP:synchronize_rcu_trivial
>         100:100         -2%          98:98    dmesg.UBSAN:implicit-conversion_in_drivers/firewire/core-transaction.c
>         100:100         -2%          98:98    dmesg.UBSAN:negation-overflow_in_lib/sort.c
>         100:100         -2%          98:98    dmesg.UBSAN:negation-overflow_in_mm/memcontrol.c
>            :100         71%          71:98    dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]   <--- (3)
>          94:100          0%          94:98    dmesg.WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture]
>         100:100         -2%          98:98    dmesg.boot_failures
> 
> 
> but seems (1)(2)(3) are quite persistent on 54d0803f63 and clean on parent.
> 
> 
> Hello,
> 
> kernel test robot noticed "WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]" on:
> 
> commit: 54d0803f632b402e519c7d97a6b52d5ffb78ae78 ("rcutorture: Start rcu_torture_writer() after rcu_torture_reader()")
> https://github.com/paulmckrcu/linux dev.2025.05.05a
> 
> in testcase: rcutorture
> version: 
> with following parameters:
> 
> 	runtime: 300s
> 	test: default
> 	torture_type: trivial

Huh.  This is the trivial RCU implementation that is not used in the
Linux kernel (aside from rcutorture testing it), but verifies that my
slideware/textbook implementation actually works.

So you didn't manage to break the Linux kernel, just some of my
presentations.  ;-)

Still, this clearly needs to be fixed.

> config: i386-randconfig-061-20250510
> compiler: clang-20
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202505131651.af6e81d7-lkp@intel.com
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250513/202505131651.af6e81d7-lkp@intel.com
> 
> 
> [  232.971337][  T965] trivial-torture: rtc: ef358cc0 ver: 9677 tfle: 0 rta: 9677 rtaf: 0 rtf: 9668 rtmbe: 0 rtmbkf: 0/0 rtbe: 0 rtbke: 0 rtbf: 0 rtb: 0 nt: 3782 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=300) barrier: 0/0:0 read-exits: 64 nocb-toggles: 0:0 gpwraps: 0
> [  232.974503][  T965] trivial-torture: !!! 
> [  232.974595][  T965] ------------[ cut here ]------------
> [  232.976982][  T965] WARNING: CPU: 0 PID: 965 at kernel/rcu/rcutorture.c:2730 rcu_torture_stats_print+0x8e4/0x8f0 [rcutorture]

And this is a too-short grace period...

> [  232.980999][  T965] Modules linked in: rcutorture torture
> [  232.982356][  T965] CPU: 0 UID: 0 PID: 965 Comm: rcu_torture_sta Tainted: G        W       T   6.15.0-rc1-00059-g54d0803f632b #1 PREEMPT(full)  5f9cb9cacb9aba8a18caee0ed4d4cf4452094bc2
> [  232.990730][  T965] Tainted: [W]=WARN, [T]=RANDSTRUCT
> [  232.992782][  T965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [  232.995869][  T965] EIP: rcu_torture_stats_print+0x8e4/0x8f0 [rcutorture]
> [  232.997526][  T965] Code: ff ff 0f 0b 83 3d 88 8f 35 ef 00 0f 84 b0 fb ff ff 0f 0b 83 3d 8c 8f 35 ef 00 0f 84 ae fb ff ff 0f 0b 84 db 0f 84 ac fb ff ff <0f> 0b e9 a5 fb ff ff 00 00 00 00 00 55 89 e5 53 57 56 83 ec 14 e8
> [  233.002674][  T965] EAX: 00000004 EBX: 00000001 ECX: 00000000 EDX: 00000000
> [  233.004477][  T965] ESI: 00000000 EDI: 00000000 EBP: 47efbf58 ESP: 47efbedc
> [  233.006060][  T965] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> [  233.007496][  T965] CR0: 80050033 CR2: 353667f8 CR3: 7c473000 CR4: 000406d0
> [  233.009173][  T965] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  233.010612][  T965] DR6: fffe0ff0 DR7: 00000400
> [  233.011678][  T965] Call Trace:
> [  233.012511][  T965]  ? rcu_torture_stats+0x24/0x70 [rcutorture 7eef68a8ad9c4bc219a4bda0dc180602dd9a9416]
> [  233.014416][  T965]  ? kthread+0x1af/0x200
> [  233.016388][  T965]  ? rcu_nocb_toggle+0x1c0/0x1c0 [rcutorture 7eef68a8ad9c4bc219a4bda0dc180602dd9a9416]
> [  233.018427][  T965]  ? schedule_tail+0x79/0xf0
> [  233.019297][  T965]  ? kthread_blkcg+0x30/0x30
> [  233.020404][  T965]  ? kthread_blkcg+0x30/0x30
> [  233.021230][  T965]  ? ret_from_fork+0x2c/0x40
> [  233.022315][  T965]  ? ret_from_fork_asm+0x12/0x20
> [  233.023169][  T965]  ? entry_INT80_32+0x10d/0x10d
> [  233.024178][  T965] irq event stamp: 495
> [  233.024991][  T965] hardirqs last  enabled at (509): [<4f981d9b>] __console_unlock+0x5b/0x70
> [  233.026606][  T965] hardirqs last disabled at (520): [<4f981d82>] __console_unlock+0x42/0x70
> [  233.028125][  T965] softirqs last  enabled at (282): [<51ba356a>] __do_softirq+0xa/0xe
> [  233.029924][  T965] softirqs last disabled at (267): [<51ba356a>] __do_softirq+0xa/0xe
> [  233.031571][  T965] ---[ end trace 0000000000000000 ]---
> [  233.033193][  T965] Reader Pipe:  23875253 3963 1 0 0 0 0 0 0 0 0

... as further indicated by the "1" after the "3963".  This means that
there was one too-short trivial-RCU grace period in not quite four
minutes of testing.

> [  233.034030][  T965] trivial-torture: Reader Batch:  23879217 0 0 0 0 0 0 0 0 0 0
> [  233.035198][  T965] trivial-torture: Free-Block Circulation:  9676 9676 9675 9674 9673 9672 9671 9670 9669 9668 0
> 
> ...
> 
> [  472.316419][ T2339] trivial-torture:--- End of test: FAILURE: nreaders=1 nfakewriters=4 stat_interval=60 verbose=1 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 test_boost_holdoff=0 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0 stall_cpu_repeat=0 n_barrier_cbs=0 onoff_interval=0 onoff_holdoff=0 read_exit_delay=13 read_exit_burst=16 reader_flavor=1 nocbs_nthreads=0 nocbs_toggle=1000 test_nmis=0 preempt_duration=0 preempt_interval=1000 n_up_down=32

And this is one possible reason why:  "shuffle_interval=3".

The way that trivial RCU works is to schedule itself on each online CPU
in turn in this function:

static void synchronize_rcu_trivial(void)
{
	int cpu;

	for_each_online_cpu(cpu) {
		torture_sched_setaffinity(current->pid, cpumask_of(cpu), true);
		WARN_ON_ONCE(raw_smp_processor_id() != cpu);
	}
}

If torture_shuffle_tasks() moves this writer kthread just after the above
torture_sched_setaffinity() completes, then synchronize_rcu_trivial()
might miss its intended CPU, thus failing to wait for RCU readers on
that CPU.  This can in turn result in a too-short grace period.

Except that I would expect the WARN_ON_ONCE() to trigger in that scenario.

I will play with this a bit.  I don't want to force-disable shuffle
in the torture_type=trivial case because it is a useful way of testing
rcutorture itself.  But perhaps I can make the complaint more explicit.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [paulmckrcu:dev.2025.05.05a] [rcutorture]  54d0803f63: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
  2025-05-16  0:37 ` Paul E. McKenney
@ 2025-05-16 21:12   ` Paul E. McKenney
  0 siblings, 0 replies; 3+ messages in thread
From: Paul E. McKenney @ 2025-05-16 21:12 UTC (permalink / raw)
  To: kernel test robot; +Cc: oe-lkp, lkp, Joel Fernandes, Boqun Feng, linux-kernel

On Thu, May 15, 2025 at 05:37:03PM -0700, Paul E. McKenney wrote:
> On Wed, May 14, 2025 at 10:23:26AM +0800, kernel test robot wrote:
> > 
> > hi, Paul,
> > 
> > we noticed various issues for both 54d0803f63/parent in this test.
> > 
> > =========================================================================================
> > tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type:
> >   vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/i386-randconfig-061-20250510/clang-20/300s/default/trivial
> > 
> > ec3f43f72b268d44 54d0803f632b402e519c7d97a6b
> > ---------------- ---------------------------
> >        fail:runs  %reproduction    fail:runs
> >            |             |             |
> >            :100         71%          71:98    rcutorture.trivial.fail   <--- (1)
> >            :100         71%          71:98    dmesg.EIP:rcu_torture_stats_print   <--- (2)
> >          94:100          0%          94:98    dmesg.EIP:synchronize_rcu_trivial
> >         100:100         -2%          98:98    dmesg.UBSAN:implicit-conversion_in_drivers/firewire/core-transaction.c
> >         100:100         -2%          98:98    dmesg.UBSAN:negation-overflow_in_lib/sort.c
> >         100:100         -2%          98:98    dmesg.UBSAN:negation-overflow_in_mm/memcontrol.c
> >            :100         71%          71:98    dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]   <--- (3)
> >          94:100          0%          94:98    dmesg.WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture]
> >         100:100         -2%          98:98    dmesg.boot_failures
> > 
> > 
> > but seems (1)(2)(3) are quite persistent on 54d0803f63 and clean on parent.
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed "WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]" on:
> > 
> > commit: 54d0803f632b402e519c7d97a6b52d5ffb78ae78 ("rcutorture: Start rcu_torture_writer() after rcu_torture_reader()")
> > https://github.com/paulmckrcu/linux dev.2025.05.05a
> > 
> > in testcase: rcutorture
> > version: 
> > with following parameters:
> > 
> > 	runtime: 300s
> > 	test: default
> > 	torture_type: trivial
> 
> Huh.  This is the trivial RCU implementation that is not used in the
> Linux kernel (aside from rcutorture testing it), but verifies that my
> slideware/textbook implementation actually works.
> 
> So you didn't manage to break the Linux kernel, just some of my
> presentations.  ;-)
> 
> Still, this clearly needs to be fixed.
> 
> > config: i386-randconfig-061-20250510
> > compiler: clang-20
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > 
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202505131651.af6e81d7-lkp@intel.com
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250513/202505131651.af6e81d7-lkp@intel.com
> > 
> > 
> > [  232.971337][  T965] trivial-torture: rtc: ef358cc0 ver: 9677 tfle: 0 rta: 9677 rtaf: 0 rtf: 9668 rtmbe: 0 rtmbkf: 0/0 rtbe: 0 rtbke: 0 rtbf: 0 rtb: 0 nt: 3782 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=300) barrier: 0/0:0 read-exits: 64 nocb-toggles: 0:0 gpwraps: 0
> > [  232.974503][  T965] trivial-torture: !!! 
> > [  232.974595][  T965] ------------[ cut here ]------------
> > [  232.976982][  T965] WARNING: CPU: 0 PID: 965 at kernel/rcu/rcutorture.c:2730 rcu_torture_stats_print+0x8e4/0x8f0 [rcutorture]
> 
> And this is a too-short grace period...
> 
> > [  232.980999][  T965] Modules linked in: rcutorture torture
> > [  232.982356][  T965] CPU: 0 UID: 0 PID: 965 Comm: rcu_torture_sta Tainted: G        W       T   6.15.0-rc1-00059-g54d0803f632b #1 PREEMPT(full)  5f9cb9cacb9aba8a18caee0ed4d4cf4452094bc2
> > [  232.990730][  T965] Tainted: [W]=WARN, [T]=RANDSTRUCT
> > [  232.992782][  T965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [  232.995869][  T965] EIP: rcu_torture_stats_print+0x8e4/0x8f0 [rcutorture]
> > [  232.997526][  T965] Code: ff ff 0f 0b 83 3d 88 8f 35 ef 00 0f 84 b0 fb ff ff 0f 0b 83 3d 8c 8f 35 ef 00 0f 84 ae fb ff ff 0f 0b 84 db 0f 84 ac fb ff ff <0f> 0b e9 a5 fb ff ff 00 00 00 00 00 55 89 e5 53 57 56 83 ec 14 e8
> > [  233.002674][  T965] EAX: 00000004 EBX: 00000001 ECX: 00000000 EDX: 00000000
> > [  233.004477][  T965] ESI: 00000000 EDI: 00000000 EBP: 47efbf58 ESP: 47efbedc
> > [  233.006060][  T965] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010202
> > [  233.007496][  T965] CR0: 80050033 CR2: 353667f8 CR3: 7c473000 CR4: 000406d0
> > [  233.009173][  T965] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > [  233.010612][  T965] DR6: fffe0ff0 DR7: 00000400
> > [  233.011678][  T965] Call Trace:
> > [  233.012511][  T965]  ? rcu_torture_stats+0x24/0x70 [rcutorture 7eef68a8ad9c4bc219a4bda0dc180602dd9a9416]
> > [  233.014416][  T965]  ? kthread+0x1af/0x200
> > [  233.016388][  T965]  ? rcu_nocb_toggle+0x1c0/0x1c0 [rcutorture 7eef68a8ad9c4bc219a4bda0dc180602dd9a9416]
> > [  233.018427][  T965]  ? schedule_tail+0x79/0xf0
> > [  233.019297][  T965]  ? kthread_blkcg+0x30/0x30
> > [  233.020404][  T965]  ? kthread_blkcg+0x30/0x30
> > [  233.021230][  T965]  ? ret_from_fork+0x2c/0x40
> > [  233.022315][  T965]  ? ret_from_fork_asm+0x12/0x20
> > [  233.023169][  T965]  ? entry_INT80_32+0x10d/0x10d
> > [  233.024178][  T965] irq event stamp: 495
> > [  233.024991][  T965] hardirqs last  enabled at (509): [<4f981d9b>] __console_unlock+0x5b/0x70
> > [  233.026606][  T965] hardirqs last disabled at (520): [<4f981d82>] __console_unlock+0x42/0x70
> > [  233.028125][  T965] softirqs last  enabled at (282): [<51ba356a>] __do_softirq+0xa/0xe
> > [  233.029924][  T965] softirqs last disabled at (267): [<51ba356a>] __do_softirq+0xa/0xe
> > [  233.031571][  T965] ---[ end trace 0000000000000000 ]---
> > [  233.033193][  T965] Reader Pipe:  23875253 3963 1 0 0 0 0 0 0 0 0
> 
> ... as further indicated by the "1" after the "3963".  This means that
> there was one too-short trivial-RCU grace period in not quite four
> minutes of testing.
> 
> > [  233.034030][  T965] trivial-torture: Reader Batch:  23879217 0 0 0 0 0 0 0 0 0 0
> > [  233.035198][  T965] trivial-torture: Free-Block Circulation:  9676 9676 9675 9674 9673 9672 9671 9670 9669 9668 0
> > 
> > ...
> > 
> > [  472.316419][ T2339] trivial-torture:--- End of test: FAILURE: nreaders=1 nfakewriters=4 stat_interval=60 verbose=1 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 test_boost_holdoff=0 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0 stall_cpu_repeat=0 n_barrier_cbs=0 onoff_interval=0 onoff_holdoff=0 read_exit_delay=13 read_exit_burst=16 reader_flavor=1 nocbs_nthreads=0 nocbs_toggle=1000 test_nmis=0 preempt_duration=0 preempt_interval=1000 n_up_down=32
> 
> And this is one possible reason why:  "shuffle_interval=3".
> 
> The way that trivial RCU works is to schedule itself on each online CPU
> in turn in this function:
> 
> static void synchronize_rcu_trivial(void)
> {
> 	int cpu;
> 
> 	for_each_online_cpu(cpu) {
> 		torture_sched_setaffinity(current->pid, cpumask_of(cpu), true);
> 		WARN_ON_ONCE(raw_smp_processor_id() != cpu);
> 	}
> }
> 
> If torture_shuffle_tasks() moves this writer kthread just after the above
> torture_sched_setaffinity() completes, then synchronize_rcu_trivial()
> might miss its intended CPU, thus failing to wait for RCU readers on
> that CPU.  This can in turn result in a too-short grace period.
> 
> Except that I would expect the WARN_ON_ONCE() to trigger in that scenario.

Which you do have, now that I actually looked at the dmesg.  ;-)

> I will play with this a bit.  I don't want to force-disable shuffle
> in the torture_type=trivial case because it is a useful way of testing
> rcutorture itself.  But perhaps I can make the complaint more explicit.

And 200 hours of testing passed.  I am adding a change to splat, but to
force these kernel boot parameters to zero so as to avoid the spurious
grace-period failure.  I expect this to reach mainline during the v6.17
merge window.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-16 21:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14  2:23 [paulmckrcu:dev.2025.05.05a] [rcutorture] 54d0803f63: WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture] kernel test robot
2025-05-16  0:37 ` Paul E. McKenney
2025-05-16 21:12   ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.