rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [rcu]  b41642c877: BUG:kernel_hang_in_boot_stage
@ 2025-08-07  5:39 kernel test robot
  2025-08-08 17:34 ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: kernel test robot @ 2025-08-07  5:39 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: oe-lkp, lkp, linux-kernel, Neeraj Upadhyay, Xiongfeng Wang, Qi Xi,
	Paul E. McKenney, Linux Kernel Functional Testing, rcu,
	oliver.sang



Hello,

kernel test robot noticed "BUG:kernel_hang_in_boot_stage" on:

commit: b41642c87716bbd09797b1e4ea7d904f06c39b7b ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      7e161a991ea71e6ec526abc8f40c6852ebe3d946]
[test failed on linux-next/master 5c5a10f0be967a8950a2309ea965bae54251b50e]

in testcase: boot

config: i386-randconfig-2006-20250804
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-------------------------------+------------+------------+
|                               | d827673d8a | b41642c877 |
+-------------------------------+------------+------------+
| boot_successes                | 15         | 0          |
| boot_failures                 | 0          | 15         |
| BUG:kernel_hang_in_boot_stage | 0          | 15         |
+-------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com


[    8.004044][    T1] Run /init as init process
[    8.004883][    T1]   with arguments:
[    8.005620][    T1]     /init
[    8.006332][    T1]   with environment:
[    8.007080][    T1]     HOME=/
[    8.007722][    T1]     TERM=linux
[    8.008407][    T1]     RESULT_ROOT=/result/boot/1/vm-snb/quantal-i386-core-20190426.cgz/i386-randconfig-2006-20250804/clang-20/b41642c87716bbd09797b1e4ea7d904f06c39b7b/0
[    8.010989][    T1]     BOOT_IMAGE=/pkg/linux/i386-randconfig-2006-20250804/clang-20/b41642c87716bbd09797b1e4ea7d904f06c39b7b/vmlinuz-6.16.0-rc3-00005-gb41642c87716
[    8.013525][    T1]     branch=gustavoars/testing/wfamnae-next20250804
[    8.014578][    T1]     job=/lkp/jobs/scheduled/vm-meta-10/boot-1-quantal-i386-core-20190426.cgz-i386-randconfig-2006-20250804-b41642c87716-20250805-101381-u20lqt-1.yaml
[    8.016412][    T1]     user=lkp
[    8.016884][    T1]     ARCH=i386
[    8.017372][    T1]     kconfig=i386-randconfig-2006-20250804
[    8.018109][    T1]     commit=b41642c87716bbd09797b1e4ea7d904f06c39b7b
[    8.018928][    T1]     intremap=posted_msi
[    8.019543][    T1]     max_uptime=600
[    8.020064][    T1]     LKP_SERVER=internal-lkp-server
[    8.020735][    T1]     selinux=0
[    8.021215][    T1]     apic=debug
[    8.021707][    T1]     prompt_ramdisk=0
[    8.022251][    T1]     vga=normal
[    8.022791][    T1]     ia32_emulation=on
[    8.023339][    T1]     riscv_isa_fallback=1
[    8.779434][    C0] random: crng init done
LKP: ttyS0: 107: skip deploy intel ucode as no ucode is specified
BUG: kernel hang in boot stage


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250807/202508071303.c1134cce-lkp@intel.com


we didn't observe more information in dmesg. and we only observe this "kernel
hang in boot stage" with a i386 random config. kernel is compiled by clang-20.


as a contrast, parent dmesg looks like below:

[    8.301768][    T1] Run /init as init process
[    8.302821][    T1]   with arguments:
[    8.303849][    T1]     /init
[    8.304408][    T1]   with environment:
[    8.305053][    T1]     HOME=/
[    8.305635][    T1]     TERM=linux
[    8.306225][    T1]     RESULT_ROOT=/result/boot/1/vm-snb/quantal-i386-core-20190426.cgz/i386-randconfig-2006-20250804/clang-20/d827673d8a4e69937dd3731da2686a2d8206aef5/0
[    8.308434][    T1]     BOOT_IMAGE=/pkg/linux/i386-randconfig-2006-20250804/clang-20/d827673d8a4e69937dd3731da2686a2d8206aef5/vmlinuz-6.16.0-rc3-00004-gd827673d8a4e
[    8.310619][    T1]     branch=gustavoars/testing/wfamnae-next20250804
[    8.311610][    T1]     job=/lkp/jobs/scheduled/vm-meta-7/boot-1-quantal-i386-core-20190426.cgz-i386-randconfig-2006-20250804-d827673d8a4e-20250805-99925-p5h1vs-1.yaml
[    8.313843][    T1]     user=lkp
[    8.314287][    T1]     ARCH=i386
[    8.314701][    T1]     kconfig=i386-randconfig-2006-20250804
[    8.315329][    T1]     commit=d827673d8a4e69937dd3731da2686a2d8206aef5
[    8.316029][    T1]     intremap=posted_msi
[    8.316574][    T1]     max_uptime=600
[    8.317018][    T1]     LKP_SERVER=internal-lkp-server
[    8.317599][    T1]     selinux=0
[    8.318009][    T1]     apic=debug
[    8.318424][    T1]     prompt_ramdisk=0
[    8.318888][    T1]     vga=normal
[    8.319307][    T1]     ia32_emulation=on
[    8.319841][    T1]     riscv_isa_fallback=1
[    9.079686][    C0] random: crng init done
LKP: ttyS0: 108: skip deploy intel ucode as no ucode is specified
[    9.371040][  T182] udevd[182]: starting version 175
LKP: ttyS0: 108: Kernel tests: Boot OK!
LKP: ttyS0: 108: HOSTNAME vm-snb, MAC 52:54:00:12:34:56, kernel 6.16.0-rc3-00004-gd827673d8a4e 1
LKP: ttyS0: 108:  /lkp/lkp/src/bin/run-lkp /lkp/jobs/scheduled/vm-meta-7/boot-1-quantal-i386-core-20190426.cgz-i386-randconfig-2006-20250804-d827673d8a4e-20250805-99925-p5h1vs-1.yaml
[   10.016422][    T1] init: failsafe main process (327) killed by TERM signal
[   10.080302][    T1] init: udev-fallback-graphics main process (366) terminated with status 1
[   10.120853][    T1] init: networking main process (373) terminated with status 1
LKP: ttyS0: 108: LKP: rebooting forcely
[   21.082808][  T108] sysrq: Emergency Sync
[   21.084955][   T40] Emergency Sync complete
[   21.086476][  T108] sysrq: Resetting



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linus:master] [rcu]  b41642c877: BUG:kernel_hang_in_boot_stage
  2025-08-07  5:39 [linus:master] [rcu] b41642c877: BUG:kernel_hang_in_boot_stage kernel test robot
@ 2025-08-08 17:34 ` Frederic Weisbecker
  2025-08-09 19:02   ` Joel Fernandes
  2025-08-13  4:40   ` Neeraj Upadhyay
  0 siblings, 2 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2025-08-08 17:34 UTC (permalink / raw)
  To: kernel test robot
  Cc: Joel Fernandes, oe-lkp, lkp, linux-kernel, Neeraj Upadhyay,
	Xiongfeng Wang, Qi Xi, Paul E. McKenney,
	Linux Kernel Functional Testing, rcu

Le Thu, Aug 07, 2025 at 01:39:32PM +0800, kernel test robot a écrit :
> 
> 
> Hello,
> 
> kernel test robot noticed "BUG:kernel_hang_in_boot_stage" on:
> 
> commit: b41642c87716bbd09797b1e4ea7d904f06c39b7b ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> [test failed on linus/master      7e161a991ea71e6ec526abc8f40c6852ebe3d946]
> [test failed on linux-next/master 5c5a10f0be967a8950a2309ea965bae54251b50e]
> 
> in testcase: boot
> 
> config: i386-randconfig-2006-20250804
> compiler: clang-20
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> +-------------------------------+------------+------------+
> |                               | d827673d8a | b41642c877 |
> +-------------------------------+------------+------------+
> | boot_successes                | 15         | 0          |
> | boot_failures                 | 0          | 15         |
> | BUG:kernel_hang_in_boot_stage | 0          | 15         |
> +-------------------------------+------------+------------+
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com

#syz test

From a3cc7624264743996d2ad1295741933103a8d63b Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic@kernel.org>
Date: Fri, 8 Aug 2025 19:03:22 +0200
Subject: [PATCH] rcu: Fix racy re-initialization of irq_work causing hangs

RCU re-initializes the deferred QS irq work everytime before attempting
to queue it. However there are situations where the irq work is
attempted to be queued even though it is already queued. In that case
re-initializing messes-up with the irq work queue that is about to be
handled.

The chances for that to happen are higher when the architecture doesn't
support self-IPIs and irq work are then all lazy, such as with the
following sequence:

1) rcu_read_unlock() is called when IRQs are disabled and there is a
   grace period involving blocked tasks on the node. The irq work
   is then initialized and queued.

2) The related tasks are unblocked and the CPU quiescent state
   is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE,
   allowing the irq work to be requeued in the future (note the previous
   one hasn't fired yet).

3) A new grace period starts and the node has blocked tasks.

4) rcu_read_unlock() is called when IRQs are disabled again. The irq work
   is re-initialized (but it's queued! and its node is cleared) and
   requeued. Which means it's requeued to itself.

5) The irq work finally fires with the tick. But since it was requeued
   to itself, it loops and hangs.

Fix this with initializing the irq work only once before the CPU boots.

Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c        | 2 ++
 kernel/rcu/tree.h        | 1 +
 kernel/rcu/tree_plugin.h | 8 ++++++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8c22db759978..3a17466ae84a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4242,6 +4242,8 @@ int rcutree_prepare_cpu(unsigned int cpu)
 	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
 	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
+	rcu_preempt_deferred_qs_init(rdp);
 	rcu_spawn_rnp_kthreads(rnp);
 	rcu_spawn_cpu_nocb_kthread(cpu);
 	ASSERT_EXCLUSIVE_WRITER(rcu_state.n_online_cpus);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index de6ca13a7b5f..b8bbe7960cda 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -488,6 +488,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp);
 static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
 static void rcu_flavor_sched_clock_irq(int user);
 static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
+static void rcu_preempt_deferred_qs_init(struct rcu_data *rdp);
 static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
 static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
 static bool rcu_is_callbacks_kthread(struct rcu_data *rdp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b6f44871f774..c99701dfffa9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -699,8 +699,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
 			    cpu_online(rdp->cpu)) {
 				// Get scheduler to re-evaluate and call hooks.
 				// If !IRQ_WORK, FQS scan will eventually IPI.
-				rdp->defer_qs_iw =
-					IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
 				rdp->defer_qs_iw_pending = DEFER_QS_PENDING;
 				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
 			}
@@ -840,6 +838,10 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
 	}
 }
 
+static void rcu_preempt_deferred_qs_init(struct rcu_data *rdp)
+{
+	rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
+}
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
 /*
@@ -1039,6 +1041,8 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
 	WARN_ON_ONCE(!list_empty(&rnp->blkd_tasks));
 }
 
+static void rcu_preempt_deferred_qs_init(struct rcu_data *rdp) { }
+
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
 /*
-- 
2.50.1




-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [linus:master] [rcu] b41642c877: BUG:kernel_hang_in_boot_stage
  2025-08-08 17:34 ` Frederic Weisbecker
@ 2025-08-09 19:02   ` Joel Fernandes
  2025-08-13  4:40   ` Neeraj Upadhyay
  1 sibling, 0 replies; 6+ messages in thread
From: Joel Fernandes @ 2025-08-09 19:02 UTC (permalink / raw)
  To: Frederic Weisbecker, kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Neeraj Upadhyay, Xiongfeng Wang, Qi Xi,
	Paul E. McKenney, Linux Kernel Functional Testing, rcu



On 8/8/2025 1:34 PM, Frederic Weisbecker wrote:
> Le Thu, Aug 07, 2025 at 01:39:32PM +0800, kernel test robot a écrit :
>>
>>
>> Hello,
>>
>> kernel test robot noticed "BUG:kernel_hang_in_boot_stage" on:
>>
>> commit: b41642c87716bbd09797b1e4ea7d904f06c39b7b ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> [test failed on linus/master      7e161a991ea71e6ec526abc8f40c6852ebe3d946]
>> [test failed on linux-next/master 5c5a10f0be967a8950a2309ea965bae54251b50e]
>>
>> in testcase: boot
>>
>> config: i386-randconfig-2006-20250804
>> compiler: clang-20
>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>>
>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>
>>
>> +-------------------------------+------------+------------+
>> |                               | d827673d8a | b41642c877 |
>> +-------------------------------+------------+------------+
>> | boot_successes                | 15         | 0          |
>> | boot_failures                 | 0          | 15         |
>> | BUG:kernel_hang_in_boot_stage | 0          | 15         |
>> +-------------------------------+------------+------------+
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
> 
> #syz test
> 
> From a3cc7624264743996d2ad1295741933103a8d63b Mon Sep 17 00:00:00 2001
> From: Frederic Weisbecker <frederic@kernel.org>
> Date: Fri, 8 Aug 2025 19:03:22 +0200
> Subject: [PATCH] rcu: Fix racy re-initialization of irq_work causing hangs
> 
> RCU re-initializes the deferred QS irq work everytime before attempting
> to queue it. However there are situations where the irq work is
> attempted to be queued even though it is already queued. In that case
> re-initializing messes-up with the irq work queue that is about to be
> handled.
> 
> The chances for that to happen are higher when the architecture doesn't
> support self-IPIs and irq work are then all lazy, such as with the
> following sequence:
> 
> 1) rcu_read_unlock() is called when IRQs are disabled and there is a
>    grace period involving blocked tasks on the node. The irq work
>    is then initialized and queued.
> 
> 2) The related tasks are unblocked and the CPU quiescent state
>    is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE,
>    allowing the irq work to be requeued in the future (note the previous
>    one hasn't fired yet).
> 
> 3) A new grace period starts and the node has blocked tasks.
> 
> 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work
>    is re-initialized (but it's queued! and its node is cleared) and
>    requeued. Which means it's requeued to itself.
> 
> 5) The irq work finally fires with the tick. But since it was requeued
>    to itself, it loops and hangs.
> 
> Fix this with initializing the irq work only once before the CPU boots.

Makes sense, good catch and thanks!

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>

 - Joel



> 
> Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree.c        | 2 ++
>  kernel/rcu/tree.h        | 1 +
>  kernel/rcu/tree_plugin.h | 8 ++++++--
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8c22db759978..3a17466ae84a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4242,6 +4242,8 @@ int rcutree_prepare_cpu(unsigned int cpu)
>  	rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
>  	trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +
> +	rcu_preempt_deferred_qs_init(rdp);
>  	rcu_spawn_rnp_kthreads(rnp);
>  	rcu_spawn_cpu_nocb_kthread(cpu);
>  	ASSERT_EXCLUSIVE_WRITER(rcu_state.n_online_cpus);
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index de6ca13a7b5f..b8bbe7960cda 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -488,6 +488,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp);
>  static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
>  static void rcu_flavor_sched_clock_irq(int user);
>  static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
> +static void rcu_preempt_deferred_qs_init(struct rcu_data *rdp);
>  static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
>  static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
>  static bool rcu_is_callbacks_kthread(struct rcu_data *rdp);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index b6f44871f774..c99701dfffa9 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -699,8 +699,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  			    cpu_online(rdp->cpu)) {
>  				// Get scheduler to re-evaluate and call hooks.
>  				// If !IRQ_WORK, FQS scan will eventually IPI.
> -				rdp->defer_qs_iw =
> -					IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
>  				rdp->defer_qs_iw_pending = DEFER_QS_PENDING;
>  				irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
>  			}
> @@ -840,6 +838,10 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
>  	}
>  }
>  
> +static void rcu_preempt_deferred_qs_init(struct rcu_data *rdp)
> +{
> +	rdp->defer_qs_iw = IRQ_WORK_INIT_HARD(rcu_preempt_deferred_qs_handler);
> +}
>  #else /* #ifdef CONFIG_PREEMPT_RCU */
>  
>  /*
> @@ -1039,6 +1041,8 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
>  	WARN_ON_ONCE(!list_empty(&rnp->blkd_tasks));
>  }
>  
> +static void rcu_preempt_deferred_qs_init(struct rcu_data *rdp) { }
> +
>  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
>  
>  /*


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linus:master] [rcu]  b41642c877: BUG:kernel_hang_in_boot_stage
  2025-08-08 17:34 ` Frederic Weisbecker
  2025-08-09 19:02   ` Joel Fernandes
@ 2025-08-13  4:40   ` Neeraj Upadhyay
  2025-08-13 13:43     ` Oliver Sang
  1 sibling, 1 reply; 6+ messages in thread
From: Neeraj Upadhyay @ 2025-08-13  4:40 UTC (permalink / raw)
  To: Frederic Weisbecker, kernel test robot
  Cc: Joel Fernandes, oe-lkp, lkp, linux-kernel, Neeraj Upadhyay,
	Xiongfeng Wang, Qi Xi, Paul E. McKenney,
	Linux Kernel Functional Testing, rcu

Hi kernel test robot,

> #syz test
> 
> >From a3cc7624264743996d2ad1295741933103a8d63b Mon Sep 17 00:00:00 2001
> From: Frederic Weisbecker <frederic@kernel.org>
> Date: Fri, 8 Aug 2025 19:03:22 +0200
> Subject: [PATCH] rcu: Fix racy re-initialization of irq_work causing hangs
> 
> RCU re-initializes the deferred QS irq work everytime before attempting
> to queue it. However there are situations where the irq work is
> attempted to be queued even though it is already queued. In that case
> re-initializing messes-up with the irq work queue that is about to be
> handled.
> 
> The chances for that to happen are higher when the architecture doesn't
> support self-IPIs and irq work are then all lazy, such as with the
> following sequence:
> 
> 1) rcu_read_unlock() is called when IRQs are disabled and there is a
>    grace period involving blocked tasks on the node. The irq work
>    is then initialized and queued.
> 
> 2) The related tasks are unblocked and the CPU quiescent state
>    is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE,
>    allowing the irq work to be requeued in the future (note the previous
>    one hasn't fired yet).
> 
> 3) A new grace period starts and the node has blocked tasks.
> 
> 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work
>    is re-initialized (but it's queued! and its node is cleared) and
>    requeued. Which means it's requeued to itself.
> 
> 5) The irq work finally fires with the tick. But since it was requeued
>    to itself, it loops and hangs.
> 
> Fix this with initializing the irq work only once before the CPU boots.
> 
> Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---

Can you please update testing results with the proposed fix?


- Neeraj


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linus:master] [rcu]  b41642c877: BUG:kernel_hang_in_boot_stage
  2025-08-13  4:40   ` Neeraj Upadhyay
@ 2025-08-13 13:43     ` Oliver Sang
  2025-08-13 13:55       ` Neeraj Upadhyay
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Sang @ 2025-08-13 13:43 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: Frederic Weisbecker, Joel Fernandes, oe-lkp, lkp, linux-kernel,
	Xiongfeng Wang, Qi Xi, Paul E. McKenney,
	Linux Kernel Functional Testing, rcu

hi, Neeraj,

On Wed, Aug 13, 2025 at 10:10:24AM +0530, Neeraj Upadhyay wrote:
> Hi kernel test robot,
> 
> > #syz test

sorry, we are not syz bot :) didn't capture this.

[...]

> > Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> 
> Can you please update testing results with the proposed fix?

now we finished the test with the proposed fix, it works. thanks!

Tested-by: kernel test robot <oliver.sang@intel.com>

> 
> 
> - Neeraj
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linus:master] [rcu]  b41642c877: BUG:kernel_hang_in_boot_stage
  2025-08-13 13:43     ` Oliver Sang
@ 2025-08-13 13:55       ` Neeraj Upadhyay
  0 siblings, 0 replies; 6+ messages in thread
From: Neeraj Upadhyay @ 2025-08-13 13:55 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Frederic Weisbecker, Joel Fernandes, oe-lkp, lkp, linux-kernel,
	Xiongfeng Wang, Qi Xi, Paul E. McKenney,
	Linux Kernel Functional Testing, rcu

On Wed, Aug 13, 2025 at 09:43:53PM +0800, Oliver Sang wrote:
> hi, Neeraj,
> 
> On Wed, Aug 13, 2025 at 10:10:24AM +0530, Neeraj Upadhyay wrote:
> > Hi kernel test robot,
> > 
> > > #syz test
> 
> sorry, we are not syz bot :) didn't capture this.
> 

Ah ok!

> [...]
> 
> > > Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > 
> > Can you please update testing results with the proposed fix?
> 
> now we finished the test with the proposed fix, it works. thanks!
> 
> Tested-by: kernel test robot <oliver.sang@intel.com>
>

Thanks for testing!


- Neeraj

> > 
> > 
> > - Neeraj
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-13 13:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07  5:39 [linus:master] [rcu] b41642c877: BUG:kernel_hang_in_boot_stage kernel test robot
2025-08-08 17:34 ` Frederic Weisbecker
2025-08-09 19:02   ` Joel Fernandes
2025-08-13  4:40   ` Neeraj Upadhyay
2025-08-13 13:43     ` Oliver Sang
2025-08-13 13:55       ` Neeraj Upadhyay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).