public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
@ 2024-07-30 14:25 David Wang
  2024-07-30 15:07 ` Thomas Gleixner
  2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
  0 siblings, 2 replies; 6+ messages in thread
From: David Wang @ 2024-07-30 14:25 UTC (permalink / raw)
  To: liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, tglx, x86

Hi,

When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:

 kernel: [ 1734.412974] smpboot: CPU 2 is now offline
 kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
 kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
 kernel: [ 1734.414964] CPU: 0 UID: 0 PID: 4619 Comm: systemd-sleep Tainted: P           OE      6.11.0-rc1-linan-4 #292
 kernel: [ 1734.414968] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
 kernel: [ 1734.414969] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020
 kernel: [ 1734.414970] Call Trace:
 kernel: [ 1734.414974]  <TASK>
 kernel: [ 1734.414978]  dump_stack_lvl+0x60/0x80
 kernel: [ 1734.414982]  check_preemption_disabled+0xce/0xe0
 kernel: [ 1734.414987]  hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
 kernel: [ 1734.414992]  ? __pfx_takedown_cpu+0x10/0x10
 kernel: [ 1734.414996]  takedown_cpu+0x97/0x130
 kernel: [ 1734.414999]  cpuhp_invoke_callback+0xf8/0x450
 kernel: [ 1734.415004]  __cpuhp_invoke_callback_range+0x78/0xe0
 kernel: [ 1734.415008]  _cpu_down+0xf4/0x360
 kernel: [ 1734.415012]  freeze_secondary_cpus+0xae/0x290
 kernel: [ 1734.415016]  suspend_devices_and_enter+0x1da/0x920
 kernel: [ 1734.415022]  pm_suspend+0x1fa/0x500
 kernel: [ 1734.415025]  state_store+0x68/0xd0
 kernel: [ 1734.415028]  kernfs_fop_write_iter+0x169/0x1f0
 kernel: [ 1734.415034]  vfs_write+0x269/0x440
 kernel: [ 1734.415041]  ksys_write+0x63/0xe0
 kernel: [ 1734.415044]  do_syscall_64+0x4b/0x110
 kernel: [ 1734.415048]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 kernel: [ 1734.415052] RIP: 0033:0x7fe885cee240
 kernel: [ 1734.415055] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
 kernel: [ 1734.415057] RSP: 002b:00007ffc53ccec58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
 kernel: [ 1734.415060] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe885cee240
 kernel: [ 1734.415062] RDX: 0000000000000004 RSI: 00007ffc53cced40 RDI: 0000000000000004
 kernel: [ 1734.415063] RBP: 00007ffc53cced40 R08: 0000000000000007 R09: 000055f34dde8210
 kernel: [ 1734.415064] R10: 6bccc22257390b18 R11: 0000000000000202 R12: 0000000000000004
 kernel: [ 1734.415066] R13: 000055f34dde42d0 R14: 0000000000000004 R15: 00007fe885dc49e0
 kernel: [ 1734.415071]  </TASK>


I confirmed that this was introduced by commit:
 f7d43dd206e7e18c182f200e67a8db8c209907fa tick/broadcast: Make takeover of broadcast hrtimer reliable
, and revert this commit can fix it.


Thanks
David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
  2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
@ 2024-07-30 15:07 ` Thomas Gleixner
  2024-07-31 10:15   ` Yu Liao
  2024-07-31 13:17   ` David Wang
  2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
  1 sibling, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-30 15:07 UTC (permalink / raw)
  To: David Wang, liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, x86

On Tue, Jul 30 2024 at 22:25, David Wang wrote:
> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>
>  kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>  kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>  kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0

The below should fix that.

Thanks,

        tglx
---
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
 #ifdef CONFIG_HOTPLUG_CPU
 void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 {
-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
 	struct clock_event_device *bc;
 	unsigned long flags;
 
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
 		 * device to avoid the starvation.
 		 */
 		if (tick_check_broadcast_expired()) {
+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
 			tick_program_event(td->evtdev->next_event, 1);
 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
  2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:15   ` Yu Liao
  2024-07-31 13:17   ` David Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Yu Liao @ 2024-07-31 10:15 UTC (permalink / raw)
  To: Thomas Gleixner, David Wang; +Cc: linux-kernel, linux-tip-commits, stable, x86

On 2024/7/30 23:07, Thomas Gleixner wrote:
> On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>>  kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>>  kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>>  kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
> 
> The below should fix that.
> 
> Thanks,
> 
>         tglx
> ---
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
>  #ifdef CONFIG_HOTPLUG_CPU
>  void hotplug_cpu__broadcast_tick_pull(int deadcpu)
>  {
> -	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
>  	struct clock_event_device *bc;
>  	unsigned long flags;
>  
> @@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
>  		 * device to avoid the starvation.
>  		 */
>  		if (tick_check_broadcast_expired()) {
> +			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> +
>  			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
>  			tick_program_event(td->evtdev->next_event, 1);
>  		}
> 

Sorry for causing this issue. I have tested the patch on an x86 machine, this
patch can fix the issue.

Tested-by: Yu Liao <liaoyu15@huawei.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section
  2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
  2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:23 ` Thomas Gleixner
  2024-07-31 10:42   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-31 10:23 UTC (permalink / raw)
  To: David Wang, liaoyu15
  Cc: linux-kernel, linux-tip-commits, stable, x86, Frederic Weisbecker,
	Anna-Maria Behnsen

The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.

This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:

  BUG: using smp_processor_id() in preemptible [00000000] code:
       caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0

Move it to the actual usage site which is in a non-preemptible region.

Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 kernel/time/tick-broadcast.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
 #ifdef CONFIG_HOTPLUG_CPU
 void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 {
-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
 	struct clock_event_device *bc;
 	unsigned long flags;
 
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
 		 * device to avoid the starvation.
 		 */
 		if (tick_check_broadcast_expired()) {
+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
 			tick_program_event(td->evtdev->next_event, 1);
 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip: timers/urgent] tick/broadcast: Move per CPU pointer access into the atomic section
  2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
@ 2024-07-31 10:42   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-07-31 10:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David Wang, Thomas Gleixner, Yu Liao, stable, x86, linux-kernel

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     6881e75237a84093d0986f56223db3724619f26e
Gitweb:        https://git.kernel.org/tip/6881e75237a84093d0986f56223db3724619f26e
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 31 Jul 2024 12:23:51 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 31 Jul 2024 12:37:43 +02:00

tick/broadcast: Move per CPU pointer access into the atomic section

The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.

This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:

  BUG: using smp_processor_id() in preemptible [00000000] code:
       caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0

Move it to the actual usage site which is in a non-preemptible region.

Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx
---
 kernel/time/tick-broadcast.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index b484309..ed58eeb 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(void)
 #ifdef CONFIG_HOTPLUG_CPU
 void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 {
-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
 	struct clock_event_device *bc;
 	unsigned long flags;
 
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 		 * device to avoid the starvation.
 		 */
 		if (tick_check_broadcast_expired()) {
+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
 			tick_program_event(td->evtdev->next_event, 1);
 		}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
  2024-07-30 15:07 ` Thomas Gleixner
  2024-07-31 10:15   ` Yu Liao
@ 2024-07-31 13:17   ` David Wang
  1 sibling, 0 replies; 6+ messages in thread
From: David Wang @ 2024-07-31 13:17 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: liaoyu15, linux-kernel, linux-tip-commits, stable, x86

Hi, 

At 2024-07-30 23:07:41, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>>  kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>>  kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>>  kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
>
>The below should fix that.
>
>Thanks,

I thought the offending line was smp_processor_id() used for cpumask_clear_cpu, so confused by this patch.... never mind

Sorry for the delay, I applied the patch and it dose fix the issue.

FYI
David 

>
>        tglx
>---
>--- a/kernel/time/tick-broadcast.c
>+++ b/kernel/time/tick-broadcast.c
>@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
> #ifdef CONFIG_HOTPLUG_CPU
> void hotplug_cpu__broadcast_tick_pull(int deadcpu)
> {
>-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> 	struct clock_event_device *bc;
> 	unsigned long flags;
> 
>@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
> 		 * device to avoid the starvation.
> 		 */
> 		if (tick_check_broadcast_expired()) {
>+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
>+
> 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
> 			tick_program_event(td->evtdev->next_event, 1);
> 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-31 13:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15   ` Yu Liao
2024-07-31 13:17   ` David Wang
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
2024-07-31 10:42   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox