All of lore.kernel.org
 help / color / mirror / Atom feed
* [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
@ 2024-07-30 14:25 David Wang
  2024-07-30 15:07 ` Thomas Gleixner
  2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
  0 siblings, 2 replies; 6+ messages in thread
From: David Wang @ 2024-07-30 14:25 UTC (permalink / raw)
  To: liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, tglx, x86

Hi,

When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:

 kernel: [ 1734.412974] smpboot: CPU 2 is now offline
 kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
 kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
 kernel: [ 1734.414964] CPU: 0 UID: 0 PID: 4619 Comm: systemd-sleep Tainted: P           OE      6.11.0-rc1-linan-4 #292
 kernel: [ 1734.414968] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
 kernel: [ 1734.414969] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020
 kernel: [ 1734.414970] Call Trace:
 kernel: [ 1734.414974]  <TASK>
 kernel: [ 1734.414978]  dump_stack_lvl+0x60/0x80
 kernel: [ 1734.414982]  check_preemption_disabled+0xce/0xe0
 kernel: [ 1734.414987]  hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
 kernel: [ 1734.414992]  ? __pfx_takedown_cpu+0x10/0x10
 kernel: [ 1734.414996]  takedown_cpu+0x97/0x130
 kernel: [ 1734.414999]  cpuhp_invoke_callback+0xf8/0x450
 kernel: [ 1734.415004]  __cpuhp_invoke_callback_range+0x78/0xe0
 kernel: [ 1734.415008]  _cpu_down+0xf4/0x360
 kernel: [ 1734.415012]  freeze_secondary_cpus+0xae/0x290
 kernel: [ 1734.415016]  suspend_devices_and_enter+0x1da/0x920
 kernel: [ 1734.415022]  pm_suspend+0x1fa/0x500
 kernel: [ 1734.415025]  state_store+0x68/0xd0
 kernel: [ 1734.415028]  kernfs_fop_write_iter+0x169/0x1f0
 kernel: [ 1734.415034]  vfs_write+0x269/0x440
 kernel: [ 1734.415041]  ksys_write+0x63/0xe0
 kernel: [ 1734.415044]  do_syscall_64+0x4b/0x110
 kernel: [ 1734.415048]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 kernel: [ 1734.415052] RIP: 0033:0x7fe885cee240
 kernel: [ 1734.415055] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
 kernel: [ 1734.415057] RSP: 002b:00007ffc53ccec58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
 kernel: [ 1734.415060] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe885cee240
 kernel: [ 1734.415062] RDX: 0000000000000004 RSI: 00007ffc53cced40 RDI: 0000000000000004
 kernel: [ 1734.415063] RBP: 00007ffc53cced40 R08: 0000000000000007 R09: 000055f34dde8210
 kernel: [ 1734.415064] R10: 6bccc22257390b18 R11: 0000000000000202 R12: 0000000000000004
 kernel: [ 1734.415066] R13: 000055f34dde42d0 R14: 0000000000000004 R15: 00007fe885dc49e0
 kernel: [ 1734.415071]  </TASK>


I confirmed that this was introduced by commit:
 f7d43dd206e7e18c182f200e67a8db8c209907fa tick/broadcast: Make takeover of broadcast hrtimer reliable
, and revert this commit can fix it.


Thanks
David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
  2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
@ 2024-07-30 15:07 ` Thomas Gleixner
  2024-07-31 10:15   ` Yu Liao
  2024-07-31 13:17   ` David Wang
  2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
  1 sibling, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-30 15:07 UTC (permalink / raw)
  To: David Wang, liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, x86

On Tue, Jul 30 2024 at 22:25, David Wang wrote:
> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>
>  kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>  kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>  kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0

The below should fix that.

Thanks,

        tglx
---
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
 #ifdef CONFIG_HOTPLUG_CPU
 void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 {
-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
 	struct clock_event_device *bc;
 	unsigned long flags;
 
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
 		 * device to avoid the starvation.
 		 */
 		if (tick_check_broadcast_expired()) {
+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
 			tick_program_event(td->evtdev->next_event, 1);
 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
  2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:15   ` Yu Liao
  2024-07-31 13:17   ` David Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Yu Liao @ 2024-07-31 10:15 UTC (permalink / raw)
  To: Thomas Gleixner, David Wang; +Cc: linux-kernel, linux-tip-commits, stable, x86

On 2024/7/30 23:07, Thomas Gleixner wrote:
> On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>>  kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>>  kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>>  kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
> 
> The below should fix that.
> 
> Thanks,
> 
>         tglx
> ---
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
>  #ifdef CONFIG_HOTPLUG_CPU
>  void hotplug_cpu__broadcast_tick_pull(int deadcpu)
>  {
> -	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
>  	struct clock_event_device *bc;
>  	unsigned long flags;
>  
> @@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
>  		 * device to avoid the starvation.
>  		 */
>  		if (tick_check_broadcast_expired()) {
> +			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> +
>  			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
>  			tick_program_event(td->evtdev->next_event, 1);
>  		}
> 

Sorry for causing this issue. I have tested the patch on an x86 machine, this
patch can fix the issue.

Tested-by: Yu Liao <liaoyu15@huawei.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section
  2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
  2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:23 ` Thomas Gleixner
  2024-07-31 10:42   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-31 10:23 UTC (permalink / raw)
  To: David Wang, liaoyu15
  Cc: linux-kernel, linux-tip-commits, stable, x86, Frederic Weisbecker,
	Anna-Maria Behnsen

The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.

This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:

  BUG: using smp_processor_id() in preemptible [00000000] code:
       caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0

Move it to the actual usage site which is in a non-preemptible region.

Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 kernel/time/tick-broadcast.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
 #ifdef CONFIG_HOTPLUG_CPU
 void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 {
-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
 	struct clock_event_device *bc;
 	unsigned long flags;
 
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
 		 * device to avoid the starvation.
 		 */
 		if (tick_check_broadcast_expired()) {
+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
 			tick_program_event(td->evtdev->next_event, 1);
 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip: timers/urgent] tick/broadcast: Move per CPU pointer access into the atomic section
  2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
@ 2024-07-31 10:42   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-07-31 10:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David Wang, Thomas Gleixner, Yu Liao, stable, x86, linux-kernel

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     6881e75237a84093d0986f56223db3724619f26e
Gitweb:        https://git.kernel.org/tip/6881e75237a84093d0986f56223db3724619f26e
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 31 Jul 2024 12:23:51 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 31 Jul 2024 12:37:43 +02:00

tick/broadcast: Move per CPU pointer access into the atomic section

The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.

This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:

  BUG: using smp_processor_id() in preemptible [00000000] code:
       caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0

Move it to the actual usage site which is in a non-preemptible region.

Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx
---
 kernel/time/tick-broadcast.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index b484309..ed58eeb 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(void)
 #ifdef CONFIG_HOTPLUG_CPU
 void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 {
-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
 	struct clock_event_device *bc;
 	unsigned long flags;
 
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(int deadcpu)
 		 * device to avoid the starvation.
 		 */
 		if (tick_check_broadcast_expired()) {
+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
 			tick_program_event(td->evtdev->next_event, 1);
 		}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
  2024-07-30 15:07 ` Thomas Gleixner
  2024-07-31 10:15   ` Yu Liao
@ 2024-07-31 13:17   ` David Wang
  1 sibling, 0 replies; 6+ messages in thread
From: David Wang @ 2024-07-31 13:17 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: liaoyu15, linux-kernel, linux-tip-commits, stable, x86

Hi, 

At 2024-07-30 23:07:41, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>>  kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>>  kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>>  kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
>
>The below should fix that.
>
>Thanks,

I thought the offending line was smp_processor_id() used for cpumask_clear_cpu, so confused by this patch.... never mind

Sorry for the delay, I applied the patch and it dose fix the issue.

FYI
David 

>
>        tglx
>---
>--- a/kernel/time/tick-broadcast.c
>+++ b/kernel/time/tick-broadcast.c
>@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
> #ifdef CONFIG_HOTPLUG_CPU
> void hotplug_cpu__broadcast_tick_pull(int deadcpu)
> {
>-	struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> 	struct clock_event_device *bc;
> 	unsigned long flags;
> 
>@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
> 		 * device to avoid the starvation.
> 		 */
> 		if (tick_check_broadcast_expired()) {
>+			struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
>+
> 			cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
> 			tick_program_event(td->evtdev->next_event, 1);
> 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-31 13:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15   ` Yu Liao
2024-07-31 13:17   ` David Wang
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
2024-07-31 10:42   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.