* [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
@ 2024-07-30 14:25 David Wang
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
0 siblings, 2 replies; 6+ messages in thread
From: David Wang @ 2024-07-30 14:25 UTC (permalink / raw)
To: liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, tglx, x86
Hi,
When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
kernel: [ 1734.412974] smpboot: CPU 2 is now offline
kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
kernel: [ 1734.414964] CPU: 0 UID: 0 PID: 4619 Comm: systemd-sleep Tainted: P OE 6.11.0-rc1-linan-4 #292
kernel: [ 1734.414968] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: [ 1734.414969] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020
kernel: [ 1734.414970] Call Trace:
kernel: [ 1734.414974] <TASK>
kernel: [ 1734.414978] dump_stack_lvl+0x60/0x80
kernel: [ 1734.414982] check_preemption_disabled+0xce/0xe0
kernel: [ 1734.414987] hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
kernel: [ 1734.414992] ? __pfx_takedown_cpu+0x10/0x10
kernel: [ 1734.414996] takedown_cpu+0x97/0x130
kernel: [ 1734.414999] cpuhp_invoke_callback+0xf8/0x450
kernel: [ 1734.415004] __cpuhp_invoke_callback_range+0x78/0xe0
kernel: [ 1734.415008] _cpu_down+0xf4/0x360
kernel: [ 1734.415012] freeze_secondary_cpus+0xae/0x290
kernel: [ 1734.415016] suspend_devices_and_enter+0x1da/0x920
kernel: [ 1734.415022] pm_suspend+0x1fa/0x500
kernel: [ 1734.415025] state_store+0x68/0xd0
kernel: [ 1734.415028] kernfs_fop_write_iter+0x169/0x1f0
kernel: [ 1734.415034] vfs_write+0x269/0x440
kernel: [ 1734.415041] ksys_write+0x63/0xe0
kernel: [ 1734.415044] do_syscall_64+0x4b/0x110
kernel: [ 1734.415048] entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: [ 1734.415052] RIP: 0033:0x7fe885cee240
kernel: [ 1734.415055] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
kernel: [ 1734.415057] RSP: 002b:00007ffc53ccec58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
kernel: [ 1734.415060] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe885cee240
kernel: [ 1734.415062] RDX: 0000000000000004 RSI: 00007ffc53cced40 RDI: 0000000000000004
kernel: [ 1734.415063] RBP: 00007ffc53cced40 R08: 0000000000000007 R09: 000055f34dde8210
kernel: [ 1734.415064] R10: 6bccc22257390b18 R11: 0000000000000202 R12: 0000000000000004
kernel: [ 1734.415066] R13: 000055f34dde42d0 R14: 0000000000000004 R15: 00007fe885dc49e0
kernel: [ 1734.415071] </TASK>
I confirmed that this was introduced by commit:
f7d43dd206e7e18c182f200e67a8db8c209907fa tick/broadcast: Make takeover of broadcast hrtimer reliable
, and revert this commit can fix it.
Thanks
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
@ 2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15 ` Yu Liao
2024-07-31 13:17 ` David Wang
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
1 sibling, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-30 15:07 UTC (permalink / raw)
To: David Wang, liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, x86
On Tue, Jul 30 2024 at 22:25, David Wang wrote:
> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>
> kernel: [ 1734.412974] smpboot: CPU 2 is now offline
> kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
> kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
The below should fix that.
Thanks,
tglx
---
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
* device to avoid the starvation.
*/
if (tick_check_broadcast_expired()) {
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
tick_program_event(td->evtdev->next_event, 1);
}
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:15 ` Yu Liao
2024-07-31 13:17 ` David Wang
1 sibling, 0 replies; 6+ messages in thread
From: Yu Liao @ 2024-07-31 10:15 UTC (permalink / raw)
To: Thomas Gleixner, David Wang; +Cc: linux-kernel, linux-tip-commits, stable, x86
On 2024/7/30 23:07, Thomas Gleixner wrote:
> On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>> kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>> kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>> kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
>
> The below should fix that.
>
> Thanks,
>
> tglx
> ---
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
> #ifdef CONFIG_HOTPLUG_CPU
> void hotplug_cpu__broadcast_tick_pull(int deadcpu)
> {
> - struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> struct clock_event_device *bc;
> unsigned long flags;
>
> @@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
> * device to avoid the starvation.
> */
> if (tick_check_broadcast_expired()) {
> + struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> +
> cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
> tick_program_event(td->evtdev->next_event, 1);
> }
>
Sorry for causing this issue. I have tested the patch on an x86 machine, this
patch can fix the issue.
Tested-by: Yu Liao <liaoyu15@huawei.com>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15 ` Yu Liao
@ 2024-07-31 13:17 ` David Wang
1 sibling, 0 replies; 6+ messages in thread
From: David Wang @ 2024-07-31 13:17 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: liaoyu15, linux-kernel, linux-tip-commits, stable, x86
Hi,
At 2024-07-30 23:07:41, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>> kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>> kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>> kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
>
>The below should fix that.
>
>Thanks,
I thought the offending line was smp_processor_id() used for cpumask_clear_cpu, so confused by this patch.... never mind
Sorry for the delay, I applied the patch and it dose fix the issue.
FYI
David
>
> tglx
>---
>--- a/kernel/time/tick-broadcast.c
>+++ b/kernel/time/tick-broadcast.c
>@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
> #ifdef CONFIG_HOTPLUG_CPU
> void hotplug_cpu__broadcast_tick_pull(int deadcpu)
> {
>- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> struct clock_event_device *bc;
> unsigned long flags;
>
>@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
> * device to avoid the starvation.
> */
> if (tick_check_broadcast_expired()) {
>+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
>+
> cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
> tick_program_event(td->evtdev->next_event, 1);
> }
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:23 ` Thomas Gleixner
2024-07-31 10:42 ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-31 10:23 UTC (permalink / raw)
To: David Wang, liaoyu15
Cc: linux-kernel, linux-tip-commits, stable, x86, Frederic Weisbecker,
Anna-Maria Behnsen
The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:
BUG: using smp_processor_id() in preemptible [00000000] code:
caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
Move it to the actual usage site which is in a non-preemptible region.
Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
kernel/time/tick-broadcast.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
* device to avoid the starvation.
*/
if (tick_check_broadcast_expired()) {
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
tick_program_event(td->evtdev->next_event, 1);
}
^ permalink raw reply [flat|nested] 6+ messages in thread* [tip: timers/urgent] tick/broadcast: Move per CPU pointer access into the atomic section
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
@ 2024-07-31 10:42 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-07-31 10:42 UTC (permalink / raw)
To: linux-tip-commits
Cc: David Wang, Thomas Gleixner, Yu Liao, stable, x86, linux-kernel
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 6881e75237a84093d0986f56223db3724619f26e
Gitweb: https://git.kernel.org/tip/6881e75237a84093d0986f56223db3724619f26e
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 31 Jul 2024 12:23:51 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 31 Jul 2024 12:37:43 +02:00
tick/broadcast: Move per CPU pointer access into the atomic section
The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:
BUG: using smp_processor_id() in preemptible [00000000] code:
caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
Move it to the actual usage site which is in a non-preemptible region.
Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx
---
kernel/time/tick-broadcast.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index b484309..ed58eeb 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(void)
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(int deadcpu)
* device to avoid the starvation.
*/
if (tick_check_broadcast_expired()) {
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
tick_program_event(td->evtdev->next_event, 1);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-31 13:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15 ` Yu Liao
2024-07-31 13:17 ` David Wang
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
2024-07-31 10:42 ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox