* [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
@ 2024-07-30 14:25 David Wang
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
0 siblings, 2 replies; 6+ messages in thread
From: David Wang @ 2024-07-30 14:25 UTC (permalink / raw)
To: liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, tglx, x86
Hi,
When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
kernel: [ 1734.412974] smpboot: CPU 2 is now offline
kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
kernel: [ 1734.414964] CPU: 0 UID: 0 PID: 4619 Comm: systemd-sleep Tainted: P OE 6.11.0-rc1-linan-4 #292
kernel: [ 1734.414968] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: [ 1734.414969] Hardware name: Micro-Star International Co., Ltd. MS-7B89/B450M MORTAR MAX (MS-7B89), BIOS 2.80 06/10/2020
kernel: [ 1734.414970] Call Trace:
kernel: [ 1734.414974] <TASK>
kernel: [ 1734.414978] dump_stack_lvl+0x60/0x80
kernel: [ 1734.414982] check_preemption_disabled+0xce/0xe0
kernel: [ 1734.414987] hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
kernel: [ 1734.414992] ? __pfx_takedown_cpu+0x10/0x10
kernel: [ 1734.414996] takedown_cpu+0x97/0x130
kernel: [ 1734.414999] cpuhp_invoke_callback+0xf8/0x450
kernel: [ 1734.415004] __cpuhp_invoke_callback_range+0x78/0xe0
kernel: [ 1734.415008] _cpu_down+0xf4/0x360
kernel: [ 1734.415012] freeze_secondary_cpus+0xae/0x290
kernel: [ 1734.415016] suspend_devices_and_enter+0x1da/0x920
kernel: [ 1734.415022] pm_suspend+0x1fa/0x500
kernel: [ 1734.415025] state_store+0x68/0xd0
kernel: [ 1734.415028] kernfs_fop_write_iter+0x169/0x1f0
kernel: [ 1734.415034] vfs_write+0x269/0x440
kernel: [ 1734.415041] ksys_write+0x63/0xe0
kernel: [ 1734.415044] do_syscall_64+0x4b/0x110
kernel: [ 1734.415048] entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: [ 1734.415052] RIP: 0033:0x7fe885cee240
kernel: [ 1734.415055] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
kernel: [ 1734.415057] RSP: 002b:00007ffc53ccec58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
kernel: [ 1734.415060] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe885cee240
kernel: [ 1734.415062] RDX: 0000000000000004 RSI: 00007ffc53cced40 RDI: 0000000000000004
kernel: [ 1734.415063] RBP: 00007ffc53cced40 R08: 0000000000000007 R09: 000055f34dde8210
kernel: [ 1734.415064] R10: 6bccc22257390b18 R11: 0000000000000202 R12: 0000000000000004
kernel: [ 1734.415066] R13: 000055f34dde42d0 R14: 0000000000000004 R15: 00007fe885dc49e0
kernel: [ 1734.415071] </TASK>
I confirmed that this was introduced by commit:
f7d43dd206e7e18c182f200e67a8db8c209907fa tick/broadcast: Make takeover of broadcast hrtimer reliable
, and revert this commit can fix it.
Thanks
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
@ 2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15 ` Yu Liao
2024-07-31 13:17 ` David Wang
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
1 sibling, 2 replies; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-30 15:07 UTC (permalink / raw)
To: David Wang, liaoyu15; +Cc: linux-kernel, linux-tip-commits, stable, x86
On Tue, Jul 30 2024 at 22:25, David Wang wrote:
> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>
> kernel: [ 1734.412974] smpboot: CPU 2 is now offline
> kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
> kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
The below should fix that.
Thanks,
tglx
---
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
* device to avoid the starvation.
*/
if (tick_check_broadcast_expired()) {
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
tick_program_event(td->evtdev->next_event, 1);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:15 ` Yu Liao
2024-07-31 13:17 ` David Wang
1 sibling, 0 replies; 6+ messages in thread
From: Yu Liao @ 2024-07-31 10:15 UTC (permalink / raw)
To: Thomas Gleixner, David Wang; +Cc: linux-kernel, linux-tip-commits, stable, x86
On 2024/7/30 23:07, Thomas Gleixner wrote:
> On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>> kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>> kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>> kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
>
> The below should fix that.
>
> Thanks,
>
> tglx
> ---
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
> #ifdef CONFIG_HOTPLUG_CPU
> void hotplug_cpu__broadcast_tick_pull(int deadcpu)
> {
> - struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> struct clock_event_device *bc;
> unsigned long flags;
>
> @@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
> * device to avoid the starvation.
> */
> if (tick_check_broadcast_expired()) {
> + struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> +
> cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
> tick_program_event(td->evtdev->next_event, 1);
> }
>
Sorry for causing this issue. I have tested the patch on an x86 machine, this
patch can fix the issue.
Tested-by: Yu Liao <liaoyu15@huawei.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
2024-07-30 15:07 ` Thomas Gleixner
@ 2024-07-31 10:23 ` Thomas Gleixner
2024-07-31 10:42 ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
1 sibling, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2024-07-31 10:23 UTC (permalink / raw)
To: David Wang, liaoyu15
Cc: linux-kernel, linux-tip-commits, stable, x86, Frederic Weisbecker,
Anna-Maria Behnsen
The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:
BUG: using smp_processor_id() in preemptible [00000000] code:
caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
Move it to the actual usage site which is in a non-preemptible region.
Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
kernel/time/tick-broadcast.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
* device to avoid the starvation.
*/
if (tick_check_broadcast_expired()) {
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
tick_program_event(td->evtdev->next_event, 1);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip: timers/urgent] tick/broadcast: Move per CPU pointer access into the atomic section
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
@ 2024-07-31 10:42 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2024-07-31 10:42 UTC (permalink / raw)
To: linux-tip-commits
Cc: David Wang, Thomas Gleixner, Yu Liao, stable, x86, linux-kernel
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 6881e75237a84093d0986f56223db3724619f26e
Gitweb: https://git.kernel.org/tip/6881e75237a84093d0986f56223db3724619f26e
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 31 Jul 2024 12:23:51 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 31 Jul 2024 12:37:43 +02:00
tick/broadcast: Move per CPU pointer access into the atomic section
The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.
This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:
BUG: using smp_processor_id() in preemptible [00000000] code:
caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
Move it to the actual usage site which is in a non-preemptible region.
Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx
---
kernel/time/tick-broadcast.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index b484309..ed58eeb 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(void)
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;
@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(int deadcpu)
* device to avoid the starvation.
*/
if (tick_check_broadcast_expired()) {
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
+
cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
tick_program_event(td->evtdev->next_event, 1);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15 ` Yu Liao
@ 2024-07-31 13:17 ` David Wang
1 sibling, 0 replies; 6+ messages in thread
From: David Wang @ 2024-07-31 13:17 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: liaoyu15, linux-kernel, linux-tip-commits, stable, x86
Hi,
At 2024-07-30 23:07:41, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>On Tue, Jul 30 2024 at 22:25, David Wang wrote:
>> When I suspend my system, via `systemctl suspend`, kernel BUG shows up in log:
>>
>> kernel: [ 1734.412974] smpboot: CPU 2 is now offline
>> kernel: [ 1734.414952] BUG: using smp_processor_id() in preemptible [00000000] code: systemd-sleep/4619
>> kernel: [ 1734.414957] caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0
>
>The below should fix that.
>
>Thanks,
I thought the offending line was smp_processor_id() used for cpumask_clear_cpu, so confused by this patch.... never mind
Sorry for the delay, I applied the patch and it dose fix the issue.
FYI
David
>
> tglx
>---
>--- a/kernel/time/tick-broadcast.c
>+++ b/kernel/time/tick-broadcast.c
>@@ -1141,7 +1141,6 @@ void tick_broadcast_switch_to_oneshot(vo
> #ifdef CONFIG_HOTPLUG_CPU
> void hotplug_cpu__broadcast_tick_pull(int deadcpu)
> {
>- struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
> struct clock_event_device *bc;
> unsigned long flags;
>
>@@ -1167,6 +1166,8 @@ void hotplug_cpu__broadcast_tick_pull(in
> * device to avoid the starvation.
> */
> if (tick_check_broadcast_expired()) {
>+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
>+
> cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
> tick_program_event(td->evtdev->next_event, 1);
> }
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-31 13:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 14:25 [Regression] 6.11.0-rc1: BUG: using smp_processor_id() in preemptible when suspend the system David Wang
2024-07-30 15:07 ` Thomas Gleixner
2024-07-31 10:15 ` Yu Liao
2024-07-31 13:17 ` David Wang
2024-07-31 10:23 ` [PATCH] tick/broadcast: Move per CPU pointer access into the atomic section Thomas Gleixner
2024-07-31 10:42 ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.