* [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs
@ 2024-12-18 16:50 Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING Frederic Weisbecker
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2024-12-18 16:50 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Frederic Weisbecker, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, paulmck, Anna-Maria Behnsen
5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
was introduced to fix stalls with scheduler bandwidth timers getting
migrated while some kthreads handling CPU hotplug rely on bandwidth.
However this has introduced several other issues which used to be
confined to RCU. But not anymore as it is spreading to hotplug code
itself (https://lore.kernel.org/all/20241213203739.1519801-1-usamaarif642@gmail.com/)
Instead of introducing yet another new hackery, fix the problem in
hrtimers for everyone.
Frederic Weisbecker (3):
hrtimers: Force migrate away hrtimers queued after
CPUHP_AP_HRTIMERS_DYING
rcu: Remove swake_up_one_online() bandaid
Revert "rcu/nocb: Fix rcuog wake-up from offline softirq"
include/linux/hrtimer_defs.h | 1 +
kernel/rcu/tree.c | 34 +-------------------
kernel/rcu/tree_exp.h | 2 +-
kernel/rcu/tree_nocb.h | 10 ++----
kernel/time/hrtimer.c | 60 +++++++++++++++++++++++++++++++-----
5 files changed, 58 insertions(+), 49 deletions(-)
--
2.46.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
2024-12-18 16:50 [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Frederic Weisbecker
@ 2024-12-18 16:50 ` Frederic Weisbecker
2024-12-19 19:00 ` Usama Arif
2024-12-18 16:50 ` [PATCH 2/3] rcu: Remove swake_up_one_online() bandaid Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2024-12-18 16:50 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Frederic Weisbecker, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, paulmck, Anna-Maria Behnsen
hrtimers are migrated away from the dying CPU to any online target at
the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
handling tasks involved in the CPU hotplug forward progress.
However wake ups can still be performed by the outgoing CPU after
CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers
being armed. Depending on several considerations (crystal ball
power management based election, earliest timer already enqueued, timer
migration enabled or not), the target may eventually be the current
CPU even if offline. If that happens, the timer is eventually ignored.
The most notable example is RCU which had to deal with each an every of
those wake-ups by deferring them to an online CPU, along with related
workarounds:
_ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
_ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
_ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
The problem isn't confined to RCU though as the stop machine kthread
(which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
and performs a wake up that eventually arms the deadline server timer:
WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
Modules linked in:
CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
Code: 41 5c 41 5d 41 5e 41 5f 5d e9 63 94 ea 00 0f 0b 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d e9 39 fc 15 01 0f 0b e9 c1 fd ff ff <0f> 0b 48 8b
+45 00 e9 59 ff ff ff f3 0f 1e fa 65 8b 05 1d ec e8 7e
RSP: 0018:ffffc900019cbcc8 EFLAGS: 00010046
RAX: ffff88bf449a4c40 RBX: 0000000000000082 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffff88bf43224c80 RDI: ffff88bf449a4c40
RBP: ffff88bf449a4c80 R08: ffff888280970090 R09: 0000000000000000
R10: ffff88bf432252e0 R11: ffffffff811abf70 R12: ffff88bf449a4c40
R13: ffff88bf43234b28 R14: ffff88bf43224c80 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88bf44980000(0000) knlGS:0000000000000000
CR2: 0000000000000000 CR3: 000000404b230001 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0xcf/0x1b0
? hrtimer_start_range_ns+0x289/0x2d0
? report_bug+0x120/0x1a0
? handle_bug+0x60/0x90
? exc_invalid_op+0x1a/0x50
? asm_exc_invalid_op+0x1a/0x20
? register_refined_jiffies+0xb0/0xb0
? hrtimer_start_range_ns+0x289/0x2d0
? hrtimer_start_range_ns+0x186/0x2d0
start_dl_timer+0xfc/0x150
enqueue_dl_entity+0x367/0x640
dl_server_start+0x53/0xa0
enqueue_task_fair+0x363/0x460
enqueue_task+0x3c/0x200
ttwu_do_activate+0x94/0x240
try_to_wake_up+0x315/0x600
complete+0x4b/0x80
Instead of providing yet another bandaid to work around the situation,
fix it from hrtimers infrastructure instead: always migrate away a
timer to an online target whenever it is enqueued from an offline CPU.
This will also allow us to revert all the above RCU disgraceful hacks.
Reported-by: Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
Reported-by: Usama Arif <usamaarif642@gmail.com>
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Closes: 20241213203739.1519801-1-usamaarif642@gmail.com
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/hrtimer_defs.h | 1 +
kernel/time/hrtimer.c | 60 +++++++++++++++++++++++++++++++-----
2 files changed, 54 insertions(+), 7 deletions(-)
diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h
index c3b4b7ed7c16..84a5045f80f3 100644
--- a/include/linux/hrtimer_defs.h
+++ b/include/linux/hrtimer_defs.h
@@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
ktime_t softirq_expires_next;
struct hrtimer *softirq_next_timer;
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+ call_single_data_t csd;
} ____cacheline_aligned;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 80fe3749d2db..48c0078d2c4f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -58,6 +58,8 @@
#define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)
#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
+static void retrigger_next_event(void *arg);
+
/*
* The timer bases:
*
@@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
.clockid = CLOCK_TAI,
.get_time = &ktime_get_clocktai,
},
- }
+ },
+ .csd = CSD_INIT(retrigger_next_event, NULL)
};
static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
@@ -716,8 +719,6 @@ static inline int hrtimer_is_hres_enabled(void)
return hrtimer_hres_enabled;
}
-static void retrigger_next_event(void *arg);
-
/*
* Switch to high resolution mode
*/
@@ -761,6 +762,8 @@ static void retrigger_next_event(void *arg)
{
struct hrtimer_cpu_base *base = this_cpu_ptr(&hrtimer_bases);
+ guard(raw_spinlock)(&base->lock);
+
/*
* When high resolution mode or nohz is active, then the offsets of
* CLOCK_REALTIME/TAI/BOOTTIME have to be updated. Otherwise the
@@ -773,18 +776,17 @@ static void retrigger_next_event(void *arg)
* In the NOHZ case the update of the offset and the reevaluation
* of the next expiring timer is enough. The return from the SMP
* function call will take care of the reprogramming in case the
- * CPU was in a NOHZ idle sleep.
+ * CPU was in a NOHZ idle sleep. base->lock must still be held to
+ * to release and synchronize against the csd unlock.
*/
if (!hrtimer_hres_active(base) && !tick_nohz_active)
return;
- raw_spin_lock(&base->lock);
hrtimer_update_base(base);
if (hrtimer_hres_active(base))
hrtimer_force_reprogram(base, 0);
else
hrtimer_update_next_event(base);
- raw_spin_unlock(&base->lock);
}
/*
@@ -1202,10 +1204,48 @@ hrtimer_update_softirq_timer(struct hrtimer_cpu_base *cpu_base, bool reprogram)
hrtimer_reprogram(cpu_base->softirq_next_timer, reprogram);
}
+/*
+ * If the current CPU is offline and timers have been already
+ * migrated away, make sure not to enqueue locally and perform
+ * a remote retrigger as a last resort.
+ */
+static void enqueue_hrtimer_offline(struct hrtimer *timer,
+ struct hrtimer_clock_base *base,
+ const enum hrtimer_mode mode)
+{
+ struct hrtimer_cpu_base *new_cpu_base, *old_cpu_base, *this_cpu_base;
+ struct hrtimer_clock_base *new_base;
+ int cpu;
+
+ old_cpu_base = base->cpu_base;
+ this_cpu_base = this_cpu_ptr(&hrtimer_bases);
+
+ if (old_cpu_base == this_cpu_base || !old_cpu_base->online) {
+ WARN_ON_ONCE(hrtimer_callback_running(timer));
+ cpu = cpumask_any_and(cpu_online_mask,
+ housekeeping_cpumask(HK_TYPE_TIMER));
+ new_cpu_base = &per_cpu(hrtimer_bases, cpu);
+ new_base = &new_cpu_base->clock_base[base->index];
+ WRITE_ONCE(timer->base, &migration_base);
+ raw_spin_unlock(&old_cpu_base->lock);
+ raw_spin_lock(&new_cpu_base->lock);
+ WRITE_ONCE(timer->base, new_base);
+ } else {
+ new_base = base;
+ new_cpu_base = new_base->cpu_base;
+ cpu = new_cpu_base->cpu;
+ }
+
+ if (enqueue_hrtimer(timer, new_base, mode))
+ smp_call_function_single_async(cpu, &new_cpu_base->csd);
+}
+
+
static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
u64 delta_ns, const enum hrtimer_mode mode,
struct hrtimer_clock_base *base)
{
+ struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
struct hrtimer_clock_base *new_base;
bool force_local, first;
@@ -1217,7 +1257,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
* and enforce reprogramming after it is queued no matter whether
* it is the new first expiring timer again or not.
*/
- force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
+ force_local = base->cpu_base == this_cpu_base;
force_local &= base->cpu_base->next_timer == timer;
/*
@@ -1240,6 +1280,12 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
hrtimer_set_expires_range_ns(timer, tim, delta_ns);
+ if (unlikely(!this_cpu_base->online)) {
+ enqueue_hrtimer_offline(timer, base, mode);
+ return 0;
+ }
+
+
/* Switch the timer base, if necessary: */
if (!force_local) {
new_base = switch_hrtimer_base(timer, base,
--
2.46.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] rcu: Remove swake_up_one_online() bandaid
2024-12-18 16:50 [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING Frederic Weisbecker
@ 2024-12-18 16:50 ` Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 3/3] Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" Frederic Weisbecker
2024-12-19 17:42 ` [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Paul E. McKenney
3 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2024-12-18 16:50 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Frederic Weisbecker, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, paulmck, Anna-Maria Behnsen
It's now ok to perform a wake-up from an offline CPU because the
resulting armed scheduler bandwidth hrtimers are now correctly targeted
by hrtimer infrastructure.
Remove the obsolete hackerry.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree.c | 34 +---------------------------------
kernel/rcu/tree_exp.h | 2 +-
kernel/rcu/tree_nocb.h | 2 +-
3 files changed, 3 insertions(+), 35 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c160e05dfb7c..576b5c365145 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1060,38 +1060,6 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
return needmore;
}
-static void swake_up_one_online_ipi(void *arg)
-{
- struct swait_queue_head *wqh = arg;
-
- swake_up_one(wqh);
-}
-
-static void swake_up_one_online(struct swait_queue_head *wqh)
-{
- int cpu = get_cpu();
-
- /*
- * If called from rcutree_report_cpu_starting(), wake up
- * is dangerous that late in the CPU-down hotplug process. The
- * scheduler might queue an ignored hrtimer. Defer the wake up
- * to an online CPU instead.
- */
- if (unlikely(cpu_is_offline(cpu))) {
- int target;
-
- target = cpumask_any_and(housekeeping_cpumask(HK_TYPE_RCU),
- cpu_online_mask);
-
- smp_call_function_single(target, swake_up_one_online_ipi,
- wqh, 0);
- put_cpu();
- } else {
- put_cpu();
- swake_up_one(wqh);
- }
-}
-
/*
* Awaken the grace-period kthread. Don't do a self-awaken (unless in an
* interrupt or softirq handler, in which case we just might immediately
@@ -1116,7 +1084,7 @@ static void rcu_gp_kthread_wake(void)
return;
WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
- swake_up_one_online(&rcu_state.gp_wq);
+ swake_up_one(&rcu_state.gp_wq);
}
/*
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index fb664d3a01c9..17b1a5e3590c 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -200,7 +200,7 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp,
if (rnp->parent == NULL) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
if (wake)
- swake_up_one_online(&rcu_state.expedited_wq);
+ swake_up_one(&rcu_state.expedited_wq);
break;
}
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 2605dd234a13..a43141a1b3a5 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -216,7 +216,7 @@ static bool __wake_nocb_gp(struct rcu_data *rdp_gp,
raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
if (needwake) {
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DoWake"));
- swake_up_one_online(&rdp_gp->nocb_gp_wq);
+ swake_up_one(&rdp_gp->nocb_gp_wq);
}
return needwake;
--
2.46.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] Revert "rcu/nocb: Fix rcuog wake-up from offline softirq"
2024-12-18 16:50 [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 2/3] rcu: Remove swake_up_one_online() bandaid Frederic Weisbecker
@ 2024-12-18 16:50 ` Frederic Weisbecker
2024-12-19 17:42 ` [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Paul E. McKenney
3 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2024-12-18 16:50 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, Frederic Weisbecker, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, paulmck, Anna-Maria Behnsen
This reverts commit f7345ccc62a4b880cf76458db5f320725f28e400.
swake_up_one_online() has been removed because hrtimers can now assign
a proper online target to hrtimers queued from offline CPUs. Therefore
remove the related hackery.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree_nocb.h | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index a43141a1b3a5..a03fc19abde7 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -554,19 +554,13 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
rcu_nocb_unlock(rdp);
wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
TPS("WakeLazy"));
- } else if (!irqs_disabled_flags(flags) && cpu_online(rdp->cpu)) {
+ } else if (!irqs_disabled_flags(flags)) {
/* ... if queue was empty ... */
rcu_nocb_unlock(rdp);
wake_nocb_gp(rdp, false);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("WakeEmpty"));
} else {
- /*
- * Don't do the wake-up upfront on fragile paths.
- * Also offline CPUs can't call swake_up_one_online() from
- * (soft-)IRQs. Rely on the final deferred wake-up from
- * rcutree_report_cpu_dead()
- */
rcu_nocb_unlock(rdp);
wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
TPS("WakeEmptyIsDeferred"));
--
2.46.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs
2024-12-18 16:50 [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Frederic Weisbecker
` (2 preceding siblings ...)
2024-12-18 16:50 ` [PATCH 3/3] Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" Frederic Weisbecker
@ 2024-12-19 17:42 ` Paul E. McKenney
2024-12-20 23:19 ` Paul E. McKenney
3 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2024-12-19 17:42 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Thomas Gleixner, LKML, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, Anna-Maria Behnsen
On Wed, Dec 18, 2024 at 05:50:05PM +0100, Frederic Weisbecker wrote:
> 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> was introduced to fix stalls with scheduler bandwidth timers getting
> migrated while some kthreads handling CPU hotplug rely on bandwidth.
>
> However this has introduced several other issues which used to be
> confined to RCU. But not anymore as it is spreading to hotplug code
> itself (https://lore.kernel.org/all/20241213203739.1519801-1-usamaarif642@gmail.com/)
>
> Instead of introducing yet another new hackery, fix the problem in
> hrtimers for everyone.
The good news is that this passes 12 hours of 400*TREE03. (Yay!!!)
The so-so news is that this gives only about 70% confidence that these
patches help, but on the other hand, it also gives much higher confidence
that these patches are not hurting anything.
At least for TREE03.
The not-so-good news is that this series causes build failures for
rcutorture scenarios (such as SRCU-T) that build with CONFIG_SMP=n:
------------------------------------------------------------------------
kernel/time/hrtimer.c: In function ‘enqueue_hrtimer_offline’:
kernel/time/hrtimer.c:1229:42: error: ‘migration_base’ undeclared (first use in this function); did you mean ‘is_migration_base’?
------------------------------------------------------------------------
When built with KCSAN enabled (--kcsan to kvm.sh), there is this
additional build failure on that same line of code:
------------------------------------------------------------------------
kernel/time/hrtimer.c:1229:3: error: incompatible pointer types assigning to 'volatile typeof (timer->base)' (aka 'struct hrtimer_clock_base *volatile') from 'bool (*)(struct hrtimer_clock_base *)' (aka '_Bool (*)(struct hrtimer_clock_base *)') [-Werror,-Wincompatible-pointer-types]
1229 | WRITE_ONCE(timer->base, &migration_base);
------------------------------------------------------------------------
Me, I am a bit surprised that enqueue_hrtimer_offline() is even built
in a CONFIG_SMP=n kernel.
But there might be some reason why #ifdef-ing out that function's body
would be a bad idea, so over to you! ;-)
Thanx, Paul
> Frederic Weisbecker (3):
> hrtimers: Force migrate away hrtimers queued after
> CPUHP_AP_HRTIMERS_DYING
> rcu: Remove swake_up_one_online() bandaid
> Revert "rcu/nocb: Fix rcuog wake-up from offline softirq"
>
> include/linux/hrtimer_defs.h | 1 +
> kernel/rcu/tree.c | 34 +-------------------
> kernel/rcu/tree_exp.h | 2 +-
> kernel/rcu/tree_nocb.h | 10 ++----
> kernel/time/hrtimer.c | 60 +++++++++++++++++++++++++++++++-----
> 5 files changed, 58 insertions(+), 49 deletions(-)
>
> --
> 2.46.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
2024-12-18 16:50 ` [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING Frederic Weisbecker
@ 2024-12-19 19:00 ` Usama Arif
2024-12-20 23:43 ` Frederic Weisbecker
0 siblings, 1 reply; 10+ messages in thread
From: Usama Arif @ 2024-12-19 19:00 UTC (permalink / raw)
To: Frederic Weisbecker, Thomas Gleixner
Cc: LKML, vlad.wing, rcu, boqun.feng, joel, neeraj.upadhyay, urezki,
qiang.zhang1211, Cheng-Jui.Wang, leitao, kernel-team, paulmck,
Anna-Maria Behnsen
On 18/12/2024 19:50, Frederic Weisbecker wrote:
> hrtimers are migrated away from the dying CPU to any online target at
> the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
> handling tasks involved in the CPU hotplug forward progress.
>
> However wake ups can still be performed by the outgoing CPU after
> CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers
> being armed. Depending on several considerations (crystal ball
> power management based election, earliest timer already enqueued, timer
> migration enabled or not), the target may eventually be the current
> CPU even if offline. If that happens, the timer is eventually ignored.
>
> The most notable example is RCU which had to deal with each an every of
> those wake-ups by deferring them to an online CPU, along with related
> workarounds:
>
> _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
> _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
> _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
>
> The problem isn't confined to RCU though as the stop machine kthread
> (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
> and performs a wake up that eventually arms the deadline server timer:
>
> WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
> Modules linked in:
> CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
> Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
> RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
> Code: 41 5c 41 5d 41 5e 41 5f 5d e9 63 94 ea 00 0f 0b 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d e9 39 fc 15 01 0f 0b e9 c1 fd ff ff <0f> 0b 48 8b
> +45 00 e9 59 ff ff ff f3 0f 1e fa 65 8b 05 1d ec e8 7e
> RSP: 0018:ffffc900019cbcc8 EFLAGS: 00010046
> RAX: ffff88bf449a4c40 RBX: 0000000000000082 RCX: 0000000000000001
> RDX: 0000000000000001 RSI: ffff88bf43224c80 RDI: ffff88bf449a4c40
> RBP: ffff88bf449a4c80 R08: ffff888280970090 R09: 0000000000000000
> R10: ffff88bf432252e0 R11: ffffffff811abf70 R12: ffff88bf449a4c40
> R13: ffff88bf43234b28 R14: ffff88bf43224c80 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff88bf44980000(0000) knlGS:0000000000000000
> CR2: 0000000000000000 CR3: 000000404b230001 CR4: 0000000000770ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __warn+0xcf/0x1b0
> ? hrtimer_start_range_ns+0x289/0x2d0
> ? report_bug+0x120/0x1a0
> ? handle_bug+0x60/0x90
> ? exc_invalid_op+0x1a/0x50
> ? asm_exc_invalid_op+0x1a/0x20
> ? register_refined_jiffies+0xb0/0xb0
> ? hrtimer_start_range_ns+0x289/0x2d0
> ? hrtimer_start_range_ns+0x186/0x2d0
> start_dl_timer+0xfc/0x150
> enqueue_dl_entity+0x367/0x640
> dl_server_start+0x53/0xa0
> enqueue_task_fair+0x363/0x460
> enqueue_task+0x3c/0x200
> ttwu_do_activate+0x94/0x240
> try_to_wake_up+0x315/0x600
> complete+0x4b/0x80
>
> Instead of providing yet another bandaid to work around the situation,
> fix it from hrtimers infrastructure instead: always migrate away a
> timer to an online target whenever it is enqueued from an offline CPU.
>
> This will also allow us to revert all the above RCU disgraceful hacks.
>
> Reported-by: Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> Reported-by: Usama Arif <usamaarif642@gmail.com>
> Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> Closes: 20241213203739.1519801-1-usamaarif642@gmail.com
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> include/linux/hrtimer_defs.h | 1 +
> kernel/time/hrtimer.c | 60 +++++++++++++++++++++++++++++++-----
> 2 files changed, 54 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h
> index c3b4b7ed7c16..84a5045f80f3 100644
> --- a/include/linux/hrtimer_defs.h
> +++ b/include/linux/hrtimer_defs.h
> @@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
> ktime_t softirq_expires_next;
> struct hrtimer *softirq_next_timer;
> struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
> + call_single_data_t csd;
> } ____cacheline_aligned;
>
>
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 80fe3749d2db..48c0078d2c4f 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -58,6 +58,8 @@
> #define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)
> #define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
>
> +static void retrigger_next_event(void *arg);
> +
> /*
> * The timer bases:
> *
> @@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
> .clockid = CLOCK_TAI,
> .get_time = &ktime_get_clocktai,
> },
> - }
> + },
> + .csd = CSD_INIT(retrigger_next_event, NULL)
> };
>
> static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
> @@ -716,8 +719,6 @@ static inline int hrtimer_is_hres_enabled(void)
> return hrtimer_hres_enabled;
> }
>
> -static void retrigger_next_event(void *arg);
> -
> /*
> * Switch to high resolution mode
> */
> @@ -761,6 +762,8 @@ static void retrigger_next_event(void *arg)
> {
> struct hrtimer_cpu_base *base = this_cpu_ptr(&hrtimer_bases);
>
> + guard(raw_spinlock)(&base->lock);
> +
> /*
> * When high resolution mode or nohz is active, then the offsets of
> * CLOCK_REALTIME/TAI/BOOTTIME have to be updated. Otherwise the
> @@ -773,18 +776,17 @@ static void retrigger_next_event(void *arg)
> * In the NOHZ case the update of the offset and the reevaluation
> * of the next expiring timer is enough. The return from the SMP
> * function call will take care of the reprogramming in case the
> - * CPU was in a NOHZ idle sleep.
> + * CPU was in a NOHZ idle sleep. base->lock must still be held to
> + * to release and synchronize against the csd unlock.
> */
> if (!hrtimer_hres_active(base) && !tick_nohz_active)
> return;
>
> - raw_spin_lock(&base->lock);
> hrtimer_update_base(base);
> if (hrtimer_hres_active(base))
> hrtimer_force_reprogram(base, 0);
> else
> hrtimer_update_next_event(base);
> - raw_spin_unlock(&base->lock);
> }
>
> /*
> @@ -1202,10 +1204,48 @@ hrtimer_update_softirq_timer(struct hrtimer_cpu_base *cpu_base, bool reprogram)
> hrtimer_reprogram(cpu_base->softirq_next_timer, reprogram);
> }
>
> +/*
> + * If the current CPU is offline and timers have been already
> + * migrated away, make sure not to enqueue locally and perform
> + * a remote retrigger as a last resort.
> + */
> +static void enqueue_hrtimer_offline(struct hrtimer *timer,
> + struct hrtimer_clock_base *base,
> + const enum hrtimer_mode mode)
> +{
> + struct hrtimer_cpu_base *new_cpu_base, *old_cpu_base, *this_cpu_base;
> + struct hrtimer_clock_base *new_base;
> + int cpu;
> +
> + old_cpu_base = base->cpu_base;
> + this_cpu_base = this_cpu_ptr(&hrtimer_bases);
> +
> + if (old_cpu_base == this_cpu_base || !old_cpu_base->online) {
> + WARN_ON_ONCE(hrtimer_callback_running(timer));
> + cpu = cpumask_any_and(cpu_online_mask,
> + housekeeping_cpumask(HK_TYPE_TIMER));
> + new_cpu_base = &per_cpu(hrtimer_bases, cpu);
> + new_base = &new_cpu_base->clock_base[base->index];
> + WRITE_ONCE(timer->base, &migration_base);
> + raw_spin_unlock(&old_cpu_base->lock);
> + raw_spin_lock(&new_cpu_base->lock);
> + WRITE_ONCE(timer->base, new_base);
> + } else {
> + new_base = base;
> + new_cpu_base = new_base->cpu_base;
> + cpu = new_cpu_base->cpu;
> + }
> +
> + if (enqueue_hrtimer(timer, new_base, mode))
> + smp_call_function_single_async(cpu, &new_cpu_base->csd);
> +}
> +
> +
> static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
> u64 delta_ns, const enum hrtimer_mode mode,
> struct hrtimer_clock_base *base)
> {
> + struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
> struct hrtimer_clock_base *new_base;
> bool force_local, first;
>
> @@ -1217,7 +1257,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
> * and enforce reprogramming after it is queued no matter whether
> * it is the new first expiring timer again or not.
> */
> - force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
> + force_local = base->cpu_base == this_cpu_base;
> force_local &= base->cpu_base->next_timer == timer;
>
> /*
> @@ -1240,6 +1280,12 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
>
> hrtimer_set_expires_range_ns(timer, tim, delta_ns);
>
> + if (unlikely(!this_cpu_base->online)) {
> + enqueue_hrtimer_offline(timer, base, mode);
Thanks for the fix!
It looks good to me, maybe as a follow up, we could rename switch_hrtimer_base to
enqueue_hrtimer_local? (or maybe something more appropriate)
There are now 2 different paths that switch hrtimer base (enqueue_hrtimer_offline and
switch_hrtimer_base).
> + return 0;
> + }
> +
> +
nit: extra new line above.
> /* Switch the timer base, if necessary: */
> if (!force_local) {
> new_base = switch_hrtimer_base(timer, base,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs
2024-12-19 17:42 ` [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Paul E. McKenney
@ 2024-12-20 23:19 ` Paul E. McKenney
2024-12-20 23:26 ` Frederic Weisbecker
0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2024-12-20 23:19 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Thomas Gleixner, LKML, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, Anna-Maria Behnsen
On Thu, Dec 19, 2024 at 09:42:48AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 18, 2024 at 05:50:05PM +0100, Frederic Weisbecker wrote:
> > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> > was introduced to fix stalls with scheduler bandwidth timers getting
> > migrated while some kthreads handling CPU hotplug rely on bandwidth.
> >
> > However this has introduced several other issues which used to be
> > confined to RCU. But not anymore as it is spreading to hotplug code
> > itself (https://lore.kernel.org/all/20241213203739.1519801-1-usamaarif642@gmail.com/)
> >
> > Instead of introducing yet another new hackery, fix the problem in
> > hrtimers for everyone.
>
> The good news is that this passes 12 hours of 400*TREE03. (Yay!!!)
>
> The so-so news is that this gives only about 70% confidence that these
> patches help, but on the other hand, it also gives much higher confidence
> that these patches are not hurting anything.
>
> At least for TREE03.
Another day, another 12 hours of 400*TREE03 passed. This gets us up
beyond 90% confidence that these patches help, and even more confidence
that they are not too severely hurting anything.
> The not-so-good news is that this series causes build failures for
> rcutorture scenarios (such as SRCU-T) that build with CONFIG_SMP=n:
>
> ------------------------------------------------------------------------
> kernel/time/hrtimer.c: In function ‘enqueue_hrtimer_offline’:
> kernel/time/hrtimer.c:1229:42: error: ‘migration_base’ undeclared (first use in this function); did you mean ‘is_migration_base’?
> ------------------------------------------------------------------------
>
> When built with KCSAN enabled (--kcsan to kvm.sh), there is this
> additional build failure on that same line of code:
>
> ------------------------------------------------------------------------
> kernel/time/hrtimer.c:1229:3: error: incompatible pointer types assigning to 'volatile typeof (timer->base)' (aka 'struct hrtimer_clock_base *volatile') from 'bool (*)(struct hrtimer_clock_base *)' (aka '_Bool (*)(struct hrtimer_clock_base *)') [-Werror,-Wincompatible-pointer-types]
>
> 1229 | WRITE_ONCE(timer->base, &migration_base);
> ------------------------------------------------------------------------
>
> Me, I am a bit surprised that enqueue_hrtimer_offline() is even built
> in a CONFIG_SMP=n kernel.
>
> But there might be some reason why #ifdef-ing out that function's body
> would be a bad idea, so over to you! ;-)
Curiosity overcame me, and the #ifdef makes things at least appear to
work for CONFIG_SMP=n kernels. Experimental patch below, which I intend
to use for further testing, but I have no plans to push it upstream.
Thanx, Paul
------------------------------------------------------------------------
commit 61a0b92b3a3cfef69e3848806e51d1b99a9e9406
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Fri Dec 20 15:10:26 2024 -0800
EXP hrtimers: No-op enqueue_hrtimer_offline() if !HOTPLUG_CPU
In kernels built with CONFIG_HOTPLUG_CPU=n, this experimental commit
uses #ifdef to remove the body of the enqueue_hrtimer_offline() function.
This might or might not be the correct fix to the build complaint about
migration_base being undeclared.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Vlad Poenaru <vlad.wing@gmail.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 48c0078d2c4f2..4235b7825b152 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1213,6 +1213,7 @@ static void enqueue_hrtimer_offline(struct hrtimer *timer,
struct hrtimer_clock_base *base,
const enum hrtimer_mode mode)
{
+#ifdef CONFIG_HOTPLUG_CPU
struct hrtimer_cpu_base *new_cpu_base, *old_cpu_base, *this_cpu_base;
struct hrtimer_clock_base *new_base;
int cpu;
@@ -1238,6 +1239,7 @@ static void enqueue_hrtimer_offline(struct hrtimer *timer,
if (enqueue_hrtimer(timer, new_base, mode))
smp_call_function_single_async(cpu, &new_cpu_base->csd);
+#endif
}
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs
2024-12-20 23:19 ` Paul E. McKenney
@ 2024-12-20 23:26 ` Frederic Weisbecker
2024-12-21 0:05 ` Paul E. McKenney
0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2024-12-20 23:26 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Thomas Gleixner, LKML, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, Anna-Maria Behnsen
Le Fri, Dec 20, 2024 at 03:19:31PM -0800, Paul E. McKenney a écrit :
> On Thu, Dec 19, 2024 at 09:42:48AM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 18, 2024 at 05:50:05PM +0100, Frederic Weisbecker wrote:
> > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> > > was introduced to fix stalls with scheduler bandwidth timers getting
> > > migrated while some kthreads handling CPU hotplug rely on bandwidth.
> > >
> > > However this has introduced several other issues which used to be
> > > confined to RCU. But not anymore as it is spreading to hotplug code
> > > itself (https://lore.kernel.org/all/20241213203739.1519801-1-usamaarif642@gmail.com/)
> > >
> > > Instead of introducing yet another new hackery, fix the problem in
> > > hrtimers for everyone.
> >
> > The good news is that this passes 12 hours of 400*TREE03. (Yay!!!)
> >
> > The so-so news is that this gives only about 70% confidence that these
> > patches help, but on the other hand, it also gives much higher confidence
> > that these patches are not hurting anything.
> >
> > At least for TREE03.
>
> Another day, another 12 hours of 400*TREE03 passed. This gets us up
> beyond 90% confidence that these patches help, and even more confidence
> that they are not too severely hurting anything.
Cool :-)
>
> > The not-so-good news is that this series causes build failures for
> > rcutorture scenarios (such as SRCU-T) that build with CONFIG_SMP=n:
> >
> > ------------------------------------------------------------------------
> > kernel/time/hrtimer.c: In function ‘enqueue_hrtimer_offline’:
> > kernel/time/hrtimer.c:1229:42: error: ‘migration_base’ undeclared (first use in this function); did you mean ‘is_migration_base’?
> > ------------------------------------------------------------------------
> >
> > When built with KCSAN enabled (--kcsan to kvm.sh), there is this
> > additional build failure on that same line of code:
> >
> > ------------------------------------------------------------------------
> > kernel/time/hrtimer.c:1229:3: error: incompatible pointer types assigning to 'volatile typeof (timer->base)' (aka 'struct hrtimer_clock_base *volatile') from 'bool (*)(struct hrtimer_clock_base *)' (aka '_Bool (*)(struct hrtimer_clock_base *)') [-Werror,-Wincompatible-pointer-types]
> >
> > 1229 | WRITE_ONCE(timer->base, &migration_base);
> > ------------------------------------------------------------------------
> >
> > Me, I am a bit surprised that enqueue_hrtimer_offline() is even built
> > in a CONFIG_SMP=n kernel.
> >
> > But there might be some reason why #ifdef-ing out that function's body
> > would be a bad idea, so over to you! ;-)
>
> Curiosity overcame me, and the #ifdef makes things at least appear to
> work for CONFIG_SMP=n kernels. Experimental patch below, which I intend
> to use for further testing, but I have no plans to push it upstream.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 61a0b92b3a3cfef69e3848806e51d1b99a9e9406
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Fri Dec 20 15:10:26 2024 -0800
>
> EXP hrtimers: No-op enqueue_hrtimer_offline() if !HOTPLUG_CPU
>
> In kernels built with CONFIG_HOTPLUG_CPU=n, this experimental commit
> uses #ifdef to remove the body of the enqueue_hrtimer_offline() function.
> This might or might not be the correct fix to the build complaint about
> migration_base being undeclared.
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Vlad Poenaru <vlad.wing@gmail.com>
> Cc: Usama Arif <usamaarif642@gmail.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
>
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 48c0078d2c4f2..4235b7825b152 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1213,6 +1213,7 @@ static void enqueue_hrtimer_offline(struct hrtimer *timer,
> struct hrtimer_clock_base *base,
> const enum hrtimer_mode mode)
> {
> +#ifdef CONFIG_HOTPLUG_CPU
> struct hrtimer_cpu_base *new_cpu_base, *old_cpu_base, *this_cpu_base;
> struct hrtimer_clock_base *new_base;
> int cpu;
> @@ -1238,6 +1239,7 @@ static void enqueue_hrtimer_offline(struct hrtimer *timer,
>
> if (enqueue_hrtimer(timer, new_base, mode))
> smp_call_function_single_async(cpu, &new_cpu_base->csd);
> +#endif
> }
I was thinking about something like that. Can I fold that and add your
SOB?
Thanks.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
2024-12-19 19:00 ` Usama Arif
@ 2024-12-20 23:43 ` Frederic Weisbecker
0 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2024-12-20 23:43 UTC (permalink / raw)
To: Usama Arif
Cc: Thomas Gleixner, LKML, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, paulmck, Anna-Maria Behnsen
Le Thu, Dec 19, 2024 at 10:00:12PM +0300, Usama Arif a écrit :
> > @@ -1240,6 +1280,12 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
> >
> > hrtimer_set_expires_range_ns(timer, tim, delta_ns);
> >
> > + if (unlikely(!this_cpu_base->online)) {
> > + enqueue_hrtimer_offline(timer, base, mode);
>
> Thanks for the fix!
>
> It looks good to me, maybe as a follow up, we could rename switch_hrtimer_base to
> enqueue_hrtimer_local? (or maybe something more appropriate)
> There are now 2 different paths that switch hrtimer base (enqueue_hrtimer_offline and
> switch_hrtimer_base).
I considered extending switch_hrtimer_base() instead to handle offline
CPU from there but that turned out ugly since what follows assumes to either
queue locally and possibly reprogram or queue remotely and not reprogram.
enqueue_hrtimer_global() does only enqueue remotely and possibly
trigger a reprogram.
And indeed we could move switch_hrtimer_base() + enqueue_hrtimer() +
hrtimer_force_reprogram() to a enqueue_hrtimer_online() for example.
Perhaps that would clarify things a bit, I don't know...
Thanks.
>
> > + return 0;
> > + }
> > +
> > +
> nit: extra new line above.
> > /* Switch the timer base, if necessary: */
> > if (!force_local) {
> > new_base = switch_hrtimer_base(timer, base,
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs
2024-12-20 23:26 ` Frederic Weisbecker
@ 2024-12-21 0:05 ` Paul E. McKenney
0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2024-12-21 0:05 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Thomas Gleixner, LKML, vlad.wing, rcu, boqun.feng, joel,
neeraj.upadhyay, urezki, qiang.zhang1211, Cheng-Jui.Wang, leitao,
kernel-team, Usama Arif, Anna-Maria Behnsen
On Sat, Dec 21, 2024 at 12:26:01AM +0100, Frederic Weisbecker wrote:
> Le Fri, Dec 20, 2024 at 03:19:31PM -0800, Paul E. McKenney a écrit :
> > On Thu, Dec 19, 2024 at 09:42:48AM -0800, Paul E. McKenney wrote:
> > > On Wed, Dec 18, 2024 at 05:50:05PM +0100, Frederic Weisbecker wrote:
> > > > 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
> > > > was introduced to fix stalls with scheduler bandwidth timers getting
> > > > migrated while some kthreads handling CPU hotplug rely on bandwidth.
> > > >
> > > > However this has introduced several other issues which used to be
> > > > confined to RCU. But not anymore as it is spreading to hotplug code
> > > > itself (https://lore.kernel.org/all/20241213203739.1519801-1-usamaarif642@gmail.com/)
> > > >
> > > > Instead of introducing yet another new hackery, fix the problem in
> > > > hrtimers for everyone.
> > >
> > > The good news is that this passes 12 hours of 400*TREE03. (Yay!!!)
> > >
> > > The so-so news is that this gives only about 70% confidence that these
> > > patches help, but on the other hand, it also gives much higher confidence
> > > that these patches are not hurting anything.
> > >
> > > At least for TREE03.
> >
> > Another day, another 12 hours of 400*TREE03 passed. This gets us up
> > beyond 90% confidence that these patches help, and even more confidence
> > that they are not too severely hurting anything.
>
> Cool :-)
>
> >
> > > The not-so-good news is that this series causes build failures for
> > > rcutorture scenarios (such as SRCU-T) that build with CONFIG_SMP=n:
> > >
> > > ------------------------------------------------------------------------
> > > kernel/time/hrtimer.c: In function ‘enqueue_hrtimer_offline’:
> > > kernel/time/hrtimer.c:1229:42: error: ‘migration_base’ undeclared (first use in this function); did you mean ‘is_migration_base’?
> > > ------------------------------------------------------------------------
> > >
> > > When built with KCSAN enabled (--kcsan to kvm.sh), there is this
> > > additional build failure on that same line of code:
> > >
> > > ------------------------------------------------------------------------
> > > kernel/time/hrtimer.c:1229:3: error: incompatible pointer types assigning to 'volatile typeof (timer->base)' (aka 'struct hrtimer_clock_base *volatile') from 'bool (*)(struct hrtimer_clock_base *)' (aka '_Bool (*)(struct hrtimer_clock_base *)') [-Werror,-Wincompatible-pointer-types]
> > >
> > > 1229 | WRITE_ONCE(timer->base, &migration_base);
> > > ------------------------------------------------------------------------
> > >
> > > Me, I am a bit surprised that enqueue_hrtimer_offline() is even built
> > > in a CONFIG_SMP=n kernel.
> > >
> > > But there might be some reason why #ifdef-ing out that function's body
> > > would be a bad idea, so over to you! ;-)
> >
> > Curiosity overcame me, and the #ifdef makes things at least appear to
> > work for CONFIG_SMP=n kernels. Experimental patch below, which I intend
> > to use for further testing, but I have no plans to push it upstream.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit 61a0b92b3a3cfef69e3848806e51d1b99a9e9406
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date: Fri Dec 20 15:10:26 2024 -0800
> >
> > EXP hrtimers: No-op enqueue_hrtimer_offline() if !HOTPLUG_CPU
> >
> > In kernels built with CONFIG_HOTPLUG_CPU=n, this experimental commit
> > uses #ifdef to remove the body of the enqueue_hrtimer_offline() function.
> > This might or might not be the correct fix to the build complaint about
> > migration_base being undeclared.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Vlad Poenaru <vlad.wing@gmail.com>
> > Cc: Usama Arif <usamaarif642@gmail.com>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> >
> > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> > index 48c0078d2c4f2..4235b7825b152 100644
> > --- a/kernel/time/hrtimer.c
> > +++ b/kernel/time/hrtimer.c
> > @@ -1213,6 +1213,7 @@ static void enqueue_hrtimer_offline(struct hrtimer *timer,
> > struct hrtimer_clock_base *base,
> > const enum hrtimer_mode mode)
> > {
> > +#ifdef CONFIG_HOTPLUG_CPU
> > struct hrtimer_cpu_base *new_cpu_base, *old_cpu_base, *this_cpu_base;
> > struct hrtimer_clock_base *new_base;
> > int cpu;
> > @@ -1238,6 +1239,7 @@ static void enqueue_hrtimer_offline(struct hrtimer *timer,
> >
> > if (enqueue_hrtimer(timer, new_base, mode))
> > smp_call_function_single_async(cpu, &new_cpu_base->csd);
> > +#endif
> > }
>
> I was thinking about something like that. Can I fold that and add your
> SOB?
Works for me!
Thanx, Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-12-21 0:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18 16:50 [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 1/3] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING Frederic Weisbecker
2024-12-19 19:00 ` Usama Arif
2024-12-20 23:43 ` Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 2/3] rcu: Remove swake_up_one_online() bandaid Frederic Weisbecker
2024-12-18 16:50 ` [PATCH 3/3] Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" Frederic Weisbecker
2024-12-19 17:42 ` [PATCH 0/3] hrtimer: Fix timers queued locally from offline CPUs Paul E. McKenney
2024-12-20 23:19 ` Paul E. McKenney
2024-12-20 23:26 ` Frederic Weisbecker
2024-12-21 0:05 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).