* [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation
@ 2025-05-30 14:20 Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 1/6] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
` (6 more replies)
0 siblings, 7 replies; 19+ messages in thread
From: Gabriele Monaco @ 2025-05-30 14:20 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
Cc: Gabriele Monaco
The timer migration mechanism allows active CPUs to pull timers from
idle ones to improve the overall idle time. This is however undesired
when CPU intensive workloads run on isolated cores, as the algorithm
would move the timers from housekeeping to isolated cores, negatively
affecting the isolation.
This effect was noticed on a 128 cores machine running oslat on the
isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs,
and the CPU with lowest count in a timer migration hierarchy (here 1
and 65) appears as always active and continuously pulls global timers,
from the housekeeping CPUs. This ends up moving driver work (e.g.
delayed work) to isolated CPUs and causes latency spikes:
before the change:
# oslat -c 1-31,33-63,65-95,97-127 -D 62s
...
Maximum: 1203 10 3 4 ... 5 (us)
after the change:
# oslat -c 1-31,33-63,65-95,97-127 -D 62s
...
Maximum: 10 4 3 4 3 ... 5 (us)
Exclude isolated cores from the timer migration algorithm, extend the
concept of unavailable cores, currently used for offline ones, to
isolated ones:
* A core is unavailable if isolated or offline;
* A core is available if isolated and offline;
A core is considered unavailable as isolated if it belongs to:
* the isolcpus (domain) list
* an isolated cpuset
Except if it is:
* in the nohz_full list (already idle for the hierarchy)
* the nohz timekeeper core (must be available to handle global timers)
Due to how the timer migration algorithm works, any CPU part of the
hierarchy can have their global timers pulled by remote CPUs and have to
pull remote timers, only skipping pulling remote timers would break the
logic.
For this reason, we prevent isolated CPUs from pulling remote global
timers, but also the other way around: any global timer started on an
isolated CPU will run there. This does not break the concept of
isolation (global timers don't come from outside the CPU) and, if
considered inappropriate, can usually be mitigated with other isolation
techniques (e.g. IRQ pinning).
The first 3 patches are preparatory work to change the concept of
online/offline to available/unavailable, keep track of those in a
separate cpumask and change a function name in cpuset code.
Patch 4 and 5 adapt isolation and cpuset to prevent domain isolated and
nohz_full from covering all CPUs not leaving any housekeeping one. This
can lead to problems with the changes introduced in this series because
no CPU would remain to handle global timers.
Patch 6 extends the unavailable status to domain isolated CPUs, which
is the main contribution of the series.
Changes since v5:
* Remove fallback if no housekeeping is left by isolcpus and nohz_full
* Adjust condition not to activate CPUs in the migration hierarchy
* Always force the nohz tick CPU active in the hierarchy
Changes since v4 [1]:
* use on_each_cpu_mask() with changes on isolated CPUs to avoid races
* keep nohz_full CPUs included in the timer migration hierarchy
* prevent domain isolated and nohz_full to cover all CPUs
Changes since v3:
* add parameter to function documentation
* split into multiple straightforward patches
Changes since v2:
* improve comments about handling CPUs isolated at boot
* minor cleanup
Changes since v1 [2]:
* split into smaller patches
* use available mask instead of unavailable
* simplification and cleanup
[1] - https://lore.kernel.org/lkml/20250506091534.42117-7-gmonaco@redhat.com
[2] - https://lore.kernel.org/lkml/20250410065446.57304-2-gmonaco@redhat.com
Gabriele Monaco (6):
timers: Rename tmigr 'online' bit to 'available'
timers: Add the available mask in timer migration
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to
update_exclusion_cpumasks()
sched/isolation: Force housekeeping if isolcpus and nohz_full don't
leave any
cgroup/cpuset: Fail if isolated and nohz_full don't leave any
housekeeping
timers: Exclude isolated cpus from timer migation
include/linux/timer.h | 9 +++
include/trace/events/timer_migration.h | 4 +-
kernel/cgroup/cpuset.c | 71 +++++++++++++++++++--
kernel/sched/isolation.c | 12 ++++
kernel/time/timer_migration.c | 88 ++++++++++++++++++++++----
kernel/time/timer_migration.h | 2 +-
6 files changed, 165 insertions(+), 21 deletions(-)
base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
--
2.49.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v6 1/6] timers: Rename tmigr 'online' bit to 'available'
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
@ 2025-05-30 14:20 ` Gabriele Monaco
2025-06-20 17:04 ` Thomas Gleixner
2025-05-30 14:20 ` [PATCH v6 2/6] timers: Add the available mask in timer migration Gabriele Monaco
` (5 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Gabriele Monaco @ 2025-05-30 14:20 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
Cc: Gabriele Monaco
The timer migration hierarchy excludes offline CPUs via the
tmigr_is_not_available function, which is essentially checking the
online bit for the CPU.
Rename the online bit to available and all references in function names
and tracepoint to generalise the concept of available CPUs.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/trace/events/timer_migration.h | 4 ++--
kernel/time/timer_migration.c | 22 +++++++++++-----------
kernel/time/timer_migration.h | 2 +-
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/include/trace/events/timer_migration.h b/include/trace/events/timer_migration.h
index 47db5eaf2f9ab..61171b13c687c 100644
--- a/include/trace/events/timer_migration.h
+++ b/include/trace/events/timer_migration.h
@@ -173,14 +173,14 @@ DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_active,
TP_ARGS(tmc)
);
-DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_online,
+DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_available,
TP_PROTO(struct tmigr_cpu *tmc),
TP_ARGS(tmc)
);
-DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_offline,
+DEFINE_EVENT(tmigr_cpugroup, tmigr_cpu_unavailable,
TP_PROTO(struct tmigr_cpu *tmc),
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 2f6330831f084..7efd897c79599 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -427,7 +427,7 @@ static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu);
static inline bool tmigr_is_not_available(struct tmigr_cpu *tmc)
{
- return !(tmc->tmgroup && tmc->online);
+ return !(tmc->tmgroup && tmc->available);
}
/*
@@ -926,7 +926,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
* updated the event takes care when hierarchy is completely
* idle. Otherwise the migrator does it as the event is enqueued.
*/
- if (!tmc->online || tmc->remote || tmc->cpuevt.ignore ||
+ if (!tmc->available || tmc->remote || tmc->cpuevt.ignore ||
now < tmc->cpuevt.nextevt.expires) {
raw_spin_unlock_irq(&tmc->lock);
return;
@@ -973,7 +973,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
* (See also section "Required event and timerqueue update after a
* remote expiry" in the documentation at the top)
*/
- if (!tmc->online || !tmc->idle) {
+ if (!tmc->available || !tmc->idle) {
timer_unlock_remote_bases(cpu);
goto unlock;
}
@@ -1435,19 +1435,19 @@ static long tmigr_trigger_active(void *unused)
{
struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
- WARN_ON_ONCE(!tmc->online || tmc->idle);
+ WARN_ON_ONCE(!tmc->available || tmc->idle);
return 0;
}
-static int tmigr_cpu_offline(unsigned int cpu)
+static int tmigr_cpu_unavailable(unsigned int cpu)
{
struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
int migrator;
u64 firstexp;
raw_spin_lock_irq(&tmc->lock);
- tmc->online = false;
+ tmc->available = false;
WRITE_ONCE(tmc->wakeup, KTIME_MAX);
/*
@@ -1455,7 +1455,7 @@ static int tmigr_cpu_offline(unsigned int cpu)
* offline; Therefore nextevt value is set to KTIME_MAX
*/
firstexp = __tmigr_cpu_deactivate(tmc, KTIME_MAX);
- trace_tmigr_cpu_offline(tmc);
+ trace_tmigr_cpu_unavailable(tmc);
raw_spin_unlock_irq(&tmc->lock);
if (firstexp != KTIME_MAX) {
@@ -1466,7 +1466,7 @@ static int tmigr_cpu_offline(unsigned int cpu)
return 0;
}
-static int tmigr_cpu_online(unsigned int cpu)
+static int tmigr_cpu_available(unsigned int cpu)
{
struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
@@ -1475,11 +1475,11 @@ static int tmigr_cpu_online(unsigned int cpu)
return -EINVAL;
raw_spin_lock_irq(&tmc->lock);
- trace_tmigr_cpu_online(tmc);
+ trace_tmigr_cpu_available(tmc);
tmc->idle = timer_base_is_idle();
if (!tmc->idle)
__tmigr_cpu_activate(tmc);
- tmc->online = true;
+ tmc->available = true;
raw_spin_unlock_irq(&tmc->lock);
return 0;
}
@@ -1850,7 +1850,7 @@ static int __init tmigr_init(void)
goto err;
ret = cpuhp_setup_state(CPUHP_AP_TMIGR_ONLINE, "tmigr:online",
- tmigr_cpu_online, tmigr_cpu_offline);
+ tmigr_cpu_available, tmigr_cpu_unavailable);
if (ret)
goto err;
diff --git a/kernel/time/timer_migration.h b/kernel/time/timer_migration.h
index ae19f70f8170f..70879cde6fdd0 100644
--- a/kernel/time/timer_migration.h
+++ b/kernel/time/timer_migration.h
@@ -97,7 +97,7 @@ struct tmigr_group {
*/
struct tmigr_cpu {
raw_spinlock_t lock;
- bool online;
+ bool available;
bool idle;
bool remote;
struct tmigr_group *tmgroup;
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 2/6] timers: Add the available mask in timer migration
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 1/6] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
@ 2025-05-30 14:20 ` Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 3/6] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_exclusion_cpumasks() Gabriele Monaco
` (4 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Gabriele Monaco @ 2025-05-30 14:20 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
Cc: Gabriele Monaco
Keep track of the CPUs available for timer migration in a cpumask. This
prepares the ground to generalise the concept of unavailable CPUs.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
kernel/time/timer_migration.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 7efd897c79599..25439f961ccf8 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -422,6 +422,9 @@ static unsigned int tmigr_crossnode_level __read_mostly;
static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu);
+/* CPUs available for timer migration */
+static cpumask_var_t tmigr_available_cpumask;
+
#define TMIGR_NONE 0xFF
#define BIT_CNT 8
@@ -1449,6 +1452,7 @@ static int tmigr_cpu_unavailable(unsigned int cpu)
raw_spin_lock_irq(&tmc->lock);
tmc->available = false;
WRITE_ONCE(tmc->wakeup, KTIME_MAX);
+ cpumask_clear_cpu(cpu, tmigr_available_cpumask);
/*
* CPU has to handle the local events on his own, when on the way to
@@ -1459,7 +1463,7 @@ static int tmigr_cpu_unavailable(unsigned int cpu)
raw_spin_unlock_irq(&tmc->lock);
if (firstexp != KTIME_MAX) {
- migrator = cpumask_any_but(cpu_online_mask, cpu);
+ migrator = cpumask_any(tmigr_available_cpumask);
work_on_cpu(migrator, tmigr_trigger_active, NULL);
}
@@ -1480,6 +1484,7 @@ static int tmigr_cpu_available(unsigned int cpu)
if (!tmc->idle)
__tmigr_cpu_activate(tmc);
tmc->available = true;
+ cpumask_set_cpu(cpu, tmigr_available_cpumask);
raw_spin_unlock_irq(&tmc->lock);
return 0;
}
@@ -1801,6 +1806,11 @@ static int __init tmigr_init(void)
if (ncpus == 1)
return 0;
+ if (!zalloc_cpumask_var(&tmigr_available_cpumask, GFP_KERNEL)) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
/*
* Calculate the required hierarchy levels. Unfortunately there is no
* reliable information available, unless all possible CPUs have been
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 3/6] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_exclusion_cpumasks()
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 1/6] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 2/6] timers: Add the available mask in timer migration Gabriele Monaco
@ 2025-05-30 14:20 ` Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 4/6] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Gabriele Monaco
` (3 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Gabriele Monaco @ 2025-05-30 14:20 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
Cc: Gabriele Monaco
update_unbound_workqueue_cpumask() updates unbound workqueues settings
when there's a change in isolated CPUs, but it can be used for other
subsystems requiring updated when isolated CPUs change.
Generalise the name to update_exclusion_cpumasks() to prepare for other
functions unrelated to workqueues to be called in that spot.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
kernel/cgroup/cpuset.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 24b70ea3e6ce9..d0936b5d9920a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1325,7 +1325,7 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
return isolcpus_updated;
}
-static void update_unbound_workqueue_cpumask(bool isolcpus_updated)
+static void update_exclusion_cpumasks(bool isolcpus_updated)
{
int ret;
@@ -1456,7 +1456,7 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
list_add(&cs->remote_sibling, &remote_children);
cpumask_copy(cs->effective_xcpus, tmp->new_cpus);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_exclusion_cpumasks(isolcpus_updated);
cpuset_force_rebuild();
cs->prs_err = 0;
@@ -1497,7 +1497,7 @@ static void remote_partition_disable(struct cpuset *cs, struct tmpmasks *tmp)
compute_effective_exclusive_cpumask(cs, NULL, NULL);
reset_partition_data(cs);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_exclusion_cpumasks(isolcpus_updated);
cpuset_force_rebuild();
/*
@@ -1565,7 +1565,7 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
if (xcpus)
cpumask_copy(cs->exclusive_cpus, xcpus);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_exclusion_cpumasks(isolcpus_updated);
if (adding || deleting)
cpuset_force_rebuild();
@@ -1908,7 +1908,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
WARN_ON_ONCE(parent->nr_subparts < 0);
}
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_exclusion_cpumasks(isolcpus_updated);
if ((old_prs != new_prs) && (cmd == partcmd_update))
update_partition_exclusive_flag(cs, new_prs);
@@ -2933,7 +2933,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
else if (isolcpus_updated)
isolated_cpus_update(old_prs, new_prs, cs->effective_xcpus);
spin_unlock_irq(&callback_lock);
- update_unbound_workqueue_cpumask(isolcpus_updated);
+ update_exclusion_cpumasks(isolcpus_updated);
/* Force update if switching back to member & update effective_xcpus */
update_cpumasks_hier(cs, &tmpmask, !new_prs);
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 4/6] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
` (2 preceding siblings ...)
2025-05-30 14:20 ` [PATCH v6 3/6] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_exclusion_cpumasks() Gabriele Monaco
@ 2025-05-30 14:20 ` Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 5/6] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Gabriele Monaco
` (2 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Gabriele Monaco @ 2025-05-30 14:20 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
Cc: Gabriele Monaco
Currently the user can set up isolcpus and nohz_full in such a way that
leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated
nor nohz full). This can be a problem for other subsystems (e.g. the
timer wheel imgration).
Prevent this configuration by invalidating the last setting in case the
union of isolcpus and nohz_full covers all CPUs.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
kernel/sched/isolation.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 81bc8b329ef17..b3b348427d2b8 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -165,6 +165,18 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
}
}
+ /* Check in combination with the previously set cpumask */
+ type = find_first_bit(&housekeeping.flags, HK_TYPE_MAX);
+ first_cpu = cpumask_first_and_and(cpu_present_mask,
+ housekeeping_staging,
+ housekeeping.cpumasks[type]);
+ if (first_cpu >= nr_cpu_ids || first_cpu >= setup_max_cpus) {
+ pr_warn("Housekeeping: must include one present CPU neither "
+ "in nohz_full= nor in isolcpus=, ignoring setting %s\n",
+ str);
+ goto free_housekeeping_staging;
+ }
+
iter_flags = flags & ~housekeeping.flags;
for_each_set_bit(type, &iter_flags, HK_TYPE_MAX)
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 5/6] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
` (3 preceding siblings ...)
2025-05-30 14:20 ` [PATCH v6 4/6] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Gabriele Monaco
@ 2025-05-30 14:20 ` Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
2025-06-18 12:17 ` [PATCH v6 0/6] " Gabriele Monaco
6 siblings, 0 replies; 19+ messages in thread
From: Gabriele Monaco @ 2025-05-30 14:20 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
Cc: Gabriele Monaco
Currently the user can set up isolated cpus via cpuset and nohz_full in
such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
domain isolated nor nohz full). This can be a problem for other
subsystems (e.g. the timer wheel imgration).
Prevent this configuration by blocking any assignation that would cause
the union of domain isolated cpus and nohz_full to covers all CPUs.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
kernel/cgroup/cpuset.c | 56 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index d0936b5d9920a..e3494ed677f5c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1261,6 +1261,19 @@ static void isolated_cpus_update(int old_prs, int new_prs, struct cpumask *xcpus
cpumask_andnot(isolated_cpus, isolated_cpus, xcpus);
}
+/*
+ * isolated_cpus_should_update - Returns if the isolated_cpus mask needs update
+ * @prs: new or old partition_root_state
+ * @parent: parent cpuset
+ * Return: true if isolated_cpus needs modification, false otherwise
+ */
+static bool isolated_cpus_should_update(int prs, struct cpuset *parent)
+{
+ if (!parent)
+ parent = &top_cpuset;
+ return prs != parent->partition_root_state;
+}
+
/*
* partition_xcpus_add - Add new exclusive CPUs to partition
* @new_prs: new partition_root_state
@@ -1325,6 +1338,35 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
return isolcpus_updated;
}
+/*
+ * isolcpus_nohz_conflict - check for isolated & nohz_full conflicts
+ * @new_cpus: cpu mask for cpus that are going to be isolated
+ * Return: true if there is conflict, false otherwise
+ *
+ * If nohz_full is enabled and we have isolated CPUs, their combination must
+ * still leave housekeeping CPUs.
+ */
+static bool isolcpus_nohz_conflict(struct cpumask *new_cpus)
+{
+ cpumask_var_t full_hk_cpus;
+ int res = false;
+
+ if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
+ return false;
+
+ if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
+ return true;
+
+ cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
+ housekeeping_cpumask(HK_TYPE_DOMAIN));
+ cpumask_and(full_hk_cpus, full_hk_cpus, cpu_online_mask);
+ if (!cpumask_weight_andnot(full_hk_cpus, new_cpus))
+ res = true;
+
+ free_cpumask_var(full_hk_cpus);
+ return res;
+}
+
static void update_exclusion_cpumasks(bool isolcpus_updated)
{
int ret;
@@ -1450,6 +1492,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
cpumask_intersects(tmp->new_cpus, subpartitions_cpus) ||
cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
return PERR_INVCPUS;
+ if (isolated_cpus_should_update(new_prs, NULL) &&
+ isolcpus_nohz_conflict(tmp->new_cpus))
+ return PERR_HKEEPING;
spin_lock_irq(&callback_lock);
isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
@@ -1548,6 +1593,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
cs->prs_err = PERR_NOCPUS;
+ else if (isolated_cpus_should_update(prs, NULL) &&
+ isolcpus_nohz_conflict(tmp->addmask))
+ cs->prs_err = PERR_HKEEPING;
if (cs->prs_err)
goto invalidate;
}
@@ -1879,6 +1927,12 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
return err;
}
+ if (deleting && isolated_cpus_should_update(new_prs, parent) &&
+ isolcpus_nohz_conflict(tmp->delmask)) {
+ cs->prs_err = PERR_HKEEPING;
+ return PERR_HKEEPING;
+ }
+
/*
* Change the parent's effective_cpus & effective_xcpus (top cpuset
* only).
@@ -2899,6 +2953,8 @@ static int update_prstate(struct cpuset *cs, int new_prs)
* Need to update isolated_cpus.
*/
isolcpus_updated = true;
+ if (isolcpus_nohz_conflict(cs->effective_xcpus))
+ err = PERR_HKEEPING;
} else {
/*
* Switching back to member is always allowed even if it
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
` (4 preceding siblings ...)
2025-05-30 14:20 ` [PATCH v6 5/6] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Gabriele Monaco
@ 2025-05-30 14:20 ` Gabriele Monaco
2025-06-20 17:00 ` Thomas Gleixner
2025-06-18 12:17 ` [PATCH v6 0/6] " Gabriele Monaco
6 siblings, 1 reply; 19+ messages in thread
From: Gabriele Monaco @ 2025-05-30 14:20 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
Cc: Gabriele Monaco
The timer migration mechanism allows active CPUs to pull timers from
idle ones to improve the overall idle time. This is however undesired
when CPU intensive workloads run on isolated cores, as the algorithm
would move the timers from housekeeping to isolated cores, negatively
affecting the isolation.
This effect was noticed on a 128 cores machine running oslat on the
isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs,
and the CPU with lowest count in a timer migration hierarchy (here 1
and 65) appears as always active and continuously pulls global timers,
from the housekeeping CPUs. This ends up moving driver work (e.g.
delayed work) to isolated CPUs and causes latency spikes:
before the change:
# oslat -c 1-31,33-63,65-95,97-127 -D 62s
...
Maximum: 1203 10 3 4 ... 5 (us)
after the change:
# oslat -c 1-31,33-63,65-95,97-127 -D 62s
...
Maximum: 10 4 3 4 3 ... 5 (us)
Exclude isolated cores from the timer migration algorithm, extend the
concept of unavailable cores, currently used for offline ones, to
isolated ones:
* A core is unavailable if isolated or offline;
* A core is available if isolated and offline;
A core is considered unavailable as isolated if it belongs to:
* the isolcpus (domain) list
* an isolated cpuset
Except if it is:
* in the nohz_full list (already idle for the hierarchy)
* the nohz timekeeper core (must be available to handle global timers)
Due to how the timer migration algorithm works, any CPU part of the
hierarchy can have their global timers pulled by remote CPUs and have to
pull remote timers, only skipping pulling remote timers would break the
logic.
For this reason, we prevent isolated CPUs from pulling remote global
timers, but also the other way around: any global timer started on an
isolated CPU will run there. This does not break the concept of
isolation (global timers don't come from outside the CPU) and, if
considered inappropriate, can usually be mitigated with other isolation
techniques (e.g. IRQ pinning).
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/linux/timer.h | 9 ++++++
kernel/cgroup/cpuset.c | 3 ++
kernel/time/timer_migration.c | 54 +++++++++++++++++++++++++++++++++++
3 files changed, 66 insertions(+)
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 10596d7c3a346..a8b683d9ce25d 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -190,4 +190,13 @@ int timers_dead_cpu(unsigned int cpu);
#define timers_dead_cpu NULL
#endif
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask);
+#else
+static inline int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
+{
+ return 0;
+}
+#endif
+
#endif
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e3494ed677f5c..7b26226f2a276 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1378,6 +1378,9 @@ static void update_exclusion_cpumasks(bool isolcpus_updated)
ret = workqueue_unbound_exclude_cpumask(isolated_cpus);
WARN_ON_ONCE(ret < 0);
+
+ ret = tmigr_isolated_exclude_cpumask(isolated_cpus);
+ WARN_ON_ONCE(ret < 0);
}
/**
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 25439f961ccf8..a14fbca7457fe 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -10,6 +10,7 @@
#include <linux/spinlock.h>
#include <linux/timerqueue.h>
#include <trace/events/ipi.h>
+#include <linux/sched/isolation.h>
#include "timer_migration.h"
#include "tick-internal.h"
@@ -1449,6 +1450,13 @@ static int tmigr_cpu_unavailable(unsigned int cpu)
int migrator;
u64 firstexp;
+ /*
+ * The tick CPU can be marked as isolated by the cpuset code, however
+ * we cannot mark it as unavailable to avoid having no global migrator
+ * for the nohz_full CPUs.
+ */
+ if (!tick_nohz_cpu_hotpluggable(cpu))
+ return 0;
raw_spin_lock_irq(&tmc->lock);
tmc->available = false;
WRITE_ONCE(tmc->wakeup, KTIME_MAX);
@@ -1478,6 +1486,20 @@ static int tmigr_cpu_available(unsigned int cpu)
if (WARN_ON_ONCE(!tmc->tmgroup))
return -EINVAL;
+ /*
+ * Domain isolated CPUs don't participate in timer migration, nohz_full
+ * CPUs are still part of the hierarchy but are always considered idle.
+ * Additionally, the tick CPU during nohz_full cannot be disabled.
+ * Checking here guarantees that CPUs isolated at boot (e.g. isolcpus)
+ * are not marked as available when they first become online.
+ * During runtime, any offline isolated CPU is also not incorrectly
+ * marked as available once it gets back online.
+ */
+ if ((!housekeeping_cpu(cpu, HK_TYPE_DOMAIN) ||
+ cpuset_cpu_is_isolated(cpu)) &&
+ housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) &&
+ tick_nohz_cpu_hotpluggable(cpu))
+ return 0;
raw_spin_lock_irq(&tmc->lock);
trace_tmigr_cpu_available(tmc);
tmc->idle = timer_base_is_idle();
@@ -1489,6 +1511,38 @@ static int tmigr_cpu_available(unsigned int cpu)
return 0;
}
+static void tmigr_remote_cpu_unavailable(void *ignored)
+{
+ tmigr_cpu_unavailable(smp_processor_id());
+}
+
+static void tmigr_remote_cpu_available(void *ignored)
+{
+ tmigr_cpu_available(smp_processor_id());
+}
+
+int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
+{
+ cpumask_var_t cpumask;
+ int ret = 0;
+
+ lockdep_assert_cpus_held();
+
+ if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
+ return -ENOMEM;
+
+ cpumask_and(cpumask, exclude_cpumask, tmigr_available_cpumask);
+ cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE));
+ on_each_cpu_mask(cpumask, tmigr_remote_cpu_unavailable, NULL, 0);
+
+ cpumask_andnot(cpumask, cpu_online_mask, exclude_cpumask);
+ cpumask_andnot(cpumask, cpumask, tmigr_available_cpumask);
+ on_each_cpu_mask(cpumask, tmigr_remote_cpu_available, NULL, 0);
+
+ free_cpumask_var(cpumask);
+ return ret;
+}
+
static void tmigr_init_group(struct tmigr_group *group, unsigned int lvl,
int node)
{
--
2.49.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
` (5 preceding siblings ...)
2025-05-30 14:20 ` [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
@ 2025-06-18 12:17 ` Gabriele Monaco
2025-06-18 12:22 ` Frederic Weisbecker
6 siblings, 1 reply; 19+ messages in thread
From: Gabriele Monaco @ 2025-06-18 12:17 UTC (permalink / raw)
To: linux-kernel, Anna-Maria Behnsen, Frederic Weisbecker,
Thomas Gleixner, Waiman Long
On Fri, 2025-05-30 at 16:20 +0200, Gabriele Monaco wrote:
> The timer migration mechanism allows active CPUs to pull timers from
> idle ones to improve the overall idle time. This is however undesired
> when CPU intensive workloads run on isolated cores, as the algorithm
> would move the timers from housekeeping to isolated cores, negatively
> affecting the isolation.
>
> [...]
>
> Exclude isolated cores from the timer migration algorithm, extend the
> concept of unavailable cores, currently used for offline ones, to
> isolated ones:
> * A core is unavailable if isolated or offline;
> * A core is available if isolated and offline;
>
> A core is considered unavailable as isolated if it belongs to:
> * the isolcpus (domain) list
> * an isolated cpuset
> Except if it is:
> * in the nohz_full list (already idle for the hierarchy)
> * the nohz timekeeper core (must be available to handle global
> timers)
Frederic, Thomas, Waiman, would you have time to review this series?
Thanks,
Gabriele
>
> Due to how the timer migration algorithm works, any CPU part of the
> hierarchy can have their global timers pulled by remote CPUs and have
> to
> pull remote timers, only skipping pulling remote timers would break
> the
> logic.
> For this reason, we prevent isolated CPUs from pulling remote global
> timers, but also the other way around: any global timer started on an
> isolated CPU will run there. This does not break the concept of
> isolation (global timers don't come from outside the CPU) and, if
> considered inappropriate, can usually be mitigated with other
> isolation
> techniques (e.g. IRQ pinning).
>
> The first 3 patches are preparatory work to change the concept of
> online/offline to available/unavailable, keep track of those in a
> separate cpumask and change a function name in cpuset code.
>
> Patch 4 and 5 adapt isolation and cpuset to prevent domain isolated
> and
> nohz_full from covering all CPUs not leaving any housekeeping one.
> This
> can lead to problems with the changes introduced in this series
> because
> no CPU would remain to handle global timers.
>
> Patch 6 extends the unavailable status to domain isolated CPUs, which
> is the main contribution of the series.
>
> Changes since v5:
> * Remove fallback if no housekeeping is left by isolcpus and
> nohz_full
> * Adjust condition not to activate CPUs in the migration hierarchy
> * Always force the nohz tick CPU active in the hierarchy
>
> Changes since v4 [1]:
> * use on_each_cpu_mask() with changes on isolated CPUs to avoid races
> * keep nohz_full CPUs included in the timer migration hierarchy
> * prevent domain isolated and nohz_full to cover all CPUs
>
> Changes since v3:
> * add parameter to function documentation
> * split into multiple straightforward patches
>
> Changes since v2:
> * improve comments about handling CPUs isolated at boot
> * minor cleanup
>
> Changes since v1 [2]:
> * split into smaller patches
> * use available mask instead of unavailable
> * simplification and cleanup
>
> [1] -
> https://lore.kernel.org/lkml/20250506091534.42117-7-gmonaco@redhat.com
> [2] -
> https://lore.kernel.org/lkml/20250410065446.57304-2-gmonaco@redhat.com
>
> Gabriele Monaco (6):
> timers: Rename tmigr 'online' bit to 'available'
> timers: Add the available mask in timer migration
> cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to
> update_exclusion_cpumasks()
> sched/isolation: Force housekeeping if isolcpus and nohz_full don't
> leave any
> cgroup/cpuset: Fail if isolated and nohz_full don't leave any
> housekeeping
> timers: Exclude isolated cpus from timer migation
>
> include/linux/timer.h | 9 +++
> include/trace/events/timer_migration.h | 4 +-
> kernel/cgroup/cpuset.c | 71 +++++++++++++++++++--
> kernel/sched/isolation.c | 12 ++++
> kernel/time/timer_migration.c | 88 ++++++++++++++++++++++--
> --
> kernel/time/timer_migration.h | 2 +-
> 6 files changed, 165 insertions(+), 21 deletions(-)
>
>
> base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation
2025-06-18 12:17 ` [PATCH v6 0/6] " Gabriele Monaco
@ 2025-06-18 12:22 ` Frederic Weisbecker
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2025-06-18 12:22 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Anna-Maria Behnsen, Thomas Gleixner, Waiman Long
Hi Gabriele,
Le Wed, Jun 18, 2025 at 02:17:29PM +0200, Gabriele Monaco a écrit :
> On Fri, 2025-05-30 at 16:20 +0200, Gabriele Monaco wrote:
> > The timer migration mechanism allows active CPUs to pull timers from
> > idle ones to improve the overall idle time. This is however undesired
> > when CPU intensive workloads run on isolated cores, as the algorithm
> > would move the timers from housekeeping to isolated cores, negatively
> > affecting the isolation.
> >
> > [...]
> >
> > Exclude isolated cores from the timer migration algorithm, extend the
> > concept of unavailable cores, currently used for offline ones, to
> > isolated ones:
> > * A core is unavailable if isolated or offline;
> > * A core is available if isolated and offline;
> >
> > A core is considered unavailable as isolated if it belongs to:
> > * the isolcpus (domain) list
> > * an isolated cpuset
> > Except if it is:
> > * in the nohz_full list (already idle for the hierarchy)
> > * the nohz timekeeper core (must be available to handle global
> > timers)
>
> Frederic, Thomas, Waiman, would you have time to review this series?
> Thanks,
> Gabriele
Yes, sorry I got distracted with other things (although quite related).
I will give it a priority soonish!
Thanks.
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-05-30 14:20 ` [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
@ 2025-06-20 17:00 ` Thomas Gleixner
2025-06-24 8:05 ` Gabriele Monaco
0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2025-06-20 17:00 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Anna-Maria Behnsen,
Frederic Weisbecker, Waiman Long
Cc: Gabriele Monaco
On Fri, May 30 2025 at 16:20, Gabriele Monaco wrote:
timer migation?
> The timer migration mechanism allows active CPUs to pull timers from
> idle ones to improve the overall idle time. This is however undesired
> when CPU intensive workloads run on isolated cores, as the algorithm
> would move the timers from housekeeping to isolated cores, negatively
> affecting the isolation.
>
> This effect was noticed on a 128 cores machine running oslat on the
> isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs,
> and the CPU with lowest count in a timer migration hierarchy (here 1
> and 65) appears as always active and continuously pulls global timers,
> from the housekeeping CPUs. This ends up moving driver work (e.g.
> delayed work) to isolated CPUs and causes latency spikes:
>
> before the change:
>
> # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> ...
> Maximum: 1203 10 3 4 ... 5 (us)
>
> after the change:
>
> # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> ...
> Maximum: 10 4 3 4 3 ... 5 (us)
It's nice to add numbers, but you should out them at the end _after_ you
described the change. At this point 'after the change' makes no sense as
the reader does not read backwards or jumps around randomly in the text.
> Exclude isolated cores from the timer migration algorithm, extend the
> concept of unavailable cores, currently used for offline ones, to
> isolated ones:
> * A core is unavailable if isolated or offline;
> * A core is available if isolated and offline;
>
> A core is considered unavailable as isolated if it belongs to:
> * the isolcpus (domain) list
> * an isolated cpuset
> Except if it is:
> * in the nohz_full list (already idle for the hierarchy)
> * the nohz timekeeper core (must be available to handle global timers)
>
> Due to how the timer migration algorithm works, any CPU part of the
> hierarchy can have their global timers pulled by remote CPUs and have to
> pull remote timers, only skipping pulling remote timers would break the
> logic.
> For this reason, we prevent isolated CPUs from pulling remote global
s/we//
> timers, but also the other way around: any global timer started on an
> isolated CPU will run there. This does not break the concept of
> isolation (global timers don't come from outside the CPU) and, if
> considered inappropriate, can usually be mitigated with other isolation
> techniques (e.g. IRQ pinning).
> +#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
> +extern int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask);
s/cpumask_var_t/struct cpumask */
cpumask_var_t is only for declaration purposes to avoid ifdeffery, but a
function always gets a pointer to the cpumask itself.
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> index 25439f961ccf8..a14fbca7457fe 100644
> --- a/kernel/time/timer_migration.c
> +++ b/kernel/time/timer_migration.c
> @@ -10,6 +10,7 @@
> #include <linux/spinlock.h>
> #include <linux/timerqueue.h>
> #include <trace/events/ipi.h>
> +#include <linux/sched/isolation.h>
>
> #include "timer_migration.h"
> #include "tick-internal.h"
> @@ -1449,6 +1450,13 @@ static int tmigr_cpu_unavailable(unsigned int cpu)
> int migrator;
> u64 firstexp;
>
> + /*
> + * The tick CPU can be marked as isolated by the cpuset code, however
> + * we cannot mark it as unavailable to avoid having no global migrator
> + * for the nohz_full CPUs.
> + */
> + if (!tick_nohz_cpu_hotpluggable(cpu))
> + return 0;
This is really horribly confusing. This function is also invoked from
the CPU offline hotplug callback and in context of CPU hotplug this
check makes absolutely no sense at all.
Please do this check in the smp function call which handles this
isolation muck.
> raw_spin_lock_irq(&tmc->lock);
> tmc->available = false;
This clearly lacks a check whether available is true or not. If it's
false, it must return right here.
To avoid goto or whatever ugly please convert that locking to a scoped
guard in a seperate patch. Then you can simply do
if (!tmc->available)
return 0;
inside the lock scope and the compiler will mop it up for you.
> WRITE_ONCE(tmc->wakeup, KTIME_MAX);
> @@ -1478,6 +1486,20 @@ static int tmigr_cpu_available(unsigned int cpu)
> if (WARN_ON_ONCE(!tmc->tmgroup))
> return -EINVAL;
>
> + /*
> + * Domain isolated CPUs don't participate in timer migration, nohz_full
> + * CPUs are still part of the hierarchy but are always considered idle.
> + * Additionally, the tick CPU during nohz_full cannot be disabled.
> + * Checking here guarantees that CPUs isolated at boot (e.g. isolcpus)
> + * are not marked as available when they first become online.
> + * During runtime, any offline isolated CPU is also not incorrectly
> + * marked as available once it gets back online.
> + */
> + if ((!housekeeping_cpu(cpu, HK_TYPE_DOMAIN) ||
> + cpuset_cpu_is_isolated(cpu)) &&
> + housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) &&
> + tick_nohz_cpu_hotpluggable(cpu))
> + return 0;
Same nonsense as above.
> raw_spin_lock_irq(&tmc->lock);
Same missing check for available == true
> trace_tmigr_cpu_available(tmc);
> tmc->idle = timer_base_is_idle();
> @@ -1489,6 +1511,38 @@ static int tmigr_cpu_available(unsigned int cpu)
> return 0;
> }
>
> +static void tmigr_remote_cpu_unavailable(void *ignored)
This is a SMP function call so what's the _remote_ for? It runs on the
CPU on which the call is scheduled. Please make it entirely clear what
this is about. This is about isolation and not about available. The fact
that it sets/clears the availability is just an implementation detail.
static void timgr_cpu_isolate(void *ignored)
{
unsigned int cpu = smp_processor_id();
/* Big fat comment */
if (!tick_nohz_cpu_hotpluggable(cpu))
return;
tmigr_set_cpu_available(cpu);
}
> +{
> + tmigr_cpu_unavailable(smp_processor_id());
> +}
> +
> +static void tmigr_remote_cpu_available(void *ignored)
> +{
> + tmigr_cpu_available(smp_processor_id());
> +}
> +
> +int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
> +{
> + cpumask_var_t cpumask;
> + int ret = 0;
> +
> + lockdep_assert_cpus_held();
> +
> + if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
> + return -ENOMEM;
> +
> + cpumask_and(cpumask, exclude_cpumask, tmigr_available_cpumask);
> + cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE));
> + on_each_cpu_mask(cpumask, tmigr_remote_cpu_unavailable, NULL, 0);
Why are those function calls async?
Shouldn't it be guaranteed that the change has been committed once this
function returns? If not then this wants a comment why it does not matter.
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 1/6] timers: Rename tmigr 'online' bit to 'available'
2025-05-30 14:20 ` [PATCH v6 1/6] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
@ 2025-06-20 17:04 ` Thomas Gleixner
0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2025-06-20 17:04 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Anna-Maria Behnsen,
Frederic Weisbecker, Waiman Long
Cc: Gabriele Monaco
On Fri, May 30 2025 at 16:20, Gabriele Monaco wrote:
>
> -static int tmigr_cpu_offline(unsigned int cpu)
> +static int tmigr_cpu_unavailable(unsigned int cpu)
tmigr_clear_cpu_available() and tmigr_set_cpu_available() are way more
clear function names because they explicitly state the action.
cpu_offline/online are not entirely clear either but at least
offline/online can be used as verbs, while [un]available definitely
cannot.
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-20 17:00 ` Thomas Gleixner
@ 2025-06-24 8:05 ` Gabriele Monaco
2025-06-24 13:20 ` Thomas Gleixner
0 siblings, 1 reply; 19+ messages in thread
From: Gabriele Monaco @ 2025-06-24 8:05 UTC (permalink / raw)
To: Thomas Gleixner, linux-kernel, Anna-Maria Behnsen,
Frederic Weisbecker, Waiman Long
On Fri, 2025-06-20 at 19:00 +0200, Thomas Gleixner wrote:
> On Fri, May 30 2025 at 16:20, Gabriele Monaco wrote:
>
> timer migation?
>
> > The timer migration mechanism allows active CPUs to pull timers
> > from
> > idle ones to improve the overall idle time. This is however
> > undesired
> > when CPU intensive workloads run on isolated cores, as the
> > algorithm
> > would move the timers from housekeeping to isolated cores,
> > negatively
> > affecting the isolation.
> >
> > This effect was noticed on a 128 cores machine running oslat on the
> > isolated cores (1-31,33-63,65-95,97-127). The tool monopolises
> > CPUs,
> > and the CPU with lowest count in a timer migration hierarchy (here
> > 1
> > and 65) appears as always active and continuously pulls global
> > timers,
> > from the housekeeping CPUs. This ends up moving driver work (e.g.
> > delayed work) to isolated CPUs and causes latency spikes:
> >
> > before the change:
> >
> > # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > ...
> > Maximum: 1203 10 3 4 ... 5 (us)
> >
> > after the change:
> >
> > # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > ...
> > Maximum: 10 4 3 4 3 ... 5 (us)
>
> It's nice to add numbers, but you should out them at the end _after_
> you
> described the change. At this point 'after the change' makes no sense
> as
> the reader does not read backwards or jumps around randomly in the
> text.
>
> > Exclude isolated cores from the timer migration algorithm, extend
> > the
> > concept of unavailable cores, currently used for offline ones, to
> > isolated ones:
> > * A core is unavailable if isolated or offline;
> > * A core is available if isolated and offline;
> >
> > A core is considered unavailable as isolated if it belongs to:
> > * the isolcpus (domain) list
> > * an isolated cpuset
> > Except if it is:
> > * in the nohz_full list (already idle for the hierarchy)
> > * the nohz timekeeper core (must be available to handle global
> > timers)
> >
> > Due to how the timer migration algorithm works, any CPU part of the
> > hierarchy can have their global timers pulled by remote CPUs and
> > have to
> > pull remote timers, only skipping pulling remote timers would break
> > the
> > logic.
> > For this reason, we prevent isolated CPUs from pulling remote
> > global
>
> s/we//
>
> > timers, but also the other way around: any global timer started on
> > an
> > isolated CPU will run there. This does not break the concept of
> > isolation (global timers don't come from outside the CPU) and, if
> > considered inappropriate, can usually be mitigated with other
> > isolation
> > techniques (e.g. IRQ pinning).
>
> > +#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
> > +extern int tmigr_isolated_exclude_cpumask(cpumask_var_t
> > exclude_cpumask);
>
> s/cpumask_var_t/struct cpumask */
>
> cpumask_var_t is only for declaration purposes to avoid ifdeffery,
> but a
> function always gets a pointer to the cpumask itself.
>
> > diff --git a/kernel/time/timer_migration.c
> > b/kernel/time/timer_migration.c
> > index 25439f961ccf8..a14fbca7457fe 100644
> > --- a/kernel/time/timer_migration.c
> > +++ b/kernel/time/timer_migration.c
> > @@ -10,6 +10,7 @@
> > #include <linux/spinlock.h>
> > #include <linux/timerqueue.h>
> > #include <trace/events/ipi.h>
> > +#include <linux/sched/isolation.h>
> >
> > #include "timer_migration.h"
> > #include "tick-internal.h"
> > @@ -1449,6 +1450,13 @@ static int tmigr_cpu_unavailable(unsigned
> > int cpu)
> > int migrator;
> > u64 firstexp;
> >
> > + /*
> > + * The tick CPU can be marked as isolated by the cpuset
> > code, however
> > + * we cannot mark it as unavailable to avoid having no
> > global migrator
> > + * for the nohz_full CPUs.
> > + */
> > + if (!tick_nohz_cpu_hotpluggable(cpu))
> > + return 0;
>
> This is really horribly confusing. This function is also invoked from
> the CPU offline hotplug callback and in context of CPU hotplug this
> check makes absolutely no sense at all.
>
> Please do this check in the smp function call which handles this
> isolation muck.
>
> > raw_spin_lock_irq(&tmc->lock);
> > tmc->available = false;
>
> This clearly lacks a check whether available is true or not. If it's
> false, it must return right here.
>
> To avoid goto or whatever ugly please convert that locking to a
> scoped
> guard in a seperate patch. Then you can simply do
>
> if (!tmc->available)
> return 0;
>
> inside the lock scope and the compiler will mop it up for you.
>
> > WRITE_ONCE(tmc->wakeup, KTIME_MAX);
> > @@ -1478,6 +1486,20 @@ static int tmigr_cpu_available(unsigned int
> > cpu)
> > if (WARN_ON_ONCE(!tmc->tmgroup))
> > return -EINVAL;
> >
> > + /*
> > + * Domain isolated CPUs don't participate in timer
> > migration, nohz_full
> > + * CPUs are still part of the hierarchy but are always
> > considered idle.
> > + * Additionally, the tick CPU during nohz_full cannot be
> > disabled.
> > + * Checking here guarantees that CPUs isolated at boot
> > (e.g. isolcpus)
> > + * are not marked as available when they first become
> > online.
> > + * During runtime, any offline isolated CPU is also not
> > incorrectly
> > + * marked as available once it gets back online.
> > + */
> > + if ((!housekeeping_cpu(cpu, HK_TYPE_DOMAIN) ||
> > + cpuset_cpu_is_isolated(cpu)) &&
> > + housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) &&
> > + tick_nohz_cpu_hotpluggable(cpu))
> > + return 0;
>
> Same nonsense as above.
>
Thanks for the review, I agree with moving the check above, but I
believe this should remain here.
tmigr_cpu_available is called at boot time and is applying also the
boot time isolation parameters (tmigr_isolated_exclude_cpumask is only
used by the cpuset code).
Now let's assume a machine booted with the arguments isolcpus=0-3
nohz_full=5-7.
Without checking for tick_nohz_cpu_hotpluggable() here, we would not
set the tick cpu (0) as available at boot, ending up in the unwanted
corner cases discussed in the v5 of the series.
I could remove this call here (which is mostly redundant after boot)
and enable explicitly the tick cpu in another way, but this still seems
cleaner to me.
Does it make sense to you? Is the comment in the code unclear?
Thanks again,
Gabriele
> > raw_spin_lock_irq(&tmc->lock);
>
> Same missing check for available == true
>
> > trace_tmigr_cpu_available(tmc);
> > tmc->idle = timer_base_is_idle();
> > @@ -1489,6 +1511,38 @@ static int tmigr_cpu_available(unsigned int
> > cpu)
> > return 0;
> > }
> >
> > +static void tmigr_remote_cpu_unavailable(void *ignored)
>
> This is a SMP function call so what's the _remote_ for? It runs on
> the
> CPU on which the call is scheduled. Please make it entirely clear
> what
> this is about. This is about isolation and not about available. The
> fact
> that it sets/clears the availability is just an implementation
> detail.
>
> static void timgr_cpu_isolate(void *ignored)
> {
> unsigned int cpu = smp_processor_id();
>
> /* Big fat comment */
> if (!tick_nohz_cpu_hotpluggable(cpu))
> return;
>
> tmigr_set_cpu_available(cpu);
> }
>
> > +{
> > + tmigr_cpu_unavailable(smp_processor_id());
> > +}
> > +
> > +static void tmigr_remote_cpu_available(void *ignored)
> > +{
> > + tmigr_cpu_available(smp_processor_id());
> > +}
> > +
> > +int tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
> > +{
> > + cpumask_var_t cpumask;
> > + int ret = 0;
> > +
> > + lockdep_assert_cpus_held();
> > +
> > + if (!alloc_cpumask_var(&cpumask, GFP_KERNEL))
> > + return -ENOMEM;
> > +
> > + cpumask_and(cpumask, exclude_cpumask,
> > tmigr_available_cpumask);
> > + cpumask_and(cpumask, cpumask,
> > housekeeping_cpumask(HK_TYPE_KERNEL_NOISE));
> > + on_each_cpu_mask(cpumask, tmigr_remote_cpu_unavailable,
> > NULL, 0);
>
> Why are those function calls async?
>
> Shouldn't it be guaranteed that the change has been committed once
> this
> function returns? If not then this wants a comment why it does not
> matter.
>
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-24 8:05 ` Gabriele Monaco
@ 2025-06-24 13:20 ` Thomas Gleixner
2025-06-24 14:06 ` Gabriele Monaco
0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2025-06-24 13:20 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Anna-Maria Behnsen,
Frederic Weisbecker, Waiman Long
On Tue, Jun 24 2025 at 10:05, Gabriele Monaco wrote:
Please trim your replies. It's a pain to scroll through 160 quotes lines
to find the gist of the mail...
> On Fri, 2025-06-20 at 19:00 +0200, Thomas Gleixner wrote:
>> > + if ((!housekeeping_cpu(cpu, HK_TYPE_DOMAIN) ||
>> > + cpuset_cpu_is_isolated(cpu)) &&
>> > + housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) &&
>> > + tick_nohz_cpu_hotpluggable(cpu))
>> > + return 0;
>>
>> Same nonsense as above.
>>
> tmigr_cpu_available is called at boot time and is applying also the
> boot time isolation parameters (tmigr_isolated_exclude_cpumask is only
> used by the cpuset code).
>
> Now let's assume a machine booted with the arguments isolcpus=0-3
> nohz_full=5-7.
>
> Without checking for tick_nohz_cpu_hotpluggable() here, we would not
> set the tick cpu (0) as available at boot, ending up in the unwanted
> corner cases discussed in the v5 of the series.
>
> I could remove this call here (which is mostly redundant after boot)
> and enable explicitly the tick cpu in another way, but this still seems
> cleaner to me.
>
> Does it make sense to you? Is the comment in the code unclear?
It does not make sense and the comment does not change that.
The point is that tmigr_init() is an early initcall which is invoked
before SMP is initialized and APs are brought up.
At this point CPU0 can neither be isolated nor nohz full for obvious
reasons, no?
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-24 13:20 ` Thomas Gleixner
@ 2025-06-24 14:06 ` Gabriele Monaco
2025-06-24 14:52 ` Frederic Weisbecker
0 siblings, 1 reply; 19+ messages in thread
From: Gabriele Monaco @ 2025-06-24 14:06 UTC (permalink / raw)
To: Thomas Gleixner, linux-kernel, Anna-Maria Behnsen,
Frederic Weisbecker, Waiman Long
On Tue, 2025-06-24 at 15:20 +0200, Thomas Gleixner wrote:
> On Tue, Jun 24 2025 at 10:05, Gabriele Monaco wrote:
>
> Please trim your replies. It's a pain to scroll through 160 quotes
> lines
> to find the gist of the mail...
>
> > On Fri, 2025-06-20 at 19:00 +0200, Thomas Gleixner wrote:
> > > > + if ((!housekeeping_cpu(cpu, HK_TYPE_DOMAIN) ||
> > > > + cpuset_cpu_is_isolated(cpu)) &&
> > > > + housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE) &&
> > > > + tick_nohz_cpu_hotpluggable(cpu))
> > > > + return 0;
> > >
> > > Same nonsense as above.
> > >
> > tmigr_cpu_available is called at boot time and is applying also the
> > boot time isolation parameters (tmigr_isolated_exclude_cpumask is
> > only
> > used by the cpuset code).
> >
> > Now let's assume a machine booted with the arguments isolcpus=0-3
> > nohz_full=5-7.
> >
> > Without checking for tick_nohz_cpu_hotpluggable() here, we would
> > not
> > set the tick cpu (0) as available at boot, ending up in the
> > unwanted
> > corner cases discussed in the v5 of the series.
> >
> > I could remove this call here (which is mostly redundant after
> > boot)
> > and enable explicitly the tick cpu in another way, but this still
> > seems
> > cleaner to me.
> >
> > Does it make sense to you? Is the comment in the code unclear?
>
> It does not make sense and the comment does not change that.
>
> The point is that tmigr_init() is an early initcall which is invoked
> before SMP is initialized and APs are brought up.
>
> At this point CPU0 can neither be isolated nor nohz full for obvious
> reasons, no?
>
Right, but as far as I understood, the first call to
tmigr_set_cpu_available() happens after the isolcpus parameter has been
parsed so we know at least cpu0 is going to be isolated.
On my machine it works reliably this way. I'm a bit lost in the init
code but seeing housekeeping_init() before rcu_init(), which in turn
should be required for some RCU-related early_initcalls, makes me
believe this order is guaranteed to be respected.
Or am I missing something?
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-24 14:06 ` Gabriele Monaco
@ 2025-06-24 14:52 ` Frederic Weisbecker
2025-06-25 10:45 ` Thomas Gleixner
0 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2025-06-24 14:52 UTC (permalink / raw)
To: Gabriele Monaco
Cc: Thomas Gleixner, linux-kernel, Anna-Maria Behnsen, Waiman Long
Le Tue, Jun 24, 2025 at 04:06:41PM +0200, Gabriele Monaco a écrit :
> Right, but as far as I understood, the first call to
> tmigr_set_cpu_available() happens after the isolcpus parameter has been
> parsed so we know at least cpu0 is going to be isolated.
>
> On my machine it works reliably this way. I'm a bit lost in the init
> code but seeing housekeeping_init() before rcu_init(), which in turn
> should be required for some RCU-related early_initcalls, makes me
> believe this order is guaranteed to be respected.
> Or am I missing something?
Right I think you need to keep those checks because if CPU 0 is isolcpus
and CPU 5 is nohz_full, CPU 0 will become later the timekeeper and must stay
in the tmigr hierarchy.
OTOH if CPU 0 is isolcpus and there is no nohz_full CPUs, then CPU 0 doesn't
want to go to the hierarchy.
cpuset isolated partitions are different because they issue SMP calls whereas
isolcpus is defined on boot.
An alternative for isolcpus could be to make a late initcall and do the smp
calls from there just like is done for cpusets.
Thanks.
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-24 14:52 ` Frederic Weisbecker
@ 2025-06-25 10:45 ` Thomas Gleixner
2025-06-25 13:16 ` Frederic Weisbecker
0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2025-06-25 10:45 UTC (permalink / raw)
To: Frederic Weisbecker, Gabriele Monaco
Cc: linux-kernel, Anna-Maria Behnsen, Waiman Long
On Tue, Jun 24 2025 at 16:52, Frederic Weisbecker wrote:
> Le Tue, Jun 24, 2025 at 04:06:41PM +0200, Gabriele Monaco a écrit :
>> Right, but as far as I understood, the first call to
>> tmigr_set_cpu_available() happens after the isolcpus parameter has been
>> parsed so we know at least cpu0 is going to be isolated.
>>
>> On my machine it works reliably this way. I'm a bit lost in the init
>> code but seeing housekeeping_init() before rcu_init(), which in turn
>> should be required for some RCU-related early_initcalls, makes me
>> believe this order is guaranteed to be respected.
>> Or am I missing something?
>
> Right I think you need to keep those checks because if CPU 0 is isolcpus
> and CPU 5 is nohz_full, CPU 0 will become later the timekeeper and must stay
> in the tmigr hierarchy.
>
> OTOH if CPU 0 is isolcpus and there is no nohz_full CPUs, then CPU 0 doesn't
> want to go to the hierarchy.
>
> cpuset isolated partitions are different because they issue SMP calls whereas
> isolcpus is defined on boot.
>
> An alternative for isolcpus could be to make a late initcall and do the smp
> calls from there just like is done for cpusets.
There is zero reason for isolcpus and nohz full muck to be
active/evaluated during early boot. That's all irrelevant and just
complicates things further.
Can we please clean this up and make it a sensible design instead of
duct taping new functionality into it in completely incomprehensible
ways?
Especially under the aspect that all this should become run-time
modifyable. That requires a run-time switch mechanism anyway, so the
obvious design choice is to utilize that run-time switch late in the
boot sequence to set this stuff up before user space starts and leave
the boot process alone and simple.
The KISS principle applies here fully.
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-25 10:45 ` Thomas Gleixner
@ 2025-06-25 13:16 ` Frederic Weisbecker
2025-06-25 13:46 ` Gabriele Monaco
0 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2025-06-25 13:16 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Gabriele Monaco, linux-kernel, Anna-Maria Behnsen, Waiman Long
Le Wed, Jun 25, 2025 at 12:45:32PM +0200, Thomas Gleixner a écrit :
> On Tue, Jun 24 2025 at 16:52, Frederic Weisbecker wrote:
> > Le Tue, Jun 24, 2025 at 04:06:41PM +0200, Gabriele Monaco a écrit :
> >> Right, but as far as I understood, the first call to
> >> tmigr_set_cpu_available() happens after the isolcpus parameter has been
> >> parsed so we know at least cpu0 is going to be isolated.
> >>
> >> On my machine it works reliably this way. I'm a bit lost in the init
> >> code but seeing housekeeping_init() before rcu_init(), which in turn
> >> should be required for some RCU-related early_initcalls, makes me
> >> believe this order is guaranteed to be respected.
> >> Or am I missing something?
> >
> > Right I think you need to keep those checks because if CPU 0 is isolcpus
> > and CPU 5 is nohz_full, CPU 0 will become later the timekeeper and must stay
> > in the tmigr hierarchy.
> >
> > OTOH if CPU 0 is isolcpus and there is no nohz_full CPUs, then CPU 0 doesn't
> > want to go to the hierarchy.
> >
> > cpuset isolated partitions are different because they issue SMP calls whereas
> > isolcpus is defined on boot.
> >
> > An alternative for isolcpus could be to make a late initcall and do the smp
> > calls from there just like is done for cpusets.
>
> There is zero reason for isolcpus and nohz full muck to be
> active/evaluated during early boot. That's all irrelevant and just
> complicates things further.
>
> Can we please clean this up and make it a sensible design instead of
> duct taping new functionality into it in completely incomprehensible
> ways?
>
> Especially under the aspect that all this should become run-time
> modifyable. That requires a run-time switch mechanism anyway, so the
> obvious design choice is to utilize that run-time switch late in the
> boot sequence to set this stuff up before user space starts and leave
> the boot process alone and simple.
>
> The KISS principle applies here fully.
Ok so the late initcall should work.
Thanks.
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-25 13:16 ` Frederic Weisbecker
@ 2025-06-25 13:46 ` Gabriele Monaco
2025-06-25 14:42 ` Frederic Weisbecker
0 siblings, 1 reply; 19+ messages in thread
From: Gabriele Monaco @ 2025-06-25 13:46 UTC (permalink / raw)
To: Frederic Weisbecker, Thomas Gleixner
Cc: linux-kernel, Anna-Maria Behnsen, Waiman Long
On Wed, 2025-06-25 at 15:16 +0200, Frederic Weisbecker wrote:
> Le Wed, Jun 25, 2025 at 12:45:32PM +0200, Thomas Gleixner a écrit :
> > On Tue, Jun 24 2025 at 16:52, Frederic Weisbecker wrote:
> > > Right I think you need to keep those checks because if CPU 0 is
> > > isolcpus
> > > and CPU 5 is nohz_full, CPU 0 will become later the timekeeper
> > > and must stay
> > > in the tmigr hierarchy.
> > >
> > > OTOH if CPU 0 is isolcpus and there is no nohz_full CPUs, then
> > > CPU 0 doesn't
> > > want to go to the hierarchy.
> > >
> > > cpuset isolated partitions are different because they issue SMP
> > > calls whereas
> > > isolcpus is defined on boot.
> > >
> > > An alternative for isolcpus could be to make a late initcall and
> > > do the smp
> > > calls from there just like is done for cpusets.
> >
> > There is zero reason for isolcpus and nohz full muck to be
> > active/evaluated during early boot. That's all irrelevant and just
> > complicates things further.
> >
> > Can we please clean this up and make it a sensible design instead
> > of
> > duct taping new functionality into it in completely
> > incomprehensible
> > ways?
> >
> > Especially under the aspect that all this should become run-time
> > modifyable. That requires a run-time switch mechanism anyway, so
> > the
> > obvious design choice is to utilize that run-time switch late in
> > the
> > boot sequence to set this stuff up before user space starts and
> > leave
> > the boot process alone and simple.
> >
> > The KISS principle applies here fully.
>
> Ok so the late initcall should work.
>
Thanks both for the reviews.
I'm a bit puzzled by what is expected now, though.
The late initcall would work just fine to replace the call to
tick_nohz_cpu_hotpluggable(), indeed superfluous for hotplug calls,
however the checks for housekeeping CPUs is required to prevent
isolated CPUs getting online from becoming available and so will run on
early boot too (without any practical reason, only because the hotplug
handlers run there).
I might avoid it by playing with cpuhp_setup_state_nocalls perhaps, but
that feels even more hacky.
Otherwise, I can refactor the code to maintain a separate field
(isolated), restore the 'online' field and keep the functions for
online/offline and isolation as separate as possible, while considering
available = !isolated && online
This would make reading housekeeping masks superfluous on hotplug (and
boot) code, but again, it doesn't look simpler to me.
Am I missing some obviously elegant solution here?
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation
2025-06-25 13:46 ` Gabriele Monaco
@ 2025-06-25 14:42 ` Frederic Weisbecker
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2025-06-25 14:42 UTC (permalink / raw)
To: Gabriele Monaco
Cc: Thomas Gleixner, linux-kernel, Anna-Maria Behnsen, Waiman Long
Le Wed, Jun 25, 2025 at 03:46:02PM +0200, Gabriele Monaco a écrit :
> Thanks both for the reviews.
> I'm a bit puzzled by what is expected now, though.
>
> The late initcall would work just fine to replace the call to
> tick_nohz_cpu_hotpluggable(), indeed superfluous for hotplug calls,
> however the checks for housekeeping CPUs is required to prevent
> isolated CPUs getting online from becoming available and so will run on
> early boot too (without any practical reason, only because the hotplug
> handlers run there).
>
> I might avoid it by playing with cpuhp_setup_state_nocalls perhaps, but
> that feels even more hacky.
>
> Otherwise, I can refactor the code to maintain a separate field
> (isolated), restore the 'online' field and keep the functions for
> online/offline and isolation as separate as possible, while considering
> available = !isolated && online
>
> This would make reading housekeeping masks superfluous on hotplug (and
> boot) code, but again, it doesn't look simpler to me.
>
> Am I missing some obviously elegant solution here?
I keep being confused as well but yes, I think you're right, we need to
keep the checks anyway on CPU up.
>
> Thanks,
> Gabriele
>
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-06-25 14:42 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-30 14:20 [PATCH v6 0/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 1/6] timers: Rename tmigr 'online' bit to 'available' Gabriele Monaco
2025-06-20 17:04 ` Thomas Gleixner
2025-05-30 14:20 ` [PATCH v6 2/6] timers: Add the available mask in timer migration Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 3/6] cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_exclusion_cpumasks() Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 4/6] sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 5/6] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Gabriele Monaco
2025-05-30 14:20 ` [PATCH v6 6/6] timers: Exclude isolated cpus from timer migation Gabriele Monaco
2025-06-20 17:00 ` Thomas Gleixner
2025-06-24 8:05 ` Gabriele Monaco
2025-06-24 13:20 ` Thomas Gleixner
2025-06-24 14:06 ` Gabriele Monaco
2025-06-24 14:52 ` Frederic Weisbecker
2025-06-25 10:45 ` Thomas Gleixner
2025-06-25 13:16 ` Frederic Weisbecker
2025-06-25 13:46 ` Gabriele Monaco
2025-06-25 14:42 ` Frederic Weisbecker
2025-06-18 12:17 ` [PATCH v6 0/6] " Gabriele Monaco
2025-06-18 12:22 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).