* [PATCH 1/7] sched/isolation: Introduce housekeeping_runtime isolation
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
@ 2024-04-03 15:05 ` Pierre Gondois
2024-04-03 15:05 ` [PATCH 2/7] sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime Pierre Gondois
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-03 15:05 UTC (permalink / raw)
To: linux-kernel
Cc: Aaron Lu, Rui Zhang, Pierre Gondois, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
Valentin Schneider, Tejun Heo, Michal Hocko, Waiman Long,
Andrew Morton
CONFIG_CPU_ISOLATION allows to setup various cpu masks to
exclude CPUs from some activities. Masks that are not modified
default to cpu_online_mask.
Masks are set at boot time. If no mask is modified, the static key
'housekeeping_overridden' is left to false, allowing to minimize the
cost of calls to housekeeping_*() functions.
Create a new housekeeping runtime type, whose isolation masks
can be modified at runtime. Also add a set of functions
around this new type. This type is independent from the
'housekeeping_overridden' static key.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
include/linux/sched/isolation.h | 28 +++++++++++++++++++++
kernel/sched/isolation.c | 43 +++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 2b461129d1fa..5d2f40c6f04c 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -6,6 +6,10 @@
#include <linux/init.h>
#include <linux/tick.h>
+enum hkr_type {
+ HKR_TYPE_MAX
+};
+
enum hk_type {
HK_TYPE_TIMER,
HK_TYPE_RCU,
@@ -26,6 +30,12 @@ extern const struct cpumask *housekeeping_cpumask(enum hk_type type);
extern bool housekeeping_enabled(enum hk_type type);
extern void housekeeping_affine(struct task_struct *t, enum hk_type type);
extern bool housekeeping_test_cpu(int cpu, enum hk_type type);
+
+extern const struct cpumask *housekeeping_runtime_cpumask(enum hkr_type type);
+extern bool housekeeping_runtime_test_cpu(int cpu, enum hkr_type type);
+extern void housekeeping_runtime_set_cpu(int cpu, enum hkr_type type);
+extern void housekeeping_runtime_clear_cpu(int cpu, enum hkr_type type);
+
extern void __init housekeeping_init(void);
#else
@@ -53,6 +63,24 @@ static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
return true;
}
+static inline const struct cpumask *housekeeping_runtime_cpumask(enum hkr_type type)
+{
+ return cpu_possible_mask;
+}
+
+static inline bool housekeeping_runtime_test_cpu(int cpu, enum hkr_type type)
+{
+ return true;
+}
+
+static inline void housekeeping_runtime_set_cpu(int cpu, enum hkr_type type)
+{
+}
+
+static inline void housekeeping_runtime_clear_cpu(int cpu, enum hkr_type type)
+{
+}
+
static inline void housekeeping_init(void) { }
#endif /* CONFIG_CPU_ISOLATION */
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 373d42c707bc..5acbed870c28 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -23,6 +23,13 @@ enum hk_flags {
DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
EXPORT_SYMBOL_GPL(housekeeping_overridden);
+struct housekeeping_runtime {
+ cpumask_var_t cpumasks[HKR_TYPE_MAX];
+ unsigned long flags;
+};
+
+static struct housekeeping_runtime housekeeping_runtime;
+
struct housekeeping {
cpumask_var_t cpumasks[HK_TYPE_MAX];
unsigned long flags;
@@ -79,10 +86,46 @@ bool housekeeping_test_cpu(int cpu, enum hk_type type)
}
EXPORT_SYMBOL_GPL(housekeeping_test_cpu);
+const struct cpumask *housekeeping_runtime_cpumask(enum hkr_type type)
+{
+ if (housekeeping_runtime.cpumasks[type])
+ return housekeeping_runtime.cpumasks[type];
+ return cpu_possible_mask;
+}
+
+bool housekeeping_runtime_test_cpu(int cpu, enum hkr_type type)
+{
+ if (housekeeping_runtime.cpumasks[type])
+ return cpumask_test_cpu(cpu, housekeeping_runtime.cpumasks[type]);
+ return true;
+}
+
+void housekeeping_runtime_set_cpu(int cpu, enum hkr_type type)
+{
+ if (housekeeping_runtime.cpumasks[type])
+ cpumask_set_cpu(cpu, housekeeping_runtime.cpumasks[type]);
+}
+
+void housekeeping_runtime_clear_cpu(int cpu, enum hkr_type type)
+{
+ if (housekeeping_runtime.cpumasks[type])
+ cpumask_clear_cpu(cpu, housekeeping_runtime.cpumasks[type]);
+}
+
+static void __init housekeeping_runtime_init(void)
+{
+ enum hkr_type type;
+
+ for (type = 0; type < HKR_TYPE_MAX; type++)
+ alloc_cpumask_var(&housekeeping_runtime.cpumasks[type], GFP_KERNEL);
+}
+
void __init housekeeping_init(void)
{
enum hk_type type;
+ housekeeping_runtime_init();
+
if (!housekeeping.flags)
return;
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 2/7] sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
2024-04-03 15:05 ` [PATCH 1/7] sched/isolation: Introduce housekeeping_runtime isolation Pierre Gondois
@ 2024-04-03 15:05 ` Pierre Gondois
2024-04-03 15:05 ` [PATCH 3/7] sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb() Pierre Gondois
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-03 15:05 UTC (permalink / raw)
To: linux-kernel
Cc: Aaron Lu, Rui Zhang, Pierre Gondois, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
Valentin Schneider, Michal Hocko, Waiman Long, Tejun Heo,
Andrew Morton
The HK_TYPE_SCHED isolation mask is never modified. It is however
referenced in the scheduler code to ignore CPUs not taking part in
load balancing for instance.
Move the HK_TYPE_SCHED to the newly created housekeeping runtime
type. Places where HK_TYPE_SCHED is used are not impacted as:
- the HKR_FLAG_SCHED flag isn't used by the isolcpus/nohz_full
kernel parameters
- masks not set though kernel parameters default to the
cpu_online_mask
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
include/linux/sched/isolation.h | 2 +-
kernel/sched/fair.c | 4 ++--
kernel/sched/isolation.c | 5 ++++-
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index 5d2f40c6f04c..80b4e26a1b73 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -7,6 +7,7 @@
#include <linux/tick.h>
enum hkr_type {
+ HKR_TYPE_SCHED,
HKR_TYPE_MAX
};
@@ -14,7 +15,6 @@ enum hk_type {
HK_TYPE_TIMER,
HK_TYPE_RCU,
HK_TYPE_MISC,
- HK_TYPE_SCHED,
HK_TYPE_TICK,
HK_TYPE_DOMAIN,
HK_TYPE_WQ,
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1dd37168da50..e3d707243ca7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12083,7 +12083,7 @@ void nohz_balance_enter_idle(int cpu)
return;
/* Spare idle load balancing on CPUs that don't want to be disturbed: */
- if (!housekeeping_cpu(cpu, HK_TYPE_SCHED))
+ if (!housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED))
return;
/*
@@ -12309,7 +12309,7 @@ static void nohz_newidle_balance(struct rq *this_rq)
* This CPU doesn't want to be disturbed by scheduler
* housekeeping
*/
- if (!housekeeping_cpu(this_cpu, HK_TYPE_SCHED))
+ if (!housekeeping_runtime_test_cpu(this_cpu, HKR_TYPE_SCHED))
return;
/* Will wake up very soon. No time for doing anything else*/
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5acbed870c28..735925578815 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -8,11 +8,14 @@
*
*/
+enum hk_runtime_flags {
+ HKR_FLAG_SCHED = BIT(HKR_TYPE_SCHED),
+};
+
enum hk_flags {
HK_FLAG_TIMER = BIT(HK_TYPE_TIMER),
HK_FLAG_RCU = BIT(HK_TYPE_RCU),
HK_FLAG_MISC = BIT(HK_TYPE_MISC),
- HK_FLAG_SCHED = BIT(HK_TYPE_SCHED),
HK_FLAG_TICK = BIT(HK_TYPE_TICK),
HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN),
HK_FLAG_WQ = BIT(HK_TYPE_WQ),
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 3/7] sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb()
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
2024-04-03 15:05 ` [PATCH 1/7] sched/isolation: Introduce housekeeping_runtime isolation Pierre Gondois
2024-04-03 15:05 ` [PATCH 2/7] sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime Pierre Gondois
@ 2024-04-03 15:05 ` Pierre Gondois
2024-04-03 15:05 ` [PATCH 4/7] sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks Pierre Gondois
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-03 15:05 UTC (permalink / raw)
To: linux-kernel
Cc: Aaron Lu, Rui Zhang, Pierre Gondois, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
Valentin Schneider, Andrew Morton, Michal Hocko, Waiman Long
Replace the HK_TYPE_MISC isolation mask with HKR_TYPE_SCHED in
find_new_ilb(). This ultimately resolves [1], i.e. selecting
a CPU for the nohz idle balance which cannot pull tasks
through:
sched_balance_trigger()
\-nohz_balancer_kick()
\-kick_ilb()
\-find_new_ilb()
\-smp_call_function_single_async()
\-nohz_csd_cpu()
\-raise_softirq_irqoff()
sched_balance_softirq()
\-_nohz_idle_balance()
\-sched_balance_domains()
Indeed, a following patch will add/remove CPUs with a NULL sched
domain to the HKR_TYPE_SCHED cpumask.
[1] https://lore.kernel.org/all/20230804090858.7605-1-rui.zhang@intel.com/
Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
kernel/sched/fair.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e3d707243ca7..0665f5eb4703 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11842,16 +11842,13 @@ static inline int on_null_domain(struct rq *rq)
* - When one of the busy CPUs notices that there may be an idle rebalancing
* needed, they will kick the idle load balancer, which then does idle
* load balancing for all the idle CPUs.
- *
- * - HK_TYPE_MISC CPUs are used for this task, because HK_TYPE_SCHED is not set
- * anywhere yet.
*/
static inline int find_new_ilb(void)
{
const struct cpumask *hk_mask;
int ilb_cpu;
- hk_mask = housekeeping_cpumask(HK_TYPE_MISC);
+ hk_mask = housekeeping_runtime_cpumask(HKR_TYPE_SCHED);
for_each_cpu_and(ilb_cpu, nohz.idle_cpus_mask, hk_mask) {
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 4/7] sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
` (2 preceding siblings ...)
2024-04-03 15:05 ` [PATCH 3/7] sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb() Pierre Gondois
@ 2024-04-03 15:05 ` Pierre Gondois
2024-04-03 15:05 ` [PATCH 5/7] sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask Pierre Gondois
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-03 15:05 UTC (permalink / raw)
To: linux-kernel
Cc: Aaron Lu, Rui Zhang, Pierre Gondois, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
Valentin Schneider, Andrew Morton, Michal Hocko, Tejun Heo,
Waiman Long
Prepare a following patch by moving/adding on_null_domain()
and housekeeping_runtime_cpu() checks.
In nohz_balance_enter_idle():
-
The housekeeping_runtime_cpu(cpu, HKR_TYPE_SCHED) call is currently
a no-op as HKR_TYPE_SCHED is never configured. The call can thus
be moved down.
-
In the current code, an isolated CPU sets nohz.has_blocked,
but isn't set in nohz.idle_cpus_mask.
However, _nohz_idle_balance::for_each_cpu_wrap() iterates
over nohz.idle_cpus_mask cpus.
Move the check up to avoid this.
In nohz_balance_exit_idle():
-
The check against a NULL sd in:
nohz_balance_enter_idle()
\-if (on_null_domain())
\-[returning here]
\-rq->nohz_tick_stopped = 1
prevents from setting the rq's nohz_tick_stopped, and
sched_balance_trigger()
\-if (on_null_domain())
\-[returning here]
\-nohz_balancer_kick()
\-nohz_balance_exit_idle()
prevents from resetting the nohz.[nr_cpus|idle_cpus_mask] variables.
So the newly added on_null_domain() check doesn't change current
behaviour.
It however prepares:
- the use of the HKR_TYPE_SCHED isolation mask
- the removal of on_null_domain()
in a later patch.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
kernel/sched/fair.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0665f5eb4703..3e0f2a0f153f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12039,6 +12039,10 @@ void nohz_balance_exit_idle(struct rq *rq)
{
SCHED_WARN_ON(rq != this_rq());
+ /* If we're a completely isolated CPU, we don't play: */
+ if (on_null_domain(rq))
+ return;
+
if (likely(!rq->nohz_tick_stopped))
return;
@@ -12079,10 +12083,6 @@ void nohz_balance_enter_idle(int cpu)
if (!cpu_active(cpu))
return;
- /* Spare idle load balancing on CPUs that don't want to be disturbed: */
- if (!housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED))
- return;
-
/*
* Can be set safely without rq->lock held
* If a clear happens, it will have evaluated last additions because
@@ -12090,6 +12090,14 @@ void nohz_balance_enter_idle(int cpu)
*/
rq->has_blocked_load = 1;
+ /* Spare idle load balancing on CPUs that don't want to be disturbed: */
+ if (!housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED))
+ return;
+
+ /* If we're a completely isolated CPU, we don't play: */
+ if (on_null_domain(rq))
+ return;
+
/*
* The tick is still stopped but load could have been added in the
* meantime. We set the nohz.has_blocked flag to trig a check of the
@@ -12099,10 +12107,6 @@ void nohz_balance_enter_idle(int cpu)
if (rq->nohz_tick_stopped)
goto out;
- /* If we're a completely isolated CPU, we don't play: */
- if (on_null_domain(rq))
- return;
-
rq->nohz_tick_stopped = 1;
cpumask_set_cpu(cpu, nohz.idle_cpus_mask);
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 5/7] sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
` (3 preceding siblings ...)
2024-04-03 15:05 ` [PATCH 4/7] sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks Pierre Gondois
@ 2024-04-03 15:05 ` Pierre Gondois
2024-04-03 15:05 ` [PATCH 6/7] sched/fair: Remove on_null_domain() and redundant checks Pierre Gondois
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-03 15:05 UTC (permalink / raw)
To: linux-kernel
Cc: Aaron Lu, Rui Zhang, Pierre Gondois, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
Valentin Schneider, Tejun Heo, Michal Hocko, Andrew Morton,
Waiman Long
Upon attaching a NULL sched domain to a CPU, remove the CPU from
the HKR_TYPE_SCHED isolation mask. CPUs present in this mask
are prevented from being added to the fair.c variables:
- nohz.idle_cpus_mask
- nohz.nr_cpus
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
kernel/sched/topology.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 63aecd2a7a9f..b4fc212ccfb0 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -775,6 +775,11 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
sched_domain_debug(sd, cpu);
+ if (sd)
+ housekeeping_runtime_set_cpu(cpu, HKR_TYPE_SCHED);
+ else
+ housekeeping_runtime_clear_cpu(cpu, HKR_TYPE_SCHED);
+
rq_attach_root(rq, rd);
tmp = rq->sd;
rcu_assign_pointer(rq->sd, sd);
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH 6/7] sched/fair: Remove on_null_domain() and redundant checks
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
` (4 preceding siblings ...)
2024-04-03 15:05 ` [PATCH 5/7] sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask Pierre Gondois
@ 2024-04-03 15:05 ` Pierre Gondois
2024-04-04 7:27 ` Peter Zijlstra
2024-04-03 15:05 ` [PATCH 7/7] sched/fair: Clear idle_cpus_mask for CPUs with NULL sd Pierre Gondois
2024-04-04 3:01 ` [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Waiman Long
7 siblings, 1 reply; 14+ messages in thread
From: Pierre Gondois @ 2024-04-03 15:05 UTC (permalink / raw)
To: linux-kernel
Cc: Aaron Lu, Rui Zhang, Pierre Gondois, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
Valentin Schneider, Waiman Long, Andrew Morton, Michal Hocko
CPUs with a NULL sched domain are removed from the HKR_TYPE_SCHED
isolation mask. The two following checks are equialent:
- !housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED)
- on_null_domain(rq)
Remove on_null_domain() and the redundant checks.
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
kernel/sched/fair.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3e0f2a0f153f..9657c8f2176b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11830,11 +11830,6 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
}
-static inline int on_null_domain(struct rq *rq)
-{
- return unlikely(!rcu_dereference_sched(rq->sd));
-}
-
#ifdef CONFIG_NO_HZ_COMMON
/*
* NOHZ idle load balancing (ILB) details:
@@ -12040,7 +12035,7 @@ void nohz_balance_exit_idle(struct rq *rq)
SCHED_WARN_ON(rq != this_rq());
/* If we're a completely isolated CPU, we don't play: */
- if (on_null_domain(rq))
+ if (!housekeeping_runtime_test_cpu(cpu_of(rq), HKR_TYPE_SCHED))
return;
if (likely(!rq->nohz_tick_stopped))
@@ -12090,12 +12085,8 @@ void nohz_balance_enter_idle(int cpu)
*/
rq->has_blocked_load = 1;
- /* Spare idle load balancing on CPUs that don't want to be disturbed: */
- if (!housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED))
- return;
-
/* If we're a completely isolated CPU, we don't play: */
- if (on_null_domain(rq))
+ if (!housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED))
return;
/*
@@ -12504,11 +12495,14 @@ static __latent_entropy void sched_balance_softirq(struct softirq_action *h)
*/
void sched_balance_trigger(struct rq *rq)
{
+ int cpu = cpu_of(rq);
+
/*
* Don't need to rebalance while attached to NULL domain or
* runqueue CPU is not active
*/
- if (unlikely(on_null_domain(rq) || !cpu_active(cpu_of(rq))))
+ if (unlikely(!housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED)) ||
+ !cpu_active(cpu))
return;
if (time_after_eq(jiffies, rq->next_balance))
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 6/7] sched/fair: Remove on_null_domain() and redundant checks
2024-04-03 15:05 ` [PATCH 6/7] sched/fair: Remove on_null_domain() and redundant checks Pierre Gondois
@ 2024-04-04 7:27 ` Peter Zijlstra
2024-04-05 9:48 ` Pierre Gondois
0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2024-04-04 7:27 UTC (permalink / raw)
To: Pierre Gondois
Cc: linux-kernel, Aaron Lu, Rui Zhang, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
Waiman Long, Andrew Morton, Michal Hocko
On Wed, Apr 03, 2024 at 05:05:38PM +0200, Pierre Gondois wrote:
> CPUs with a NULL sched domain are removed from the HKR_TYPE_SCHED
> isolation mask. The two following checks are equialent:
> - !housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED)
> - on_null_domain(rq)
>
> Remove on_null_domain() and the redundant checks.
>
> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
> ---
> kernel/sched/fair.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3e0f2a0f153f..9657c8f2176b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11830,11 +11830,6 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
>
> }
>
> -static inline int on_null_domain(struct rq *rq)
> -{
> - return unlikely(!rcu_dereference_sched(rq->sd));
> -}
> -
> #ifdef CONFIG_NO_HZ_COMMON
> /*
> * NOHZ idle load balancing (ILB) details:
> @@ -12040,7 +12035,7 @@ void nohz_balance_exit_idle(struct rq *rq)
> SCHED_WARN_ON(rq != this_rq());
>
> /* If we're a completely isolated CPU, we don't play: */
> - if (on_null_domain(rq))
> + if (!housekeeping_runtime_test_cpu(cpu_of(rq), HKR_TYPE_SCHED))
> return;
>
> if (likely(!rq->nohz_tick_stopped))
This seems broken, the whole null domain can happen with cpusets, but
this housekeeping nonsense is predicated on CPU_ISOLATION and none of
that is mandatory for CPUSETS.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 6/7] sched/fair: Remove on_null_domain() and redundant checks
2024-04-04 7:27 ` Peter Zijlstra
@ 2024-04-05 9:48 ` Pierre Gondois
0 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-05 9:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Aaron Lu, Rui Zhang, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
Waiman Long, Andrew Morton, Michal Hocko
Hello Peter,
On 4/4/24 09:27, Peter Zijlstra wrote:
> On Wed, Apr 03, 2024 at 05:05:38PM +0200, Pierre Gondois wrote:
>> CPUs with a NULL sched domain are removed from the HKR_TYPE_SCHED
>> isolation mask. The two following checks are equialent:
>> - !housekeeping_runtime_test_cpu(cpu, HKR_TYPE_SCHED)
>> - on_null_domain(rq)
>>
>> Remove on_null_domain() and the redundant checks.
>>
>> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
>> ---
>> kernel/sched/fair.c | 18 ++++++------------
>> 1 file changed, 6 insertions(+), 12 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 3e0f2a0f153f..9657c8f2176b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -11830,11 +11830,6 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
>>
>> }
>>
>> -static inline int on_null_domain(struct rq *rq)
>> -{
>> - return unlikely(!rcu_dereference_sched(rq->sd));
>> -}
>> -
>> #ifdef CONFIG_NO_HZ_COMMON
>> /*
>> * NOHZ idle load balancing (ILB) details:
>> @@ -12040,7 +12035,7 @@ void nohz_balance_exit_idle(struct rq *rq)
>> SCHED_WARN_ON(rq != this_rq());
>>
>> /* If we're a completely isolated CPU, we don't play: */
>> - if (on_null_domain(rq))
>> + if (!housekeeping_runtime_test_cpu(cpu_of(rq), HKR_TYPE_SCHED))
>> return;
>>
>> if (likely(!rq->nohz_tick_stopped))
>
> This seems broken, the whole null domain can happen with cpusets, but
> this housekeeping nonsense is predicated on CPU_ISOLATION and none of
> that is mandatory for CPUSETS.
ok right,
I will try to remove this implicit dependency,
Regards,
Pierre
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 7/7] sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
` (5 preceding siblings ...)
2024-04-03 15:05 ` [PATCH 6/7] sched/fair: Remove on_null_domain() and redundant checks Pierre Gondois
@ 2024-04-03 15:05 ` Pierre Gondois
2024-04-04 3:01 ` [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Waiman Long
7 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-03 15:05 UTC (permalink / raw)
To: linux-kernel
Cc: Aaron Lu, Rui Zhang, Pierre Gondois, Anna-Maria Behnsen,
Frederic Weisbecker, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
Valentin Schneider, Andrew Morton, Waiman Long, Tejun Heo
As reported in [1], an isolated CPU keeps the values of:
- rq->nohz_tick_stopped
- nohz.idle_cpus_mask
- nohz.nr_cpus
when a NULL sd is attached to the CPU. Clear the values.
[1] https://lore.kernel.org/all/20230804090858.7605-1-rui.zhang@intel.com/
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
include/linux/sched/nohz.h | 2 ++
kernel/sched/fair.c | 11 +++++++++++
kernel/sched/topology.c | 6 ++++--
3 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 6d67e9a5af6b..18e620715c9d 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -9,8 +9,10 @@
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
extern void nohz_balance_enter_idle(int cpu);
extern int get_nohz_timer_target(void);
+extern void nohz_clear_state(int cpu);
#else
static inline void nohz_balance_enter_idle(int cpu) { }
+static void nohz_clear_state(int cpu) { }
#endif
#ifdef CONFIG_NO_HZ_COMMON
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9657c8f2176b..6786d4d78e41 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12014,6 +12014,17 @@ static void nohz_balancer_kick(struct rq *rq)
kick_ilb(flags);
}
+void nohz_clear_state(int cpu)
+{
+ struct rq *rq = cpu_rq(cpu);
+
+ if (rq->nohz_tick_stopped) {
+ rq->nohz_tick_stopped = 0;
+ cpumask_clear_cpu(cpu, nohz.idle_cpus_mask);
+ atomic_dec(&nohz.nr_cpus);
+ }
+}
+
static void set_cpu_sd_state_busy(int cpu)
{
struct sched_domain *sd;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b4fc212ccfb0..e8e40b7d964b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -775,10 +775,12 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
sched_domain_debug(sd, cpu);
- if (sd)
+ if (sd) {
housekeeping_runtime_set_cpu(cpu, HKR_TYPE_SCHED);
- else
+ } else {
housekeeping_runtime_clear_cpu(cpu, HKR_TYPE_SCHED);
+ nohz_clear_state(cpu);
+ }
rq_attach_root(rq, rd);
tmp = rq->sd;
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs
2024-04-03 15:05 [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Pierre Gondois
` (6 preceding siblings ...)
2024-04-03 15:05 ` [PATCH 7/7] sched/fair: Clear idle_cpus_mask for CPUs with NULL sd Pierre Gondois
@ 2024-04-04 3:01 ` Waiman Long
2024-04-04 12:55 ` Pierre Gondois
7 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2024-04-04 3:01 UTC (permalink / raw)
To: Pierre Gondois, linux-kernel
Cc: Aaron Lu, Rui Zhang, Anna-Maria Behnsen, Frederic Weisbecker,
Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
Tejun Heo, Michal Hocko
On 4/3/24 11:05, Pierre Gondois wrote:
> Zhang Rui reported that find_new_ilb() was iterating over CPUs in
> isolated cgroup partitions. This triggered spurious wakeups for
> theses CPUs. [1]
> The initial approach was to ignore CPUs on NULL sched domains, as
> isolated CPUs have a NULL sched domain. However a CPU:
> - with its tick disabled, so taken into account in
> nohz.[idle_cpus_mask|nr_cpus]
> - which is placed in an isolated cgroup partition
> will never update nohz.[idle_cpus_mask|nr_cpus] again.
>
> To avoid that, the following variables should be cleared
> when a CPU is placed in an isolated cgroup partition:
> - nohz.idle_cpus_mask
> - nohz.nr_cpus
> - rq->nohz_tick_stopped
> This would allow to avoid considering wrong nohz.* values during
> idle load balance.
>
> As suggested in [2] and to avoid calling nohz_balance_[enter|exit]_idle()
> from a remote CPU and create concurrency issues, leverage the existing
> housekeeping HK_TYPE_SCHED mask to reflect isolated CPUs (i.e. on NULL
> sched domains).
> Indeed the HK_TYPE_SCHED mask is currently never set by the
> isolcpus/nohz_full kernel parameters, so it defaults to cpu_online_mask.
> Plus it's current usage fits CPUs that are isolated and should
> not take part in load balancing.
>
> Making use of HK_TYPE_SCHED for this purpose implies creating a
> housekeeping mask which can be modified at runtime.
>
> [1] https://lore.kernel.org/all/20230804090858.7605-1-rui.zhang@intel.com/
> [2] https://lore.kernel.org/all/CAKfTPtAMd_KNKhXXGk5MEibzzQUX3BFkWgxtEW2o8FFTX99DKw@mail.gmail.com/
>
> Pierre Gondois (7):
> sched/isolation: Introduce housekeeping_runtime isolation
> sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime
> sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb()
> sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks
> sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask
> sched/fair: Remove on_null_domain() and redundant checks
> sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
>
> include/linux/sched/isolation.h | 30 ++++++++++++++++++++-
> include/linux/sched/nohz.h | 2 ++
> kernel/sched/fair.c | 44 +++++++++++++++++-------------
> kernel/sched/isolation.c | 48 ++++++++++++++++++++++++++++++++-
> kernel/sched/topology.c | 7 +++++
> 5 files changed, 110 insertions(+), 21 deletions(-)
>
I had also posted a patch series on excluding isolated CPUs in isolated
partitions from housekeeping cpumasks earlier this year. See
https://lore.kernel.org/lkml/20240229021414.508972-1-longman@redhat.com/
It took a different approach from this series. It looks like I should
include HK_TYPE_MISC as well.
Cheers,
Longman
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs
2024-04-04 3:01 ` [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs Waiman Long
@ 2024-04-04 12:55 ` Pierre Gondois
2024-04-05 0:23 ` Waiman Long
0 siblings, 1 reply; 14+ messages in thread
From: Pierre Gondois @ 2024-04-04 12:55 UTC (permalink / raw)
To: Waiman Long, linux-kernel
Cc: Aaron Lu, Rui Zhang, Anna-Maria Behnsen, Frederic Weisbecker,
Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
Tejun Heo, Michal Hocko
Hello Waiman,
Thanks for the link, I didn't see the patchset previously.
On 4/4/24 05:01, Waiman Long wrote:
> On 4/3/24 11:05, Pierre Gondois wrote:
>> Zhang Rui reported that find_new_ilb() was iterating over CPUs in
>> isolated cgroup partitions. This triggered spurious wakeups for
>> theses CPUs. [1]
>> The initial approach was to ignore CPUs on NULL sched domains, as
>> isolated CPUs have a NULL sched domain. However a CPU:
>> - with its tick disabled, so taken into account in
>> nohz.[idle_cpus_mask|nr_cpus]
>> - which is placed in an isolated cgroup partition
>> will never update nohz.[idle_cpus_mask|nr_cpus] again.
>>
>> To avoid that, the following variables should be cleared
>> when a CPU is placed in an isolated cgroup partition:
>> - nohz.idle_cpus_mask
>> - nohz.nr_cpus
>> - rq->nohz_tick_stopped
>> This would allow to avoid considering wrong nohz.* values during
>> idle load balance.
>>
>> As suggested in [2] and to avoid calling nohz_balance_[enter|exit]_idle()
>> from a remote CPU and create concurrency issues, leverage the existing
>> housekeeping HK_TYPE_SCHED mask to reflect isolated CPUs (i.e. on NULL
>> sched domains).
>> Indeed the HK_TYPE_SCHED mask is currently never set by the
>> isolcpus/nohz_full kernel parameters, so it defaults to cpu_online_mask.
>> Plus it's current usage fits CPUs that are isolated and should
>> not take part in load balancing.
>>
>> Making use of HK_TYPE_SCHED for this purpose implies creating a
>> housekeeping mask which can be modified at runtime.
>>
>> [1] https://lore.kernel.org/all/20230804090858.7605-1-rui.zhang@intel.com/
>> [2] https://lore.kernel.org/all/CAKfTPtAMd_KNKhXXGk5MEibzzQUX3BFkWgxtEW2o8FFTX99DKw@mail.gmail.com/
>>
>> Pierre Gondois (7):
>> sched/isolation: Introduce housekeeping_runtime isolation
>> sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime
>> sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb()
>> sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks
>> sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask
>> sched/fair: Remove on_null_domain() and redundant checks
>> sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
>>
>> include/linux/sched/isolation.h | 30 ++++++++++++++++++++-
>> include/linux/sched/nohz.h | 2 ++
>> kernel/sched/fair.c | 44 +++++++++++++++++-------------
>> kernel/sched/isolation.c | 48 ++++++++++++++++++++++++++++++++-
>> kernel/sched/topology.c | 7 +++++
>> 5 files changed, 110 insertions(+), 21 deletions(-)
>>
> I had also posted a patch series on excluding isolated CPUs in isolated
> partitions from housekeeping cpumasks earlier this year. See
>
> https://lore.kernel.org/lkml/20240229021414.508972-1-longman@redhat.com/
>
> It took a different approach from this series. It looks like I should
> include HK_TYPE_MISC as well.
The common point between the 2 patchset is that find_new_ilb() won't
take into account isolated CPUs.
The present patchset also:
- clears nohz.[idle_cpus_mask|nr_cpus] variable when a CPU becomes isolated,
cf. [PATCH 7/7] sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
- tries to clean up/gather on_null_domain()/HK_TYPE_SCHED/HK_TYPE_MISC
mask checks, as HK_TYPE_SCHED/HK_TYPE_MISC masks are currently never
set.
but it also:
- updates the housekeeping mask from sched/topology.c. It might be better
to do it from cpuset.c as you did as the update originally comes from
here and it is unlikely another place would require updating housekeeping
CPUs.
A new housekeeping_runtime type is also created, but I think the way you
handle updating housekeeping mask at runtime is better.
- adds a dependency of sched/fair.c over CPU_ISOLATION (cf. housekeeping_*
calls), as Peter noted (IIUC) [1].
Should I re-spin the patchset and try to correct those points ? Or do you
think this should be done differently ?
Regards,
Pierre
[1] https://lore.kernel.org/lkml/20240404072745.GA35684@noisy.programming.kicks-ass.net/
>
> Cheers,
> Longman
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs
2024-04-04 12:55 ` Pierre Gondois
@ 2024-04-05 0:23 ` Waiman Long
2024-04-09 13:33 ` Pierre Gondois
0 siblings, 1 reply; 14+ messages in thread
From: Waiman Long @ 2024-04-05 0:23 UTC (permalink / raw)
To: Pierre Gondois, linux-kernel
Cc: Aaron Lu, Rui Zhang, Anna-Maria Behnsen, Frederic Weisbecker,
Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
Tejun Heo, Michal Hocko
On 4/4/24 08:55, Pierre Gondois wrote:
> Hello Waiman,
> Thanks for the link, I didn't see the patchset previously.
>
> On 4/4/24 05:01, Waiman Long wrote:
>> On 4/3/24 11:05, Pierre Gondois wrote:
>>> Zhang Rui reported that find_new_ilb() was iterating over CPUs in
>>> isolated cgroup partitions. This triggered spurious wakeups for
>>> theses CPUs. [1]
>>> The initial approach was to ignore CPUs on NULL sched domains, as
>>> isolated CPUs have a NULL sched domain. However a CPU:
>>> - with its tick disabled, so taken into account in
>>> nohz.[idle_cpus_mask|nr_cpus]
>>> - which is placed in an isolated cgroup partition
>>> will never update nohz.[idle_cpus_mask|nr_cpus] again.
>>>
>>> To avoid that, the following variables should be cleared
>>> when a CPU is placed in an isolated cgroup partition:
>>> - nohz.idle_cpus_mask
>>> - nohz.nr_cpus
>>> - rq->nohz_tick_stopped
>>> This would allow to avoid considering wrong nohz.* values during
>>> idle load balance.
>>>
>>> As suggested in [2] and to avoid calling
>>> nohz_balance_[enter|exit]_idle()
>>> from a remote CPU and create concurrency issues, leverage the existing
>>> housekeeping HK_TYPE_SCHED mask to reflect isolated CPUs (i.e. on NULL
>>> sched domains).
>>> Indeed the HK_TYPE_SCHED mask is currently never set by the
>>> isolcpus/nohz_full kernel parameters, so it defaults to
>>> cpu_online_mask.
>>> Plus it's current usage fits CPUs that are isolated and should
>>> not take part in load balancing.
>>>
>>> Making use of HK_TYPE_SCHED for this purpose implies creating a
>>> housekeeping mask which can be modified at runtime.
>>>
>>> [1]
>>> https://lore.kernel.org/all/20230804090858.7605-1-rui.zhang@intel.com/
>>> [2]
>>> https://lore.kernel.org/all/CAKfTPtAMd_KNKhXXGk5MEibzzQUX3BFkWgxtEW2o8FFTX99DKw@mail.gmail.com/
>>>
>>> Pierre Gondois (7):
>>> sched/isolation: Introduce housekeeping_runtime isolation
>>> sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime
>>> sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb()
>>> sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks
>>> sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask
>>> sched/fair: Remove on_null_domain() and redundant checks
>>> sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
>>>
>>> include/linux/sched/isolation.h | 30 ++++++++++++++++++++-
>>> include/linux/sched/nohz.h | 2 ++
>>> kernel/sched/fair.c | 44 +++++++++++++++++-------------
>>> kernel/sched/isolation.c | 48
>>> ++++++++++++++++++++++++++++++++-
>>> kernel/sched/topology.c | 7 +++++
>>> 5 files changed, 110 insertions(+), 21 deletions(-)
>>>
>> I had also posted a patch series on excluding isolated CPUs in isolated
>> partitions from housekeeping cpumasks earlier this year. See
>>
>> https://lore.kernel.org/lkml/20240229021414.508972-1-longman@redhat.com/
>>
>> It took a different approach from this series. It looks like I should
>> include HK_TYPE_MISC as well.
>
> The common point between the 2 patchset is that find_new_ilb() won't
> take into account isolated CPUs.
> The present patchset also:
> - clears nohz.[idle_cpus_mask|nr_cpus] variable when a CPU becomes
> isolated,
> cf. [PATCH 7/7] sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
> - tries to clean up/gather on_null_domain()/HK_TYPE_SCHED/HK_TYPE_MISC
> mask checks, as HK_TYPE_SCHED/HK_TYPE_MISC masks are currently never
> set.
> but it also:
> - updates the housekeeping mask from sched/topology.c. It might be better
> to do it from cpuset.c as you did as the update originally comes from
> here and it is unlikely another place would require updating
> housekeeping
> CPUs.
> A new housekeeping_runtime type is also created, but I think the way
> you
> handle updating housekeeping mask at runtime is better.
> - adds a dependency of sched/fair.c over CPU_ISOLATION (cf.
> housekeeping_*
> calls), as Peter noted (IIUC) [1].
That is true. Without CONFIG_CPU_ISOLATION, all the housekeeping* calls
are essentially no-ops.
OTOH, without CONFIG_CPU_ISOLATION, a number of isolation capabilities
won't be there. Most distros will have this config option set anyway.
BTW, a number of the HK_TYPE_* are also used at runtime, like
HK_TYPE_TIMER and HK_TYPE_RCU. So it is hard to artificially distinguish
between runtime or boot time.
I don't believe you need to add direct dependency on
CONFIG_CPU_ISOLATION, but you do have to add any housekeeping check as
an additional check, not as a replacement of the existing check.
Cheers,
Longman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs
2024-04-05 0:23 ` Waiman Long
@ 2024-04-09 13:33 ` Pierre Gondois
0 siblings, 0 replies; 14+ messages in thread
From: Pierre Gondois @ 2024-04-09 13:33 UTC (permalink / raw)
To: Waiman Long, linux-kernel
Cc: Aaron Lu, Rui Zhang, Anna-Maria Behnsen, Frederic Weisbecker,
Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
Tejun Heo, Michal Hocko
On 4/5/24 02:23, Waiman Long wrote:
> On 4/4/24 08:55, Pierre Gondois wrote:
>> Hello Waiman,
>> Thanks for the link, I didn't see the patchset previously.
>>
>> On 4/4/24 05:01, Waiman Long wrote:
>>> On 4/3/24 11:05, Pierre Gondois wrote:
>>>> Zhang Rui reported that find_new_ilb() was iterating over CPUs in
>>>> isolated cgroup partitions. This triggered spurious wakeups for
>>>> theses CPUs. [1]
>>>> The initial approach was to ignore CPUs on NULL sched domains, as
>>>> isolated CPUs have a NULL sched domain. However a CPU:
>>>> - with its tick disabled, so taken into account in
>>>> nohz.[idle_cpus_mask|nr_cpus]
>>>> - which is placed in an isolated cgroup partition
>>>> will never update nohz.[idle_cpus_mask|nr_cpus] again.
>>>>
>>>> To avoid that, the following variables should be cleared
>>>> when a CPU is placed in an isolated cgroup partition:
>>>> - nohz.idle_cpus_mask
>>>> - nohz.nr_cpus
>>>> - rq->nohz_tick_stopped
>>>> This would allow to avoid considering wrong nohz.* values during
>>>> idle load balance.
>>>>
>>>> As suggested in [2] and to avoid calling
>>>> nohz_balance_[enter|exit]_idle()
>>>> from a remote CPU and create concurrency issues, leverage the existing
>>>> housekeeping HK_TYPE_SCHED mask to reflect isolated CPUs (i.e. on NULL
>>>> sched domains).
>>>> Indeed the HK_TYPE_SCHED mask is currently never set by the
>>>> isolcpus/nohz_full kernel parameters, so it defaults to
>>>> cpu_online_mask.
>>>> Plus it's current usage fits CPUs that are isolated and should
>>>> not take part in load balancing.
>>>>
>>>> Making use of HK_TYPE_SCHED for this purpose implies creating a
>>>> housekeeping mask which can be modified at runtime.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/all/20230804090858.7605-1-rui.zhang@intel.com/
>>>> [2]
>>>> https://lore.kernel.org/all/CAKfTPtAMd_KNKhXXGk5MEibzzQUX3BFkWgxtEW2o8FFTX99DKw@mail.gmail.com/
>>>>
>>>> Pierre Gondois (7):
>>>> sched/isolation: Introduce housekeeping_runtime isolation
>>>> sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime
>>>> sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb()
>>>> sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks
>>>> sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask
>>>> sched/fair: Remove on_null_domain() and redundant checks
>>>> sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
>>>>
>>>> include/linux/sched/isolation.h | 30 ++++++++++++++++++++-
>>>> include/linux/sched/nohz.h | 2 ++
>>>> kernel/sched/fair.c | 44 +++++++++++++++++-------------
>>>> kernel/sched/isolation.c | 48
>>>> ++++++++++++++++++++++++++++++++-
>>>> kernel/sched/topology.c | 7 +++++
>>>> 5 files changed, 110 insertions(+), 21 deletions(-)
>>>>
>>> I had also posted a patch series on excluding isolated CPUs in isolated
>>> partitions from housekeeping cpumasks earlier this year. See
>>>
>>> https://lore.kernel.org/lkml/20240229021414.508972-1-longman@redhat.com/
>>>
>>> It took a different approach from this series. It looks like I should
>>> include HK_TYPE_MISC as well.
Yes right, but as noted, if CONFIG_CPUSET is set without CPU_ISOLATION,
adding HK_TYPE_MISC to HOUSEKEEPING_FLAGS in your patchset would have no
effect right ?
The only check that would be always true is on_null_domain() from [1].
>>
>> The common point between the 2 patchset is that find_new_ilb() won't
>> take into account isolated CPUs.
>> The present patchset also:
>> - clears nohz.[idle_cpus_mask|nr_cpus] variable when a CPU becomes
>> isolated,
>> cf. [PATCH 7/7] sched/fair: Clear idle_cpus_mask for CPUs with NULL sd
>> - tries to clean up/gather on_null_domain()/HK_TYPE_SCHED/HK_TYPE_MISC
>> mask checks, as HK_TYPE_SCHED/HK_TYPE_MISC masks are currently never
>> set.
>> but it also:
>> - updates the housekeeping mask from sched/topology.c. It might be better
>> to do it from cpuset.c as you did as the update originally comes from
>> here and it is unlikely another place would require updating
>> housekeeping
>> CPUs.
>> A new housekeeping_runtime type is also created, but I think the way
>> you
>> handle updating housekeeping mask at runtime is better.
>> - adds a dependency of sched/fair.c over CPU_ISOLATION (cf.
>> housekeeping_*
>> calls), as Peter noted (IIUC) [1].
>
> That is true. Without CONFIG_CPU_ISOLATION, all the housekeeping* calls
> are essentially no-ops.
>
> OTOH, without CONFIG_CPU_ISOLATION, a number of isolation capabilities
> won't be there. Most distros will have this config option set anyway.
>
> BTW, a number of the HK_TYPE_* are also used at runtime, like
> HK_TYPE_TIMER and HK_TYPE_RCU. So it is hard to artificially distinguish
> between runtime or boot time.
>
> I don't believe you need to add direct dependency on
> CONFIG_CPU_ISOLATION, but you do have to add any housekeeping check as
> an additional check, not as a replacement of the existing check.
(on another topic)
Isolated CPUs currently keep the state they were in when the isolation was
done, i.e. if the tick was stopped when adding the CPU was added to the
isolated cpumask, then the CPU stays in nohz.idle_cpus_mask forever.
Similarly if the tick was not stopped, the CPU is cleared forever in
nohz.idle_cpus_mask.
This patchset also intended to clear isolated CPUs in nohz.idle_cpus_mask
to let them in a known state. This might not be a good approach.
nohz.idle_cpus_mask is also used in:
nohz_run_idle_balance()
\-_nohz_idle_balance()
\-for_each_cpu_wrap(balance_cpu, nohz.idle_cpus_mask, this_cpu+1)
\-update_nohz_stats()
This is apparently done to update 'the blocked load of already idle CPUs'.
However isolated CPUs might not have their blocked load updated as they are:
- currently randomly part of nohz.idle_cpus_mask.
- after this patch, never part of nohz.idle_cpus_mask.
I am also wondering whether this makes sense for isolated CPUs to update
the blocked load of other CPUs, i.e. if the following would not be needed:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1dd37168da50..9b92700564b1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12291,6 +12291,9 @@ void nohz_run_idle_balance(int cpu)
{
unsigned int flags;
+ if (on_null_domain(cpu_rq(cpu)))
+ return;
+
flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
/*
Regards,
Pierre
^ permalink raw reply related [flat|nested] 14+ messages in thread