* [PATCH v5 0/4] Scan for an idle sibling in a single pass
@ 2021-01-27 13:51 Mel Gorman
2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Mel Gorman @ 2021-01-27 13:51 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman
Changelog since v4
o Avoid use of intermediate variable during select_idle_cpu
Changelog since v3
o Drop scanning based on cores, SMT4 results showed problems
Changelog since v2
o Remove unnecessary parameters
o Update nr during scan only when scanning for cpus
Changlog since v1
o Move extern declaration to header for coding style
o Remove unnecessary parameter from __select_idle_cpu
This series of 4 patches reposts three patches from Peter entitled
"select_idle_sibling() wreckage". It only scans the runqueues in a single
pass when searching for an idle sibling.
Three patches from Peter were dropped. The first patch altered how scan
depth was calculated. Scan depth deletion is a random number generator
with two major limitations. The avg_idle time is based on the time
between a CPU going idle and being woken up clamped approximately by
2*sysctl_sched_migration_cost. This is difficult to compare in a sensible
fashion to avg_scan_cost. The second issue is that only the avg_scan_cost
of scan failures is recorded and it does not decay. This requires deeper
surgery that would justify a patch on its own although Peter notes that
https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is
potentially useful for an alternative avg_idle metric.
The second patch dropped scanned based on cores instead of CPUs as it
rationalised the difference between core scanning and CPU scanning.
Unfortunately, Vincent reported problems with SMT4 so it's dropped
for now until depth searching can be fixed.
The third patch dropped converted the idle core scan throttling mechanism
to SIS_PROP. While this would unify the throttling of core and CPU
scanning, it was not free of regressions and has_idle_cores is a fairly
effective throttling mechanism with the caveat that it can have a lot of
false positives for workloads like hackbench.
Peter's series tried to solve three problems at once, this subset addresses
one problem.
kernel/sched/fair.c | 151 +++++++++++++++++++---------------------
kernel/sched/features.h | 1 -
2 files changed, 70 insertions(+), 82 deletions(-)
--
2.26.2
Mel Gorman (4):
sched/fair: Remove SIS_AVG_CPU
sched/fair: Move avg_scan_cost calculations under SIS_PROP
sched/fair: Remove select_idle_smt()
sched/fair: Merge select_idle_core/cpu()
kernel/sched/fair.c | 151 +++++++++++++++++++---------------------
kernel/sched/features.h | 1 -
2 files changed, 70 insertions(+), 82 deletions(-)
--
2.26.2
^ permalink raw reply [flat|nested] 19+ messages in thread* [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU 2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman @ 2021-01-27 13:52 ` Mel Gorman 2021-01-27 13:52 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman ` (3 subsequent siblings) 4 siblings, 0 replies; 19+ messages in thread From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman SIS_AVG_CPU was introduced as a means of avoiding a search when the average search cost indicated that the search would likely fail. It was a blunt instrument and disabled by commit 4c77b18cf8b7 ("sched/fair: Make select_idle_cpu() more aggressive") and later replaced with a proportional search depth by commit 1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()"). While there are corner cases where SIS_AVG_CPU is better, it has now been disabled for almost three years. As the intent of SIS_PROP is to reduce the time complexity of select_idle_cpu(), lets drop SIS_AVG_CPU and focus on SIS_PROP as a throttling mechanism. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 20 +++++++++----------- kernel/sched/features.h | 1 - 2 files changed, 9 insertions(+), 12 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 04a3ce20da67..9f5682aeda2e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6145,7 +6145,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t { struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); struct sched_domain *this_sd; - u64 avg_cost, avg_idle; u64 time; int this = smp_processor_id(); int cpu, nr = INT_MAX; @@ -6154,18 +6153,17 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t if (!this_sd) return -1; - /* - * Due to large variance we need a large fuzz factor; hackbench in - * particularly is sensitive here. - */ - avg_idle = this_rq()->avg_idle / 512; - avg_cost = this_sd->avg_scan_cost + 1; + if (sched_feat(SIS_PROP)) { + u64 avg_cost, avg_idle, span_avg; - if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost) - return -1; + /* + * Due to large variance we need a large fuzz factor; + * hackbench in particularly is sensitive here. + */ + avg_idle = this_rq()->avg_idle / 512; + avg_cost = this_sd->avg_scan_cost + 1; - if (sched_feat(SIS_PROP)) { - u64 span_avg = sd->span_weight * avg_idle; + span_avg = sd->span_weight * avg_idle; if (span_avg > 4*avg_cost) nr = div_u64(span_avg, avg_cost); else diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 68d369cba9e4..e875eabb6600 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -54,7 +54,6 @@ SCHED_FEAT(TTWU_QUEUE, true) /* * When doing wakeups, attempt to limit superfluous scans of the LLC domain. */ -SCHED_FEAT(SIS_AVG_CPU, false) SCHED_FEAT(SIS_PROP, true) /* -- 2.26.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman 2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman @ 2021-01-27 13:52 ` Mel Gorman 2021-01-27 13:52 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman ` (2 subsequent siblings) 4 siblings, 0 replies; 19+ messages in thread From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP check and while we are at it, exclude the cost of initialising the CPU mask from the average scan cost. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9f5682aeda2e..c8d8e185cf3b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t if (!this_sd) return -1; + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + if (sched_feat(SIS_PROP)) { u64 avg_cost, avg_idle, span_avg; @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t nr = div_u64(span_avg, avg_cost); else nr = 4; - } - - time = cpu_clock(this); - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + time = cpu_clock(this); + } for_each_cpu_wrap(cpu, cpus, target) { if (!--nr) @@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t break; } - time = cpu_clock(this) - time; - update_avg(&this_sd->avg_scan_cost, time); + if (sched_feat(SIS_PROP)) { + time = cpu_clock(this) - time; + update_avg(&this_sd->avg_scan_cost, time); + } return cpu; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/4] sched/fair: Remove select_idle_smt() 2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman 2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman 2021-01-27 13:52 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman @ 2021-01-27 13:52 ` Mel Gorman 2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman 2021-02-01 1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey 4 siblings, 0 replies; 19+ messages in thread From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman From: Peter Zijlstra (Intel) <peterz@infradead.org> In order to make the next patch more readable, and to quantify the actual effectiveness of this pass, start by removing it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 30 ------------------------------ 1 file changed, 30 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c8d8e185cf3b..fe587350ea14 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6101,27 +6101,6 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int return -1; } -/* - * Scan the local SMT mask for idle CPUs. - */ -static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target) -{ - int cpu; - - if (!static_branch_likely(&sched_smt_present)) - return -1; - - for_each_cpu(cpu, cpu_smt_mask(target)) { - if (!cpumask_test_cpu(cpu, p->cpus_ptr) || - !cpumask_test_cpu(cpu, sched_domain_span(sd))) - continue; - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) - return cpu; - } - - return -1; -} - #else /* CONFIG_SCHED_SMT */ static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target) @@ -6129,11 +6108,6 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s return -1; } -static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target) -{ - return -1; -} - #endif /* CONFIG_SCHED_SMT */ /* @@ -6323,10 +6297,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) if ((unsigned)i < nr_cpumask_bits) return i; - i = select_idle_smt(p, sd, target); - if ((unsigned)i < nr_cpumask_bits) - return i; - return target; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() 2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman ` (2 preceding siblings ...) 2021-01-27 13:52 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman @ 2021-01-27 13:52 ` Mel Gorman 2021-02-17 13:17 ` [tip: sched/core] " tip-bot2 for Mel Gorman 2021-02-01 1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey 4 siblings, 1 reply; 19+ messages in thread From: Mel Gorman @ 2021-01-27 13:52 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman From: Peter Zijlstra (Intel) <peterz@infradead.org> Both select_idle_core() and select_idle_cpu() do a loop over the same cpumask. Observe that by clearing the already visited CPUs, we can fold the iteration and iterate a core at a time. All we need to do is remember any non-idle CPU we encountered while scanning for an idle core. This way we'll only iterate every CPU once. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 99 +++++++++++++++++++++++++++------------------ 1 file changed, 59 insertions(+), 40 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fe587350ea14..01591e24a95f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6006,6 +6006,14 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p return new_cpu; } +static inline int __select_idle_cpu(int cpu) +{ + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) + return cpu; + + return -1; +} + #ifdef CONFIG_SCHED_SMT DEFINE_STATIC_KEY_FALSE(sched_smt_present); EXPORT_SYMBOL_GPL(sched_smt_present); @@ -6064,48 +6072,51 @@ void __update_idle_core(struct rq *rq) * there are no idle cores left in the system; tracked through * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above. */ -static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target) +static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu) { - struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); - int core, cpu; + bool idle = true; + int cpu; if (!static_branch_likely(&sched_smt_present)) - return -1; - - if (!test_idle_cores(target, false)) - return -1; - - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + return __select_idle_cpu(core); - for_each_cpu_wrap(core, cpus, target) { - bool idle = true; - - for_each_cpu(cpu, cpu_smt_mask(core)) { - if (!available_idle_cpu(cpu)) { - idle = false; - break; + for_each_cpu(cpu, cpu_smt_mask(core)) { + if (!available_idle_cpu(cpu)) { + idle = false; + if (*idle_cpu == -1) { + if (sched_idle_cpu(cpu) && cpumask_test_cpu(cpu, p->cpus_ptr)) { + *idle_cpu = cpu; + break; + } + continue; } + break; } - - if (idle) - return core; - - cpumask_andnot(cpus, cpus, cpu_smt_mask(core)); + if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr)) + *idle_cpu = cpu; } - /* - * Failed to find an idle core; stop looking for one. - */ - set_idle_cores(target, 0); + if (idle) + return core; + cpumask_andnot(cpus, cpus, cpu_smt_mask(core)); return -1; } #else /* CONFIG_SCHED_SMT */ -static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target) +static inline void set_idle_cores(int cpu, int val) { - return -1; +} + +static inline bool test_idle_cores(int cpu, bool def) +{ + return def; +} + +static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu) +{ + return __select_idle_cpu(core); } #endif /* CONFIG_SCHED_SMT */ @@ -6118,10 +6129,11 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target) { struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); + int i, cpu, idle_cpu = -1, nr = INT_MAX; + bool smt = test_idle_cores(target, false); + int this = smp_processor_id(); struct sched_domain *this_sd; u64 time; - int this = smp_processor_id(); - int cpu, nr = INT_MAX; this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc)); if (!this_sd) @@ -6129,7 +6141,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); - if (sched_feat(SIS_PROP)) { + if (sched_feat(SIS_PROP) && !smt) { u64 avg_cost, avg_idle, span_avg; /* @@ -6149,18 +6161,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t } for_each_cpu_wrap(cpu, cpus, target) { - if (!--nr) - return -1; - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) - break; + if (smt) { + i = select_idle_core(p, cpu, cpus, &idle_cpu); + if ((unsigned int)i < nr_cpumask_bits) + return i; + + } else { + if (!--nr) + return -1; + idle_cpu = __select_idle_cpu(cpu); + if ((unsigned int)idle_cpu < nr_cpumask_bits) + break; + } } - if (sched_feat(SIS_PROP)) { + if (smt) + set_idle_cores(this, false); + + if (sched_feat(SIS_PROP) && !smt) { time = cpu_clock(this) - time; update_avg(&this_sd->avg_scan_cost, time); } - return cpu; + return idle_cpu; } /* @@ -6289,10 +6312,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) if (!sd) return target; - i = select_idle_core(p, sd, target); - if ((unsigned)i < nr_cpumask_bits) - return i; - i = select_idle_cpu(p, sd, target); if ((unsigned)i < nr_cpumask_bits) return i; -- 2.26.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [tip: sched/core] sched/fair: Merge select_idle_core/cpu() 2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman @ 2021-02-17 13:17 ` tip-bot2 for Mel Gorman 0 siblings, 0 replies; 19+ messages in thread From: tip-bot2 for Mel Gorman @ 2021-02-17 13:17 UTC (permalink / raw) To: linux-tip-commits Cc: Mel Gorman, Peter Zijlstra (Intel), Ingo Molnar, Vincent Guittot, x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: 9fe1f127b913318c631d0041ecf71486e38c2c2d Gitweb: https://git.kernel.org/tip/9fe1f127b913318c631d0041ecf71486e38c2c2d Author: Mel Gorman <mgorman@techsingularity.net> AuthorDate: Wed, 27 Jan 2021 13:52:03 Committer: Ingo Molnar <mingo@kernel.org> CommitterDate: Wed, 17 Feb 2021 14:07:25 +01:00 sched/fair: Merge select_idle_core/cpu() Both select_idle_core() and select_idle_cpu() do a loop over the same cpumask. Observe that by clearing the already visited CPUs, we can fold the iteration and iterate a core at a time. All we need to do is remember any non-idle CPU we encountered while scanning for an idle core. This way we'll only iterate every CPU once. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210127135203.19633-5-mgorman@techsingularity.net --- kernel/sched/fair.c | 99 +++++++++++++++++++++++++------------------ 1 file changed, 59 insertions(+), 40 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6a0fc8a..c73d588 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6019,6 +6019,14 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p return new_cpu; } +static inline int __select_idle_cpu(int cpu) +{ + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) + return cpu; + + return -1; +} + #ifdef CONFIG_SCHED_SMT DEFINE_STATIC_KEY_FALSE(sched_smt_present); EXPORT_SYMBOL_GPL(sched_smt_present); @@ -6077,48 +6085,51 @@ unlock: * there are no idle cores left in the system; tracked through * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above. */ -static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target) +static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu) { - struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); - int core, cpu; + bool idle = true; + int cpu; if (!static_branch_likely(&sched_smt_present)) - return -1; - - if (!test_idle_cores(target, false)) - return -1; - - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + return __select_idle_cpu(core); - for_each_cpu_wrap(core, cpus, target) { - bool idle = true; - - for_each_cpu(cpu, cpu_smt_mask(core)) { - if (!available_idle_cpu(cpu)) { - idle = false; - break; + for_each_cpu(cpu, cpu_smt_mask(core)) { + if (!available_idle_cpu(cpu)) { + idle = false; + if (*idle_cpu == -1) { + if (sched_idle_cpu(cpu) && cpumask_test_cpu(cpu, p->cpus_ptr)) { + *idle_cpu = cpu; + break; + } + continue; } + break; } - - if (idle) - return core; - - cpumask_andnot(cpus, cpus, cpu_smt_mask(core)); + if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr)) + *idle_cpu = cpu; } - /* - * Failed to find an idle core; stop looking for one. - */ - set_idle_cores(target, 0); + if (idle) + return core; + cpumask_andnot(cpus, cpus, cpu_smt_mask(core)); return -1; } #else /* CONFIG_SCHED_SMT */ -static inline int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target) +static inline void set_idle_cores(int cpu, int val) { - return -1; +} + +static inline bool test_idle_cores(int cpu, bool def) +{ + return def; +} + +static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu) +{ + return __select_idle_cpu(core); } #endif /* CONFIG_SCHED_SMT */ @@ -6131,10 +6142,11 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target) { struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); + int i, cpu, idle_cpu = -1, nr = INT_MAX; + bool smt = test_idle_cores(target, false); + int this = smp_processor_id(); struct sched_domain *this_sd; u64 time; - int this = smp_processor_id(); - int cpu, nr = INT_MAX; this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc)); if (!this_sd) @@ -6142,7 +6154,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); - if (sched_feat(SIS_PROP)) { + if (sched_feat(SIS_PROP) && !smt) { u64 avg_cost, avg_idle, span_avg; /* @@ -6162,18 +6174,29 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t } for_each_cpu_wrap(cpu, cpus, target) { - if (!--nr) - return -1; - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) - break; + if (smt) { + i = select_idle_core(p, cpu, cpus, &idle_cpu); + if ((unsigned int)i < nr_cpumask_bits) + return i; + + } else { + if (!--nr) + return -1; + idle_cpu = __select_idle_cpu(cpu); + if ((unsigned int)idle_cpu < nr_cpumask_bits) + break; + } } - if (sched_feat(SIS_PROP)) { + if (smt) + set_idle_cores(this, false); + + if (sched_feat(SIS_PROP) && !smt) { time = cpu_clock(this) - time; update_avg(&this_sd->avg_scan_cost, time); } - return cpu; + return idle_cpu; } /* @@ -6302,10 +6325,6 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) if (!sd) return target; - i = select_idle_core(p, sd, target); - if ((unsigned)i < nr_cpumask_bits) - return i; - i = select_idle_cpu(p, sd, target); if ((unsigned)i < nr_cpumask_bits) return i; ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v5 0/4] Scan for an idle sibling in a single pass 2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman ` (3 preceding siblings ...) 2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman @ 2021-02-01 1:13 ` Li, Aubrey 2021-02-01 13:06 ` Mel Gorman 4 siblings, 1 reply; 19+ messages in thread From: Li, Aubrey @ 2021-02-01 1:13 UTC (permalink / raw) To: Mel Gorman, Peter Zijlstra, Ingo Molnar Cc: Vincent Guittot, Qais Yousef, LKML On 2021/1/27 21:51, Mel Gorman wrote: > Changelog since v4 > o Avoid use of intermediate variable during select_idle_cpu > > Changelog since v3 > o Drop scanning based on cores, SMT4 results showed problems > > Changelog since v2 > o Remove unnecessary parameters > o Update nr during scan only when scanning for cpus > > Changlog since v1 > o Move extern declaration to header for coding style > o Remove unnecessary parameter from __select_idle_cpu > > This series of 4 patches reposts three patches from Peter entitled > "select_idle_sibling() wreckage". It only scans the runqueues in a single > pass when searching for an idle sibling. > > Three patches from Peter were dropped. The first patch altered how scan > depth was calculated. Scan depth deletion is a random number generator > with two major limitations. The avg_idle time is based on the time > between a CPU going idle and being woken up clamped approximately by > 2*sysctl_sched_migration_cost. This is difficult to compare in a sensible > fashion to avg_scan_cost. The second issue is that only the avg_scan_cost > of scan failures is recorded and it does not decay. This requires deeper > surgery that would justify a patch on its own although Peter notes that > https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is > potentially useful for an alternative avg_idle metric. > > The second patch dropped scanned based on cores instead of CPUs as it > rationalised the difference between core scanning and CPU scanning. > Unfortunately, Vincent reported problems with SMT4 so it's dropped > for now until depth searching can be fixed. > > The third patch dropped converted the idle core scan throttling mechanism > to SIS_PROP. While this would unify the throttling of core and CPU > scanning, it was not free of regressions and has_idle_cores is a fairly > effective throttling mechanism with the caveat that it can have a lot of > false positives for workloads like hackbench. > > Peter's series tried to solve three problems at once, this subset addresses > one problem. > > kernel/sched/fair.c | 151 +++++++++++++++++++--------------------- > kernel/sched/features.h | 1 - > 2 files changed, 70 insertions(+), 82 deletions(-) > 4 benchmarks measured on a x86 4s system with 24 cores per socket and 2 HTs per core, total 192 CPUs. The load level is [25%, 50%, 75%, 100%]. - hackbench almost has a universal win. - netperf high load has notable changes, as well as tbench 50% load. Details below: hackbench: 10 iterations, 10000 loops, 40 fds per group ====================================================== - pipe process group base %std v5 %std 3 1 19.18 1.0266 9.06 6 1 9.17 0.987 13.03 9 1 7.11 1.0195 4.61 12 1 1.07 0.9927 1.43 - pipe thread group base %std v5 %std 3 1 11.14 0.9742 7.27 6 1 9.15 0.9572 7.48 9 1 2.95 0.986 4.05 12 1 1.75 0.9992 1.68 - socket process group base %std v5 %std 3 1 2.9 0.9586 2.39 6 1 0.68 0.9641 1.3 9 1 0.64 0.9388 0.76 12 1 0.56 0.9375 0.55 - socket thread group base %std v5 %std 3 1 3.82 0.9686 2.97 6 1 2.06 0.9667 1.91 9 1 0.44 0.9354 1.25 12 1 0.54 0.9362 0.6 netperf: 10 iterations x 100 seconds, transactions rate / sec ============================================================= - tcp request/response performance thread base %std v4 %std 25% 1 5.34 1.0039 5.13 50% 1 4.97 1.0115 6.3 75% 1 5.09 0.9257 6.75 100% 1 4.53 0.908 4.83 - udp request/response performance thread base %std v4 %std 25% 1 6.18 0.9896 6.09 50% 1 5.88 1.0198 8.92 75% 1 24.38 0.9236 29.14 100% 1 26.16 0.9063 22.16 tbench: 10 iterations x 100 seconds, throughput / sec ===================================================== thread base %std v4 %std 25% 1 0.45 1.003 1.48 50% 1 1.71 0.9286 0.82 75% 1 0.84 0.9928 0.94 100% 1 0.76 0.9762 0.59 schbench: 10 iterations x 100 seconds, 99th percentile latency ============================================================== mthread base %std v4 %std 25% 1 2.89 0.9884 7.34 50% 1 40.38 1.0055 38.37 75% 1 4.76 1.0095 4.62 100% 1 10.09 1.0083 8.03 Thanks, -Aubrey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v5 0/4] Scan for an idle sibling in a single pass 2021-02-01 1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey @ 2021-02-01 13:06 ` Mel Gorman 0 siblings, 0 replies; 19+ messages in thread From: Mel Gorman @ 2021-02-01 13:06 UTC (permalink / raw) To: Li, Aubrey Cc: Peter Zijlstra, Ingo Molnar, Vincent Guittot, Qais Yousef, LKML On Mon, Feb 01, 2021 at 09:13:16AM +0800, Li, Aubrey wrote: > > Peter's series tried to solve three problems at once, this subset addresses > > one problem. > > > > kernel/sched/fair.c | 151 +++++++++++++++++++--------------------- > > kernel/sched/features.h | 1 - > > 2 files changed, 70 insertions(+), 82 deletions(-) > > > > 4 benchmarks measured on a x86 4s system with 24 cores per socket and > 2 HTs per core, total 192 CPUs. > > The load level is [25%, 50%, 75%, 100%]. > > - hackbench almost has a universal win. > - netperf high load has notable changes, as well as tbench 50% load. > Ok, both netperf and tbench are somewhat expected as at those loads are rapidly idling. Previously I observed that rapidly idling loads can allow the has_idle_cores test pass for short durations and the double scanning means there is a greater chance of finding an idle CPU over the two passes. I think overall it's better to avoid double scanning even if there are counter examples as it's possible we'll get that back through better depth selection in the future. Thanks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v4 0/4] Scan for an idle sibling in a single pass @ 2021-01-25 8:59 Mel Gorman 2021-01-25 8:59 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman 0 siblings, 1 reply; 19+ messages in thread From: Mel Gorman @ 2021-01-25 8:59 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman Changelog since v3 o Drop scanning based on cores, SMT4 results showed problems Changelog since v2 o Remove unnecessary parameters o Update nr during scan only when scanning for cpus Changlog since v1 o Move extern declaration to header for coding style o Remove unnecessary parameter from __select_idle_cpu This series of 4 patches reposts three patches from Peter entitled "select_idle_sibling() wreckage". It only scans the runqueues in a single pass when searching for an idle sibling. Three patches from Peter were dropped. The first patch altered how scan depth was calculated. Scan depth deletion is a random number generator with two major limitations. The avg_idle time is based on the time between a CPU going idle and being woken up clamped approximately by 2*sysctl_sched_migration_cost. This is difficult to compare in a sensible fashion to avg_scan_cost. The second issue is that only the avg_scan_cost of scan failures is recorded and it does not decay. This requires deeper surgery that would justify a patch on its own although Peter notes that https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is potentially useful for an alternative avg_idle metric. The second patch dropped scanned based on cores instead of CPUs as it rationalised the difference between core scanning and CPU scanning. Unfortunately, Vincent reported problems with SMT4 so it's dropped for now until depth searching can be fixed. The third patch dropped converted the idle core scan throttling mechanism to SIS_PROP. While this would unify the throttling of core and CPU scanning, it was not free of regressions and has_idle_cores is a fairly effective throttling mechanism with the caveat that it can have a lot of false positives for workloads like hackbench. Peter's series tried to solve three problems at once, this subset addresses one problem. losses but won more than it lost across a range of workloads and machines. kernel/sched/fair.c | 153 +++++++++++++++++++--------------------- kernel/sched/features.h | 1 - 2 files changed, 72 insertions(+), 82 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2021-01-25 8:59 [PATCH v4 " Mel Gorman @ 2021-01-25 8:59 ` Mel Gorman 2021-01-27 10:39 ` Vincent Guittot 0 siblings, 1 reply; 19+ messages in thread From: Mel Gorman @ 2021-01-25 8:59 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Vincent Guittot, Li Aubrey, Qais Yousef, LKML, Mel Gorman As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP check and while we are at it, exclude the cost of initialising the CPU mask from the average scan cost. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> --- kernel/sched/fair.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9f5682aeda2e..c8d8e185cf3b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t if (!this_sd) return -1; + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + if (sched_feat(SIS_PROP)) { u64 avg_cost, avg_idle, span_avg; @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t nr = div_u64(span_avg, avg_cost); else nr = 4; - } - - time = cpu_clock(this); - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + time = cpu_clock(this); + } for_each_cpu_wrap(cpu, cpus, target) { if (!--nr) @@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t break; } - time = cpu_clock(this) - time; - update_avg(&this_sd->avg_scan_cost, time); + if (sched_feat(SIS_PROP)) { + time = cpu_clock(this) - time; + update_avg(&this_sd->avg_scan_cost, time); + } return cpu; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2021-01-25 8:59 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman @ 2021-01-27 10:39 ` Vincent Guittot 0 siblings, 0 replies; 19+ messages in thread From: Vincent Guittot @ 2021-01-27 10:39 UTC (permalink / raw) To: Mel Gorman; +Cc: Peter Zijlstra, Ingo Molnar, Li Aubrey, Qais Yousef, LKML On Mon, 25 Jan 2021 at 09:59, Mel Gorman <mgorman@techsingularity.net> wrote: > > As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP > even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP > check and while we are at it, exclude the cost of initialising the CPU > mask from the average scan cost. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 9f5682aeda2e..c8d8e185cf3b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > if (!this_sd) > return -1; > > + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > + > if (sched_feat(SIS_PROP)) { > u64 avg_cost, avg_idle, span_avg; > > @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > nr = div_u64(span_avg, avg_cost); > else > nr = 4; > - } > - > - time = cpu_clock(this); > > - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > + time = cpu_clock(this); > + } > > for_each_cpu_wrap(cpu, cpus, target) { > if (!--nr) > @@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > break; > } > > - time = cpu_clock(this) - time; > - update_avg(&this_sd->avg_scan_cost, time); > + if (sched_feat(SIS_PROP)) { > + time = cpu_clock(this) - time; > + update_avg(&this_sd->avg_scan_cost, time); > + } > > return cpu; > } > -- > 2.26.2 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 0/4] Reduce scanning of runqueues in select_idle_sibling
@ 2020-12-08 15:34 Mel Gorman
2020-12-08 15:34 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman
0 siblings, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2020-12-08 15:34 UTC (permalink / raw)
To: Peter Ziljstra, Ingo Molnar, LKML
Cc: Aubrey Li, Barry Song, Juri Lelli, Vincent Guittot,
Valentin Schneider, Linux-ARM, Mel Gorman
Changelog since v1
o Drop single-pass patch (vincent)
o Scope variables used for SIS_AVG_CPU (dietmar)
o Remove redundant assignment (dietmar
This reduces the amount of runqueue scanning in select_idle_sibling in
the worst case.
Patch 1 removes SIS_AVG_CPU because it's unused.
Patch 2 moves all SIS_PROP-related calculations under SIS_PROP
Patch 3 improves the hit rate of p->recent_used_cpu to reduce the amount
of scanning. It should be relatively uncontroversial
Patch 4 returns an idle candidate if one is found while scanning for a
free core.
--
2.26.2
Mel Gorman (4):
sched/fair: Remove SIS_AVG_CPU
sched/fair: Move avg_scan_cost calculations under SIS_PROP
sched/fair: Do not replace recent_used_cpu with the new target
sched/fair: Return an idle cpu if one is found after a failed search
for an idle core
kernel/sched/fair.c | 51 ++++++++++++++++++++---------------------
kernel/sched/features.h | 1 -
2 files changed, 25 insertions(+), 27 deletions(-)
--
2.26.2
^ permalink raw reply [flat|nested] 19+ messages in thread* [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-08 15:34 [PATCH 0/4] Reduce scanning of runqueues in select_idle_sibling Mel Gorman @ 2020-12-08 15:34 ` Mel Gorman 2020-12-08 16:03 ` Vincent Guittot 2020-12-10 5:18 ` Li, Aubrey 0 siblings, 2 replies; 19+ messages in thread From: Mel Gorman @ 2020-12-08 15:34 UTC (permalink / raw) To: Peter Ziljstra, Ingo Molnar, LKML Cc: Aubrey Li, Barry Song, Juri Lelli, Vincent Guittot, Valentin Schneider, Linux-ARM, Mel Gorman As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP check and while we are at it, exclude the cost of initialising the CPU mask from the average scan cost. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> --- kernel/sched/fair.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ac7b34e7372b..5c41875aec23 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t if (!this_sd) return -1; + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + if (sched_feat(SIS_PROP)) { u64 avg_cost, avg_idle, span_avg; @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t nr = div_u64(span_avg, avg_cost); else nr = 4; - } - - time = cpu_clock(this); - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + time = cpu_clock(this); + } for_each_cpu_wrap(cpu, cpus, target) { if (!--nr) @@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t break; } - time = cpu_clock(this) - time; - update_avg(&this_sd->avg_scan_cost, time); + if (sched_feat(SIS_PROP)) { + time = cpu_clock(this) - time; + update_avg(&this_sd->avg_scan_cost, time); + } return cpu; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-08 15:34 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman @ 2020-12-08 16:03 ` Vincent Guittot 2020-12-08 16:30 ` Mel Gorman 2020-12-09 5:28 ` Li, Aubrey 2020-12-10 5:18 ` Li, Aubrey 1 sibling, 2 replies; 19+ messages in thread From: Vincent Guittot @ 2020-12-08 16:03 UTC (permalink / raw) To: Mel Gorman Cc: Peter Ziljstra, Ingo Molnar, LKML, Aubrey Li, Barry Song, Juri Lelli, Valentin Schneider, Linux-ARM On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote: > > As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP > even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP > check and while we are at it, exclude the cost of initialising the CPU > mask from the average scan cost. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > --- > kernel/sched/fair.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ac7b34e7372b..5c41875aec23 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > if (!this_sd) > return -1; Just noticed while reviewing the patch that the above related to this_sd can also go under sched_feat(SIS_PROP) > > + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > + > if (sched_feat(SIS_PROP)) { > u64 avg_cost, avg_idle, span_avg; > > @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > nr = div_u64(span_avg, avg_cost); > else > nr = 4; > - } > - > - time = cpu_clock(this); > > - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > + time = cpu_clock(this); > + } > > for_each_cpu_wrap(cpu, cpus, target) { > if (!--nr) > @@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > break; > } > > - time = cpu_clock(this) - time; > - update_avg(&this_sd->avg_scan_cost, time); > + if (sched_feat(SIS_PROP)) { > + time = cpu_clock(this) - time; > + update_avg(&this_sd->avg_scan_cost, time); > + } > > return cpu; > } > -- > 2.26.2 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-08 16:03 ` Vincent Guittot @ 2020-12-08 16:30 ` Mel Gorman 2020-12-09 5:28 ` Li, Aubrey 1 sibling, 0 replies; 19+ messages in thread From: Mel Gorman @ 2020-12-08 16:30 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Ziljstra, Ingo Molnar, LKML, Aubrey Li, Barry Song, Juri Lelli, Valentin Schneider, Linux-ARM On Tue, Dec 08, 2020 at 05:03:21PM +0100, Vincent Guittot wrote: > On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote: > > > > As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP > > even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP > > check and while we are at it, exclude the cost of initialising the CPU > > mask from the average scan cost. > > > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > --- > > kernel/sched/fair.c | 14 ++++++++------ > > 1 file changed, 8 insertions(+), 6 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index ac7b34e7372b..5c41875aec23 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > > if (!this_sd) > > return -1; > > Just noticed while reviewing the patch that the above related to > this_sd can also go under sched_feat(SIS_PROP) > Technically yes but I also decided against it. It's a functional difference depending on whether SIS_PROP is set in the highly unlikely case that this_sd == NULL. I was also thinking in terms of what happens if SIS_PROP was disabled and enabled while a search is in progress even if it's very unlikely. In that case, this_sd would be uninitialised. That might be impossible in practice depending on how static branching is implemented but I don't think we should rely on the internals of jump labels and play it safe. I can move it in if you feel strongly about it but I think the disable/enable race is enough of a concern to leave it alone. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-08 16:03 ` Vincent Guittot 2020-12-08 16:30 ` Mel Gorman @ 2020-12-09 5:28 ` Li, Aubrey 2020-12-09 9:05 ` Mel Gorman 1 sibling, 1 reply; 19+ messages in thread From: Li, Aubrey @ 2020-12-09 5:28 UTC (permalink / raw) To: Vincent Guittot, Mel Gorman Cc: Peter Ziljstra, Ingo Molnar, LKML, Barry Song, Juri Lelli, Valentin Schneider, Linux-ARM On 2020/12/9 0:03, Vincent Guittot wrote: > On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote: >> >> As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP >> even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP >> check and while we are at it, exclude the cost of initialising the CPU >> mask from the average scan cost. >> >> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> >> --- >> kernel/sched/fair.c | 14 ++++++++------ >> 1 file changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index ac7b34e7372b..5c41875aec23 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t >> if (!this_sd) >> return -1; > > Just noticed while reviewing the patch that the above related to > this_sd can also go under sched_feat(SIS_PROP) > >> >> + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); >> + >> if (sched_feat(SIS_PROP)) { >> u64 avg_cost, avg_idle, span_avg; >> >> @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t >> nr = div_u64(span_avg, avg_cost); >> else >> nr = 4; >> - } >> - >> - time = cpu_clock(this); >> >> - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); >> + time = cpu_clock(this); >> + } >> >> for_each_cpu_wrap(cpu, cpus, target) { >> if (!--nr) nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well. Thanks, -Aubrey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-09 5:28 ` Li, Aubrey @ 2020-12-09 9:05 ` Mel Gorman 2020-12-09 11:07 ` Li, Aubrey 0 siblings, 1 reply; 19+ messages in thread From: Mel Gorman @ 2020-12-09 9:05 UTC (permalink / raw) To: Li, Aubrey Cc: Vincent Guittot, Peter Ziljstra, Ingo Molnar, LKML, Barry Song, Juri Lelli, Valentin Schneider, Linux-ARM On Wed, Dec 09, 2020 at 01:28:11PM +0800, Li, Aubrey wrote: > >> nr = div_u64(span_avg, avg_cost); > >> else > >> nr = 4; > >> - } > >> - > >> - time = cpu_clock(this); > >> > >> - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > >> + time = cpu_clock(this); > >> + } > >> > >> for_each_cpu_wrap(cpu, cpus, target) { > >> if (!--nr) > > nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well. > It isn't necessary as nr in initialised to INT_MAX if !SIS_PROP. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-09 9:05 ` Mel Gorman @ 2020-12-09 11:07 ` Li, Aubrey 2020-12-09 11:33 ` Mel Gorman 0 siblings, 1 reply; 19+ messages in thread From: Li, Aubrey @ 2020-12-09 11:07 UTC (permalink / raw) To: Mel Gorman Cc: Vincent Guittot, Peter Ziljstra, Ingo Molnar, LKML, Barry Song, Juri Lelli, Valentin Schneider, Linux-ARM On 2020/12/9 17:05, Mel Gorman wrote: > On Wed, Dec 09, 2020 at 01:28:11PM +0800, Li, Aubrey wrote: >>>> nr = div_u64(span_avg, avg_cost); >>>> else >>>> nr = 4; >>>> - } >>>> - >>>> - time = cpu_clock(this); >>>> >>>> - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); >>>> + time = cpu_clock(this); >>>> + } >>>> >>>> for_each_cpu_wrap(cpu, cpus, target) { >>>> if (!--nr) >> >> nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well. >> > > It isn't necessary as nr in initialised to INT_MAX if !SIS_PROP. >If !SIS_PROP, nr need to -1 then tested in the loop, instead of testing directly. But with SIS_PROP, need adding a test in the loop. Since SIS_PROP is default true, I think it's okay to keep the current way. Thanks, -Aubrey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-09 11:07 ` Li, Aubrey @ 2020-12-09 11:33 ` Mel Gorman 0 siblings, 0 replies; 19+ messages in thread From: Mel Gorman @ 2020-12-09 11:33 UTC (permalink / raw) To: Li, Aubrey Cc: Vincent Guittot, Peter Ziljstra, Ingo Molnar, LKML, Barry Song, Juri Lelli, Valentin Schneider, Linux-ARM On Wed, Dec 09, 2020 at 07:07:11PM +0800, Li, Aubrey wrote: > On 2020/12/9 17:05, Mel Gorman wrote: > > On Wed, Dec 09, 2020 at 01:28:11PM +0800, Li, Aubrey wrote: > >>>> nr = div_u64(span_avg, avg_cost); > >>>> else > >>>> nr = 4; > >>>> - } > >>>> - > >>>> - time = cpu_clock(this); > >>>> > >>>> - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > >>>> + time = cpu_clock(this); > >>>> + } > >>>> > >>>> for_each_cpu_wrap(cpu, cpus, target) { > >>>> if (!--nr) > >> > >> nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well. > >> > > > > It isn't necessary as nr in initialised to INT_MAX if !SIS_PROP. > >If !SIS_PROP, nr need to -1 then tested in the loop, instead of testing directly. > But with SIS_PROP, need adding a test in the loop. > Since SIS_PROP is default true, I think it's okay to keep the current way. > It's because it's default true and the cost is negligible that I'm leaving it alone. The branch cost and nr accounting cost is negligible and it avoids peppering select_idle_cpu() with too many SIS_PROP checks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-08 15:34 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman 2020-12-08 16:03 ` Vincent Guittot @ 2020-12-10 5:18 ` Li, Aubrey 2020-12-10 9:32 ` Mel Gorman 1 sibling, 1 reply; 19+ messages in thread From: Li, Aubrey @ 2020-12-10 5:18 UTC (permalink / raw) To: Mel Gorman, Peter Ziljstra, Ingo Molnar, LKML Cc: Barry Song, Juri Lelli, Vincent Guittot, Valentin Schneider, Linux-ARM On 2020/12/8 23:34, Mel Gorman wrote: > As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP > even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP > check and while we are at it, exclude the cost of initialising the CPU > mask from the average scan cost. > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > --- > kernel/sched/fair.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ac7b34e7372b..5c41875aec23 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > if (!this_sd) > return -1; > > + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > + > if (sched_feat(SIS_PROP)) { > u64 avg_cost, avg_idle, span_avg; > > @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > nr = div_u64(span_avg, avg_cost); > else > nr = 4; > - } > - > - time = cpu_clock(this); > > - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > + time = cpu_clock(this); > + } > > for_each_cpu_wrap(cpu, cpus, target) { > if (!--nr) > return -1; I thought about this again and here seems not to be consistent: - even if nr reduces to 0, shouldn't avg_scan_cost be updated as well before return -1? - if avg_scan_cost is not updated because nr is throttled, the first time = cpu_clock(this); can be optimized. As nr is calculated and we already know which of the weight of cpumask and nr is greater. Thanks, -Aubrey ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP 2020-12-10 5:18 ` Li, Aubrey @ 2020-12-10 9:32 ` Mel Gorman 0 siblings, 0 replies; 19+ messages in thread From: Mel Gorman @ 2020-12-10 9:32 UTC (permalink / raw) To: Li, Aubrey Cc: Peter Ziljstra, Ingo Molnar, LKML, Barry Song, Juri Lelli, Vincent Guittot, Valentin Schneider, Linux-ARM On Thu, Dec 10, 2020 at 01:18:05PM +0800, Li, Aubrey wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index ac7b34e7372b..5c41875aec23 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > > if (!this_sd) > > return -1; > > > > + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > > + > > if (sched_feat(SIS_PROP)) { > > u64 avg_cost, avg_idle, span_avg; > > > > @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t > > nr = div_u64(span_avg, avg_cost); > > else > > nr = 4; > > - } > > - > > - time = cpu_clock(this); > > > > - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); > > + time = cpu_clock(this); > > + } > > > > for_each_cpu_wrap(cpu, cpus, target) { > > if (!--nr) > > return -1; > > I thought about this again and here seems not to be consistent: > - even if nr reduces to 0, shouldn't avg_scan_cost be updated as well before return -1? You're right, but it's outside the scope of this patch. I noted that this was a problem in lore.kernel.org/r/lore.kernel.org/r/20201203141124.7391-8-mgorman@techsingularity.net It's neither a consistent win or loss to always account for it and so was dropped for this series to keep the number of controversial patches to a minimum. > - if avg_scan_cost is not updated because nr is throttled, the first > time = cpu_clock(this); > can be optimized. As nr is calculated and we already know which of the weight of cpumask and nr is greater. > That is also outside the scope of this patch. To do that, cpumask_weight() would have to be calculated but it's likely to be a net loss. Even under light load, nr will be smaller than the domain weight incurring both the cost of cpumask_weight and the clock read in the common case. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-02-17 13:30 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-01-27 13:51 [PATCH v5 0/4] Scan for an idle sibling in a single pass Mel Gorman 2021-01-27 13:52 ` [PATCH 1/4] sched/fair: Remove SIS_AVG_CPU Mel Gorman 2021-01-27 13:52 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman 2021-01-27 13:52 ` [PATCH 3/4] sched/fair: Remove select_idle_smt() Mel Gorman 2021-01-27 13:52 ` [PATCH 4/4] sched/fair: Merge select_idle_core/cpu() Mel Gorman 2021-02-17 13:17 ` [tip: sched/core] " tip-bot2 for Mel Gorman 2021-02-01 1:13 ` [PATCH v5 0/4] Scan for an idle sibling in a single pass Li, Aubrey 2021-02-01 13:06 ` Mel Gorman -- strict thread matches above, loose matches on Subject: below -- 2021-01-25 8:59 [PATCH v4 " Mel Gorman 2021-01-25 8:59 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman 2021-01-27 10:39 ` Vincent Guittot 2020-12-08 15:34 [PATCH 0/4] Reduce scanning of runqueues in select_idle_sibling Mel Gorman 2020-12-08 15:34 ` [PATCH 2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP Mel Gorman 2020-12-08 16:03 ` Vincent Guittot 2020-12-08 16:30 ` Mel Gorman 2020-12-09 5:28 ` Li, Aubrey 2020-12-09 9:05 ` Mel Gorman 2020-12-09 11:07 ` Li, Aubrey 2020-12-09 11:33 ` Mel Gorman 2020-12-10 5:18 ` Li, Aubrey 2020-12-10 9:32 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).