* [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu @ 2021-11-24 9:15 Barry Song 2021-11-24 11:13 ` Peter Zijlstra 0 siblings, 1 reply; 7+ messages in thread From: Barry Song @ 2021-11-24 9:15 UTC (permalink / raw) To: mingo, peterz, juri.lelli, vincent.guittot, rostedt, linux-kernel Cc: dietmar.eggemann, bsegall, mgorman, bristot, Barry Song From: Barry Song <song.bao.hua@hisilicon.com> This patch keeps the same scanning amount, but drops a redundant loop of cpumask_next_wrap. The original code did for_each_cpu_wrap(cpu, cpus, target + 1), then checked --nr; this patch does --nr before doing the next loop, thus, it can remove a cpumask_next_wrap() which costs a little bit. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> --- -v2: make code clearer with respect to Peter's comment kernel/sched/fair.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6e476f6..8cd23f1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6278,6 +6278,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool time = cpu_clock(this); } + --nr; for_each_cpu_wrap(cpu, cpus, target + 1) { if (has_idle_core) { i = select_idle_core(p, cpu, cpus, &idle_cpu); @@ -6285,11 +6286,11 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool return i; } else { - if (!--nr) - return -1; idle_cpu = __select_idle_cpu(cpu, p); if ((unsigned int)idle_cpu < nr_cpumask_bits) break; + if (!--nr) + return -1; } } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu 2021-11-24 9:15 [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu Barry Song @ 2021-11-24 11:13 ` Peter Zijlstra 2021-11-24 11:49 ` Barry Song 0 siblings, 1 reply; 7+ messages in thread From: Peter Zijlstra @ 2021-11-24 11:13 UTC (permalink / raw) To: Barry Song Cc: mingo, juri.lelli, vincent.guittot, rostedt, linux-kernel, dietmar.eggemann, bsegall, mgorman, bristot, Barry Song On Wed, Nov 24, 2021 at 05:15:46PM +0800, Barry Song wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 6e476f6..8cd23f1 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6278,6 +6278,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > time = cpu_clock(this); > } > > + --nr; > for_each_cpu_wrap(cpu, cpus, target + 1) { > if (has_idle_core) { > i = select_idle_core(p, cpu, cpus, &idle_cpu); > @@ -6285,11 +6286,11 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > return i; > > } else { > - if (!--nr) > - return -1; > idle_cpu = __select_idle_cpu(cpu, p); > if ((unsigned int)idle_cpu < nr_cpumask_bits) > break; > + if (!--nr) > + return -1; > } > } This way nr can never be 1 for a single iteration -- it current isn't, but that's besides the point. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu 2021-11-24 11:13 ` Peter Zijlstra @ 2021-11-24 11:49 ` Barry Song 2021-11-24 11:57 ` Barry Song 0 siblings, 1 reply; 7+ messages in thread From: Barry Song @ 2021-11-24 11:49 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Steven Rostedt, LKML, Dietmar Eggemann, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Barry Song On Thu, Nov 25, 2021 at 12:13 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Nov 24, 2021 at 05:15:46PM +0800, Barry Song wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 6e476f6..8cd23f1 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -6278,6 +6278,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > time = cpu_clock(this); > > } > > > > + --nr; > > for_each_cpu_wrap(cpu, cpus, target + 1) { > > if (has_idle_core) { > > i = select_idle_core(p, cpu, cpus, &idle_cpu); > > @@ -6285,11 +6286,11 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > return i; > > > > } else { > > - if (!--nr) > > - return -1; > > idle_cpu = __select_idle_cpu(cpu, p); > > if ((unsigned int)idle_cpu < nr_cpumask_bits) > > break; > > + if (!--nr) > > + return -1; > > } > > } > > This way nr can never be 1 for a single iteration -- it current isn't, > but that's besides the point. Yep. nr=1 seems to be a corner case. if nr=1, the original code will return -1 directly without scanning any cpu. but the new code will scan one time because I haven't checked if(!--nr) and return before for_each_cpu_wrap(). so this changes the behaviour for this corner case. but if i change "--nr" to "nr--", this new code will scan nr times rather than nr-1, this changes the behaviour for all cases besides nr!=1. The original code was looping nr times but scanning nr-1 times only. so you prefer here the codes should scan nr times and change the scanning amount from nr-1 to nr? Thanks Barry ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu 2021-11-24 11:49 ` Barry Song @ 2021-11-24 11:57 ` Barry Song 2021-11-24 12:02 ` Barry Song 0 siblings, 1 reply; 7+ messages in thread From: Barry Song @ 2021-11-24 11:57 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Steven Rostedt, LKML, Dietmar Eggemann, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Barry Song On Thu, Nov 25, 2021 at 12:49 AM Barry Song <21cnbao@gmail.com> wrote: > > On Thu, Nov 25, 2021 at 12:13 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Wed, Nov 24, 2021 at 05:15:46PM +0800, Barry Song wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 6e476f6..8cd23f1 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -6278,6 +6278,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > > time = cpu_clock(this); > > > } > > > > > > + --nr; > > > for_each_cpu_wrap(cpu, cpus, target + 1) { > > > if (has_idle_core) { > > > i = select_idle_core(p, cpu, cpus, &idle_cpu); > > > @@ -6285,11 +6286,11 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > > return i; > > > > > > } else { > > > - if (!--nr) > > > - return -1; > > > idle_cpu = __select_idle_cpu(cpu, p); > > > if ((unsigned int)idle_cpu < nr_cpumask_bits) > > > break; > > > + if (!--nr) > > > + return -1; > > > } > > > } > > > > This way nr can never be 1 for a single iteration -- it current isn't, > > but that's besides the point. > > Yep. nr=1 seems to be a corner case. > if nr=1, the original code will return -1 directly without scanning > any cpu. but the new code will scan > one time because I haven't checked if(!--nr) and return before > for_each_cpu_wrap(). so this changes > the behaviour for this corner case. > > but if i change "--nr" to "nr--", this new code will scan nr times > rather than nr-1, this changes the behaviour > for all cases besides nr!=1. The original code was looping nr times > but scanning nr-1 times only. > > so you prefer here the codes should scan nr times and change the > scanning amount from nr-1 > to nr? Let me make it clearer. if nr=5, the original code will loop 5 times, but in the 5th loop, it returns directly, so __select_idle_cpu is only done 4 times. if nr=1, the original code will loop 1 time, but in the 1st loop, it returns directly, so __select_idle_cpu is done 0 times. if i change the code to if(!nr--), while nr=5, the new code will do __select_idle_cpu() 5 times rather than 4 times in the original code. but of course the new code changes the __select_idle_cpu from zero to one time for the corner case nr==1. > > Thanks > Barry ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu 2021-11-24 11:57 ` Barry Song @ 2021-11-24 12:02 ` Barry Song 2021-11-24 15:02 ` Peter Zijlstra 0 siblings, 1 reply; 7+ messages in thread From: Barry Song @ 2021-11-24 12:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Steven Rostedt, LKML, Dietmar Eggemann, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Barry Song On Thu, Nov 25, 2021 at 12:57 AM Barry Song <21cnbao@gmail.com> wrote: > > On Thu, Nov 25, 2021 at 12:49 AM Barry Song <21cnbao@gmail.com> wrote: > > > > On Thu, Nov 25, 2021 at 12:13 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Wed, Nov 24, 2021 at 05:15:46PM +0800, Barry Song wrote: > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index 6e476f6..8cd23f1 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -6278,6 +6278,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > > > time = cpu_clock(this); > > > > } > > > > > > > > + --nr; > > > > for_each_cpu_wrap(cpu, cpus, target + 1) { > > > > if (has_idle_core) { > > > > i = select_idle_core(p, cpu, cpus, &idle_cpu); > > > > @@ -6285,11 +6286,11 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > > > return i; > > > > > > > > } else { > > > > - if (!--nr) > > > > - return -1; > > > > idle_cpu = __select_idle_cpu(cpu, p); > > > > if ((unsigned int)idle_cpu < nr_cpumask_bits) > > > > break; > > > > + if (!--nr) > > > > + return -1; > > > > } > > > > } > > > > > > This way nr can never be 1 for a single iteration -- it current isn't, > > > but that's besides the point. > > > > Yep. nr=1 seems to be a corner case. > > if nr=1, the original code will return -1 directly without scanning > > any cpu. but the new code will scan > > one time because I haven't checked if(!--nr) and return before > > for_each_cpu_wrap(). so this changes > > the behaviour for this corner case. > > > > but if i change "--nr" to "nr--", this new code will scan nr times > > rather than nr-1, this changes the behaviour > > for all cases besides nr!=1. The original code was looping nr times > > but scanning nr-1 times only. > > > > so you prefer here the codes should scan nr times and change the > > scanning amount from nr-1 > > to nr? > > Let me make it clearer. if nr=5, the original code will loop 5 times, > but in the 5th loop, it returns directly, so __select_idle_cpu is > only done 4 times. > > if nr=1, the original code will loop 1 time, but in the 1st loop, > it returns directly, so __select_idle_cpu is done 0 times. this is also why in the first version of patch, i did this: span_avg = sd->span_weight * avg_idle; if (span_avg > 4*avg_cost) - nr = div_u64(span_avg, avg_cost); + nr = div_u64(span_avg, avg_cost) - 1; else - nr = 4; + nr = 3; because we are actually scanning 3 times or div_u64(span_avg, avg_cost) - 1 times but not 4 times or div_u64(span_avg, avg_cost) times. this is not confusing at all. the only thing which is confusing is the original code. > > if i change the code to if(!nr--), while nr=5, the new code will > do __select_idle_cpu() 5 times rather than 4 times in the > original code. > > but of course the new code changes the __select_idle_cpu > from zero to one time for the corner case nr==1. > > > > Thanks > > Barry ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu 2021-11-24 12:02 ` Barry Song @ 2021-11-24 15:02 ` Peter Zijlstra 2021-11-24 20:49 ` Barry Song 0 siblings, 1 reply; 7+ messages in thread From: Peter Zijlstra @ 2021-11-24 15:02 UTC (permalink / raw) To: Barry Song Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Steven Rostedt, LKML, Dietmar Eggemann, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Barry Song On Thu, Nov 25, 2021 at 01:02:00AM +1300, Barry Song wrote: > On Thu, Nov 25, 2021 at 12:57 AM Barry Song <21cnbao@gmail.com> wrote: > > Let me make it clearer. if nr=5, the original code will loop 5 times, > > but in the 5th loop, it returns directly, so __select_idle_cpu is > > only done 4 times. > > > > if nr=1, the original code will loop 1 time, but in the 1st loop, > > it returns directly, so __select_idle_cpu is done 0 times. > > this is also why in the first version of patch, i did this: > span_avg = sd->span_weight * avg_idle; > if (span_avg > 4*avg_cost) > - nr = div_u64(span_avg, avg_cost); > + nr = div_u64(span_avg, avg_cost) - 1; > else > - nr = 4; > + nr = 3; > > because we are actually scanning 3 times or div_u64(span_avg, avg_cost) - 1 > times but not 4 times or div_u64(span_avg, avg_cost) times. It still is confusing, because > 4*span -> nr = avg/span, very much implies we want to bottom out at 4. > this is not confusing at all. the only thing which is confusing is the original > code. But yes, it seems a whole lot of confusion stacked together. Let make it sane and say that we do 'nr' iterations, because clearly that was the intent :-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu 2021-11-24 15:02 ` Peter Zijlstra @ 2021-11-24 20:49 ` Barry Song 0 siblings, 0 replies; 7+ messages in thread From: Barry Song @ 2021-11-24 20:49 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Steven Rostedt, LKML, Dietmar Eggemann, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, Barry Song On Thu, Nov 25, 2021 at 4:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Nov 25, 2021 at 01:02:00AM +1300, Barry Song wrote: > > On Thu, Nov 25, 2021 at 12:57 AM Barry Song <21cnbao@gmail.com> wrote: > > > > Let me make it clearer. if nr=5, the original code will loop 5 times, > > > but in the 5th loop, it returns directly, so __select_idle_cpu is > > > only done 4 times. > > > > > > if nr=1, the original code will loop 1 time, but in the 1st loop, > > > it returns directly, so __select_idle_cpu is done 0 times. > > > > this is also why in the first version of patch, i did this: > > span_avg = sd->span_weight * avg_idle; > > if (span_avg > 4*avg_cost) > > - nr = div_u64(span_avg, avg_cost); > > + nr = div_u64(span_avg, avg_cost) - 1; > > else > > - nr = 4; > > + nr = 3; > > > > because we are actually scanning 3 times or div_u64(span_avg, avg_cost) - 1 > > times but not 4 times or div_u64(span_avg, avg_cost) times. > > It still is confusing, because > 4*span -> nr = avg/span, very much > implies we want to bottom out at 4. > > > this is not confusing at all. the only thing which is confusing is the original > > code. > > But yes, it seems a whole lot of confusion stacked together. Let make it > sane and say that we do 'nr' iterations, because clearly that was the > intent :-) yes. It seems this is much more sensible to do iterations in the number of nr rather than nr-1. we can achieve this goal by two ways: (1) nr--; for_each_cpu_wrap(cpu, cpus, target + 1) { _select_idle_cpu().... if (!nr--) return; } (2) for_each_cpu_wrap(cpu, cpus, target + 1) { _select_idle_cpu().... if (!--nr) return; } it seems the second way is still better as we don't need the "nr--" before for_each_cpu_wrap() ? Thanks Barry ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-11-24 20:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-11-24 9:15 [PATCH v2] sched/fair: Remove the cost of a redundant cpumask_next_wrap in select_idle_cpu Barry Song 2021-11-24 11:13 ` Peter Zijlstra 2021-11-24 11:49 ` Barry Song 2021-11-24 11:57 ` Barry Song 2021-11-24 12:02 ` Barry Song 2021-11-24 15:02 ` Peter Zijlstra 2021-11-24 20:49 ` Barry Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox