public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Parth Shah <parth@linux.ibm.com>
To: subhra mazumdar <subhra.mazumdar@oracle.com>,
	linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, mingo@redhat.com, tglx@linutronix.de,
	steven.sistare@oracle.com, dhaval.giani@oracle.com,
	daniel.lezcano@linaro.org, vincent.guittot@linaro.org,
	viresh.kumar@linaro.org, tim.c.chen@linux.intel.com,
	mgorman@techsingularity.net
Subject: Re: [PATCH v3 3/7] sched: rotate the cpu search window for better spread
Date: Sat, 29 Jun 2019 00:06:58 +0530	[thread overview]
Message-ID: <f8b185ca-ae3e-8c54-5381-e9104be4954c@linux.ibm.com> (raw)
In-Reply-To: <20190627012919.4341-4-subhra.mazumdar@oracle.com>

Hi Subhra,

I ran your patch series on IBM POWER systems and this is what I have observed.

On 6/27/19 6:59 AM, subhra mazumdar wrote:
> Rotate the cpu search window for better spread of threads. This will ensure
> an idle cpu will quickly be found if one exists.
> 
> Signed-off-by: subhra mazumdar <subhra.mazumdar@oracle.com>
> ---
>  kernel/sched/fair.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b58f08f..c1ca88e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6188,7 +6188,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>  	u64 avg_cost, avg_idle;
>  	u64 time, cost;
>  	s64 delta;
> -	int cpu, limit, floor, nr = INT_MAX;
> +	int cpu, limit, floor, target_tmp, nr = INT_MAX;
> 
>  	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
>  	if (!this_sd)
> @@ -6219,9 +6219,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>  		}
>  	}
> 
> +	if (per_cpu(next_cpu, target) != -1)
> +		target_tmp = per_cpu(next_cpu, target);
> +	else
> +		target_tmp = target;
> +
>  	time = local_clock();
> 
> -	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
> +	for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
> +		per_cpu(next_cpu, target) = cpu;

This leads to a problem of cache hotness.
AFAIU, in most cases, `target = prev_cpu` of the task being woken up and
selecting an idle CPU nearest to the prev_cpu is favorable.
But since this doesn't keep track of last idle cpu per task, it fails to find the
nearest possible idle CPU in cases when the task is being woken up after other
scheduled task.

Consider below scenario:
=======================
- System: 44 cores, 88 CPUs
- 44 CPU intensive tasks pinned to any CPU in each core. This makes 'select_idle_core' return -1;
- Consider below shown timeline:
- Task T1 runs for time 0-5 on CPU0
- Then task T2 runs for time 6-10 on CPU0
- T1 wakes at time 7, with target=0, and setting
  per_cpu(next_cpu,0)= 4 (let's say cpu 0-3 are busy at the time)
- So T1 runs for time 7-12 on CPU4.
- Meanwhile, T2 wakes at time 11, with target=0, but per_cpu(next_cpu, 0) is 4. So starts
  searching from CPU4 and ends up at CPU 8 or so even though CPU0 is free at that time.
- This goes on further far away from the prev_cpu on each such iteration unless it wraps around after 44 CPUs.


              ^T1           T1$   ^T2          T2$
CPU 0          |             |     |            |      
       -----------------------------------------------------------------------------
               0             5     6            10                       time----->

                                       ^T1                T1$
CPU 4                                   |                  |
       -----------------------------------------------------------------------------
                                        7                  12            time----->

                                                      ^T2          T2$
CPU 8                                                  |            |
       -----------------------------------------------------------------------------
                                                       11                time------>

Symbols: ^Tn: Task Tn wake-up,  Tn$: task Tn sleeps

Above example indicates the both the task T1 and T2 suffers from cache hotness in further iterations.


>  		if (!--nr)
>  			return -1;
>  		if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
> 


Best
Parth


  parent reply	other threads:[~2019-06-28 18:37 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-27  1:29 [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path subhra mazumdar
2019-06-27  1:29 ` [PATCH v3 1/7] sched: limit cpu search in select_idle_cpu subhra mazumdar
2019-06-28 18:47   ` Parth Shah
2019-06-28 22:21     ` Subhra Mazumdar
2019-06-27  1:29 ` [PATCH v3 2/7] sched: introduce per-cpu var next_cpu to track search limit subhra mazumdar
2019-06-27  1:29 ` [PATCH v3 3/7] sched: rotate the cpu search window for better spread subhra mazumdar
2019-06-28 11:54   ` Srikar Dronamraju
2019-06-28 22:34     ` Subhra Mazumdar
2019-06-28 18:36   ` Parth Shah [this message]
2019-06-28 22:14     ` Subhra Mazumdar
2019-06-27  1:29 ` [PATCH v3 4/7] sched: add sched feature to disable idle core search subhra mazumdar
2019-06-27  1:29 ` [PATCH v3 5/7] sched: SIS_CORE " subhra mazumdar
2019-06-28 19:01   ` Parth Shah
2019-06-28 22:29     ` Subhra Mazumdar
2019-07-01  9:57       ` Parth Shah
2019-07-01 20:37         ` Subhra Mazumdar
2019-07-04 12:34           ` Parth Shah
2019-07-14  1:16             ` Subhra Mazumdar
2019-06-27  1:29 ` [PATCH v3 6/7] x86/smpboot: introduce per-cpu variable for HT siblings subhra mazumdar
2019-06-27  6:51   ` Thomas Gleixner
2019-06-27  6:54     ` Thomas Gleixner
2019-06-28  1:06       ` Subhra Mazumdar
2019-06-28  1:02     ` Subhra Mazumdar
2019-06-27  1:29 ` [PATCH v3 7/7] sched: use per-cpu variable cpumask_weight_sibling subhra mazumdar
2019-07-01  9:02 ` [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path Peter Zijlstra
2019-07-01 13:55   ` Patrick Bellasi
2019-07-01 14:04     ` Peter Zijlstra
2019-07-08 22:32       ` Tim Chen
2019-07-01 14:06     ` Peter Zijlstra
2019-07-02  0:01     ` Subhra Mazumdar
2019-07-02  8:54       ` Patrick Bellasi
2019-07-03  3:52         ` Subhra Mazumdar
2019-07-04 11:35           ` Parth Shah
  -- strict thread matches above, loose matches on Subject: below --
2019-06-09  1:49 [PATCH " subhra mazumdar
2019-06-09  1:49 ` [PATCH v3 3/7] sched: rotate the cpu search window for better spread subhra mazumdar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8b185ca-ae3e-8c54-5381-e9104be4954c@linux.ibm.com \
    --to=parth@linux.ibm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dhaval.giani@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=steven.sistare@oracle.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox