All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
Cc: Parth Shah <parth@linux.ibm.com>,
	Michael Neuling <mikey@neuling.org>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Rik van Riel <riel@surriel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linuxppc-dev@lists.ozlabs.org,
	LKML <linux-kernel@vger.kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Valentin Schneider <valentin.schneider@arm.com>
Subject: Re: [RFC/PATCH] powerpc/smp: Add SD_SHARE_PKG_RESOURCES flag to MC sched-domain
Date: Fri, 2 Apr 2021 13:06:59 +0530	[thread overview]
Message-ID: <20210402073659.GA651@in.ibm.com> (raw)
In-Reply-To: <1617341874-1205-1-git-send-email-ego@linux.vnet.ibm.com>

(Missed cc'ing Cc Peter in the original posting)

On Fri, Apr 02, 2021 at 11:07:54AM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> On POWER10 systems, the L2 cache is at the SMT4 small core level. The
> following commits ensure that L2 cache gets correctly discovered and
> the Last-Level-Cache domain (LLC) is set to the SMT sched-domain.
> 
>     790a166 powerpc/smp: Parse ibm,thread-groups with multiple properties
>     1fdc1d6 powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map
>     fbd2b67 powerpc/smp: Rename init_thread_group_l1_cache_map() to make
>                          it generic
>     538abe powerpc/smp: Add support detecting thread-groups sharing L2 cache
>     0be4763 powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache
> 
> However, with the LLC now on the SMT sched-domain, we are seeing some
> regressions in the performance of applications that requires
> single-threaded performance. The reason for this is as follows:
> 
> Prior to the change (we call this P9-sched below), the sched-domain
> hierarchy was:
> 
> 	  SMT (SMT4) --> CACHE (SMT8)[LLC] --> MC (Hemisphere) --> DIE
> 
> where the CACHE sched-domain is defined to be the Last Level Cache (LLC).
> 
> On the upstream kernel, with the aforementioned commmits (P10-sched),
> the sched-domain hierarchy is:
> 
>     	  SMT (SMT4)[LLC] --> MC (Hemisphere) --> DIE
> 
> with the SMT sched-domain as the LLC.
> 
> When the scheduler tries to wakeup a task, it chooses between the
> waker-CPU and the wakee's previous-CPU. Suppose this choice is called
> the "target", then in the target's LLC domain, the scheduler
> 
> a) tries to find an idle core in the LLC. This helps exploit the
>    SMT folding that the wakee task can benefit from. If an idle
>    core is found, the wakee is woken up on it.
> 
> b) Failing to find an idle core, the scheduler tries to find an idle
>    CPU in the LLC. This helps minimise the wakeup latency for the
>    wakee since it gets to run on the CPU immediately.
> 
> c) Failing this, it will wake it up on target CPU.
> 
> Thus, with P9-sched topology, since the CACHE domain comprises of two
> SMT4 cores, there is a decent chance that we get an idle core, failing
> which there is a relatively higher probability of finding an idle CPU
> among the 8 threads in the domain.
> 
> However, in P10-sched topology, since the SMT domain is the LLC and it
> contains only a single SMT4 core, the probability that we find that
> core to be idle is less. Furthermore, since there are only 4 CPUs to
> search for an idle CPU, there is lower probability that we can get an
> idle CPU to wake up the task on.
> 
> Thus applications which require single threaded performance will end
> up getting woken up on potentially busy core, even though there are
> idle cores in the system.
> 
> To remedy this, this patch proposes that the LLC be moved to the MC
> level which is a group of cores in one half of the chip.
> 
>       SMT (SMT4) --> MC (Hemisphere)[LLC] --> DIE
> 
> While there is no cache being shared at this level, this is still the
> level where some amount of cache-snooping takes place and it is
> relatively faster to access the data from the caches of the cores
> within this domain. With this change, we no longer see regressions on
> P10 for applications which require single threaded performance.
> 
> The patch also improves the tail latencies on schbench and the
> usecs/op on "perf bench sched pipe"
> 
> On a 10 core P10 system with 80 CPUs,
> 
> schbench
> ============
> (https://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git/)
> 
> Values : Lower the better.
> 99th percentile is the tail latency.
> 
> 
> 99th percentile
> ~~~~~~~~~~~~~~~~~~
> No. messenger
> threads       5.12-rc4    5.12-rc4
>               P10-sched   MC-LLC
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1             70 us         85 us
> 2             81 us        101 us
> 3             92 us        107 us
> 4             96 us        110 us
> 5            103 us        123 us
> 6           3412 us ---->  122 us
> 7           1490 us        136 us
> 8           6200 us       3572 us
> 
> 
> Hackbench
> ============
> (perf bench sched pipe)
> values: lower the better
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> No. of
> parallel
> instances   5.12-rc4       5.12-rc4
>             P10-sched      MC-LLC 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1           24.04 us/op    18.72 us/op 
> 2           24.04 us/op    18.65 us/op 
> 4           24.01 us/op    18.76 us/op 
> 8           24.10 us/op    19.11 us/op 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 5a4d59a..c75dbd4 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -976,6 +976,13 @@ static bool has_coregroup_support(void)
>  	return coregroup_enabled;
>  }
> 
> +static int powerpc_mc_flags(void)
> +{
> +	if(has_coregroup_support())
> +		return SD_SHARE_PKG_RESOURCES;
> +	return 0;
> +}
> +
>  static const struct cpumask *cpu_mc_mask(int cpu)
>  {
>  	return cpu_coregroup_mask(cpu);
> @@ -986,7 +993,7 @@ static const struct cpumask *cpu_mc_mask(int cpu)
>  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
>  	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> -	{ cpu_mc_mask, SD_INIT_NAME(MC) },
> +	{ cpu_mc_mask, powerpc_mc_flags, SD_INIT_NAME(MC) },
>  	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
>  	{ NULL, },
>  };
> -- 
> 1.9.4
> 

WARNING: multiple messages have this Message-ID (diff)
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	Michael Neuling <mikey@neuling.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Anton Blanchard <anton@ozlabs.org>,
	Parth Shah <parth@linux.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linuxppc-dev@lists.ozlabs.org,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC/PATCH] powerpc/smp: Add SD_SHARE_PKG_RESOURCES flag to MC sched-domain
Date: Fri, 2 Apr 2021 13:06:59 +0530	[thread overview]
Message-ID: <20210402073659.GA651@in.ibm.com> (raw)
In-Reply-To: <1617341874-1205-1-git-send-email-ego@linux.vnet.ibm.com>

(Missed cc'ing Cc Peter in the original posting)

On Fri, Apr 02, 2021 at 11:07:54AM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> On POWER10 systems, the L2 cache is at the SMT4 small core level. The
> following commits ensure that L2 cache gets correctly discovered and
> the Last-Level-Cache domain (LLC) is set to the SMT sched-domain.
> 
>     790a166 powerpc/smp: Parse ibm,thread-groups with multiple properties
>     1fdc1d6 powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map
>     fbd2b67 powerpc/smp: Rename init_thread_group_l1_cache_map() to make
>                          it generic
>     538abe powerpc/smp: Add support detecting thread-groups sharing L2 cache
>     0be4763 powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache
> 
> However, with the LLC now on the SMT sched-domain, we are seeing some
> regressions in the performance of applications that requires
> single-threaded performance. The reason for this is as follows:
> 
> Prior to the change (we call this P9-sched below), the sched-domain
> hierarchy was:
> 
> 	  SMT (SMT4) --> CACHE (SMT8)[LLC] --> MC (Hemisphere) --> DIE
> 
> where the CACHE sched-domain is defined to be the Last Level Cache (LLC).
> 
> On the upstream kernel, with the aforementioned commmits (P10-sched),
> the sched-domain hierarchy is:
> 
>     	  SMT (SMT4)[LLC] --> MC (Hemisphere) --> DIE
> 
> with the SMT sched-domain as the LLC.
> 
> When the scheduler tries to wakeup a task, it chooses between the
> waker-CPU and the wakee's previous-CPU. Suppose this choice is called
> the "target", then in the target's LLC domain, the scheduler
> 
> a) tries to find an idle core in the LLC. This helps exploit the
>    SMT folding that the wakee task can benefit from. If an idle
>    core is found, the wakee is woken up on it.
> 
> b) Failing to find an idle core, the scheduler tries to find an idle
>    CPU in the LLC. This helps minimise the wakeup latency for the
>    wakee since it gets to run on the CPU immediately.
> 
> c) Failing this, it will wake it up on target CPU.
> 
> Thus, with P9-sched topology, since the CACHE domain comprises of two
> SMT4 cores, there is a decent chance that we get an idle core, failing
> which there is a relatively higher probability of finding an idle CPU
> among the 8 threads in the domain.
> 
> However, in P10-sched topology, since the SMT domain is the LLC and it
> contains only a single SMT4 core, the probability that we find that
> core to be idle is less. Furthermore, since there are only 4 CPUs to
> search for an idle CPU, there is lower probability that we can get an
> idle CPU to wake up the task on.
> 
> Thus applications which require single threaded performance will end
> up getting woken up on potentially busy core, even though there are
> idle cores in the system.
> 
> To remedy this, this patch proposes that the LLC be moved to the MC
> level which is a group of cores in one half of the chip.
> 
>       SMT (SMT4) --> MC (Hemisphere)[LLC] --> DIE
> 
> While there is no cache being shared at this level, this is still the
> level where some amount of cache-snooping takes place and it is
> relatively faster to access the data from the caches of the cores
> within this domain. With this change, we no longer see regressions on
> P10 for applications which require single threaded performance.
> 
> The patch also improves the tail latencies on schbench and the
> usecs/op on "perf bench sched pipe"
> 
> On a 10 core P10 system with 80 CPUs,
> 
> schbench
> ============
> (https://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git/)
> 
> Values : Lower the better.
> 99th percentile is the tail latency.
> 
> 
> 99th percentile
> ~~~~~~~~~~~~~~~~~~
> No. messenger
> threads       5.12-rc4    5.12-rc4
>               P10-sched   MC-LLC
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1             70 us         85 us
> 2             81 us        101 us
> 3             92 us        107 us
> 4             96 us        110 us
> 5            103 us        123 us
> 6           3412 us ---->  122 us
> 7           1490 us        136 us
> 8           6200 us       3572 us
> 
> 
> Hackbench
> ============
> (perf bench sched pipe)
> values: lower the better
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> No. of
> parallel
> instances   5.12-rc4       5.12-rc4
>             P10-sched      MC-LLC 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 1           24.04 us/op    18.72 us/op 
> 2           24.04 us/op    18.65 us/op 
> 4           24.01 us/op    18.76 us/op 
> 8           24.10 us/op    19.11 us/op 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 5a4d59a..c75dbd4 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -976,6 +976,13 @@ static bool has_coregroup_support(void)
>  	return coregroup_enabled;
>  }
> 
> +static int powerpc_mc_flags(void)
> +{
> +	if(has_coregroup_support())
> +		return SD_SHARE_PKG_RESOURCES;
> +	return 0;
> +}
> +
>  static const struct cpumask *cpu_mc_mask(int cpu)
>  {
>  	return cpu_coregroup_mask(cpu);
> @@ -986,7 +993,7 @@ static const struct cpumask *cpu_mc_mask(int cpu)
>  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
>  	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> -	{ cpu_mc_mask, SD_INIT_NAME(MC) },
> +	{ cpu_mc_mask, powerpc_mc_flags, SD_INIT_NAME(MC) },
>  	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
>  	{ NULL, },
>  };
> -- 
> 1.9.4
> 

  reply	other threads:[~2021-04-02  7:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  5:37 [RFC/PATCH] powerpc/smp: Add SD_SHARE_PKG_RESOURCES flag to MC sched-domain Gautham R. Shenoy
2021-04-02  5:37 ` Gautham R. Shenoy
2021-04-02  7:36 ` Gautham R Shenoy [this message]
2021-04-02  7:36   ` Gautham R Shenoy
2021-04-12  6:24 ` Srikar Dronamraju
2021-04-12  6:24   ` Srikar Dronamraju
2021-04-12  9:37   ` Mel Gorman
2021-04-12  9:37     ` Mel Gorman
2021-04-12 10:06     ` Valentin Schneider
2021-04-12 10:06       ` Valentin Schneider
2021-04-12 10:48       ` Mel Gorman
2021-04-12 10:48         ` Mel Gorman
2021-04-19  6:14         ` Gautham R Shenoy
2021-04-19  6:14           ` Gautham R Shenoy
2021-04-12 12:21     ` Vincent Guittot
2021-04-12 12:21       ` Vincent Guittot
2021-04-12 15:24       ` Mel Gorman
2021-04-12 15:24         ` Mel Gorman
2021-04-12 16:33         ` Michal Suchánek
2021-04-12 16:33           ` Michal Suchánek
2021-04-14  7:02           ` Gautham R Shenoy
2021-04-14  7:02             ` Gautham R Shenoy
2021-04-13  7:10         ` Vincent Guittot
2021-04-13  7:10           ` Vincent Guittot
2021-04-14  7:00         ` Gautham R Shenoy
2021-04-14  7:00           ` Gautham R Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210402073659.GA651@in.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@techsingularity.net \
    --cc=mikey@neuling.org \
    --cc=npiggin@gmail.com \
    --cc=parth@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.