All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: lkp@lists.01.org
Subject: Re: [sched] 143e1e28cb4: +17.9% aim7.jobs-per-min, -9.7% hackbench.throughput
Date: Mon, 11 Aug 2014 15:33:52 +0200	[thread overview]
Message-ID: <20140811133352.GC9918@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140810105413.GA29451@localhost>

[-- Attachment #1: Type: text/plain, Size: 4363 bytes --]

On Sun, Aug 10, 2014 at 06:54:13PM +0800, Fengguang Wu wrote:
> This view may be easier to read, by grouping the metrics by test case.
> 
> test case: brickland1/aim7/6000-page_test

OK, I have a similar system to the brickland thing (slightly different
configuration, but should be close enough).

Now; do you have a description of each test-case someplace? In
particular, it might be good to have a small annotation to show which
direction is better.

> 
>     128529 ± 1%     +17.9%     151594 ± 0%  TOTAL aim7.jobs-per-min

jobs per minute, + is better, so no worries there.

>     582269 ±14%     -55.6%     258617 ±16%  TOTAL softirqs.SCHED
>     993654 ± 2%     -19.9%     795962 ± 3%  TOTAL softirqs.RCU
>   15865125 ± 1%     -15.0%   13485882 ± 1%  TOTAL softirqs.TIMER

>   59366697 ± 3%     -46.1%   32017187 ± 7%  TOTAL cpuidle.C1-IVT.time
>      54543 ±11%     -37.2%      34252 ±16%  TOTAL cpuidle.C1-IVT.usage
>      19542 ± 9%     -38.3%      12057 ± 4%  TOTAL cpuidle.C1E-IVT.usage
>   49527464 ± 6%     -32.4%   33488833 ± 4%  TOTAL cpuidle.C1E-IVT.time
>      76064 ± 3%     -32.2%      51572 ± 6%  TOTAL cpuidle.C6-IVT.usage

Less idle time; might be good, if the work is cpubound, might be bad if
not; hard to say.

>       2.82 ± 3%     +21.9%       3.43 ± 4%  TOTAL turbostat.%pc2
>       4.40 ± 2%     +22.0%       5.37 ± 4%  TOTAL turbostat.%c6
>      15.75 ± 1%      -3.4%      15.21 ± 0%  TOTAL turbostat.RAM_W

>    3150464 ± 2%     -24.2%    2387551 ± 3%  TOTAL time.voluntary_context_switches

Typically less ctxsw is better..

>        281 ± 1%     -15.1%        238 ± 0%  TOTAL time.elapsed_time
>      29294 ± 1%     -14.3%      25093 ± 0%  TOTAL time.system_time

Less time spend (on presumably the same work) is better

>    4529818 ± 1%      -8.8%    4129398 ± 1%  TOTAL time.involuntary_context_switches

Less preemptions, also generally better

>      10655 ± 0%      +1.4%      10802 ± 0%  TOTAL time.percent_of_cpu_this_job_got

Seem an improvement; not sure.

Many more stats.. but from the above it looks like its an overall 'win';
or am I reading the thing wrong?


Now I think I see why this is; we've reduced load balancing frequency
significantly on this machine due to:


-#define SD_SIBLING_INIT (struct sched_domain) {                                \
-       .min_interval           = 1,                                    \
-       .max_interval           = 2,                                    \


-#define SD_MC_INIT (struct sched_domain) {                             \
-       .min_interval           = 1,                                    \
-       .max_interval           = 4,                                    \


-#define SD_CPU_INIT (struct sched_domain) {                            \
-       .min_interval           = 1,                                    \
-       .max_interval           = 4,                                    \


        *sd = (struct sched_domain){
                .min_interval           = sd_weight,
                .max_interval           = 2*sd_weight,

Which both increased the min and max value significantly for all domains
involved.

That said; I think we might want to do something like the below; I can
imagine decreasing load balancing too much will negatively impact other
workloads.

Maybe slightly modified to make sure the first domain has a min_interval
of 1.

---
 kernel/sched/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1211575a2208..67ed5d854da1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6049,8 +6049,8 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 		sd_flags &= ~TOPOLOGY_SD_FLAGS;
 
 	*sd = (struct sched_domain){
-		.min_interval		= sd_weight,
-		.max_interval		= 2*sd_weight,
+		.min_interval		= max(1, sd_weight/2),
+		.max_interval		= sd_weight,
 		.busy_factor		= 32,
 		.imbalance_pct		= 125,
 
@@ -6076,7 +6076,7 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 					,
 
 		.last_balance		= jiffies,
-		.balance_interval	= sd_weight,
+		.balance_interval	= max(1, sd_weight/2),
 		.smt_gain		= 0,
 		.max_newidle_lb_cost	= 0,
 		.next_decay_max_lb_cost	= jiffies,

[-- Attachment #2: attachment.sig --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Dave Hansen <dave.hansen@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@01.org, Ingo Molnar <mingo@kernel.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>
Subject: Re: [sched] 143e1e28cb4: +17.9% aim7.jobs-per-min, -9.7% hackbench.throughput
Date: Mon, 11 Aug 2014 15:33:52 +0200	[thread overview]
Message-ID: <20140811133352.GC9918@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140810105413.GA29451@localhost>

[-- Attachment #1: Type: text/plain, Size: 4329 bytes --]

On Sun, Aug 10, 2014 at 06:54:13PM +0800, Fengguang Wu wrote:
> This view may be easier to read, by grouping the metrics by test case.
> 
> test case: brickland1/aim7/6000-page_test

OK, I have a similar system to the brickland thing (slightly different
configuration, but should be close enough).

Now; do you have a description of each test-case someplace? In
particular, it might be good to have a small annotation to show which
direction is better.

> 
>     128529 ± 1%     +17.9%     151594 ± 0%  TOTAL aim7.jobs-per-min

jobs per minute, + is better, so no worries there.

>     582269 ±14%     -55.6%     258617 ±16%  TOTAL softirqs.SCHED
>     993654 ± 2%     -19.9%     795962 ± 3%  TOTAL softirqs.RCU
>   15865125 ± 1%     -15.0%   13485882 ± 1%  TOTAL softirqs.TIMER

>   59366697 ± 3%     -46.1%   32017187 ± 7%  TOTAL cpuidle.C1-IVT.time
>      54543 ±11%     -37.2%      34252 ±16%  TOTAL cpuidle.C1-IVT.usage
>      19542 ± 9%     -38.3%      12057 ± 4%  TOTAL cpuidle.C1E-IVT.usage
>   49527464 ± 6%     -32.4%   33488833 ± 4%  TOTAL cpuidle.C1E-IVT.time
>      76064 ± 3%     -32.2%      51572 ± 6%  TOTAL cpuidle.C6-IVT.usage

Less idle time; might be good, if the work is cpubound, might be bad if
not; hard to say.

>       2.82 ± 3%     +21.9%       3.43 ± 4%  TOTAL turbostat.%pc2
>       4.40 ± 2%     +22.0%       5.37 ± 4%  TOTAL turbostat.%c6
>      15.75 ± 1%      -3.4%      15.21 ± 0%  TOTAL turbostat.RAM_W

>    3150464 ± 2%     -24.2%    2387551 ± 3%  TOTAL time.voluntary_context_switches

Typically less ctxsw is better..

>        281 ± 1%     -15.1%        238 ± 0%  TOTAL time.elapsed_time
>      29294 ± 1%     -14.3%      25093 ± 0%  TOTAL time.system_time

Less time spend (on presumably the same work) is better

>    4529818 ± 1%      -8.8%    4129398 ± 1%  TOTAL time.involuntary_context_switches

Less preemptions, also generally better

>      10655 ± 0%      +1.4%      10802 ± 0%  TOTAL time.percent_of_cpu_this_job_got

Seem an improvement; not sure.

Many more stats.. but from the above it looks like its an overall 'win';
or am I reading the thing wrong?


Now I think I see why this is; we've reduced load balancing frequency
significantly on this machine due to:


-#define SD_SIBLING_INIT (struct sched_domain) {                                \
-       .min_interval           = 1,                                    \
-       .max_interval           = 2,                                    \


-#define SD_MC_INIT (struct sched_domain) {                             \
-       .min_interval           = 1,                                    \
-       .max_interval           = 4,                                    \


-#define SD_CPU_INIT (struct sched_domain) {                            \
-       .min_interval           = 1,                                    \
-       .max_interval           = 4,                                    \


        *sd = (struct sched_domain){
                .min_interval           = sd_weight,
                .max_interval           = 2*sd_weight,

Which both increased the min and max value significantly for all domains
involved.

That said; I think we might want to do something like the below; I can
imagine decreasing load balancing too much will negatively impact other
workloads.

Maybe slightly modified to make sure the first domain has a min_interval
of 1.

---
 kernel/sched/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1211575a2208..67ed5d854da1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6049,8 +6049,8 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 		sd_flags &= ~TOPOLOGY_SD_FLAGS;
 
 	*sd = (struct sched_domain){
-		.min_interval		= sd_weight,
-		.max_interval		= 2*sd_weight,
+		.min_interval		= max(1, sd_weight/2),
+		.max_interval		= sd_weight,
 		.busy_factor		= 32,
 		.imbalance_pct		= 125,
 
@@ -6076,7 +6076,7 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 					,
 
 		.last_balance		= jiffies,
-		.balance_interval	= sd_weight,
+		.balance_interval	= max(1, sd_weight/2),
 		.smt_gain		= 0,
 		.max_newidle_lb_cost	= 0,
 		.next_decay_max_lb_cost	= jiffies,

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  parent reply	other threads:[~2014-08-11 13:33 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-10  4:41 [sched] 143e1e28cb4: +17.9% aim7.jobs-per-min, -9.7% hackbench.throughput Fengguang Wu
2014-08-10  4:41 ` Fengguang Wu
2014-08-10  7:59 ` Peter Zijlstra
2014-08-10  7:59   ` Peter Zijlstra
2014-08-10 10:54   ` Fengguang Wu
2014-08-10 10:54     ` Fengguang Wu
2014-08-10 15:05     ` Peter Zijlstra
2014-08-10 15:05       ` Peter Zijlstra
2014-08-10 15:16       ` Ingo Molnar
2014-08-10 15:16         ` Ingo Molnar
2014-08-11  1:23       ` Fengguang Wu
2014-08-11  1:23         ` Fengguang Wu
2014-08-12 14:57         ` kodiak furr
2014-08-12 14:57           ` kodiak furr
2014-08-11 13:33     ` Peter Zijlstra [this message]
2014-08-11 13:33       ` Peter Zijlstra
2014-08-12  3:59       ` Preeti U Murthy
2014-08-12  3:59         ` Preeti U Murthy
2014-08-12  6:41         ` Peter Zijlstra
2014-08-12  6:41           ` Peter Zijlstra
2014-08-12 14:30       ` Fengguang Wu
2014-08-12 14:30         ` Fengguang Wu
2014-08-25 13:47       ` Vincent Guittot
2014-08-25 13:47         ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140811133352.GC9918@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.