Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alex Shi <alex.shi@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"fweisbec@gmail.com" <fweisbec@gmail.com>,
	"linux@arm.linux.org.uk" <linux@arm.linux.org.uk>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"fenghua.yu@intel.com" <fenghua.yu@intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"pjt@google.com" <pjt@google.com>,
	"fengguang.wu@intel.com" <fengguang.wu@intel.com>,
	"james.hogan@imgtec.com" <james.hogan@imgtec.com>,
	"jason.low2@hp.com" <jason.low2@hp.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"hanjun.guo@linaro.org" <hanjun.guo@linaro.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving
Date: Wed, 08 Jan 2014 22:15:54 +0800	[thread overview]
Message-ID: <52CD5D9A.30604@linaro.org> (raw)
In-Reply-To: <20140107125930.GW31570@twins.programming.kicks-ass.net>

On 01/07/2014 08:59 PM, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 12:55:18PM +0000, Morten Rasmussen wrote:
>> My understanding is that should_we_balance() decides which cpu is
>> eligible for doing the load balancing for a given domain (and the
>> domains above). That is, only one cpu in a group is allowed to load
>> balance between the local group and other groups. That cpu would
>> therefore be reponsible for pulling enough load that the groups are
>> balanced even if it means temporarily overloading itself. The other cpus
>> in the group will take care of load balancing the extra load within the
>> local group later.

Thanks for both of you comments and explanations! :)

I know this patch's change is arguable and my attempt doesn't tune well. But I believe I am in a correct way. :) let me explain a bit for this patch again.

First cpu_load includes the history load info, so repeatedly decay and use the history load is kind of non-sense. and the old source/target_load randomly select history load or current load just according to max/min, it also owe a well explanation.
Second, we consider the bias in source/target_load already. but still use imbalance_pct as last check in idlest/busiest group finding. It is also a kind of redundant job. If we can consider the source/target bias, we'd better not use imbalance_pct again.
And last, imbalance pct overused with quickly core number increasing cpu. Like in find_busiset_group:
Assume a 2 groups domain, each group has 8 cores cpus.
    The target group will bias 8 * (imbalance_pct -100) 
				= 8 * (125 - 100) = 200.
     Since each of cpu bias .25 times load, for 8 cpus, totally bias 2 times average cpu load between groups. That is a too much. But if there only 2 cores in cpu group(common case when the code introduced). the bias is just 2 * 25 / 100 = 0.5 times average cpu load.

Now this patchset remove the cpu_load array avoid repeated history decay; reorganize the imbalance_pct usage to avoid redundant balance bias. and reduce the bias value between cpu groups -- maybe it isn't tune well. :)

> 
> Correct.
> 
>> I may have missed something, but I don't understand the reason for the
>> performance improvements that you are reporting. I see better numbers
>> for a few benchmarks, but I still don't understand why the code makes
>> sense after the cleanup. If we don't understand why it works, we cannot
>> be sure that it doesn't harm other benchmarks. There is always a chance
>> that we miss something but, IMHO, not having any idea to begin with
>> increases the chances for problems later significantly. So why not get
>> to the bottom of the problem of cleaning up cpu_load?
>>
>> Have you done more extensive benchmarking? Have you seen any regressions
>> in other benchmarks?
> 
> I only remember hackbench numbers and that generally fares well with a
> more aggressive balancer since it has no actual work to speak of the
> migration penalty is very low and because there's a metric ton of tasks
> the aggressive leveling makes for more coherent 'throughput'.

I just tested hackbench on arm. and with more testing times plus rebase to .13-rc6, the variation increased, then the benefit become unclear. anyway still no regression find on both perf-stat cpu-migration times and real execute time.

On 0day performance testing should tested kbuild, hackbench, aim7, dbench, tbench, sysbench, netperf etc. etc. No regression found.

The 0day performance testing also catch a cpu migration reduced on kvm guest.
https://lkml.org/lkml/2013/12/21/135 

and another benchmark get benefit on the old patchset:
like the testing results show on 0day performance testing: 

https://lkml.org/lkml/2013/12/4/102

Hi Alex,

We obsevered 150% performance gain with vm-scalability/300s-mmap-pread-seq
testcase with this patch applied. Here is a list of changes we got so far:

testbox : brickland
testcase: vm-scalability/300s-mmap-pread-seq


    f1b6442c7dd12802e622      d70495ef86f397816d73  
       (parent commit)            (this commit)
------------------------  ------------------------  
             26393249.80      +150.9%  66223933.60  vm-scalability.throughput

                  225.12       -49.9%       112.75  time.elapsed_time
                36333.40       -90.7%      3392.20  vmstat.system.cs
                    2.40      +375.0%        11.40  vmstat.cpu.id
              3770081.60       -97.7%     87673.40  time.major_page_faults
              3975276.20       -97.0%    117409.60  time.voluntary_context_switches
                    3.05      +301.7%        12.24  iostat.cpu.idle
                21118.41       -70.3%      6277.19  time.system_time
                   18.40      +130.4%        42.40  vmstat.cpu.us
                   77.00       -41.3%        45.20  vmstat.cpu.sy
                47459.60       -31.3%     32592.20  vmstat.system.in
                82435.40       -12.1%     72443.60  time.involuntary_context_switches
                 5128.13       +14.0%      5848.30  time.user_time
                11656.20        -7.8%     10745.60  time.percent_of_cpu_this_job_got
           1069997484.80        +0.3% 1073679919.00 time.minor_page_faults

Btw, the latest patchset include more clean up.
	git@github.com:alexshi/power-scheduling.git noload
Guess fengguang's 0day performance is doing test on it.

-- 
Thanks
    Alex

next prev parent reply	other threads:[~2014-01-08 14:18 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03  9:05 [PATCH 0/4] sched: remove cpu_load decay Alex Shi
2013-12-03  9:05 ` [PATCH 1/4] sched: shortcut to remove load_idx Alex Shi
2013-12-03  9:05 ` [PATCH 2/4] sched: remove rq->cpu_load[load_idx] array Alex Shi
2013-12-03  9:05 ` [PATCH 3/4] sched: clean up cpu_load update Alex Shi
2013-12-03  9:05 ` [PATCH 4/4] sched: bias to target cpu load to reduce task moving Alex Shi
2013-12-04  9:06   ` Yuanhan Liu
2013-12-04 11:25     ` Alex Shi
2013-12-17 14:10   ` Morten Rasmussen
2013-12-17 15:38     ` Peter Zijlstra
2013-12-19 13:34       ` Alex Shi
2013-12-20 11:19         ` Morten Rasmussen
2013-12-20 14:45           ` Alex Shi
2013-12-25 14:58           ` Alex Shi
2014-01-02 16:04             ` Morten Rasmussen
2014-01-06 13:35               ` Alex Shi
2014-01-07 12:55                 ` Morten Rasmussen
2014-01-07 12:59                   ` Peter Zijlstra
2014-01-07 13:15                     ` Peter Zijlstra
2014-01-07 13:32                       ` Vincent Guittot
2014-01-07 13:40                         ` Peter Zijlstra
2014-01-07 15:16                       ` Morten Rasmussen
2014-01-07 20:37                         ` Peter Zijlstra
2014-01-08 14:15                     ` Alex Shi [this message]
2013-12-03 10:26 ` [PATCH 0/4] sched: remove cpu_load decay Peter Zijlstra
2013-12-10  1:04   ` Alex Shi
2013-12-10  1:06     ` Paul Turner
2013-12-13 19:50     ` bsegall
2013-12-14 12:53       ` Alex Shi
2013-12-13 20:03 ` Peter Zijlstra
2013-12-14 13:27   ` Alex Shi
2013-12-17 14:04     ` Morten Rasmussen
2013-12-17 15:37       ` Peter Zijlstra
2013-12-17 18:12         ` Morten Rasmussen
2013-12-20 14:43           ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CD5D9A.30604@linaro.org \
    --to=alex.shi@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=fengguang.wu@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hanjun.guo@linaro.org \
    --cc=james.hogan@imgtec.com \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.