group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
@ 2016-09-26 10:42 Christian Borntraeger
  2016-09-26 10:56 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Borntraeger @ 2016-09-26 10:42 UTC (permalink / raw)
  To: Yuyang Du; +Cc: Peter Zijlstra, Ingo Molnar, Linux Kernel Mailing List

Folks,

I have seen big scalability degredations sind 4.3 (bisected 9d89c257d
sched/fair: Rewrite runnable load and utilization average tracking)
This has not been fixed by subsequent patches,e.g. the ones that try to
fix this for interactive workload.

The problem is only visible for sleep/wakeup heavy workload which must
be part of the scheduler group (e.g. a sysbench OLTP inside a KVM guest
as libvirt will put KVM guests into cgroup instances).

For example a simple sysbench oltp with mysql inside a KVM guests with
16 CPUs backed by 8 host cpus (16 host threads) scales less (scale up
inside a guest, having multiple instances). This is the numbers of
events per second.
Unmounting /sys/fs/cgroup/cpu,cpuacct (thus forcing libvirt to not
use group scheduling for KVM guests) makes the behaviour much better:

instances	group		nogroup
1		3406		3002
2		5078		4940
3		6017		6760
4		6471		8216 (+27%)
5		6716		9196
6		6976		9783
7		7127		10170
8		7399		10385 (+40%)

before 9d89c257d ("sched/fair: Rewrite runnable load and utilization
average tracking") there was basically no difference between group
or non-group scheduling. These numbers are with 4.7, older kernels after
9d89c257d show a similar difference.

The bad thing is that there is a lot of idle cpu power in the host
when this happens so the scheduler seems to not realize that this
workload could use more cpus in the host.

I tried some experiments , but I have not found a hack that "fixes" the
degredation, which would give me an indication which part  of the code
is broken. So are there any ideas? Is the estimated group load
calculation just not fast enough for sleep/wakeup workload?

Christian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 10:42 group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking) Christian Borntraeger
@ 2016-09-26 10:56 ` Peter Zijlstra
  2016-09-26 11:42   ` Christian Borntraeger
  2016-09-26 12:25   ` Vincent Guittot
  0 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2016-09-26 10:56 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Yuyang Du, Ingo Molnar, Linux Kernel Mailing List,
	vincent.guittot, Morten.Rasmussen, dietmar.eggemann, pjt, bsegall

On Mon, Sep 26, 2016 at 12:42:22PM +0200, Christian Borntraeger wrote:
> Folks,
> 
> I have seen big scalability degredations sind 4.3 (bisected 9d89c257d
> sched/fair: Rewrite runnable load and utilization average tracking)
> This has not been fixed by subsequent patches,e.g. the ones that try to
> fix this for interactive workload.
> 
> The problem is only visible for sleep/wakeup heavy workload which must
> be part of the scheduler group (e.g. a sysbench OLTP inside a KVM guest
> as libvirt will put KVM guests into cgroup instances).
> 
> For example a simple sysbench oltp with mysql inside a KVM guests with
> 16 CPUs backed by 8 host cpus (16 host threads) scales less (scale up
> inside a guest, having multiple instances). This is the numbers of
> events per second.
> Unmounting /sys/fs/cgroup/cpu,cpuacct (thus forcing libvirt to not
> use group scheduling for KVM guests) makes the behaviour much better:
> 
> 
> instances	group		nogroup
> 1		3406		3002
> 2		5078		4940
> 3		6017		6760
> 4		6471		8216 (+27%)
> 5		6716		9196
> 6		6976		9783
> 7		7127		10170
> 8		7399		10385 (+40%)
> 
> before 9d89c257d ("sched/fair: Rewrite runnable load and utilization
> average tracking") there was basically no difference between group
> or non-group scheduling. These numbers are with 4.7, older kernels after
> 9d89c257d show a similar difference.
> 
> The bad thing is that there is a lot of idle cpu power in the host
> when this happens so the scheduler seems to not realize that this
> workload could use more cpus in the host.
> 
> I tried some experiments , but I have not found a hack that "fixes" the
> degredation, which would give me an indication which part  of the code
> is broken. So are there any ideas? Is the estimated group load
> calculation just not fast enough for sleep/wakeup workload?

One of the differences in the old and new thing is being addressed by
these patches:

  https://lkml.kernel.org/r/1473666472-13749-1-git-send-email-vincent.guittot@linaro.org

Could you see if those patches make a difference? If not, we'll have to
go poke elsewhere ofcourse ;-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 10:56 ` Peter Zijlstra
@ 2016-09-26 11:42   ` Christian Borntraeger
  2016-09-26 11:53     ` Peter Zijlstra
  2016-09-26 12:25   ` Vincent Guittot
  1 sibling, 1 reply; 9+ messages in thread
From: Christian Borntraeger @ 2016-09-26 11:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yuyang Du, Ingo Molnar, Linux Kernel Mailing List,
	vincent.guittot, Morten.Rasmussen, dietmar.eggemann, pjt, bsegall

On 09/26/2016 12:56 PM, Peter Zijlstra wrote:
> On Mon, Sep 26, 2016 at 12:42:22PM +0200, Christian Borntraeger wrote:
>> Folks,
>>
>> I have seen big scalability degredations sind 4.3 (bisected 9d89c257d
>> sched/fair: Rewrite runnable load and utilization average tracking)
>> This has not been fixed by subsequent patches,e.g. the ones that try to
>> fix this for interactive workload.
>>
>> The problem is only visible for sleep/wakeup heavy workload which must
>> be part of the scheduler group (e.g. a sysbench OLTP inside a KVM guest
>> as libvirt will put KVM guests into cgroup instances).
>>
>> For example a simple sysbench oltp with mysql inside a KVM guests with
>> 16 CPUs backed by 8 host cpus (16 host threads) scales less (scale up
>> inside a guest, having multiple instances). This is the numbers of
>> events per second.
>> Unmounting /sys/fs/cgroup/cpu,cpuacct (thus forcing libvirt to not
>> use group scheduling for KVM guests) makes the behaviour much better:
>>
>>
>> instances	group		nogroup
>> 1		3406		3002
>> 2		5078		4940
>> 3		6017		6760
>> 4		6471		8216 (+27%)
>> 5		6716		9196
>> 6		6976		9783
>> 7		7127		10170
>> 8		7399		10385 (+40%)
>>
>> before 9d89c257d ("sched/fair: Rewrite runnable load and utilization
>> average tracking") there was basically no difference between group
>> or non-group scheduling. These numbers are with 4.7, older kernels after
>> 9d89c257d show a similar difference.
>>
>> The bad thing is that there is a lot of idle cpu power in the host
>> when this happens so the scheduler seems to not realize that this
>> workload could use more cpus in the host.
>>
>> I tried some experiments , but I have not found a hack that "fixes" the
>> degredation, which would give me an indication which part  of the code
>> is broken. So are there any ideas? Is the estimated group load
>> calculation just not fast enough for sleep/wakeup workload?
> 
> One of the differences in the old and new thing is being addressed by
> these patches:
> 
>   https://lkml.kernel.org/r/1473666472-13749-1-git-send-email-vincent.guittot@linaro.org
> 
> Could you see if those patches make a difference? If not, we'll have to
> go poke elsewhere ofcourse ;-)

Those patches do not apply cleanly on v4.7, linux/master or next/master.
Is there a good branch to test these patches?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 11:42   ` Christian Borntraeger
@ 2016-09-26 11:53     ` Peter Zijlstra
  2016-09-26 12:01       ` Christian Borntraeger
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2016-09-26 11:53 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Yuyang Du, Ingo Molnar, Linux Kernel Mailing List,
	vincent.guittot, Morten.Rasmussen, dietmar.eggemann, pjt, bsegall

On Mon, Sep 26, 2016 at 01:42:05PM +0200, Christian Borntraeger wrote:
> On 09/26/2016 12:56 PM, Peter Zijlstra wrote:

> > One of the differences in the old and new thing is being addressed by
> > these patches:
> > 
> >   https://lkml.kernel.org/r/1473666472-13749-1-git-send-email-vincent.guittot@linaro.org
> > 
> > Could you see if those patches make a difference? If not, we'll have to
> > go poke elsewhere ofcourse ;-)
> 
> Those patches do not apply cleanly on v4.7, linux/master or next/master.
> Is there a good branch to test these patches?

They seemed to apply for me on tip/sched/core, I pushed out a branch for
you that has them on.

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/propagate

I didn't boot the result though; but they applied without issue.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 11:53     ` Peter Zijlstra
@ 2016-09-26 12:01       ` Christian Borntraeger
  2016-09-26 12:10         ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Borntraeger @ 2016-09-26 12:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yuyang Du, Ingo Molnar, Linux Kernel Mailing List,
	vincent.guittot, Morten.Rasmussen, dietmar.eggemann, pjt, bsegall

On 09/26/2016 01:53 PM, Peter Zijlstra wrote:
> On Mon, Sep 26, 2016 at 01:42:05PM +0200, Christian Borntraeger wrote:
>> On 09/26/2016 12:56 PM, Peter Zijlstra wrote:
> 
>>> One of the differences in the old and new thing is being addressed by
>>> these patches:
>>>
>>>   https://lkml.kernel.org/r/1473666472-13749-1-git-send-email-vincent.guittot@linaro.org
>>>
>>> Could you see if those patches make a difference? If not, we'll have to
>>> go poke elsewhere ofcourse ;-)
>>
>> Those patches do not apply cleanly on v4.7, linux/master or next/master.
>> Is there a good branch to test these patches?
> 
> They seemed to apply for me on tip/sched/core, I pushed out a branch for
> you that has them on.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/propagate
> 
> I didn't boot the result though; but they applied without issue.

They applied ok on next from 9/13. Things go even worse.
With this host configuration:

CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
0   0    0    0      0    0:0:0:0         yes    yes        0
1   0    0    0      0    1:1:1:1         yes    yes        1
2   0    0    0      1    2:2:2:2         yes    yes        2
3   0    0    0      1    3:3:3:3         yes    yes        3
4   0    0    1      2    4:4:4:4         yes    yes        4
5   0    0    1      2    5:5:5:5         yes    yes        5
6   0    0    1      3    6:6:6:6         yes    yes        6
7   0    0    1      3    7:7:7:7         yes    yes        7
8   0    0    1      4    8:8:8:8         yes    yes        8
9   0    0    1      4    9:9:9:9         yes    yes        9
10  0    0    1      5    10:10:10:10     yes    yes        10
11  0    0    1      5    11:11:11:11     yes    yes        11
12  0    0    1      6    12:12:12:12     yes    yes        12
13  0    0    1      6    13:13:13:13     yes    yes        13
14  0    0    1      7    14:14:14:14     yes    yes        14
15  0    0    1      7    15:15:15:15     yes    yes        15

the guest was running either on 0-3 or on 4-15, but never
used the full system. With group scheduling disabled everything was good
again. So looks like that this bug has also some dependency on on the
host topology.

Christian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 12:01       ` Christian Borntraeger
@ 2016-09-26 12:10         ` Peter Zijlstra
  2016-09-26 12:49           ` Christian Borntraeger
  2016-09-26 14:12           ` Christian Borntraeger
  0 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2016-09-26 12:10 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Yuyang Du, Ingo Molnar, Linux Kernel Mailing List,
	vincent.guittot, Morten.Rasmussen, dietmar.eggemann, pjt, bsegall

On Mon, Sep 26, 2016 at 02:01:43PM +0200, Christian Borntraeger wrote:
> They applied ok on next from 9/13. Things go even worse.
> With this host configuration:
> 
> CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
> 0   0    0    0      0    0:0:0:0         yes    yes        0
> 1   0    0    0      0    1:1:1:1         yes    yes        1
> 2   0    0    0      1    2:2:2:2         yes    yes        2
> 3   0    0    0      1    3:3:3:3         yes    yes        3
> 4   0    0    1      2    4:4:4:4         yes    yes        4
> 5   0    0    1      2    5:5:5:5         yes    yes        5
> 6   0    0    1      3    6:6:6:6         yes    yes        6
> 7   0    0    1      3    7:7:7:7         yes    yes        7
> 8   0    0    1      4    8:8:8:8         yes    yes        8
> 9   0    0    1      4    9:9:9:9         yes    yes        9
> 10  0    0    1      5    10:10:10:10     yes    yes        10
> 11  0    0    1      5    11:11:11:11     yes    yes        11
> 12  0    0    1      6    12:12:12:12     yes    yes        12
> 13  0    0    1      6    13:13:13:13     yes    yes        13
> 14  0    0    1      7    14:14:14:14     yes    yes        14
> 15  0    0    1      7    15:15:15:15     yes    yes        15
> 
> the guest was running either on 0-3 or on 4-15, but never
> used the full system. With group scheduling disabled everything was good
> again. So looks like that this bug has also some dependency on on the
> host topology.

OK, so CPU affinities that unevenly straddle topology boundaries like
that are hard (and is generally not recommended), but its not
immediately obvious why it would be so much worse with cgroups enabled.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 12:10         ` Peter Zijlstra
@ 2016-09-26 12:49           ` Christian Borntraeger
  2016-09-26 14:12           ` Christian Borntraeger
  1 sibling, 0 replies; 9+ messages in thread
From: Christian Borntraeger @ 2016-09-26 12:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yuyang Du, Ingo Molnar, Linux Kernel Mailing List,
	vincent.guittot, Morten.Rasmussen, dietmar.eggemann, pjt, bsegall

On 09/26/2016 02:10 PM, Peter Zijlstra wrote:
> On Mon, Sep 26, 2016 at 02:01:43PM +0200, Christian Borntraeger wrote:
>> They applied ok on next from 9/13. Things go even worse.
>> With this host configuration:
>>
>> CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
>> 0   0    0    0      0    0:0:0:0         yes    yes        0
>> 1   0    0    0      0    1:1:1:1         yes    yes        1
>> 2   0    0    0      1    2:2:2:2         yes    yes        2
>> 3   0    0    0      1    3:3:3:3         yes    yes        3
>> 4   0    0    1      2    4:4:4:4         yes    yes        4
>> 5   0    0    1      2    5:5:5:5         yes    yes        5
>> 6   0    0    1      3    6:6:6:6         yes    yes        6
>> 7   0    0    1      3    7:7:7:7         yes    yes        7
>> 8   0    0    1      4    8:8:8:8         yes    yes        8
>> 9   0    0    1      4    9:9:9:9         yes    yes        9
>> 10  0    0    1      5    10:10:10:10     yes    yes        10
>> 11  0    0    1      5    11:11:11:11     yes    yes        11
>> 12  0    0    1      6    12:12:12:12     yes    yes        12
>> 13  0    0    1      6    13:13:13:13     yes    yes        13
>> 14  0    0    1      7    14:14:14:14     yes    yes        14
>> 15  0    0    1      7    15:15:15:15     yes    yes        15
>>
>> the guest was running either on 0-3 or on 4-15, but never
>> used the full system. With group scheduling disabled everything was good
>> again. So looks like that this bug has also some dependency on on the
>> host topology.
> 
> OK, so CPU affinities that unevenly straddle topology boundaries like
> that are hard (and is generally not recommended), but its not
> immediately obvious why it would be so much worse with cgroups enabled.

Well thats what I get from LPAR...
With CPUs 0-3 disabled things are better, but there is still 10%
difference between group/nogroup.
Will test Vincents v4 soon.

In any case, would a 5 second sequence of /proc/sched_debug for the
good/bad case with all 16 host CPUs  (or the reduced 12 cpu set) be useful?

Christian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 12:10         ` Peter Zijlstra
  2016-09-26 12:49           ` Christian Borntraeger
@ 2016-09-26 14:12           ` Christian Borntraeger
  1 sibling, 0 replies; 9+ messages in thread
From: Christian Borntraeger @ 2016-09-26 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yuyang Du, Ingo Molnar, Linux Kernel Mailing List,
	vincent.guittot, Morten.Rasmussen, dietmar.eggemann, pjt, bsegall

On 09/26/2016 02:10 PM, Peter Zijlstra wrote:
> On Mon, Sep 26, 2016 at 02:01:43PM +0200, Christian Borntraeger wrote:
>> They applied ok on next from 9/13. Things go even worse.
>> With this host configuration:
>>
>> CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
>> 0   0    0    0      0    0:0:0:0         yes    yes        0
>> 1   0    0    0      0    1:1:1:1         yes    yes        1
>> 2   0    0    0      1    2:2:2:2         yes    yes        2
>> 3   0    0    0      1    3:3:3:3         yes    yes        3
>> 4   0    0    1      2    4:4:4:4         yes    yes        4
>> 5   0    0    1      2    5:5:5:5         yes    yes        5
>> 6   0    0    1      3    6:6:6:6         yes    yes        6
>> 7   0    0    1      3    7:7:7:7         yes    yes        7
>> 8   0    0    1      4    8:8:8:8         yes    yes        8
>> 9   0    0    1      4    9:9:9:9         yes    yes        9
>> 10  0    0    1      5    10:10:10:10     yes    yes        10
>> 11  0    0    1      5    11:11:11:11     yes    yes        11
>> 12  0    0    1      6    12:12:12:12     yes    yes        12
>> 13  0    0    1      6    13:13:13:13     yes    yes        13
>> 14  0    0    1      7    14:14:14:14     yes    yes        14
>> 15  0    0    1      7    15:15:15:15     yes    yes        15
>>
>> the guest was running either on 0-3 or on 4-15, but never
>> used the full system. With group scheduling disabled everything was good
>> again. So looks like that this bug has also some dependency on on the
>> host topology.
> 
> OK, so CPU affinities that unevenly straddle topology boundaries like
> that are hard (and is generally not recommended), but its not

Ok so I created with cpu hotplug a symmetrical CPU topology:
CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
0   0    0    0      0    0:0:0:0         yes    yes        0
1   0    0    0      0    1:1:1:1         yes    yes        1
2   0    0    0      1    2:2:2:2         yes    yes        2
3   0    0    0      1    3:3:3:3         yes    yes        3
4   0    0    0      2    4:4:4:4         yes    yes        4
5   0    0    0      2    5:5:5:5         yes    yes        5
6   0    0    0      3    6:6:6:6         yes    yes        6
7   0    0    0      3    7:7:7:7         yes    yes        7
8   0    0    0      -    :::             no     yes        8
9   0    0    0      -    :::             no     yes        9
10  0    0    0      -    :::             no     yes        10
11  0    0    0      -    :::             no     yes        11
12  0    0    1      4    8:8:8:8         yes    yes        12
13  0    0    1      4    9:9:9:9         yes    yes        13
14  0    0    1      5    10:10:10:10     yes    yes        14
15  0    0    1      5    11:11:11:11     yes    yes        15
16  0    0    1      6    12:12:12:12     yes    yes        16
17  0    0    1      6    13:13:13:13     yes    yes        17
18  0    0    1      7    14:14:14:14     yes    yes        18
19  0    0    1      7    15:15:15:15     yes    yes        19

Same effect: Only half of the CPUs are used, but the number of
guest CPUs == number of host cpus. Turns out that this is totally
unrelated to this patch set, so it must be something else.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)
  2016-09-26 10:56 ` Peter Zijlstra
  2016-09-26 11:42   ` Christian Borntraeger
@ 2016-09-26 12:25   ` Vincent Guittot
  1 sibling, 0 replies; 9+ messages in thread
From: Vincent Guittot @ 2016-09-26 12:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian Borntraeger, Yuyang Du, Ingo Molnar,
	Linux Kernel Mailing List, Morten Rasmussen, Dietmar Eggemann,
	Paul Turner, Benjamin Segall

On 26 September 2016 at 12:56, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Sep 26, 2016 at 12:42:22PM +0200, Christian Borntraeger wrote:
>> Folks,
>>
>> I have seen big scalability degredations sind 4.3 (bisected 9d89c257d
>> sched/fair: Rewrite runnable load and utilization average tracking)
>> This has not been fixed by subsequent patches,e.g. the ones that try to
>> fix this for interactive workload.
>>
>> The problem is only visible for sleep/wakeup heavy workload which must
>> be part of the scheduler group (e.g. a sysbench OLTP inside a KVM guest
>> as libvirt will put KVM guests into cgroup instances).
>>
>> For example a simple sysbench oltp with mysql inside a KVM guests with
>> 16 CPUs backed by 8 host cpus (16 host threads) scales less (scale up
>> inside a guest, having multiple instances). This is the numbers of
>> events per second.
>> Unmounting /sys/fs/cgroup/cpu,cpuacct (thus forcing libvirt to not
>> use group scheduling for KVM guests) makes the behaviour much better:
>>
>>
>> instances     group           nogroup
>> 1             3406            3002
>> 2             5078            4940
>> 3             6017            6760
>> 4             6471            8216 (+27%)
>> 5             6716            9196
>> 6             6976            9783
>> 7             7127            10170
>> 8             7399            10385 (+40%)
>>
>> before 9d89c257d ("sched/fair: Rewrite runnable load and utilization
>> average tracking") there was basically no difference between group
>> or non-group scheduling. These numbers are with 4.7, older kernels after
>> 9d89c257d show a similar difference.
>>
>> The bad thing is that there is a lot of idle cpu power in the host
>> when this happens so the scheduler seems to not realize that this
>> workload could use more cpus in the host.
>>
>> I tried some experiments , but I have not found a hack that "fixes" the
>> degredation, which would give me an indication which part  of the code
>> is broken. So are there any ideas? Is the estimated group load
>> calculation just not fast enough for sleep/wakeup workload?
>
> One of the differences in the old and new thing is being addressed by
> these patches:
>
>   https://lkml.kernel.org/r/1473666472-13749-1-git-send-email-vincent.guittot@linaro.org

I have just sent a new version which fix one issue with
runnable_load_avg that was raised by Dietmar.

The patchset is also available here:
https://git.linaro.org/people/vincent.guittot/kernel.git sched/pelt
>
> Could you see if those patches make a difference? If not, we'll have to
> go poke elsewhere ofcourse ;-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-26 14:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-26 10:42 group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking) Christian Borntraeger
2016-09-26 10:56 ` Peter Zijlstra
2016-09-26 11:42   ` Christian Borntraeger
2016-09-26 11:53     ` Peter Zijlstra
2016-09-26 12:01       ` Christian Borntraeger
2016-09-26 12:10         ` Peter Zijlstra
2016-09-26 12:49           ` Christian Borntraeger
2016-09-26 14:12           ` Christian Borntraeger
2016-09-26 12:25   ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox