[BUG] cpu controller can't provide fair CPU time for each group

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Miao Xie <miaox@cn.fujitsu.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linux-Kernel <linux-kernel@vger.kernel.org>
Subject: [BUG] cpu controller can't provide fair CPU time for each group
Date: Tue, 03 Nov 2009 11:26:48 +0900	[thread overview]
Message-ID: <4AEF94E8.3030403@cn.fujitsu.com> (raw)

Hi, Peter.

I found two problems about cpu controller:
1) cpu controller didn't provide fair CPU time to groups when the tasks
    attached into those groups were bound to the same logic CPU.
2) cpu controller didn't provide fair CPU time to groups when shares of
    each group <= 2 * nr_cpus.

The detail is following:
1) The first one is that cpu controller didn't provide fair CPU time to
    groups when the tasks attached into those groups were bound to the
    same logic CPU.

    The reason is that there is something with the computing of the per
    cpu shares.

    on my test box with 16 logic CPU, I did the following manipulation:
    a. create 2 cpu controller groups.
    b. attach a task into one group and 2 tasks into the other.
    c. bind three tasks to the same logic cpu.
             +--------+     +--------+
             | group1 |     | group2 |
             +--------+     +--------+
                 |              |
    CPU0      Task A      Task B & Task C

    The following is the reproduce steps:
    # mkdir /dev/cpuctl
    # mount -t cgroup -o cpu,noprefix cpuctl /dev/cpuctl
    # mkdir /dev/cpuctl/1
    # mkdir /dev/cpuctl/2
    # cat /dev/zero > /dev/null &
    # pid1=$!
    # echo $pid1 > /dev/cpuctl/1/tasks
    # taskset -p -c 0 $pid1
    # cat /dev/zero > /dev/null &
    # pid2=$!
    # echo $pid2 > /dev/cpuctl/2/tasks
    # taskset -p -c 0 $pid2
    # cat /dev/zero > /dev/null &
    # pid3=$!
    # echo $pid3 > /dev/cpuctl/2/tasks
    # taskset -p -c 0 $pid3

    some time later, I found the the task in the group1 got the 35% CPU time not
    50% CPU time. It was very strange that this result against the expected.

    this problem was caused by the wrong computing of the per cpu shares.
    According to the design of the cpu controller, the shares of each cpu
    controller group will be divided for every CPU by the workload of each
    logic CPU.
       cpu[i] shares = group shares * CPU[i] workload / sum(CPU workload)

    But if the CPU has no task, cpu controller will pretend there is one of
    average load, usually this average load is 1024, the load of the task whose
    nice is zero. So in the test, the shares of group1 on CPU0 is:
       1024 * (1 * 1024) / ((1 * 1024 + 15 * 1024)) = 64
    and the shares of group2 on CPU0 is:
       1024 * (2 * 1024) / ((2 * 1024 + 15 * 1024)) = 120
    The scheduler of the CPU0 provided CPU time to each group by the shares
    above. The bug occured.

2) The second problem is that cpu controller didn't provide fair CPU time to
    groups when shares of each group <= 2 * nr_cpus

    The reason is that per cpu shares was set to MIN_SHARES(=2) if shares of
    each group <= 2 * nr_cpus.

    on the test box with 16 logic CPU, we do the following test:
    a. create two cpu controller groups
    b. attach 32 tasks into each group
    c. set shares of the first group to 16, the other to 32
             +--------+     +--------+
             | group1 |     | group2 |
             +--------+     +--------+
	        |shares=16     |shares=32
                 |              |
              16 Tasks       32 Tasks

    some time later, the first group got 50% CPU time, not 33%. It also was very
    strange that this result against the expected.

    It is because the shares of cpuctl group was small, and there is many logic
    CPU. So per cpu shares that was computed was less than MIN_SHARES, and then
    was set to MIN_SHARES.

    Maybe 16 and 32 is not used usually. We can set a usual number(such as 1024)
    to avoid this problem on my box. But the number of CPU on a machine will
    become more and more in the future. If the number of CPU is greater than 512,
    this bug will occur even we set shares of group to 1024. This is a usual
    number. At this rate, the usual user will feel strange.

next             reply	other threads:[~2009-11-03  2:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-03  2:26 Miao Xie [this message]
2009-11-05  2:56 ` [BUG] cpu controller can't provide fair CPU time for each group Miao Xie
2009-11-10  0:22   ` Andrew Morton
2009-11-10  9:48 ` Peter Zijlstra
2009-11-11  6:21   ` Yasunori Goto
2009-11-11  7:20     ` Peter Zijlstra
2009-11-11  9:59       ` Yasunori Goto
2009-11-11 20:39       ` Chris Friesen
2009-11-11 20:51         ` Peter Zijlstra
2009-11-11 10:07     ` Peter Zijlstra
2009-11-12  1:12       ` Yasunori Goto
2009-11-19  7:09         ` Yasunori Goto
2009-12-09  9:55       ` [tip:sched/urgent] sched: cgroup: Implement different treatment for idle shares tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AEF94E8.3030403@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox