All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Tong Li <tong.n.li@intel.com>
Cc: linux-kernel@vger.kernel.org, Chris Snook <csnook@redhat.com>
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS
Date: Wed, 25 Jul 2007 14:03:58 +0200	[thread overview]
Message-ID: <20070725120358.GA30755@elte.hu> (raw)
In-Reply-To: <20070725110159.GA15076@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> > This patch extends CFS to achieve better fairness for SMPs. For 
> > example, with 10 tasks (same priority) on 8 CPUs, it enables each task 
> > to receive equal CPU time (80%). [...]
> 
> hm, CFS should already offer reasonable long-term SMP fairness. It 
> certainly works on a dual-core box, i just started 3 tasks of the same 
> priority on 2 CPUs, and on vanilla 2.6.23-rc1 the distribution is 
> this:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  7084 mingo     20   0  1576  248  196 R   67  0.0   0:50.13 loop
>  7083 mingo     20   0  1576  244  196 R   66  0.0   0:48.86 loop
>  7085 mingo     20   0  1576  244  196 R   66  0.0   0:49.45 loop
> 
> so each task gets a perfect 66% of CPU time.
> 
> prior CFS, we indeed did a 50%/50%/100% split - so for example on 
> v2.6.22:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2256 mingo     25   0  1580  248  196 R  100  0.0   1:03.19 loop
>  2255 mingo     25   0  1580  248  196 R   50  0.0   0:31.79 loop
>  2257 mingo     25   0  1580  248  196 R   50  0.0   0:31.69 loop
> 
> but CFS has changed that behavior.
> 
> I'll check your 10-tasks-on-8-cpus example on an 8-way box too, maybe 
> we regressed somewhere ...

ok, i just tried it on an 8-cpu box and indeed, unlike the dual-core 
case, the scheduler does not distribute tasks well enough:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2572 mingo     20   0  1576  244  196 R  100  0.0   1:03.61 loop
 2578 mingo     20   0  1576  248  196 R  100  0.0   1:03.59 loop
 2576 mingo     20   0  1576  248  196 R  100  0.0   1:03.52 loop
 2571 mingo     20   0  1576  244  196 R  100  0.0   1:03.46 loop
 2569 mingo     20   0  1576  244  196 R   99  0.0   1:03.36 loop
 2570 mingo     20   0  1576  244  196 R   95  0.0   1:00.55 loop
 2577 mingo     20   0  1576  248  196 R   50  0.0   0:31.88 loop
 2574 mingo     20   0  1576  248  196 R   50  0.0   0:31.87 loop
 2573 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop
 2575 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop

but this is relatively easy to fix - with the patch below applied, it 
looks a lot better:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2681 mingo     20   0  1576  244  196 R   85  0.0   3:51.68 loop
 2688 mingo     20   0  1576  244  196 R   81  0.0   3:46.35 loop
 2682 mingo     20   0  1576  244  196 R   80  0.0   3:43.68 loop
 2685 mingo     20   0  1576  248  196 R   80  0.0   3:45.97 loop
 2683 mingo     20   0  1576  248  196 R   80  0.0   3:40.25 loop
 2679 mingo     20   0  1576  244  196 R   80  0.0   3:33.53 loop
 2680 mingo     20   0  1576  244  196 R   79  0.0   3:43.53 loop
 2686 mingo     20   0  1576  244  196 R   79  0.0   3:39.31 loop
 2687 mingo     20   0  1576  244  196 R   78  0.0   3:33.31 loop
 2684 mingo     20   0  1576  244  196 R   77  0.0   3:27.52 loop

they now nicely converte to the expected 80% long-term CPU usage.

so, could you please try the patch below, does it work for you too?

	Ingo

--------------------------->
Subject: sched: increase SCHED_LOAD_SCALE_FUZZ
From: Ingo Molnar <mingo@elte.hu>

increase SCHED_LOAD_SCALE_FUZZ that adds a small amount of
over-balancing: to help distribute CPU-bound tasks more fairly on SMP
systems.

the problem of unfair balancing was noticed and reported by Tong N Li.

10 CPU-bound tasks running on 8 CPUs, v2.6.23-rc1:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2572 mingo     20   0  1576  244  196 R  100  0.0   1:03.61 loop
 2578 mingo     20   0  1576  248  196 R  100  0.0   1:03.59 loop
 2576 mingo     20   0  1576  248  196 R  100  0.0   1:03.52 loop
 2571 mingo     20   0  1576  244  196 R  100  0.0   1:03.46 loop
 2569 mingo     20   0  1576  244  196 R   99  0.0   1:03.36 loop
 2570 mingo     20   0  1576  244  196 R   95  0.0   1:00.55 loop
 2577 mingo     20   0  1576  248  196 R   50  0.0   0:31.88 loop
 2574 mingo     20   0  1576  248  196 R   50  0.0   0:31.87 loop
 2573 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop
 2575 mingo     20   0  1576  248  196 R   50  0.0   0:31.86 loop

v2.6.23-rc1 + patch:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2681 mingo     20   0  1576  244  196 R   85  0.0   3:51.68 loop
 2688 mingo     20   0  1576  244  196 R   81  0.0   3:46.35 loop
 2682 mingo     20   0  1576  244  196 R   80  0.0   3:43.68 loop
 2685 mingo     20   0  1576  248  196 R   80  0.0   3:45.97 loop
 2683 mingo     20   0  1576  248  196 R   80  0.0   3:40.25 loop
 2679 mingo     20   0  1576  244  196 R   80  0.0   3:33.53 loop
 2680 mingo     20   0  1576  244  196 R   79  0.0   3:43.53 loop
 2686 mingo     20   0  1576  244  196 R   79  0.0   3:39.31 loop
 2687 mingo     20   0  1576  244  196 R   78  0.0   3:33.31 loop
 2684 mingo     20   0  1576  244  196 R   77  0.0   3:27.52 loop

so they now nicely converte to the expected 80% long-term CPU usage.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -681,7 +681,7 @@ enum cpu_idle_type {
 #define SCHED_LOAD_SHIFT	10
 #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
 
-#define SCHED_LOAD_SCALE_FUZZ	(SCHED_LOAD_SCALE >> 5)
+#define SCHED_LOAD_SCALE_FUZZ	(SCHED_LOAD_SCALE >> 1)
 
 #ifdef CONFIG_SMP
 #define SD_LOAD_BALANCE		1	/* Do load balancing on this domain. */

  reply	other threads:[~2007-07-25 12:04 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-23 18:38 [RFC] scheduler: improve SMP fairness in CFS Tong Li
2007-07-23 20:00 ` Andi Kleen
2007-07-23 21:10   ` Li, Tong N
2007-07-23 21:25     ` Chris Friesen
2007-07-24  9:43       ` Andi Kleen
2007-07-23 23:40 ` Chris Snook
2007-07-24  8:07   ` Chris Snook
2007-07-24 17:11     ` Li, Tong N
2007-07-24 17:07   ` Tong Li
2007-07-24 18:08     ` Chris Snook
2007-07-24 19:47       ` Chris Friesen
2007-07-24 20:39         ` Chris Snook
2007-07-24 20:58           ` Li, Tong N
2007-07-24 21:09             ` Chris Snook
2007-07-24 21:23               ` Chris Friesen
2007-07-24 21:45                 ` Chris Snook
2007-07-24 23:33                   ` Chris Friesen
2007-07-24 21:06           ` Bill Huey
2007-07-24 21:22             ` Chris Snook
2007-07-24 23:14               ` Bill Huey
2007-07-24 21:12           ` Chris Friesen
2007-07-25 11:01 ` Ingo Molnar
2007-07-25 12:03   ` Ingo Molnar [this message]
2007-07-25 17:23     ` Tong Li
2007-07-25 19:24       ` Ingo Molnar
2007-07-25 20:38         ` Chris Friesen
2007-07-25 20:55           ` Chris Snook
2007-07-25 21:15             ` Li, Tong N
2007-07-25 22:24               ` Chris Snook
2007-07-26 19:00         ` Tong Li
2007-07-26 21:31           ` Ingo Molnar
2007-07-26 22:00             ` Li, Tong N
2007-07-27  1:34               ` Tong Li
2007-07-27 17:16                 ` Chris Snook
2007-07-27 19:03                   ` Tong Li
2007-07-27 22:20                     ` Bill Huey
2007-07-27 23:36                     ` Chris Snook
2007-07-28  0:54                       ` Bill Huey
2007-07-28  2:59                         ` Chris Snook
2007-07-28 19:38                           ` Tong Li
2007-07-29  2:40                             ` Chris Snook
2007-07-28 19:23                       ` Tong Li
2007-07-29  3:01                         ` Chris Snook
2007-07-25 18:20     ` Li, Tong N
2007-07-25 19:18       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070725120358.GA30755@elte.hu \
    --to=mingo@elte.hu \
    --cc=csnook@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tong.n.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.