From: "Alex,Shi" <alex.shi@intel.com>
To: linux-kernel@vger.kernel.org, suresh.b.siddha@intel.com,
a.p.zijlstra@chello.nl
Cc: yanmin.zhang@intel.com, tim.c.chen@intel.com
Subject: Re: [patch] Over schedule issue fixing
Date: Fri, 18 Jun 2010 12:25:01 +0800 [thread overview]
Message-ID: <1276835101.2118.185.camel@debian> (raw)
In-Reply-To: <1276754893.9452.5442.camel@debian>
Add Suresh and Peter into thread.
Would you like to give some comments of this issue?
Thanks!
Alex
On Thu, 2010-06-17 at 14:08 +0800, Alex,Shi wrote:
> commit e709715915d69b6a929d77e7652c9c3fea61c317 introduced an imbalance
> schedule issue. If we do not use CGROUP, function update_h_load won't
> want to update h_load. When the system has a large number of tasks far
> more than logical CPU number, the incorrect cfs_rq[cpu]->h_load value
> will cause load_balance() to pull too many tasks to local CPU from the
> busiest CPU. So the busiest CPU keeps being in a round robin. That will
> hurt performance.
> The issue was found originally by a scientific calculation workload that
> developed by Yanmin. with the commit, the workload performance drops
> about 40% from this commit. We can be reproduced by a short program as
> following.
>
> # gcc -o sl sched-loop.c -lpthread
> # ./sl -n 100 -t 100 &
> # cat /proc/sched_debug &> sd1
> # grep -A 1 cpu# sd1
> sd1:cpu#0, 2533.008 MHz
> sd1- .nr_running : 2
> --
> sd1:cpu#1, 2533.008 MHz
> sd1- .nr_running : 1
> --
> sd1:cpu#2, 2533.008 MHz
> sd1- .nr_running : 11
> --
> sd1:cpu#3, 2533.008 MHz
> sd1- .nr_running : 12
> --
> sd1:cpu#4, 2533.008 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#5, 2533.008 MHz
> sd1- .nr_running : 11
> --
> sd1:cpu#6, 2533.008 MHz
> sd1- .nr_running : 10
> --
> sd1:cpu#7, 2533.008 MHz
> sd1- .nr_running : 12
> --
> sd1:cpu#8, 2533.008 MHz
> sd1- .nr_running : 11
> --
> sd1:cpu#9, 2533.008 MHz
> sd1- .nr_running : 12
> --
> sd1:cpu#10, 2533.008 MHz
> sd1- .nr_running : 1
> --
> sd1:cpu#11, 2533.008 MHz
> sd1- .nr_running : 1
> --
> sd1:cpu#12, 2533.008 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#13, 2533.008 MHz
> sd1- .nr_running : 2
> --
> sd1:cpu#14, 2533.008 MHz
> sd1- .nr_running : 2
> --
> sd1:cpu#15, 2533.008 MHz
> sd1- .nr_running : 1
>
> After apply the fixing patch, cfs_rq get balance.
>
> sd1:cpu#0, 2533.479 MHz
> sd1- .nr_running : 7
> --
> sd1:cpu#1, 2533.479 MHz
> sd1- .nr_running : 7
> --
> sd1:cpu#2, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#3, 2533.479 MHz
> sd1- .nr_running : 7
> --
> sd1:cpu#4, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#5, 2533.479 MHz
> sd1- .nr_running : 7
> --
> sd1:cpu#6, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#7, 2533.479 MHz
> sd1- .nr_running : 7
> --
> sd1:cpu#8, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#9, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#10, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#11, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#12, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#13, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#14, 2533.479 MHz
> sd1- .nr_running : 6
> --
> sd1:cpu#15, 2533.479 MHz
> sd1- .nr_running : 6
>
> ---
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <pthread.h>
>
> volatile int * exiting;
>
> void *idle_loop(){
> volatile int calc01 = 100;
> while(*exiting !=1)
> calc01++;
> }
> int main(int argc, char *argv[]){
> int i, t, c, er=0, num=8;
> static char optstr[] = "n:t:";
> pthread_t ptid[1024];
>
> while ((c = getopt(argc, argv, optstr)) != EOF)
> switch (c) {
> case 'n':
> num = atoi(optarg);
> break;
> case 't':
> t = atoi(optarg);
> break;
> case '?':
> er = 1;
> break;
> }
>
> if (er) {
> printf("usage: %s %s\n", argv[0], optstr);
> exit(1);
> }
> exiting = malloc(sizeof(int));
>
> *exiting = 0;
> for(i=0; i<num ; i++)
> pthread_create(&ptid[i], NULL, idle_loop, NULL);
>
> sleep(t);
> *exiting = 1;
>
> for (i=0; i<num; i++)
> pthread_join(ptid[i], NULL);
> exit(0);
>
> }
>
> Reviewed-by: Yanmin zhang <yanmin.zhang@intel.com>
> Signed-off-by: Alex Shi <alex.shi@intel.com>
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f8b8996..a18bf93 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1660,9 +1660,6 @@ static void update_shares(struct sched_domain *sd)
>
> static void update_h_load(long cpu)
> {
> - if (root_task_group_empty())
> - return;
> -
> walk_tg_tree(tg_load_down, tg_nop, (void *)cpu);
> }
>
>
next prev parent reply other threads:[~2010-06-18 5:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-17 6:08 [patch] Over schedule issue fixing Alex,Shi
2010-06-18 4:25 ` Alex,Shi [this message]
2010-06-18 7:16 ` Peter Zijlstra
2010-06-18 10:18 ` [tip:sched/urgent] sched: Fix over-scheduling bug tip-bot for Alex,Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1276835101.2118.185.camel@debian \
--to=alex.shi@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=suresh.b.siddha@intel.com \
--cc=tim.c.chen@intel.com \
--cc=yanmin.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.