* Re: [sched, patch] better wake-balancing, #2
@ 2005-07-30 23:26 Chuck Ebbert
2005-07-31 4:35 ` Con Kolivas
2005-07-31 6:29 ` Ingo Molnar
0 siblings, 2 replies; 5+ messages in thread
From: Chuck Ebbert @ 2005-07-30 23:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Chen, Kenneth W, Andrew Morton, Nick Piggin, linux-kernel,
linux-ia64
On Fri, 29 Jul 2005 at 17:02:07 +0200, Ingo Molnar wrote:
> do wakeup-balancing only if the wakeup-CPU is idle.
>
> this prevents excessive wakeup-balancing while the system is highly
> loaded, but helps spread out the workload on partly idle systems.
I tested this with Volanomark on dual-processor PII Xeon -- the
results were very bad:
Before: 5863 messages per second
124169 schedule 64.1369
64663 _spin_unlock_irqrestore 4041.4375
7949 tcp_clean_rtx_queue 6.5370
6787 net_rx_action 24.9522
After: 5569 messages per second
139417 schedule 72.0129
82169 _spin_unlock_irqrestore 5135.5625
9949 tcp_clean_rtx_queue 8.1817
7917 net_rx_action 29.1066
__
Chuck
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [sched, patch] better wake-balancing, #2 2005-07-30 23:26 [sched, patch] better wake-balancing, #2 Chuck Ebbert @ 2005-07-31 4:35 ` Con Kolivas 2005-07-31 6:29 ` Ingo Molnar 1 sibling, 0 replies; 5+ messages in thread From: Con Kolivas @ 2005-07-31 4:35 UTC (permalink / raw) To: linux-kernel Cc: Chuck Ebbert, Ingo Molnar, Chen, Kenneth W, Andrew Morton, Nick Piggin, linux-ia64 [-- Attachment #1: Type: text/plain, Size: 786 bytes --] On Sun, 31 Jul 2005 09:26, Chuck Ebbert wrote: > On Fri, 29 Jul 2005 at 17:02:07 +0200, Ingo Molnar wrote: > > do wakeup-balancing only if the wakeup-CPU is idle. > > > > this prevents excessive wakeup-balancing while the system is highly > > loaded, but helps spread out the workload on partly idle systems. > > I tested this with Volanomark on dual-processor PII Xeon -- the > results were very bad: > > Before: 5863 messages per second > After: 5569 messages per second Can you check schedstats or otherwise to find if volanomark uses sched_yield() ? When last this benchmark came up, it appeared that no jvm used futexes and left locking to yielding. We really should find out if that is the case before trying to optimise for this benchmark. Cheers, Con [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [sched, patch] better wake-balancing, #2 2005-07-30 23:26 [sched, patch] better wake-balancing, #2 Chuck Ebbert 2005-07-31 4:35 ` Con Kolivas @ 2005-07-31 6:29 ` Ingo Molnar 1 sibling, 0 replies; 5+ messages in thread From: Ingo Molnar @ 2005-07-31 6:29 UTC (permalink / raw) To: Chuck Ebbert Cc: Chen, Kenneth W, Andrew Morton, Nick Piggin, linux-kernel, linux-ia64 * Chuck Ebbert <76306.1226@compuserve.com> wrote: > On Fri, 29 Jul 2005 at 17:02:07 +0200, Ingo Molnar wrote: > > > do wakeup-balancing only if the wakeup-CPU is idle. > > > > this prevents excessive wakeup-balancing while the system is highly > > loaded, but helps spread out the workload on partly idle systems. > > I tested this with Volanomark on dual-processor PII Xeon -- the > results were very bad: which patch have you tested? The mail you replied to above is for patch #2, while on SMT/HT boxes it's patch #3 that is the correct approach. furthermore, which base kernel have you applied the patch to? Best would be to test the following kernels: 2.6.13-rc4 + sched-rollup 2.6.13-rc4 + sched-rollup + better-wake-balance-#3 the sched-rollup and the latest better-wake-balance patches can be found at: http://redhat.com/~mingo/scheduler-patches/ (sched-rollup is the current scheduler patch-queue in -mm. And if you have time, it would also be nice to have a 2.6.13-rc4 baseline for VolanoMark, and perhaps a 2.6.12 measurement too, so that we can see how things changed.) Ingo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [sched, patch] better wake-balancing, #2 @ 2005-07-31 13:35 Chuck Ebbert 0 siblings, 0 replies; 5+ messages in thread From: Chuck Ebbert @ 2005-07-31 13:35 UTC (permalink / raw) To: Ingo Molnar Cc: linux-ia64, linux-kernel, Nick Piggin, Andrew Morton, Chen, Kenneth W On Sun, 31 Jul 2005 at 08:29:27 +0200, Ingo Molnar wrote: > > I tested this with Volanomark on dual-processor PII Xeon -- the > > results were very bad: > > which patch have you tested? The mail you replied to above is for patch > #2, while on SMT/HT boxes it's patch #3 that is the correct approach. Since my system is not HT, I used patch #2. > furthermore, which base kernel have you applied the patch to? 2.6.13-rc3 Results for -rc4 follow, with the latest patchsets: Volanomark results for 2.6.13-rc4 System: Dell Workstation 610 (i440GX) 2 x Pentium II Xeon 2MB cache 350MHz [Sun Jul 31 06:58:31 EDT 2005] Test started. Kernel: 2.6.13-rc4a #1 SMP Patches: sched-rollup + better-wake-balance test-1.log: Average throughput = 4905 messages per second test-2.log: Average throughput = 5583 messages per second test-3.log: Average throughput = 5624 messages per second test-4.log: Average throughput = 5526 messages per second test-1.log: Average throughput = 5584 messages per second test-2.log: Average throughput = 5430 messages per second test-3.log: Average throughput = 5263 messages per second test-4.log: Average throughput = 5425 messages per second [Sun Jul 31 07:17:02 EDT 2005] Test ended. timestamp 76174 cpu0 0 0 6 6 78 17319 6612 5663 5236 6510 621 10707 domain0 3 144251 144099 138 3156 15 3 0 144099 84 82 2 18 0 0 0 82 6622 6546 66 631 10 1 0 6546 2 2 0 0 0 0 0 0 0 486 449 0 cpu1 0 0 0 0 50 7888 2914 1984 1498 2008 1576 4974 domain0 3 146154 146000 121 2226 33 17 0 146000 62 58 2 24 2 0 0 58 2926 2824 89 953 15 5 0 2824 0 0 0 0 0 0 0 0 0 427 407 0 version 12 timestamp 357112 cpu0 6903 32226 44787 3345829 56425 6652018 20466 14097 11629 162123 21377675 6631552 domain0 3 269777 267819 1353 194144 6129 602 0 267819 4041 2818 70 1305809 164558 75 0 2818 37869 15488 4978 14472687 1280712 1583 0 15488 2 2 0 0 0 0 0 0 0 2198 1290 0 cpu1 7764 33269 44559 3354402 57092 6697864 19910 12189 9991 155123 21297442 6677954 domain0 3 274148 272109 1433 180072 4066 441 0 272109 3981 2775 60 1259541 167938 99 0 2775 37372 14157 5752 14238933 1278568 1334 0 14157 0 0 0 0 0 0 0 0 0 2468 1438 0 [Sun Jul 31 07:33:09 EDT 2005] Test started. Kernel: 2.6.13-rc4a #2 SMP Patches: sched-rollup test-1.log: Average throughput = 5112 messages per second test-2.log: Average throughput = 5662 messages per second test-3.log: Average throughput = 5809 messages per second test-4.log: Average throughput = 5977 messages per second test-1.log: Average throughput = 5976 messages per second test-2.log: Average throughput = 6008 messages per second test-3.log: Average throughput = 5855 messages per second test-4.log: Average throughput = 6017 messages per second [Sun Jul 31 07:51:00 EDT 2005] Test ended. version 12 timestamp 4294911410 cpu0 0 0 0 0 56 6018 1969 3037 1846 4008 5751 4049 domain0 3 14739 14634 99 2000 8 4 0 14634 31 30 1 10 0 0 0 30 1971 1921 48 443 2 0 0 1921 2 1 1 0 0 0 0 0 0 1247 502 0 cpu1 0 0 0 0 40 5357 1792 2832 1583 1176 1568 3565 domain0 3 14867 14788 76 1520 3 1 0 14788 35 34 1 5 0 0 0 34 1797 1749 42 411 7 3 0 1749 0 0 0 0 0 0 0 0 0 1191 469 0 version 12 timestamp 212533 cpu0 10030 29290 30736 3372251 41704 6164156 19216 2636963 2026635 148591 23876540 6144940 domain0 3 138859 136778 1343 139015 3507 558 0 136778 3404 2644 49 704633 103491 32 0 2644 28623 15546 3670 4623415 467816 1395 0 15546 2 1 1 0 0 0 0 0 0 595792 264363 0 cpu1 4739 24137 31111 3387783 36850 6143087 12792 2610468 2014674 145416 24188585 6130295 domain0 3 139219 137155 1287 133527 3930 457 0 137155 3366 2714 46 569294 85214 46 0 2714 22259 8839 3952 4783355 487041 1084 0 8839 0 0 0 0 0 0 0 0 0 610328 262829 0 [Sun Jul 31 08:39:05 EDT 2005] Test started. Kernel: 2.6.13-rc4a #3 SMP Patches: none test-1.log: Average throughput = 5243 messages per second test-2.log: Average throughput = 5816 messages per second test-3.log: Average throughput = 5886 messages per second test-4.log: Average throughput = 6039 messages per second test-1.log: Average throughput = 5911 messages per second test-2.log: Average throughput = 5934 messages per second test-3.log: Average throughput = 5928 messages per second test-4.log: Average throughput = 6053 messages per second [Sun Jul 31 08:56:52 EDT 2005] Test ended. version 12 timestamp 4294911037 cpu0 0 0 0 0 44 5715 1877 2877 1656 1196 1427 3838 domain0 3 14886 14817 57 70 13 0 0 14817 29 29 0 0 0 0 0 29 1878 1866 11 12 1 0 0 1866 0 0 0 0 0 0 0 0 0 1168 551 0 cpu1 0 0 0 0 55 4498 1522 2269 1099 550 131 2976 domain0 3 15108 15066 37 42 5 0 0 15066 16 16 0 0 0 0 0 16 1523 1513 8 10 2 0 0 1513 2 2 0 0 0 0 0 0 0 1221 532 0 version 12 timestamp 211196 cpu0 1784 20283 27831 3378283 31894 6122681 18841 2586384 2080058 145291 24152181 6103840 domain0 3 138711 136342 402 22486 21019 25 0 136342 3689 3115 17 16077 15996 0 0 3115 21229 18330 511 21079 19823 19 0 18330 0 0 0 0 0 0 0 0 0 502182 219444 0 cpu1 10295 29015 28139 3391580 40743 6142466 18965 2589921 2087737 143278 24294054 6123501 domain0 3 140333 137972 378 22362 20974 11 0 137972 3623 3075 20 15786 15695 1 0 3075 21435 18517 447 19642 18459 48 0 18517 2 2 0 0 0 0 0 0 0 506326 221459 0 __ Chuck ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
@ 2005-07-29 2:01 Nick Piggin
2005-07-29 6:27 ` Chen, Kenneth W
0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2005-07-29 2:01 UTC (permalink / raw)
To: Chen, Kenneth W; +Cc: Ingo Molnar, linux-kernel, linux-ia64
Chen, Kenneth W wrote:
>Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM
>
>>I'd like to try making them less aggressive first if possible.
>>
>
>Well, that's exactly what I'm trying to do: make them not aggressive
>at all by not performing any load balance :-) The workload gets maximum
>benefit with zero aggressiveness.
>
>
Unfortunately we can't forget about other workloads, and we're
trying to stay away from runtime tunables in the scheduler.
If we can get performance to within a couple of tenths of a percent
of the zero balancing case, then that would be preferable I think.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 5+ messages in thread* RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 2:01 Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Nick Piggin @ 2005-07-29 6:27 ` Chen, Kenneth W 2005-07-29 11:48 ` [patch] remove wake-balancing Ingo Molnar 0 siblings, 1 reply; 5+ messages in thread From: Chen, Kenneth W @ 2005-07-29 6:27 UTC (permalink / raw) To: 'Nick Piggin'; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM > Chen, Kenneth W wrote: > >Well, that's exactly what I'm trying to do: make them not aggressive > >at all by not performing any load balance :-) The workload gets maximum > >benefit with zero aggressiveness. > > Unfortunately we can't forget about other workloads, and we're > trying to stay away from runtime tunables in the scheduler. This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. > If we can get performance to within a couple of tenths of a percent > of the zero balancing case, then that would be preferable I think. I won't try to compromise between the two. If you do so, we would end up with two half baked raw turkey. Making less aggressive load balance in the wake up path would probably reduce performance for the type of workload you quoted earlier and for db workload, we don't want any of them at all, not even the code to determine whether it should be balanced or not. Do you have an example workload you mentioned earlier that depends on SD_WAKE_BALANCE? I would like to experiment with it so we can move this forward instead of paper talk. - Ken ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch] remove wake-balancing 2005-07-29 6:27 ` Chen, Kenneth W @ 2005-07-29 11:48 ` Ingo Molnar 2005-07-29 14:13 ` [sched, patch] better wake-balancing Ingo Molnar 0 siblings, 1 reply; 5+ messages in thread From: Ingo Molnar @ 2005-07-29 11:48 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Nick Piggin', linux-kernel, linux-ia64, Andrew Morton * Chen, Kenneth W <kenneth.w.chen@intel.com> wrote: > > If we can get performance to within a couple of tenths of a percent > > of the zero balancing case, then that would be preferable I think. > > I won't try to compromise between the two. If you do so, we would end > up with two half baked raw turkey. Making less aggressive load > balance in the wake up path would probably reduce performance for the > type of workload you quoted earlier and for db workload, we don't want > any of them at all, not even the code to determine whether it should > be balanced or not. i think we could try to get rid of wakeup-time balancing altogether. these days pretty much the only time we can sensibly do 'fast' (as in immediate) migration are fork/clone and exec. Furthermore, the gained simplicity of wakeup is quite compelling too. (Originally, when i introduced the first variant wakeup-time balancing eons ago, we didnt have anything like fork-time and exec-time balancing.) i think we could try the patch below in -mm, it removes (non-)affine wakeup and passive wakeup-balancing, but keeps SD_WAKE_IDLE that is needed for efficient SMT scheduling. I test-booted the patch on x86, and it should work on all architectures. (I have tested various local-IPC and non-IPC workloads and only found performance improvements - but i'm sure regressions exist too, and need to be examined.) Ingo ------ remove wakeup-time balancing. It turns out exec-time and fork-time balancing combined with periodic rebalancing ticks does a good enough job. Signed-off-by: Ingo Molnar <mingo@elte.hu> include/asm-i386/topology.h | 3 - include/asm-ia64/topology.h | 6 -- include/asm-mips/mach-ip27/topology.h | 3 - include/asm-ppc64/topology.h | 3 - include/asm-x86_64/topology.h | 3 - include/linux/sched.h | 4 - include/linux/topology.h | 4 - kernel/sched.c | 89 +++------------------------------- 8 files changed, 16 insertions(+), 99 deletions(-) Index: linux-prefetch-task/include/asm-i386/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-i386/topology.h +++ linux-prefetch-task/include/asm-i386/topology.h @@ -81,8 +81,7 @@ static inline int node_to_first_cpu(int .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ - | SD_BALANCE_FORK \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_FORK, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-ia64/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-ia64/topology.h +++ linux-prefetch-task/include/asm-ia64/topology.h @@ -65,8 +65,7 @@ void build_cpu_to_node_map(void); .forkexec_idx = 1, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ - | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ @@ -91,8 +90,7 @@ void build_cpu_to_node_map(void); .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ - | SD_BALANCE_FORK \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_FORK, \ .last_balance = jiffies, \ .balance_interval = 64, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-mips/mach-ip27/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-mips/mach-ip27/topology.h +++ linux-prefetch-task/include/asm-mips/mach-ip27/topology.h @@ -28,8 +28,7 @@ extern unsigned char __node_distances[MA .cache_nice_tries = 1, \ .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ - | SD_BALANCE_EXEC \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-ppc64/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-ppc64/topology.h +++ linux-prefetch-task/include/asm-ppc64/topology.h @@ -52,8 +52,7 @@ static inline int node_to_first_cpu(int .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ | SD_BALANCE_NEWIDLE \ - | SD_WAKE_IDLE \ - | SD_WAKE_BALANCE, \ + | SD_WAKE_IDLE, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-x86_64/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-x86_64/topology.h +++ linux-prefetch-task/include/asm-x86_64/topology.h @@ -48,8 +48,7 @@ extern int __node_distance(int, int); .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_FORK \ - | SD_BALANCE_EXEC \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/linux/sched.h =================================================================== --- linux-prefetch-task.orig/include/linux/sched.h +++ linux-prefetch-task/include/linux/sched.h @@ -471,9 +471,7 @@ enum idle_type #define SD_BALANCE_EXEC 4 /* Balance on exec */ #define SD_BALANCE_FORK 8 /* Balance on fork, clone */ #define SD_WAKE_IDLE 16 /* Wake to idle CPU on task wakeup */ -#define SD_WAKE_AFFINE 32 /* Wake task to waking CPU */ -#define SD_WAKE_BALANCE 64 /* Perform balancing at task wakeup */ -#define SD_SHARE_CPUPOWER 128 /* Domain members share cpu power */ +#define SD_SHARE_CPUPOWER 32 /* Domain members share cpu power */ struct sched_group { struct sched_group *next; /* Must be a circular list */ Index: linux-prefetch-task/include/linux/topology.h =================================================================== --- linux-prefetch-task.orig/include/linux/topology.h +++ linux-prefetch-task/include/linux/topology.h @@ -97,7 +97,6 @@ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE \ | SD_WAKE_IDLE \ | SD_SHARE_CPUPOWER, \ .last_balance = jiffies, \ @@ -127,8 +126,7 @@ .forkexec_idx = 1, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ - | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/kernel/sched.c =================================================================== --- linux-prefetch-task.orig/kernel/sched.c +++ linux-prefetch-task/kernel/sched.c @@ -254,7 +254,6 @@ struct runqueue { /* try_to_wake_up() stats */ unsigned long ttwu_cnt; - unsigned long ttwu_local; #endif }; @@ -373,7 +372,7 @@ static inline void task_rq_unlock(runque * bump this up when changing the output format or the meaning of an existing * format, so that tools can adapt (or abort) */ -#define SCHEDSTAT_VERSION 12 +#define SCHEDSTAT_VERSION 13 static int show_schedstat(struct seq_file *seq, void *v) { @@ -390,11 +389,11 @@ static int show_schedstat(struct seq_fil /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", + "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", cpu, rq->yld_both_empty, rq->yld_act_empty, rq->yld_exp_empty, rq->yld_cnt, rq->sched_switch, rq->sched_cnt, rq->sched_goidle, - rq->ttwu_cnt, rq->ttwu_local, + rq->ttwu_cnt, rq->rq_sched_info.cpu_time, rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt); @@ -424,8 +423,7 @@ static int show_schedstat(struct seq_fil seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu\n", sd->alb_cnt, sd->alb_failed, sd->alb_pushed, sd->sbe_cnt, sd->sbe_balanced, sd->sbe_pushed, - sd->sbf_cnt, sd->sbf_balanced, sd->sbf_pushed, - sd->ttwu_wake_remote, sd->ttwu_move_affine, sd->ttwu_move_balance); + sd->sbf_cnt, sd->sbf_balanced, sd->sbf_pushed); } preempt_enable(); #endif @@ -1134,8 +1132,6 @@ static int try_to_wake_up(task_t * p, un long old_state; runqueue_t *rq; #ifdef CONFIG_SMP - unsigned long load, this_load; - struct sched_domain *sd, *this_sd = NULL; int new_cpu; #endif @@ -1154,77 +1150,13 @@ static int try_to_wake_up(task_t * p, un if (unlikely(task_running(rq, p))) goto out_activate; - new_cpu = cpu; - schedstat_inc(rq, ttwu_cnt); - if (cpu == this_cpu) { - schedstat_inc(rq, ttwu_local); - goto out_set_cpu; - } - - for_each_domain(this_cpu, sd) { - if (cpu_isset(cpu, sd->span)) { - schedstat_inc(sd, ttwu_wake_remote); - this_sd = sd; - break; - } - } - - if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed))) - goto out_set_cpu; /* - * Check for affine wakeup and passive balancing possibilities. + * Wake to the CPU the task was last running on (or any + * nearby SMT-equivalent idle CPU): */ - if (this_sd) { - int idx = this_sd->wake_idx; - unsigned int imbalance; - - imbalance = 100 + (this_sd->imbalance_pct - 100) / 2; - - load = source_load(cpu, idx); - this_load = target_load(this_cpu, idx); - - new_cpu = this_cpu; /* Wake to this CPU if we can */ - - if (this_sd->flags & SD_WAKE_AFFINE) { - unsigned long tl = this_load; - /* - * If sync wakeup then subtract the (maximum possible) - * effect of the currently running task from the load - * of the current CPU: - */ - if (sync) - tl -= SCHED_LOAD_SCALE; - - if ((tl <= load && - tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) || - 100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) { - /* - * This domain has SD_WAKE_AFFINE and - * p is cache cold in this domain, and - * there is no bad imbalance. - */ - schedstat_inc(this_sd, ttwu_move_affine); - goto out_set_cpu; - } - } - - /* - * Start passive balancing when half the imbalance_pct - * limit is reached. - */ - if (this_sd->flags & SD_WAKE_BALANCE) { - if (imbalance*this_load <= 100*load) { - schedstat_inc(this_sd, ttwu_move_balance); - goto out_set_cpu; - } - } - } - - new_cpu = cpu; /* Could not wake to this_cpu. Wake to cpu instead */ -out_set_cpu: - new_cpu = wake_idle(new_cpu, p); + new_cpu = wake_idle(cpu, p); if (new_cpu != cpu) { set_task_cpu(p, new_cpu); task_rq_unlock(rq, &flags); @@ -4758,9 +4690,7 @@ static int sd_degenerate(struct sched_do } /* Following flags don't use groups */ - if (sd->flags & (SD_WAKE_IDLE | - SD_WAKE_AFFINE | - SD_WAKE_BALANCE)) + if (sd->flags & SD_WAKE_IDLE) return 0; return 1; @@ -4778,9 +4708,6 @@ static int sd_parent_degenerate(struct s return 0; /* Does parent contain flags not in child? */ - /* WAKE_BALANCE is a subset of WAKE_AFFINE */ - if (cflags & SD_WAKE_AFFINE) - pflags &= ~SD_WAKE_BALANCE; /* Flags needing groups don't count if only 1 group in parent */ if (parent->groups == parent->groups->next) { pflags &= ~(SD_LOAD_BALANCE | ^ permalink raw reply [flat|nested] 5+ messages in thread
* [sched, patch] better wake-balancing 2005-07-29 11:48 ` [patch] remove wake-balancing Ingo Molnar @ 2005-07-29 14:13 ` Ingo Molnar 2005-07-29 15:02 ` [sched, patch] better wake-balancing, #2 Ingo Molnar 0 siblings, 1 reply; 5+ messages in thread From: Ingo Molnar @ 2005-07-29 14:13 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Nick Piggin', linux-kernel, linux-ia64, Andrew Morton another approach would be the patch below, to do wakeup-balancing only if the wakeup CPU or the task CPU is idle. I've measured half-loaded tbench and unless total wakeup-balancing removal it does not degrade with this patch applied, while fully loaded tbench and other workloads clearly improve. Ken, could you give this one a try? (It's against the current scheduler queue in -mm, but also applies fine to current Linus trees.) Ingo --- do wakeup-balancing only if the wakeup-CPU or the task-CPU is idle. this prevents excessive wakeup-balancing while the system is highly loaded, but helps spread out the workload on partly idle systems. Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/sched.c | 6 ++++++ 1 files changed, 6 insertions(+) Index: linux-sched-curr/kernel/sched.c =================================================================== --- linux-sched-curr.orig/kernel/sched.c +++ linux-sched-curr/kernel/sched.c @@ -1252,7 +1252,13 @@ static int try_to_wake_up(task_t *p, uns if (unlikely(task_running(rq, p))) goto out_activate; + /* + * If neither this CPU, nor the previous CPU the task was + * running on is idle then skip wakeup-balancing: + */ new_cpu = cpu; + if (!idle_cpu(this_cpu) && !idle_cpu(cpu)) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { ^ permalink raw reply [flat|nested] 5+ messages in thread
* [sched, patch] better wake-balancing, #2 2005-07-29 14:13 ` [sched, patch] better wake-balancing Ingo Molnar @ 2005-07-29 15:02 ` Ingo Molnar 0 siblings, 0 replies; 5+ messages in thread From: Ingo Molnar @ 2005-07-29 15:02 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Nick Piggin', linux-kernel, linux-ia64, Andrew Morton * Ingo Molnar <mingo@elte.hu> wrote: > another approach would be the patch below, to do wakeup-balancing only > if the wakeup CPU or the task CPU is idle. there's an even simpler way: only do wakeup-balancing if this_cpu is idle. (tbench results are still OK, and other workloads improved.) Ingo -------- do wakeup-balancing only if the wakeup-CPU is idle. this prevents excessive wakeup-balancing while the system is highly loaded, but helps spread out the workload on partly idle systems. Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/sched.c | 6 ++++++ 1 files changed, 6 insertions(+) Index: linux-sched-curr/kernel/sched.c =================================================================== --- linux-sched-curr.orig/kernel/sched.c +++ linux-sched-curr/kernel/sched.c @@ -1253,7 +1253,13 @@ static int try_to_wake_up(task_t *p, uns if (unlikely(task_running(rq, p))) goto out_activate; + /* + * Only do wakeup-balancing (== potentially migrate the task) + * if this CPU is idle: + */ new_cpu = cpu; + if (!idle_cpu(this_cpu)) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-07-31 13:44 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-30 23:26 [sched, patch] better wake-balancing, #2 Chuck Ebbert 2005-07-31 4:35 ` Con Kolivas 2005-07-31 6:29 ` Ingo Molnar -- strict thread matches above, loose matches on Subject: below -- 2005-07-31 13:35 Chuck Ebbert 2005-07-29 2:01 Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Nick Piggin 2005-07-29 6:27 ` Chen, Kenneth W 2005-07-29 11:48 ` [patch] remove wake-balancing Ingo Molnar 2005-07-29 14:13 ` [sched, patch] better wake-balancing Ingo Molnar 2005-07-29 15:02 ` [sched, patch] better wake-balancing, #2 Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox