* Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags @ 2005-07-28 23:08 Chen, Kenneth W 2005-07-28 23:34 ` Nick Piggin 2005-07-29 11:26 ` Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Ingo Molnar 0 siblings, 2 replies; 26+ messages in thread From: Chen, Kenneth W @ 2005-07-28 23:08 UTC (permalink / raw) To: Ingo Molnar, 'Nick Piggin'; +Cc: linux-kernel, linux-ia64 What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BALANCE is giving us performance grief with our industry standard OLTP workload. To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load balancing gives us another 0.2% gain. The wake up patch should be made simple, just put the waking task on the previously ran cpu runqueue. Simple and elegant. I'm proposing we either delete these two flags or make them run time configurable. - Ken --- linux-2.6.12/include/linux/topology.h.orig 2005-07-28 15:54:05.007399685 -0700 +++ linux-2.6.12/include/linux/topology.h 2005-07-28 15:54:39.292555515 -0700 @@ -118,9 +118,7 @@ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE \ - | SD_WAKE_IDLE \ - | SD_WAKE_BALANCE, \ + | SD_WAKE_IDLE, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-28 23:08 Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Chen, Kenneth W @ 2005-07-28 23:34 ` Nick Piggin 2005-07-28 23:48 ` Chen, Kenneth W 2005-07-29 11:26 ` Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Ingo Molnar 1 sibling, 1 reply; 26+ messages in thread From: Nick Piggin @ 2005-07-28 23:34 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Chen, Kenneth W wrote: > What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? > SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. > In fact, it creates more harm than usefulness, causing detrimental > process migration and destroy process cache affinity etc. Also > SD_WAKE_BALANCE is giving us performance grief with our industry > standard OLTP workload. > The periodic load balancer basically makes completely undirected, random choices when picking which tasks to move where. Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing, probably half of the pairs will be on different CPUs. With wake balancing, it should be much better. I've also been told that it impoves IO efficiency significantly - obviously that depends on the system and workload. > To demonstrate the problem, we turned off these two flags in the cpu > sd domain and measured a stunning 2.15% performance gain! And deleting > all the code in the try_to_wake_up() pertain to load balancing gives us > another 0.2% gain. > > The wake up patch should be made simple, just put the waking task on > the previously ran cpu runqueue. Simple and elegant. > > I'm proposing we either delete these two flags or make them run time > configurable. > There have been lots of changes since 2.6.12. Including less aggressive wake balancing. I hear you might be having problems with recent 2.6.13 kernels? If so, it would be really good to have a look that before 2.6.13 goes out the door. I appreciate all the effort you're putting into this! Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-28 23:34 ` Nick Piggin @ 2005-07-28 23:48 ` Chen, Kenneth W 2005-07-29 1:25 ` Nick Piggin 0 siblings, 1 reply; 26+ messages in thread From: Chen, Kenneth W @ 2005-07-28 23:48 UTC (permalink / raw) To: 'Nick Piggin'; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM > Wake balancing provides an opportunity to provide some input bias > into the load balancer. > > For example, if you started 100 pairs of tasks which communicate > through a pipe. On a 2 CPU system without wake balancing, probably > half of the pairs will be on different CPUs. With wake balancing, > it should be much better. Shouldn't the pipe code use synchronous wakeup? > I hear you might be having problems with recent 2.6.13 kernels? If so, > it would be really good to have a look that before 2.6.13 goes out the > door. Yes I do :-(, apparently bumping up cache_hot_time won't give us the performance boost we used to see. - Ken ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-28 23:48 ` Chen, Kenneth W @ 2005-07-29 1:25 ` Nick Piggin 2005-07-29 1:39 ` Chen, Kenneth W 0 siblings, 1 reply; 26+ messages in thread From: Nick Piggin @ 2005-07-29 1:25 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Chen, Kenneth W wrote: >Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM > >>Wake balancing provides an opportunity to provide some input bias >>into the load balancer. >> >>For example, if you started 100 pairs of tasks which communicate >>through a pipe. On a 2 CPU system without wake balancing, probably >>half of the pairs will be on different CPUs. With wake balancing, >>it should be much better. >> > >Shouldn't the pipe code use synchronous wakeup? > > Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wakeups, I'd have to be convinced we should special case pipes and not eg. semaphores or AF_UNIX sockets). > >>I hear you might be having problems with recent 2.6.13 kernels? If so, >>it would be really good to have a look that before 2.6.13 goes out the >>door. >> > >Yes I do :-(, apparently bumping up cache_hot_time won't give us the >performance boost we used to see. > > OK there are probably a number of things we can explore depending on what are the symptoms (eg. excessive idle time, bad cache performance). Unfortunately it is kind of difficult to tune 2.6.13 on the basis of 2.6.12 results - although that's not to say it won't indicate a good avenue to investigate. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 1:25 ` Nick Piggin @ 2005-07-29 1:39 ` Chen, Kenneth W 2005-07-29 1:46 ` Nick Piggin 0 siblings, 1 reply; 26+ messages in thread From: Chen, Kenneth W @ 2005-07-29 1:39 UTC (permalink / raw) To: 'Nick Piggin'; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM > Well pipes are just an example. It could be any type of communication. > What's more, even the synchronous wakeup uses the wake balancing path > (although that could be modified to only do wake balancing for synch > wakeups, I'd have to be convinced we should special case pipes and not > eg. semaphores or AF_UNIX sockets). Why is the normal load balance path not enough (or not be able to do the right thing)? The reblance_tick and idle_balance ought be enough to take care of the imbalance. What makes load balancing in wake up path so special? Oh, I'd like to hear your opinion on what to do with these two flags, make them runtime configurable? (I'm of the opinion to delete them altogether) - Ken ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 1:39 ` Chen, Kenneth W @ 2005-07-29 1:46 ` Nick Piggin 2005-07-29 1:53 ` Chen, Kenneth W 0 siblings, 1 reply; 26+ messages in thread From: Nick Piggin @ 2005-07-29 1:46 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Chen, Kenneth W wrote: >Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM > >>Well pipes are just an example. It could be any type of communication. >>What's more, even the synchronous wakeup uses the wake balancing path >>(although that could be modified to only do wake balancing for synch >>wakeups, I'd have to be convinced we should special case pipes and not >>eg. semaphores or AF_UNIX sockets). >> > > >Why is the normal load balance path not enough (or not be able to do the >right thing)? The reblance_tick and idle_balance ought be enough to take >care of the imbalance. What makes load balancing in wake up path so special? > > Well the normal load balancing path treats all tasks the same, while the wake path knows if a CPU is waking a remote task and can attempt to maximise the number of local wakeups. >Oh, I'd like to hear your opinion on what to do with these two flags, make >them runtime configurable? (I'm of the opinion to delete them altogether) > > I'd like to try making them less aggressive first if possible. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 1:46 ` Nick Piggin @ 2005-07-29 1:53 ` Chen, Kenneth W 2005-07-29 2:01 ` Nick Piggin 0 siblings, 1 reply; 26+ messages in thread From: Chen, Kenneth W @ 2005-07-29 1:53 UTC (permalink / raw) To: 'Nick Piggin'; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM > I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 1:53 ` Chen, Kenneth W @ 2005-07-29 2:01 ` Nick Piggin 2005-07-29 6:27 ` Chen, Kenneth W 0 siblings, 1 reply; 26+ messages in thread From: Nick Piggin @ 2005-07-29 2:01 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Chen, Kenneth W wrote: >Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM > >>I'd like to try making them less aggressive first if possible. >> > >Well, that's exactly what I'm trying to do: make them not aggressive >at all by not performing any load balance :-) The workload gets maximum >benefit with zero aggressiveness. > > Unfortunately we can't forget about other workloads, and we're trying to stay away from runtime tunables in the scheduler. If we can get performance to within a couple of tenths of a percent of the zero balancing case, then that would be preferable I think. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 2:01 ` Nick Piggin @ 2005-07-29 6:27 ` Chen, Kenneth W 2005-07-29 8:48 ` Nick Piggin 2005-07-29 11:48 ` [patch] remove wake-balancing Ingo Molnar 0 siblings, 2 replies; 26+ messages in thread From: Chen, Kenneth W @ 2005-07-29 6:27 UTC (permalink / raw) To: 'Nick Piggin'; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM > Chen, Kenneth W wrote: > >Well, that's exactly what I'm trying to do: make them not aggressive > >at all by not performing any load balance :-) The workload gets maximum > >benefit with zero aggressiveness. > > Unfortunately we can't forget about other workloads, and we're > trying to stay away from runtime tunables in the scheduler. This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. > If we can get performance to within a couple of tenths of a percent > of the zero balancing case, then that would be preferable I think. I won't try to compromise between the two. If you do so, we would end up with two half baked raw turkey. Making less aggressive load balance in the wake up path would probably reduce performance for the type of workload you quoted earlier and for db workload, we don't want any of them at all, not even the code to determine whether it should be balanced or not. Do you have an example workload you mentioned earlier that depends on SD_WAKE_BALANCE? I would like to experiment with it so we can move this forward instead of paper talk. - Ken ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 6:27 ` Chen, Kenneth W @ 2005-07-29 8:48 ` Nick Piggin 2005-07-29 8:53 ` Ingo Molnar ` (2 more replies) 2005-07-29 11:48 ` [patch] remove wake-balancing Ingo Molnar 1 sibling, 3 replies; 26+ messages in thread From: Nick Piggin @ 2005-07-29 8:48 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: Ingo Molnar, linux-kernel, linux-ia64 Chen, Kenneth W wrote: > Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM > This clearly outlines an issue with the implementation. Optimize for one > type of workload has detrimental effect on another workload and vice versa. > Yep. That comes up fairly regularly when tuning the scheduler :( > > I won't try to compromise between the two. If you do so, we would end up > with two half baked raw turkey. Making less aggressive load balance in the > wake up path would probably reduce performance for the type of workload you > quoted earlier and for db workload, we don't want any of them at all, not > even the code to determine whether it should be balanced or not. > Well, that remains to be seen. If it can be made _smarter_, then you may not have to take such a big compromise. But either way, there will have to be some compromise made. At the very least you have to find some acceptable default. > Do you have an example workload you mentioned earlier that depends on > SD_WAKE_BALANCE? I would like to experiment with it so we can move this > forward instead of paper talk. > Well, you can easily see suboptimal scheduling decisions on many programs with lots of interprocess communication. For example, tbench on a dual Xeon: processes 1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. Networking or other IO workloads where processes are tightly coupled to a specific adapter / interrupt source can also see pretty good gains. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 8:48 ` Nick Piggin @ 2005-07-29 8:53 ` Ingo Molnar 2005-07-29 8:59 ` Nick Piggin 2005-07-29 9:07 ` Ingo Molnar 2005-07-29 16:40 ` Ingo Molnar 2 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 8:53 UTC (permalink / raw) To: Nick Piggin; +Cc: Chen, Kenneth W, linux-kernel, linux-ia64 * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Well, you can easily see suboptimal scheduling decisions on many > programs with lots of interprocess communication. For example, tbench > on a dual Xeon: > > processes 1 2 3 4 > > 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 > no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 > > Numbers are MB/s, higher is better. what type of network was used - localhost or a real one? Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 8:53 ` Ingo Molnar @ 2005-07-29 8:59 ` Nick Piggin 2005-07-29 9:01 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Nick Piggin @ 2005-07-29 8:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: Chen, Kenneth W, linux-kernel, linux-ia64 Ingo Molnar wrote: > * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > >>processes 1 2 3 4 >> >>2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 >>no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 >> >>Numbers are MB/s, higher is better. > > > what type of network was used - localhost or a real one? > Localhost. Yeah it isn't a real world test, but it does show the erratic behaviour without wake affine. I don't have a setup with multiple fast network adapters otherwise I would have run a similar test using a real network. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 8:59 ` Nick Piggin @ 2005-07-29 9:01 ` Ingo Molnar 0 siblings, 0 replies; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 9:01 UTC (permalink / raw) To: Nick Piggin; +Cc: Chen, Kenneth W, linux-kernel, linux-ia64 * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > >>processes 1 2 3 4 > >> > >>2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 > >>no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 > >> > >>Numbers are MB/s, higher is better. > > > > > >what type of network was used - localhost or a real one? > > > > Localhost. Yeah it isn't a real world test, but it does show the > erratic behaviour without wake affine. yeah - fine enough. (It's not representative for IO workloads, but it's representative for local IPC workloads, just wanted to know precisely which workload it is.) Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 8:48 ` Nick Piggin 2005-07-29 8:53 ` Ingo Molnar @ 2005-07-29 9:07 ` Ingo Molnar 2005-07-29 16:40 ` Ingo Molnar 2 siblings, 0 replies; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 9:07 UTC (permalink / raw) To: Nick Piggin; +Cc: Chen, Kenneth W, linux-kernel, linux-ia64 * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Well, you can easily see suboptimal scheduling decisions on many > programs with lots of interprocess communication. For example, tbench > on a dual Xeon: > > processes 1 2 3 4 > > 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 > no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 > > Numbers are MB/s, higher is better. i cannot see any difference with/without wake-balancing in this workload, on a dual Xeon. Could you try the quick hack below and do: echo 1 > /proc/sys/kernel/panic # turn on wake-balancing echo 0 > /proc/sys/kernel/panic # turn off wake-balancing does the runtime switching show any effects on the throughput numbers tbench is showing? I'm using dbench-3.03. (i only checked the status numbers, didnt do full runs) (did you have SCHED_SMT enabled?) Ingo kernel/sched.c | 2 ++ 1 files changed, 2 insertions(+) Index: linux-prefetch-task/kernel/sched.c =================================================================== --- linux-prefetch-task.orig/kernel/sched.c +++ linux-prefetch-task/kernel/sched.c @@ -1155,6 +1155,8 @@ static int try_to_wake_up(task_t * p, un goto out_activate; new_cpu = cpu; + if (!panic_timeout) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 8:48 ` Nick Piggin 2005-07-29 8:53 ` Ingo Molnar 2005-07-29 9:07 ` Ingo Molnar @ 2005-07-29 16:40 ` Ingo Molnar 2 siblings, 0 replies; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 16:40 UTC (permalink / raw) To: Nick Piggin; +Cc: Chen, Kenneth W, linux-kernel, linux-ia64 * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > Chen, Kenneth W wrote: > >Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM > > >This clearly outlines an issue with the implementation. Optimize for one > >type of workload has detrimental effect on another workload and vice versa. > > > > Yep. That comes up fairly regularly when tuning the scheduler :( in this particular case we can clearly separate the two workloads though: CPU-overload (Ken's benchmark) vs. half-load (3-task tbench). So by checking for migration target/source idleness we can have a hard separator for wakeup balancing. (whether it works out for both types of workloads remains to be seen) Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch] remove wake-balancing 2005-07-29 6:27 ` Chen, Kenneth W 2005-07-29 8:48 ` Nick Piggin @ 2005-07-29 11:48 ` Ingo Molnar 2005-07-29 14:13 ` [sched, patch] better wake-balancing Ingo Molnar 1 sibling, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 11:48 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Nick Piggin', linux-kernel, linux-ia64, Andrew Morton * Chen, Kenneth W <kenneth.w.chen@intel.com> wrote: > > If we can get performance to within a couple of tenths of a percent > > of the zero balancing case, then that would be preferable I think. > > I won't try to compromise between the two. If you do so, we would end > up with two half baked raw turkey. Making less aggressive load > balance in the wake up path would probably reduce performance for the > type of workload you quoted earlier and for db workload, we don't want > any of them at all, not even the code to determine whether it should > be balanced or not. i think we could try to get rid of wakeup-time balancing altogether. these days pretty much the only time we can sensibly do 'fast' (as in immediate) migration are fork/clone and exec. Furthermore, the gained simplicity of wakeup is quite compelling too. (Originally, when i introduced the first variant wakeup-time balancing eons ago, we didnt have anything like fork-time and exec-time balancing.) i think we could try the patch below in -mm, it removes (non-)affine wakeup and passive wakeup-balancing, but keeps SD_WAKE_IDLE that is needed for efficient SMT scheduling. I test-booted the patch on x86, and it should work on all architectures. (I have tested various local-IPC and non-IPC workloads and only found performance improvements - but i'm sure regressions exist too, and need to be examined.) Ingo ------ remove wakeup-time balancing. It turns out exec-time and fork-time balancing combined with periodic rebalancing ticks does a good enough job. Signed-off-by: Ingo Molnar <mingo@elte.hu> include/asm-i386/topology.h | 3 - include/asm-ia64/topology.h | 6 -- include/asm-mips/mach-ip27/topology.h | 3 - include/asm-ppc64/topology.h | 3 - include/asm-x86_64/topology.h | 3 - include/linux/sched.h | 4 - include/linux/topology.h | 4 - kernel/sched.c | 89 +++------------------------------- 8 files changed, 16 insertions(+), 99 deletions(-) Index: linux-prefetch-task/include/asm-i386/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-i386/topology.h +++ linux-prefetch-task/include/asm-i386/topology.h @@ -81,8 +81,7 @@ static inline int node_to_first_cpu(int .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ - | SD_BALANCE_FORK \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_FORK, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-ia64/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-ia64/topology.h +++ linux-prefetch-task/include/asm-ia64/topology.h @@ -65,8 +65,7 @@ void build_cpu_to_node_map(void); .forkexec_idx = 1, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ - | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ @@ -91,8 +90,7 @@ void build_cpu_to_node_map(void); .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ - | SD_BALANCE_FORK \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_FORK, \ .last_balance = jiffies, \ .balance_interval = 64, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-mips/mach-ip27/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-mips/mach-ip27/topology.h +++ linux-prefetch-task/include/asm-mips/mach-ip27/topology.h @@ -28,8 +28,7 @@ extern unsigned char __node_distances[MA .cache_nice_tries = 1, \ .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ - | SD_BALANCE_EXEC \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-ppc64/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-ppc64/topology.h +++ linux-prefetch-task/include/asm-ppc64/topology.h @@ -52,8 +52,7 @@ static inline int node_to_first_cpu(int .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ | SD_BALANCE_NEWIDLE \ - | SD_WAKE_IDLE \ - | SD_WAKE_BALANCE, \ + | SD_WAKE_IDLE, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/asm-x86_64/topology.h =================================================================== --- linux-prefetch-task.orig/include/asm-x86_64/topology.h +++ linux-prefetch-task/include/asm-x86_64/topology.h @@ -48,8 +48,7 @@ extern int __node_distance(int, int); .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_FORK \ - | SD_BALANCE_EXEC \ - | SD_WAKE_BALANCE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/include/linux/sched.h =================================================================== --- linux-prefetch-task.orig/include/linux/sched.h +++ linux-prefetch-task/include/linux/sched.h @@ -471,9 +471,7 @@ enum idle_type #define SD_BALANCE_EXEC 4 /* Balance on exec */ #define SD_BALANCE_FORK 8 /* Balance on fork, clone */ #define SD_WAKE_IDLE 16 /* Wake to idle CPU on task wakeup */ -#define SD_WAKE_AFFINE 32 /* Wake task to waking CPU */ -#define SD_WAKE_BALANCE 64 /* Perform balancing at task wakeup */ -#define SD_SHARE_CPUPOWER 128 /* Domain members share cpu power */ +#define SD_SHARE_CPUPOWER 32 /* Domain members share cpu power */ struct sched_group { struct sched_group *next; /* Must be a circular list */ Index: linux-prefetch-task/include/linux/topology.h =================================================================== --- linux-prefetch-task.orig/include/linux/topology.h +++ linux-prefetch-task/include/linux/topology.h @@ -97,7 +97,6 @@ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE \ | SD_WAKE_IDLE \ | SD_SHARE_CPUPOWER, \ .last_balance = jiffies, \ @@ -127,8 +126,7 @@ .forkexec_idx = 1, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ - | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE, \ + | SD_BALANCE_EXEC, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \ Index: linux-prefetch-task/kernel/sched.c =================================================================== --- linux-prefetch-task.orig/kernel/sched.c +++ linux-prefetch-task/kernel/sched.c @@ -254,7 +254,6 @@ struct runqueue { /* try_to_wake_up() stats */ unsigned long ttwu_cnt; - unsigned long ttwu_local; #endif }; @@ -373,7 +372,7 @@ static inline void task_rq_unlock(runque * bump this up when changing the output format or the meaning of an existing * format, so that tools can adapt (or abort) */ -#define SCHEDSTAT_VERSION 12 +#define SCHEDSTAT_VERSION 13 static int show_schedstat(struct seq_file *seq, void *v) { @@ -390,11 +389,11 @@ static int show_schedstat(struct seq_fil /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", + "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", cpu, rq->yld_both_empty, rq->yld_act_empty, rq->yld_exp_empty, rq->yld_cnt, rq->sched_switch, rq->sched_cnt, rq->sched_goidle, - rq->ttwu_cnt, rq->ttwu_local, + rq->ttwu_cnt, rq->rq_sched_info.cpu_time, rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt); @@ -424,8 +423,7 @@ static int show_schedstat(struct seq_fil seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu\n", sd->alb_cnt, sd->alb_failed, sd->alb_pushed, sd->sbe_cnt, sd->sbe_balanced, sd->sbe_pushed, - sd->sbf_cnt, sd->sbf_balanced, sd->sbf_pushed, - sd->ttwu_wake_remote, sd->ttwu_move_affine, sd->ttwu_move_balance); + sd->sbf_cnt, sd->sbf_balanced, sd->sbf_pushed); } preempt_enable(); #endif @@ -1134,8 +1132,6 @@ static int try_to_wake_up(task_t * p, un long old_state; runqueue_t *rq; #ifdef CONFIG_SMP - unsigned long load, this_load; - struct sched_domain *sd, *this_sd = NULL; int new_cpu; #endif @@ -1154,77 +1150,13 @@ static int try_to_wake_up(task_t * p, un if (unlikely(task_running(rq, p))) goto out_activate; - new_cpu = cpu; - schedstat_inc(rq, ttwu_cnt); - if (cpu == this_cpu) { - schedstat_inc(rq, ttwu_local); - goto out_set_cpu; - } - - for_each_domain(this_cpu, sd) { - if (cpu_isset(cpu, sd->span)) { - schedstat_inc(sd, ttwu_wake_remote); - this_sd = sd; - break; - } - } - - if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed))) - goto out_set_cpu; /* - * Check for affine wakeup and passive balancing possibilities. + * Wake to the CPU the task was last running on (or any + * nearby SMT-equivalent idle CPU): */ - if (this_sd) { - int idx = this_sd->wake_idx; - unsigned int imbalance; - - imbalance = 100 + (this_sd->imbalance_pct - 100) / 2; - - load = source_load(cpu, idx); - this_load = target_load(this_cpu, idx); - - new_cpu = this_cpu; /* Wake to this CPU if we can */ - - if (this_sd->flags & SD_WAKE_AFFINE) { - unsigned long tl = this_load; - /* - * If sync wakeup then subtract the (maximum possible) - * effect of the currently running task from the load - * of the current CPU: - */ - if (sync) - tl -= SCHED_LOAD_SCALE; - - if ((tl <= load && - tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) || - 100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) { - /* - * This domain has SD_WAKE_AFFINE and - * p is cache cold in this domain, and - * there is no bad imbalance. - */ - schedstat_inc(this_sd, ttwu_move_affine); - goto out_set_cpu; - } - } - - /* - * Start passive balancing when half the imbalance_pct - * limit is reached. - */ - if (this_sd->flags & SD_WAKE_BALANCE) { - if (imbalance*this_load <= 100*load) { - schedstat_inc(this_sd, ttwu_move_balance); - goto out_set_cpu; - } - } - } - - new_cpu = cpu; /* Could not wake to this_cpu. Wake to cpu instead */ -out_set_cpu: - new_cpu = wake_idle(new_cpu, p); + new_cpu = wake_idle(cpu, p); if (new_cpu != cpu) { set_task_cpu(p, new_cpu); task_rq_unlock(rq, &flags); @@ -4758,9 +4690,7 @@ static int sd_degenerate(struct sched_do } /* Following flags don't use groups */ - if (sd->flags & (SD_WAKE_IDLE | - SD_WAKE_AFFINE | - SD_WAKE_BALANCE)) + if (sd->flags & SD_WAKE_IDLE) return 0; return 1; @@ -4778,9 +4708,6 @@ static int sd_parent_degenerate(struct s return 0; /* Does parent contain flags not in child? */ - /* WAKE_BALANCE is a subset of WAKE_AFFINE */ - if (cflags & SD_WAKE_AFFINE) - pflags &= ~SD_WAKE_BALANCE; /* Flags needing groups don't count if only 1 group in parent */ if (parent->groups == parent->groups->next) { pflags &= ~(SD_LOAD_BALANCE | ^ permalink raw reply [flat|nested] 26+ messages in thread
* [sched, patch] better wake-balancing 2005-07-29 11:48 ` [patch] remove wake-balancing Ingo Molnar @ 2005-07-29 14:13 ` Ingo Molnar 2005-07-29 15:02 ` [sched, patch] better wake-balancing, #2 Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 14:13 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Nick Piggin', linux-kernel, linux-ia64, Andrew Morton another approach would be the patch below, to do wakeup-balancing only if the wakeup CPU or the task CPU is idle. I've measured half-loaded tbench and unless total wakeup-balancing removal it does not degrade with this patch applied, while fully loaded tbench and other workloads clearly improve. Ken, could you give this one a try? (It's against the current scheduler queue in -mm, but also applies fine to current Linus trees.) Ingo --- do wakeup-balancing only if the wakeup-CPU or the task-CPU is idle. this prevents excessive wakeup-balancing while the system is highly loaded, but helps spread out the workload on partly idle systems. Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/sched.c | 6 ++++++ 1 files changed, 6 insertions(+) Index: linux-sched-curr/kernel/sched.c =================================================================== --- linux-sched-curr.orig/kernel/sched.c +++ linux-sched-curr/kernel/sched.c @@ -1252,7 +1252,13 @@ static int try_to_wake_up(task_t *p, uns if (unlikely(task_running(rq, p))) goto out_activate; + /* + * If neither this CPU, nor the previous CPU the task was + * running on is idle then skip wakeup-balancing: + */ new_cpu = cpu; + if (!idle_cpu(this_cpu) && !idle_cpu(cpu)) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { ^ permalink raw reply [flat|nested] 26+ messages in thread
* [sched, patch] better wake-balancing, #2 2005-07-29 14:13 ` [sched, patch] better wake-balancing Ingo Molnar @ 2005-07-29 15:02 ` Ingo Molnar 2005-07-29 16:21 ` [sched, patch] better wake-balancing, #3 Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 15:02 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Nick Piggin', linux-kernel, linux-ia64, Andrew Morton * Ingo Molnar <mingo@elte.hu> wrote: > another approach would be the patch below, to do wakeup-balancing only > if the wakeup CPU or the task CPU is idle. there's an even simpler way: only do wakeup-balancing if this_cpu is idle. (tbench results are still OK, and other workloads improved.) Ingo -------- do wakeup-balancing only if the wakeup-CPU is idle. this prevents excessive wakeup-balancing while the system is highly loaded, but helps spread out the workload on partly idle systems. Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/sched.c | 6 ++++++ 1 files changed, 6 insertions(+) Index: linux-sched-curr/kernel/sched.c =================================================================== --- linux-sched-curr.orig/kernel/sched.c +++ linux-sched-curr/kernel/sched.c @@ -1253,7 +1253,13 @@ static int try_to_wake_up(task_t *p, uns if (unlikely(task_running(rq, p))) goto out_activate; + /* + * Only do wakeup-balancing (== potentially migrate the task) + * if this CPU is idle: + */ new_cpu = cpu; + if (!idle_cpu(this_cpu)) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { ^ permalink raw reply [flat|nested] 26+ messages in thread
* [sched, patch] better wake-balancing, #3 2005-07-29 15:02 ` [sched, patch] better wake-balancing, #2 Ingo Molnar @ 2005-07-29 16:21 ` Ingo Molnar 2005-07-30 0:08 ` Nick Piggin 0 siblings, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 16:21 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Nick Piggin', linux-kernel, linux-ia64, Andrew Morton * Ingo Molnar <mingo@elte.hu> wrote: > there's an even simpler way: only do wakeup-balancing if this_cpu is > idle. (tbench results are still OK, and other workloads improved.) here's an updated patch. It handles one more detail: on SCHED_SMT we should check the idleness of siblings too. Benchmark numbers still look good. Ingo ---- do wakeup-balancing only if the wakeup-CPU (or any of its siblings) is idle. this prevents excessive wakeup-balancing while the system is highly loaded, but helps spread out the workload on partly idle systems. Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/sched.c | 6 ++++++ 1 files changed, 6 insertions(+) Index: linux-sched-curr/kernel/sched.c =================================================================== --- linux-sched-curr.orig/kernel/sched.c +++ linux-sched-curr/kernel/sched.c @@ -1253,7 +1253,13 @@ static int try_to_wake_up(task_t *p, uns if (unlikely(task_running(rq, p))) goto out_activate; + /* + * Only do wakeup-balancing (== potentially migrate the task) + * if this CPU (or any SMT sibling) is idle: + */ new_cpu = cpu; + if (!idle_cpu(this_cpu) && this_cpu == wake_idle(this_cpu, p)) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [sched, patch] better wake-balancing, #3 2005-07-29 16:21 ` [sched, patch] better wake-balancing, #3 Ingo Molnar @ 2005-07-30 0:08 ` Nick Piggin 2005-07-30 7:19 ` Ingo Molnar 0 siblings, 1 reply; 26+ messages in thread From: Nick Piggin @ 2005-07-30 0:08 UTC (permalink / raw) To: Ingo Molnar; +Cc: Chen, Kenneth W, linux-kernel, linux-ia64, Andrew Morton Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > >>there's an even simpler way: only do wakeup-balancing if this_cpu is >>idle. (tbench results are still OK, and other workloads improved.) > > > here's an updated patch. It handles one more detail: on SCHED_SMT we > should check the idleness of siblings too. Benchmark numbers still look > good. > Maybe. Ken hasn't measured the effect of wake balancing in 2.6.13, which is quite a lot different to that found in 2.6.12. I don't really like having a hard cutoff like that -wake balancing can be important for IO workloads, though I haven't measured for a long time. In IPC workloads, the cache affinity of local wakeups becomes less apparent when the runqueue gets lots of tasks on it, however benefits of IO affinity will generally remain. Especially on NUMA systems. fork/clone/exec/etc balancing really doesn't do anything to capture this kind of relationship between tasks and between tasks and IRQ sources. Without wake balancing we basically have a completely random scattering of tasks. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [sched, patch] better wake-balancing, #3 2005-07-30 0:08 ` Nick Piggin @ 2005-07-30 7:19 ` Ingo Molnar 2005-07-31 1:15 ` Nick Piggin 2005-08-08 23:18 ` Chen, Kenneth W 0 siblings, 2 replies; 26+ messages in thread From: Ingo Molnar @ 2005-07-30 7:19 UTC (permalink / raw) To: Nick Piggin Cc: Chen, Kenneth W, linux-kernel, linux-ia64, Andrew Morton, John Hawkes, Martin J. Bligh, Paul Jackson * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > here's an updated patch. It handles one more detail: on SCHED_SMT we > > should check the idleness of siblings too. Benchmark numbers still > > look good. > > Maybe. Ken hasn't measured the effect of wake balancing in 2.6.13, > which is quite a lot different to that found in 2.6.12. > > I don't really like having a hard cutoff like that -wake balancing can > be important for IO workloads, though I haven't measured for a long > time. [...] well, i have measured it, and it was a win for just about everything that is not idle, and even for an IPC (SysV semaphores) half-idle workload i've measured a 3% gain. No performance loss in tbench either, which is clearly the most sensitive to affine/passive balancing. But i'd like to see what Ken's (and others') numbers are. the hard cutoff also has the benefit that it allows us to potentially make wakeup migration _more_ agressive in the future. So instead of having to think about weakening it due to the tradeoffs present in e.g. Ken's workload, we can actually make it stronger. > [...] In IPC workloads, the cache affinity of local wakeups becomes > less apparent when the runqueue gets lots of tasks on it, however > benefits of IO affinity will generally remain. Especially on NUMA > systems. especially on NUMA, if the migration-target CPU (this_cpu) is not at least partially idle, i'd be quite uneasy to passive balance from another node. I suspect this needs numbers from Martin and John? > fork/clone/exec/etc balancing really doesn't do anything to capture > this kind of relationship between tasks and between tasks and IRQ > sources. Without wake balancing we basically have a completely random > scattering of tasks. Ken's workload is a heavy IO one with lots of IRQ sources. And precisely for such type of workloads usually the best tactic is to leave the task alone and queue it wherever it last ran. whenever there's a strong (and exclusive) relationship between tasks and individual interrupt sources, explicit binding to CPUs/groups of CPUs is the best method. In any case, more measurements are needed. Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [sched, patch] better wake-balancing, #3 2005-07-30 7:19 ` Ingo Molnar @ 2005-07-31 1:15 ` Nick Piggin 2005-08-01 17:13 ` Siddha, Suresh B 2005-08-08 23:18 ` Chen, Kenneth W 1 sibling, 1 reply; 26+ messages in thread From: Nick Piggin @ 2005-07-31 1:15 UTC (permalink / raw) To: Ingo Molnar Cc: Chen, Kenneth W, linux-kernel, linux-ia64, Andrew Morton, John Hawkes, Martin J. Bligh, Paul Jackson Ingo Molnar wrote: > * Nick Piggin <nickpiggin@yahoo.com.au> wrote: >>I don't really like having a hard cutoff like that -wake balancing can >>be important for IO workloads, though I haven't measured for a long >>time. [...] > > > well, i have measured it, and it was a win for just about everything I meant: measured for IO workloads. I had one group tell me their IO efficiency went up by several *times* on a 16-way NUMA system after generalising the wake balancing to interrupts as well. > that is not idle, and even for an IPC (SysV semaphores) half-idle > workload i've measured a 3% gain. No performance loss in tbench either, > which is clearly the most sensitive to affine/passive balancing. But i'd > like to see what Ken's (and others') numbers are. > > the hard cutoff also has the benefit that it allows us to potentially > make wakeup migration _more_ agressive in the future. So instead of > having to think about weakening it due to the tradeoffs present in e.g. > Ken's workload, we can actually make it stronger. > That would make the behaviour change even more violent, which is what I dislike. I would much prefer to have code that handles both workloads without introducing sudden cutoff points in behaviour. > > especially on NUMA, if the migration-target CPU (this_cpu) is not at > least partially idle, i'd be quite uneasy to passive balance from > another node. I suspect this needs numbers from Martin and John? > Passive balancing cuts in only when an imbalance is becoming apparent. If the queue gets more imbalanced, periodic balancing will cut in, and that is much worse than wake balancing. > >>fork/clone/exec/etc balancing really doesn't do anything to capture >>this kind of relationship between tasks and between tasks and IRQ >>sources. Without wake balancing we basically have a completely random >>scattering of tasks. > > > Ken's workload is a heavy IO one with lots of IRQ sources. And precisely > for such type of workloads usually the best tactic is to leave the task > alone and queue it wherever it last ran. > Yep, I agree the wake balancing code in 2.6.12 wasn't ideal. That's why I changed it in 2.6.13 - precisely because it moved things around too much. It probably still isn't ideal though. > whenever there's a strong (and exclusive) relationship between tasks and > individual interrupt sources, explicit binding to CPUs/groups of CPUs is > the best method. In any case, more measurements are needed. > Well, I wouldn't say it is always the best method. Especially not when there is a big variation in the CPU consumption of the groups of tasks. But anyway, even in the cases where it definitely is the best method, we really should try to handle them properly without binding too. I do agree that more measurements are needed :) -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [sched, patch] better wake-balancing, #3 2005-07-31 1:15 ` Nick Piggin @ 2005-08-01 17:13 ` Siddha, Suresh B 0 siblings, 0 replies; 26+ messages in thread From: Siddha, Suresh B @ 2005-08-01 17:13 UTC (permalink / raw) To: Nick Piggin Cc: Ingo Molnar, Chen, Kenneth W, linux-kernel, linux-ia64, Andrew Morton, John Hawkes, Martin J. Bligh, Paul Jackson On Sun, Jul 31, 2005 at 11:15:16AM +1000, Nick Piggin wrote: > Ingo Molnar wrote: > > especially on NUMA, if the migration-target CPU (this_cpu) is not at > > least partially idle, i'd be quite uneasy to passive balance from > > another node. I suspect this needs numbers from Martin and John? > > Passive balancing cuts in only when an imbalance is becoming apparent. > If the queue gets more imbalanced, periodic balancing will cut in, > and that is much worse than wake balancing. Another point to note about the current wake balance. Imbalance calculation is not taking the complete load of the sched group into account. I think there might be scenario's where the current wake balance will actually result in some imbalances corrected later by periodic balancing. thanks, suresh ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [sched, patch] better wake-balancing, #3 2005-07-30 7:19 ` Ingo Molnar 2005-07-31 1:15 ` Nick Piggin @ 2005-08-08 23:18 ` Chen, Kenneth W 1 sibling, 0 replies; 26+ messages in thread From: Chen, Kenneth W @ 2005-08-08 23:18 UTC (permalink / raw) To: 'Ingo Molnar', Nick Piggin Cc: linux-kernel, linux-ia64, Andrew Morton, John Hawkes, Martin J. Bligh, Paul Jackson Ingo Molnar wrote on Saturday, July 30, 2005 12:19 AM > * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > > > here's an updated patch. It handles one more detail: on SCHED_SMT we > > > should check the idleness of siblings too. Benchmark numbers still > > > look good. > > > > Maybe. Ken hasn't measured the effect of wake balancing in 2.6.13, > > which is quite a lot different to that found in 2.6.12. > > > > I don't really like having a hard cutoff like that -wake balancing can > > be important for IO workloads, though I haven't measured for a long > > time. [...] > > well, i have measured it, and it was a win for just about everything > that is not idle, and even for an IPC (SysV semaphores) half-idle > workload i've measured a 3% gain. No performance loss in tbench either, > which is clearly the most sensitive to affine/passive balancing. But i'd > like to see what Ken's (and others') numbers are. > > the hard cutoff also has the benefit that it allows us to potentially > make wakeup migration _more_ agressive in the future. So instead of > having to think about weakening it due to the tradeoffs present in e.g. > Ken's workload, we can actually make it stronger. Sorry it took us a while to get the experiment done on our large db setup. This patch has the same effectiveness compare to turning off both SD_WAKE_BALANCE and SD_WAKE_AFFINE, (+2.2% on db OLTP workload). We like it a lot. - Ken ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-28 23:08 Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Chen, Kenneth W 2005-07-28 23:34 ` Nick Piggin @ 2005-07-29 11:26 ` Ingo Molnar 2005-07-29 17:30 ` Chen, Kenneth W 1 sibling, 1 reply; 26+ messages in thread From: Ingo Molnar @ 2005-07-29 11:26 UTC (permalink / raw) To: Chen, Kenneth W; +Cc: 'Nick Piggin', linux-kernel, linux-ia64 * Chen, Kenneth W <kenneth.w.chen@intel.com> wrote: > To demonstrate the problem, we turned off these two flags in the cpu > sd domain and measured a stunning 2.15% performance gain! And > deleting all the code in the try_to_wake_up() pertain to load > balancing gives us another 0.2% gain. another thing: do you have a HT-capable ia64 CPU, and do you have CONFIG_SCHED_SMT turned on? If yes then could you try to turn off SD_WAKE_IDLE too, i found it to bring further performance improvements in certain workloads. Ingo ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags 2005-07-29 11:26 ` Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Ingo Molnar @ 2005-07-29 17:30 ` Chen, Kenneth W 0 siblings, 0 replies; 26+ messages in thread From: Chen, Kenneth W @ 2005-07-29 17:30 UTC (permalink / raw) To: 'Ingo Molnar'; +Cc: 'Nick Piggin', linux-kernel, linux-ia64 Ingo Molnar wrote on Friday, July 29, 2005 4:26 AM > * Chen, Kenneth W <kenneth.w.chen@intel.com> wrote: > > To demonstrate the problem, we turned off these two flags in the cpu > > sd domain and measured a stunning 2.15% performance gain! And > > deleting all the code in the try_to_wake_up() pertain to load > > balancing gives us another 0.2% gain. > > another thing: do you have a HT-capable ia64 CPU, and do you have > CONFIG_SCHED_SMT turned on? If yes then could you try to turn off > SD_WAKE_IDLE too, i found it to bring further performance improvements > in certain workloads. The scheduler experiments done so far are on non-SMT CPU (Madison processor). We have another db setup with multi-thread capable ia64 cpu (montecito, and to be precise, it is SOEMT capable). We are just about to do scheduler experiments on that setup. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2005-08-08 23:19 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-28 23:08 Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Chen, Kenneth W 2005-07-28 23:34 ` Nick Piggin 2005-07-28 23:48 ` Chen, Kenneth W 2005-07-29 1:25 ` Nick Piggin 2005-07-29 1:39 ` Chen, Kenneth W 2005-07-29 1:46 ` Nick Piggin 2005-07-29 1:53 ` Chen, Kenneth W 2005-07-29 2:01 ` Nick Piggin 2005-07-29 6:27 ` Chen, Kenneth W 2005-07-29 8:48 ` Nick Piggin 2005-07-29 8:53 ` Ingo Molnar 2005-07-29 8:59 ` Nick Piggin 2005-07-29 9:01 ` Ingo Molnar 2005-07-29 9:07 ` Ingo Molnar 2005-07-29 16:40 ` Ingo Molnar 2005-07-29 11:48 ` [patch] remove wake-balancing Ingo Molnar 2005-07-29 14:13 ` [sched, patch] better wake-balancing Ingo Molnar 2005-07-29 15:02 ` [sched, patch] better wake-balancing, #2 Ingo Molnar 2005-07-29 16:21 ` [sched, patch] better wake-balancing, #3 Ingo Molnar 2005-07-30 0:08 ` Nick Piggin 2005-07-30 7:19 ` Ingo Molnar 2005-07-31 1:15 ` Nick Piggin 2005-08-01 17:13 ` Siddha, Suresh B 2005-08-08 23:18 ` Chen, Kenneth W 2005-07-29 11:26 ` Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags Ingo Molnar 2005-07-29 17:30 ` Chen, Kenneth W
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox