public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Patch] Idle balancer: cache align nohz structure to improve idle load balancing scalability
@ 2011-10-19 21:45 Tim Chen
  2011-10-20  4:18 ` Eric Dumazet
  2011-10-20  4:24 ` Andi Kleen
  0 siblings, 2 replies; 15+ messages in thread
From: Tim Chen @ 2011-10-19 21:45 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Andi Kleen, Suresh Siddha, Venki Pallipadi

Idle load balancing makes use of a global structure nohz to keep track
of the cpu doing the idle load balancing, first and second busy cpu and
the cpus that are idle.  This leads to scalability issue.

For workload that has processes waking up and going to sleep often, the 
load_balancer, first_pick_cpu, second_cpu and idle_cpus_mask in the
no_hz structure get updated very frequently. This causes lots of cache
bouncing and slowing down the idle and wakeup path for large system with
many cores/sockets.  This is evident from up to 41% of cpu cycles spent
in the function select_nohz_load_balancer from a test work load I ran.
By putting these fields in their own cache line, the problem can be
mitigated.

The test workload has multiple pairs of processes. Within a process
pair, each process receive and then send message back and forth to the
other process via a pipe connecting them. So at any one time, half the
processes are active.

I found that for 32 pairs of processes, I got an increase of the rate of
context switching between the processes by 37% and by 24% for 64 process
pairs. The test was run on a 8 socket 64 cores NHM-EX system, where
hyper-threading has been turned on.

Tim

Workload cpu cycle profile on vanilla kernel:
41.19%          swapper  [kernel.kallsyms]          [k] select_nohz_load_balancer   
   - select_nohz_load_balancer                                                       
      + 54.91% tick_nohz_restart_sched_tick                                         
      + 45.04% tick_nohz_stop_sched_tick     
18.96%          swapper  [kernel.kallsyms]          [k] mwait_idle_with_hints        
 3.50%          swapper  [kernel.kallsyms]          [k] tick_nohz_restart_sched_tick 
 3.36%          swapper  [kernel.kallsyms]          [k] tick_check_idle              
 2.96%          swapper  [kernel.kallsyms]          [k] rcu_enter_nohz               
 2.40%          swapper  [kernel.kallsyms]          [k] _raw_spin_lock               
 2.11%          swapper  [kernel.kallsyms]          [k] tick_nohz_stop_sched_tick    


Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index bc8ee99..26ea877 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3639,10 +3639,10 @@ static inline void init_sched_softirq_csd(struct call_single_data *csd)
  *   load balancing for all the idle CPUs.
  */
 static struct {
-	atomic_t load_balancer;
-	atomic_t first_pick_cpu;
-	atomic_t second_pick_cpu;
-	cpumask_var_t idle_cpus_mask;
+	atomic_t load_balancer ____cacheline_aligned;
+	atomic_t first_pick_cpu ____cacheline_aligned;
+	atomic_t second_pick_cpu ____cacheline_aligned;
+	cpumask_var_t idle_cpus_mask ____cacheline_aligned;
 	cpumask_var_t grp_idle_mask;
 	unsigned long next_balance;     /* in jiffy units */
 } nohz ____cacheline_aligned;












^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-11-14 19:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-19 21:45 [Patch] Idle balancer: cache align nohz structure to improve idle load balancing scalability Tim Chen
2011-10-20  4:18 ` Eric Dumazet
2011-10-20  5:57   ` Suresh Siddha
2011-10-20  6:43     ` Eric Dumazet
2011-10-20 17:19   ` Tim Chen
2011-10-20  4:24 ` Andi Kleen
2011-10-20 12:26   ` Venki Pallipadi
2011-10-20 17:31     ` Suresh Siddha
2011-10-20 17:38     ` Peter Zijlstra
     [not found]     ` <4FF5AC937153B0459463C1A88EB478F20135D6ECB5@orsmsx505.amr.corp.intel.com>
2011-11-01 23:52       ` Suresh Siddha
2011-11-02 13:04         ` Srivatsa Vaddagiri
2011-11-02 13:54         ` Srivatsa Vaddagiri
2011-11-02 15:13           ` Suresh Siddha
2011-11-14  9:32         ` Peter Zijlstra
2011-11-14 19:37           ` Suresh Siddha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox