public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
@ 2007-07-26 18:32 Siddha, Suresh B
  2007-07-26 22:18 ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Siddha, Suresh B @ 2007-07-26 18:32 UTC (permalink / raw)
  To: mingo, npiggin; +Cc: linux-kernel, akpm

Introduce SD_BALANCE_FORK for HT/MC/SMP domains.

For HT/MC, as caches are shared, SD_BALANCE_FORK is the right thing to do.
Given that NUMA domain already has this flag and the
scheduler currently doesn't have the concept of running threads belonging
to a process as close as possible(i.e., forking may keep close, but periodic
balance later will likely take them far away), introduce SD_BALANCE_FORK
for SMP domain too.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/include/linux/topology.h b/include/linux/topology.h
index d0890a7..dc15a9f 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -104,6 +104,7 @@
 	.flags			= SD_LOAD_BALANCE	\
 				| SD_BALANCE_NEWIDLE	\
 				| SD_BALANCE_EXEC	\
+				| SD_BALANCE_FORK	\
 				| SD_WAKE_AFFINE	\
 				| SD_WAKE_IDLE		\
 				| SD_SHARE_CPUPOWER,	\
@@ -135,6 +136,7 @@
 	.flags			= SD_LOAD_BALANCE	\
 				| SD_BALANCE_NEWIDLE	\
 				| SD_BALANCE_EXEC	\
+				| SD_BALANCE_FORK	\
 				| SD_WAKE_AFFINE	\
 				| SD_WAKE_IDLE		\
 				| SD_SHARE_PKG_RESOURCES\
@@ -166,6 +168,7 @@
 	.flags			= SD_LOAD_BALANCE	\
 				| SD_BALANCE_NEWIDLE	\
 				| SD_BALANCE_EXEC	\
+				| SD_BALANCE_FORK	\
 				| SD_WAKE_AFFINE	\
 				| SD_WAKE_IDLE		\
 				| BALANCE_FOR_PKG_POWER,\

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
  2007-07-26 18:32 [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains Siddha, Suresh B
@ 2007-07-26 22:18 ` Ingo Molnar
  2007-07-26 22:34   ` Siddha, Suresh B
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2007-07-26 22:18 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: npiggin, linux-kernel, akpm


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> Introduce SD_BALANCE_FORK for HT/MC/SMP domains.
> 
> For HT/MC, as caches are shared, SD_BALANCE_FORK is the right thing to 
> do. Given that NUMA domain already has this flag and the scheduler 
> currently doesn't have the concept of running threads belonging to a 
> process as close as possible(i.e., forking may keep close, but 
> periodic balance later will likely take them far away), introduce 
> SD_BALANCE_FORK for SMP domain too.
> 
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

i'm not opposed to this fundamentally, but it would be nice to better 
map the effects of this change: do you have any particular workload 
under which you've tested this and under which you've seen it makes a 
difference? I'd expect this to improve fork-intense half-idle workloads 
perhaps - things like a make -j3 on a 4-core CPU.

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
  2007-07-26 22:18 ` Ingo Molnar
@ 2007-07-26 22:34   ` Siddha, Suresh B
  2007-07-27  1:22     ` Nick Piggin
  2007-07-29 21:16     ` Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Siddha, Suresh B @ 2007-07-26 22:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Siddha, Suresh B, npiggin, linux-kernel, akpm

On Fri, Jul 27, 2007 at 12:18:30AM +0200, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> 
> > Introduce SD_BALANCE_FORK for HT/MC/SMP domains.
> > 
> > For HT/MC, as caches are shared, SD_BALANCE_FORK is the right thing to 
> > do. Given that NUMA domain already has this flag and the scheduler 
> > currently doesn't have the concept of running threads belonging to a 
> > process as close as possible(i.e., forking may keep close, but 
> > periodic balance later will likely take them far away), introduce 
> > SD_BALANCE_FORK for SMP domain too.
> > 
> > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> 
> i'm not opposed to this fundamentally, but it would be nice to better 
> map the effects of this change: do you have any particular workload 
> under which you've tested this and under which you've seen it makes a 
> difference? I'd expect this to improve fork-intense half-idle workloads 
> perhaps - things like a make -j3 on a 4-core CPU.

They might be doing more exec's and probably covered by exec balance.

There was a small pthread test case which was calculating the time to create
all the threads and how much time each thread took to start running. It
appeared as if the threads ran sequentially one after another on a DP system
with four cores leading to this SD_BALANCE_FORK observation.

thanks,
suresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
  2007-07-26 22:34   ` Siddha, Suresh B
@ 2007-07-27  1:22     ` Nick Piggin
  2007-07-27 19:09       ` Siddha, Suresh B
  2007-07-29 21:16     ` Ingo Molnar
  1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2007-07-27  1:22 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: Ingo Molnar, linux-kernel, akpm

On Thu, Jul 26, 2007 at 03:34:56PM -0700, Suresh B wrote:
> On Fri, Jul 27, 2007 at 12:18:30AM +0200, Ingo Molnar wrote:
> > 
> > * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> > 
> > > Introduce SD_BALANCE_FORK for HT/MC/SMP domains.
> > > 
> > > For HT/MC, as caches are shared, SD_BALANCE_FORK is the right thing to 
> > > do. Given that NUMA domain already has this flag and the scheduler 
> > > currently doesn't have the concept of running threads belonging to a 
> > > process as close as possible(i.e., forking may keep close, but 
> > > periodic balance later will likely take them far away), introduce 
> > > SD_BALANCE_FORK for SMP domain too.
> > > 
> > > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> > 
> > i'm not opposed to this fundamentally, but it would be nice to better 
> > map the effects of this change: do you have any particular workload 
> > under which you've tested this and under which you've seen it makes a 
> > difference? I'd expect this to improve fork-intense half-idle workloads 
> > perhaps - things like a make -j3 on a 4-core CPU.
> 
> They might be doing more exec's and probably covered by exec balance.
> 
> There was a small pthread test case which was calculating the time to create
> all the threads and how much time each thread took to start running. It
> appeared as if the threads ran sequentially one after another on a DP system
> with four cores leading to this SD_BALANCE_FORK observation.

If it helps throughput in a non-trivial microbenchmark it would be
helpful. I'm not against it either really, but keep in mind that it
can make fork more expensive and less scalable; the reason we do it
for the NUMA domain is because today we're basically screwed WRT
existing working set if we have to migrage processes over nodes. It
is really important to try to minimise that any way we possibly can.

When (if) we get NUMA page replication and automatic migration going,
I will be looking at whether we can make NUMA migration more
aggressive (and potentially remove SD_BALANCE_FORK). Not that either
replication or migration help with kernel allocations, nor are they
cheap, so NUMA placement will always be worth spending more cycles
on to get right.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
  2007-07-27  1:22     ` Nick Piggin
@ 2007-07-27 19:09       ` Siddha, Suresh B
  0 siblings, 0 replies; 7+ messages in thread
From: Siddha, Suresh B @ 2007-07-27 19:09 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Siddha, Suresh B, Ingo Molnar, linux-kernel, akpm

On Fri, Jul 27, 2007 at 03:22:14AM +0200, Nick Piggin wrote:
> On Thu, Jul 26, 2007 at 03:34:56PM -0700, Suresh B wrote:
> > On Fri, Jul 27, 2007 at 12:18:30AM +0200, Ingo Molnar wrote:
> > > 
> > > * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> > > 
> > > > Introduce SD_BALANCE_FORK for HT/MC/SMP domains.
> > > > 
> > > > For HT/MC, as caches are shared, SD_BALANCE_FORK is the right thing to 
> > > > do. Given that NUMA domain already has this flag and the scheduler 
> > > > currently doesn't have the concept of running threads belonging to a 
> > > > process as close as possible(i.e., forking may keep close, but 
> > > > periodic balance later will likely take them far away), introduce 
> > > > SD_BALANCE_FORK for SMP domain too.
> > > > 
> > > > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> > > 
> > > i'm not opposed to this fundamentally, but it would be nice to better 
> > > map the effects of this change: do you have any particular workload 
> > > under which you've tested this and under which you've seen it makes a 
> > > difference? I'd expect this to improve fork-intense half-idle workloads 
> > > perhaps - things like a make -j3 on a 4-core CPU.
> > 
> > They might be doing more exec's and probably covered by exec balance.
> > 
> > There was a small pthread test case which was calculating the time to create
> > all the threads and how much time each thread took to start running. It
> > appeared as if the threads ran sequentially one after another on a DP system
> > with four cores leading to this SD_BALANCE_FORK observation.
> 
> If it helps throughput in a non-trivial microbenchmark it would be
> helpful.

We are planning to collect some data with this.

> I'm not against it either really, but keep in mind that it
> can make fork more expensive and less scalable; the reason we do it
> for the NUMA domain is because today we're basically screwed WRT
> existing working set if we have to migrage processes over nodes. It
> is really important to try to minimise that any way we possibly can.
> 
> When (if) we get NUMA page replication and automatic migration going,
> I will be looking at whether we can make NUMA migration more
> aggressive (and potentially remove SD_BALANCE_FORK). Not that either
> replication or migration help with kernel allocations, nor are they
> cheap, so NUMA placement will always be worth spending more cycles
> on to get right.

I agree. Perhaps we can set it only for ht/mc domains.

thanks,
suresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
  2007-07-26 22:34   ` Siddha, Suresh B
  2007-07-27  1:22     ` Nick Piggin
@ 2007-07-29 21:16     ` Ingo Molnar
  2007-07-30 17:53       ` Siddha, Suresh B
  1 sibling, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2007-07-29 21:16 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: npiggin, linux-kernel, akpm


* Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:

> They might be doing more exec's and probably covered by exec balance.
> 
> There was a small pthread test case which was calculating the time to 
> create all the threads and how much time each thread took to start 
> running. It appeared as if the threads ran sequentially one after 
> another on a DP system with four cores leading to this SD_BALANCE_FORK 
> observation.

would be nice to dig out that testcase i suspect and quantify the 
benefits of your patch. Another workload which might perform better 
would be linpack: it benefits from fast and immediate 'spreading' of 
freshly forked threads.

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains
  2007-07-29 21:16     ` Ingo Molnar
@ 2007-07-30 17:53       ` Siddha, Suresh B
  0 siblings, 0 replies; 7+ messages in thread
From: Siddha, Suresh B @ 2007-07-30 17:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Siddha, Suresh B, npiggin, linux-kernel, akpm

On Sun, Jul 29, 2007 at 11:16:44PM +0200, Ingo Molnar wrote:
> 
> * Siddha, Suresh B <suresh.b.siddha@intel.com> wrote:
> 
> > They might be doing more exec's and probably covered by exec balance.
> > 
> > There was a small pthread test case which was calculating the time to 
> > create all the threads and how much time each thread took to start 
> > running. It appeared as if the threads ran sequentially one after 
> > another on a DP system with four cores leading to this SD_BALANCE_FORK 
> > observation.
> 
> would be nice to dig out that testcase i suspect and quantify the 
> benefits of your patch.

That test case doesn't do much other than calculating the time taken for each
thread to start running. With this balance on fork patch, that small pthread
test case shows that all the threads now start almost at the same time on all
cores.

> Another workload which might perform better 
> would be linpack: it benefits from fast and immediate 'spreading' of 
> freshly forked threads.

My understanding is that linkpack doesn't do fork often(as such
difference might not be visible, but will take a look). We were planning
to test httperf or some other workloads which probably does fork more often.

thanks,
suresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-07-30 17:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-26 18:32 [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains Siddha, Suresh B
2007-07-26 22:18 ` Ingo Molnar
2007-07-26 22:34   ` Siddha, Suresh B
2007-07-27  1:22     ` Nick Piggin
2007-07-27 19:09       ` Siddha, Suresh B
2007-07-29 21:16     ` Ingo Molnar
2007-07-30 17:53       ` Siddha, Suresh B

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox