public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] (0/3) NUMA aware scheduler
@ 2003-01-16 18:27 Martin J. Bligh
  2003-01-16 19:32 ` Martin J. Bligh
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Martin J. Bligh @ 2003-01-16 18:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Following is a sequence of patches to add NUMA awareness to the scheduler.
These have been submitted to you several times before, but in my opinion
were structured in such a way to make them too invasive to non-NUMA machines.
I propsed a new scheme of working in "concentric circles" which this set 
follows (Erich did most of the hard work of restructuring), and is now 
completely non-invasive to non-NUMA systems. It has no effect whatsoever 
on standard machines. This can be seen by code inspection, and has been 
checked by benchmarking.

These patches are the culmination of work by Erich Focht, Michael Hohnbaum
and myself. We've also incorporated feedback from Christoph and Robert Love.
I believe these are now ready for mainline acceptance. I've tested them on
NUMA-Q, standard SMP and UP. Erich has run them on the NEC ia64 NUMA machine.

Benchmarks on a 16-way NUMA-Q machine w/ 16Gb of RAM

Kernbench: (average of 5 kernel compiles)
                                   Elapsed        User      System         CPU
                        2.5.58     20.012s     191.81s      48.37s     1200.6%
              2.5.58-numasched      19.57s    187.264s     42.186s     1171.8%

NUMA schedbench 64: (64 processes running memory allocation fairly heavily)
                                               Elapsed   TotalUser    TotalSys
                        2.5.48                  608.81     9418.37       26.74
              2.5.58-numasched                  230.49     3613.47       15.57


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] (0/3) NUMA aware scheduler
  2003-01-16 18:27 [PATCH] (0/3) NUMA aware scheduler Martin J. Bligh
@ 2003-01-16 19:32 ` Martin J. Bligh
  2003-01-16 19:48 ` Andrew Theurer
  2003-01-16 20:28 ` Linus Torvalds
  2 siblings, 0 replies; 6+ messages in thread
From: Martin J. Bligh @ 2003-01-16 19:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

> These patches are the culmination of work by Erich Focht, Michael Hohnbaum
> and myself. We've also incorporated feedback from Christoph and Robert Love.

There's also some bits of code embedded in here from Christoph.

M.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] (0/3) NUMA aware scheduler
  2003-01-16 18:27 [PATCH] (0/3) NUMA aware scheduler Martin J. Bligh
  2003-01-16 19:32 ` Martin J. Bligh
@ 2003-01-16 19:48 ` Andrew Theurer
  2003-01-19 22:04   ` Pavel Machek
  2003-01-16 20:28 ` Linus Torvalds
  2 siblings, 1 reply; 6+ messages in thread
From: Andrew Theurer @ 2003-01-16 19:48 UTC (permalink / raw)
  To: Martin J. Bligh, Linus Torvalds, Ingo Molnar; +Cc: linux-kernel

> Following is a sequence of patches to add NUMA awareness to the scheduler.
> These have been submitted to you several times before, but in my opinion
> were structured in such a way to make them too invasive to non-NUMA
machines.
> I propsed a new scheme of working in "concentric circles" which this set
> follows (Erich did most of the hard work of restructuring), and is now
> completely non-invasive to non-NUMA systems. It has no effect whatsoever
> on standard machines. This can be seen by code inspection, and has been
> checked by benchmarking.

FYI, I have used a topology to map HT aware processors (in this case P4) to
a NUMA topology while using this scheduler.  This was done to help address
the same problems that Ingo's shared runqueue implementation fixed.  The
topology is quite simple. Sibling logical procs are members of a node.
Number of nodes = number of physical procs.

This primarily avoids sharing cpu cores (and avoiding resource contention)
on low loads.  In my case, 4 tasks on 8 logical proc system, we want to load
balance the tasks across nodes/cores for better performance.  For my test, I
did a make -j4 on a 2.4.18 kernel.  Results are:

stock sched, no numa:    56.523 elapsed  202.899 user,  18.266 sys,  390.6%
numa sched, ht topo:      53.088 elapsed  189.424 user,  18.36 sys,    391%

~6.5% better.  These results are the average of 10 kernel compiles.
* I did make one minor change to sched_best_cpu(). The first test case was
elimintaed, and that change is currently under discussion.

I did this mainly to demonstrate that a numa scheduler's policies may be
able to help HT systems and to capture a wider interest in numa scheduler.
By no means is P4 HT required to use this.  This is simply a numa topology
implemantation.  I would like some feedback on any interest in this.

One of the reasons we probably have not had much interest in numa patches is
that numa systems are not that prevailent.  However, numa-like qualites are
showing up in commonly available systems, and I believe we can take
advantage of policies that these patches, such as numa scheduler provide.
Does anyone have any other ideas where numa like qualities lie?  x86-64?

-Andrew Theurer

P.S. I am working on a topology patch to send out.  It's quite hackish right
now.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] (0/3) NUMA aware scheduler
  2003-01-16 18:27 [PATCH] (0/3) NUMA aware scheduler Martin J. Bligh
  2003-01-16 19:32 ` Martin J. Bligh
  2003-01-16 19:48 ` Andrew Theurer
@ 2003-01-16 20:28 ` Linus Torvalds
  2003-01-16 22:36   ` Martin J. Bligh
  2 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2003-01-16 20:28 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel


Applied. 

I also have to say that I hope this means that the HT-specific scheduler 
stuff will go away. HT _should_ be just another NuMA issue, and right now 
the two seem to be just slightly different ways of covering the same 
needs.

However, I'm going away for two weeks starting tomorrow, so even if there 
is some experimental HT/NUMA patch, I don't want it at this point. The 
NUMA scheduler merge is more of a "get the infrastructure in place" thing 
for me right now.

			Linus


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] (0/3) NUMA aware scheduler
  2003-01-16 20:28 ` Linus Torvalds
@ 2003-01-16 22:36   ` Martin J. Bligh
  0 siblings, 0 replies; 6+ messages in thread
From: Martin J. Bligh @ 2003-01-16 22:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Theurer

> Applied. 

Thank you!
 
> I also have to say that I hope this means that the HT-specific scheduler 
> stuff will go away. HT _should_ be just another NuMA issue, and right now 
> the two seem to be just slightly different ways of covering the same 
> needs.

Yup, Andrew Theurer from our performance team has been working on this.
Initial results look encouraging.

> However, I'm going away for two weeks starting tomorrow, so even if there 
> is some experimental HT/NUMA patch, I don't want it at this point. The 
> NUMA scheduler merge is more of a "get the infrastructure in place" thing 
> for me right now.

Absolutely. Hopefully by the time you return we'll have a structure for
hyperthreading in place that's reasonably tuned ;-)

There's some more tuning and tweaking we could do to the NUMA machines as
well (I'm looking at how to implement Ingo's feedback), but I'm convinced
the infrastructure is correct.

Thanks,

M.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] (0/3) NUMA aware scheduler
  2003-01-16 19:48 ` Andrew Theurer
@ 2003-01-19 22:04   ` Pavel Machek
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Machek @ 2003-01-19 22:04 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Martin J. Bligh, Linus Torvalds, Ingo Molnar, linux-kernel

Hi!

> One of the reasons we probably have not had much interest in numa patches is
> that numa systems are not that prevailent.  However, numa-like qualites are
> showing up in commonly available systems, and I believe we can take
> advantage of policies that these patches, such as numa scheduler provide.
> Does anyone have any other ideas where numa like qualities lie?  x86-64?

Yep, x86-64 SMP systems are in fact NUMA systems that don't penalize
remote memory *that* badly.
								Pavel
-- 
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-01-20 19:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-16 18:27 [PATCH] (0/3) NUMA aware scheduler Martin J. Bligh
2003-01-16 19:32 ` Martin J. Bligh
2003-01-16 19:48 ` Andrew Theurer
2003-01-19 22:04   ` Pavel Machek
2003-01-16 20:28 ` Linus Torvalds
2003-01-16 22:36   ` Martin J. Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox