public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Erich Focht <efocht@ess.nec.de>
To: Michael Hohnbaum <hohnbaum@us.ibm.com>,
	"Martin J. Bligh" <mbligh@aracnet.com>
Cc: Robert Love <rml@tech9.net>, Ingo Molnar <mingo@elte.hu>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	lse-tech <lse-tech@lists.sourceforge.net>
Subject: Re: [Lse-tech] Minature NUMA scheduler
Date: Fri, 10 Jan 2003 17:34:56 +0100	[thread overview]
Message-ID: <200301101734.56182.efocht@ess.nec.de> (raw)
In-Reply-To: <1042176966.30434.148.camel@kenai>

Hi Martin & Michael,

indeed, restricting a process to the node on which it was started
helps, as the memory will always be local. The NUMA scheduler allows a
process to move away from it's node, if the load conditions require
it, but in the current form the process will not come back to its
homenode. That's what the "node affine scheduler" tried to realise.

The miniature NUMA scheduler relies on the quality of the initial load
balancer, and that one seems to be good enough. As you mentioned,
multithreaded jobs are disadvantaged as their threads have to stick on
the originating node.

Having some sort of automatic node affinity of processes and equal
node loads in mind (as design targets), we could:
 - take the minimal NUMA scheduler
 - if the normal (node-restricted) find_busiest_queue() fails and
 certain conditions are fulfilled (tried to balance inside own node
 for a while and didn't succeed, own CPU idle, etc... ???) rebalance
 over node boundaries (eg. using my load balancer)
This actually resembles the original design of the node affine
scheduler, having the cross-node balancing separate is ok and might
make the ideas clearer.

I made some measurements, too, and found basically what I
expected. The numbers are from a machine with 4 nodes of 2 CPUs
each. It's on ia64, so 2.5.52 based.

As the minsched cannot make mistakes (by moving tasks away from their
homenode), it leads to the best results with numa_test. OTOH hackbench
suffers a lot from the limitation to one node. The hackbench tasks are
not latency/bandwidth limited, the faster they spread over the whole
machine, the quicker finishes the job. That's why NUMA-sched is
slightly worse than a stock kernel. But minsched looses >50%. Funilly,
on my machine kernbench is slightly faster with the normal NUMA
scheduler.

Regards,
Erich

Results on a 8 CPU machine with 4 nodes (2 CPUs per node).

kernbench:
                elapsed       user          system
      stock52   134.52(0.84)  951.64(0.97)  20.72(0.22)
      sched52   133.19(1.49)  944.24(0.50)  21.36(0.24)
   minsched52   135.47(0.47)  937.61(0.20)  21.30(0.14)

schedbench/hackbench: time(s)
               10         25         50         100
      stock52  0.81(0.04) 2.06(0.07) 4.09(0.13) 7.89(0.25)
      sched52  0.81(0.04) 2.03(0.07) 4.14(0.20) 8.61(0.35)
   minsched52  1.28(0.05) 3.19(0.06) 6.59(0.13) 13.56(0.27)

numabench/numa_test 4
               AvgUserTime ElapsedTime TotUserTime TotSysTime
      stock52  ---         27.23(0.52) 89.30(4.18) 0.09(0.01)
      sched52  22.32(1.00) 27.39(0.42) 89.29(4.02) 0.10(0.01)
   minsched52  20.01(0.01) 23.40(0.13) 80.05(0.02) 0.08(0.01)

numabench/numa_test 8
               AvgUserTime ElapsedTime TotUserTime  TotSysTime
      stock52  ---         27.50(2.58) 174.74(6.66) 0.18(0.01)
      sched52  21.73(1.00) 33.70(1.82) 173.87(7.96) 0.18(0.01)
   minsched52  20.31(0.00) 23.50(0.12) 162.47(0.04) 0.16(0.01)

numabench/numa_test 16
               AvgUserTime ElapsedTime TotUserTime   TotSysTime
      stock52  ---         52.68(1.51) 390.03(15.10) 0.34(0.01)
      sched52  21.51(0.80) 47.18(3.24) 344.29(12.78) 0.36(0.01)
   minsched52  20.50(0.03) 43.82(0.08) 328.05(0.45)  0.34(0.01)

numabench/numa_test 32
               AvgUserTime ElapsedTime  TotUserTime   TotSysTime
      stock52  ---         102.60(3.89) 794.57(31.72) 0.65(0.01)
      sched52  21.93(0.57) 92.46(1.10)  701.75(18.38) 0.67(0.02)
   minsched52  20.64(0.10) 89.95(3.16)  660.72(3.13)  0.68(0.07)



On Friday 10 January 2003 06:36, Michael Hohnbaum wrote:
> On Thu, 2003-01-09 at 15:54, Martin J. Bligh wrote:
> > I tried a small experiment today - did a simple restriction of
> > the O(1) scheduler to only balance inside a node. Coupled with
> > the small initial load balancing patch floating around, this
> > covers 95% of cases, is a trivial change (3 lines), performs
> > just as well as Erich's patch on a kernel compile, and actually
> > better on schedbench.
> >
> > This is NOT meant to be a replacement for the code Erich wrote,
> > it's meant to be a simple way to get integration and acceptance.
> > Code that just forks and never execs will stay on one node - but
> > we can take the code Erich wrote, and put it in seperate rebalancer
> > that fires much less often to do a cross-node rebalance.
>
> I tried this on my 4 node NUMAQ (16 procs, 16GB memory) and got
> similar results.  Also, added in the cputime_stats patch and am
> attaching the matrix results from the 32 process run.  Basically,
> all runs show that the initial load balancer is able to place the
> tasks evenly across the nodes, and the better overall times show
> that not migrating to keep the nodes balanced over time results
> in better performance - at least on these boxes.
>
> Obviously, there can be situations where load balancing across
> nodes is necessary, but these results point to less load balancing
> being better, at least on these machines.  It will be interesting
> to repeat this on boxes with other interconnects.
>
> $ reportbench hacksched54 sched54 stock54
> Kernbench:
>                         Elapsed       User     System        CPU
>          hacksched54    29.406s    282.18s    81.716s    1236.8%
>              sched54    29.112s   283.888s     82.84s    1259.4%
>              stock54    31.348s   303.134s    87.824s    1247.2%
>
> Schedbench 4:
>                         AvgUser    Elapsed  TotalUser   TotalSys
>          hacksched54      21.94      31.93      87.81       0.53
>              sched54      22.03      34.90      88.15       0.75
>              stock54      49.35      57.53     197.45       0.86
>
> Schedbench 8:
>                         AvgUser    Elapsed  TotalUser   TotalSys
>          hacksched54      28.23      31.62     225.87       1.11
>              sched54      27.95      37.12     223.66       1.50
>              stock54      43.14      62.97     345.18       2.12
>
> Schedbench 16:
>                         AvgUser    Elapsed  TotalUser   TotalSys
>          hacksched54      49.29      71.31     788.83       2.88
>              sched54      55.37      69.58     886.10       3.79
>              stock54      66.00      81.25    1056.25       7.12
>
> Schedbench 32:
>                         AvgUser    Elapsed  TotalUser   TotalSys
>          hacksched54      56.41     117.98    1805.35       5.90
>              sched54      57.93     132.11    1854.01      10.74
>              stock54      77.81     173.26    2490.31      12.37
>
> Schedbench 64:
>                         AvgUser    Elapsed  TotalUser   TotalSys
>          hacksched54      56.62     231.93    3624.42      13.32
>              sched54      72.91     308.87    4667.03      21.06
>              stock54      86.68     368.55    5548.57      25.73
>
> hacksched54 = 2.5.54 + Martin's tiny NUMA patch +
>               03-cputimes_stat-2.5.53.patch +
>               02-numa-sched-ilb-2.5.53-21.patch
> sched54 = 2.5.54 + 01-numa-sched-core-2.5.53-24.patch +
>           02-ilb-2.5.53-21.patch02 +
>           03-cputimes_stat-2.5.53.patch
> stock54 - 2.5.54 + 03-cputimes_stat-2.5.53.patch


  reply	other threads:[~2003-01-10 16:26 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-01-09 23:54 Minature NUMA scheduler Martin J. Bligh
2003-01-10  5:36 ` [Lse-tech] " Michael Hohnbaum
2003-01-10 16:34   ` Erich Focht [this message]
2003-01-10 16:57     ` Martin J. Bligh
2003-01-12 23:35       ` Erich Focht
2003-01-12 23:55       ` NUMA scheduler 2nd approach Erich Focht
2003-01-13  8:02         ` Christoph Hellwig
2003-01-13 11:32           ` Erich Focht
2003-01-13 15:26             ` [Lse-tech] " Christoph Hellwig
2003-01-13 15:46               ` Erich Focht
2003-01-13 19:03             ` Michael Hohnbaum
2003-01-14  1:23         ` Michael Hohnbaum
2003-01-14  4:45           ` [Lse-tech] " Andrew Theurer
2003-01-14  4:56             ` Martin J. Bligh
2003-01-14 11:14               ` Erich Focht
2003-01-14 15:55                 ` [PATCH 2.5.58] new NUMA scheduler Erich Focht
2003-01-14 16:07                   ` [Lse-tech] " Christoph Hellwig
2003-01-14 16:23                   ` [PATCH 2.5.58] new NUMA scheduler: fix Erich Focht
2003-01-14 16:43                     ` Erich Focht
2003-01-14 19:02                       ` Michael Hohnbaum
2003-01-14 21:56                         ` [Lse-tech] " Michael Hohnbaum
2003-01-15 15:10                         ` Erich Focht
2003-01-16  0:14                           ` Michael Hohnbaum
2003-01-16  6:05                           ` Martin J. Bligh
2003-01-16 16:47                             ` Erich Focht
2003-01-16 18:07                               ` Robert Love
2003-01-16 18:48                                 ` Martin J. Bligh
2003-01-16 19:07                                 ` Ingo Molnar
2003-01-16 18:59                                   ` Martin J. Bligh
2003-01-16 19:10                                   ` Christoph Hellwig
2003-01-16 19:44                                     ` Ingo Molnar
2003-01-16 19:43                                       ` Martin J. Bligh
2003-01-16 20:19                                         ` Ingo Molnar
2003-01-16 20:29                                           ` [Lse-tech] " Rick Lindsley
2003-01-16 23:31                                           ` Martin J. Bligh
2003-01-17  7:23                                             ` Ingo Molnar
2003-01-17  8:47                                             ` [patch] sched-2.5.59-A2 Ingo Molnar
2003-01-17 14:35                                               ` Erich Focht
2003-01-17 15:11                                                 ` Ingo Molnar
2003-01-17 15:30                                                   ` Erich Focht
2003-01-17 16:58                                                   ` Martin J. Bligh
2003-01-18 20:54                                                     ` NUMA sched -> pooling scheduler (inc HT) Martin J. Bligh
2003-01-18 21:34                                                       ` [Lse-tech] " Martin J. Bligh
2003-01-19  0:13                                                         ` Andrew Theurer
2003-01-17 18:19                                                   ` [patch] sched-2.5.59-A2 Michael Hohnbaum
2003-01-18  7:08                                                   ` William Lee Irwin III
2003-01-18  8:12                                                     ` Martin J. Bligh
2003-01-18  8:16                                                       ` William Lee Irwin III
2003-01-19  4:22                                                     ` William Lee Irwin III
2003-01-17 17:21                                                 ` Martin J. Bligh
2003-01-17 17:23                                                 ` Martin J. Bligh
2003-01-17 18:11                                                 ` Erich Focht
2003-01-17 19:04                                                   ` Martin J. Bligh
2003-01-17 19:26                                                     ` [Lse-tech] " Martin J. Bligh
2003-01-18  0:13                                                       ` Michael Hohnbaum
2003-01-18 13:31                                                         ` [patch] tunable rebalance rates for sched-2.5.59-B0 Erich Focht
2003-01-18 23:09                                                         ` [patch] sched-2.5.59-A2 Erich Focht
2003-01-20  9:28                                                           ` Ingo Molnar
2003-01-20 12:07                                                             ` Erich Focht
2003-01-20 16:56                                                               ` Ingo Molnar
2003-01-20 17:04                                                                 ` Ingo Molnar
2003-01-20 17:10                                                                   ` Martin J. Bligh
2003-01-20 17:24                                                                     ` Ingo Molnar
2003-01-20 19:13                                                                       ` Andrew Theurer
2003-01-20 19:33                                                                         ` Martin J. Bligh
2003-01-20 19:52                                                                           ` Andrew Theurer
2003-01-20 19:52                                                                             ` Martin J. Bligh
2003-01-20 21:18                                                                               ` [patch] HT scheduler, sched-2.5.59-D7 Ingo Molnar
2003-01-20 22:28                                                                                 ` Andrew Morton
2003-01-21  1:11                                                                                   ` Michael Hohnbaum
2003-01-22  3:15                                                                                 ` Michael Hohnbaum
2003-01-22 16:41                                                                                   ` Andrew Theurer
2003-01-22 16:17                                                                                     ` Martin J. Bligh
2003-01-22 16:20                                                                                       ` Andrew Theurer
2003-01-22 16:35                                                                                     ` Michael Hohnbaum
2003-02-03 18:23                                                                                 ` [patch] HT scheduler, sched-2.5.59-E2 Ingo Molnar
2003-02-03 20:47                                                                                   ` Robert Love
2003-02-04  9:31                                                                                   ` Erich Focht
2003-01-20 17:04                                                                 ` [patch] sched-2.5.59-A2 Martin J. Bligh
2003-01-21 17:44                                                                 ` Erich Focht
2003-01-20 16:23                                                             ` Martin J. Bligh
2003-01-20 16:59                                                               ` Ingo Molnar
2003-01-17 23:09                                                     ` Matthew Dobson
2003-01-16 23:45                                           ` [PATCH 2.5.58] new NUMA scheduler: fix Michael Hohnbaum
2003-01-17 11:10                                           ` Erich Focht
2003-01-17 14:07                                             ` Ingo Molnar
2003-01-16 19:44                                       ` John Bradford
2003-01-14 16:51                     ` Christoph Hellwig
2003-01-15  0:05                     ` Michael Hohnbaum
2003-01-15  7:47                     ` Martin J. Bligh
2003-01-14  5:50             ` [Lse-tech] Re: NUMA scheduler 2nd approach Michael Hohnbaum
2003-01-14 16:52               ` Andrew Theurer
2003-01-14 15:13                 ` Erich Focht
2003-01-14 10:56           ` Erich Focht
2003-01-11 14:43     ` [Lse-tech] Minature NUMA scheduler Bill Davidsen
2003-01-12 23:24       ` Erich Focht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200301101734.56182.efocht@ess.nec.de \
    --to=efocht@ess.nec.de \
    --cc=hohnbaum@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    --cc=mingo@elte.hu \
    --cc=rml@tech9.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox