All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Twichell <tbrian@us.ibm.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Lang <david.lang@digitalinsight.com>,
	linux-kernel@vger.kernel.org, mbligh@mbligh.org,
	slpratt@us.ibm.com, anton@samba.org
Subject: Re: Database regression due to scheduler changes ?
Date: Tue, 08 Nov 2005 23:03:48 -0600	[thread overview]
Message-ID: <43718334.2090905@us.ibm.com> (raw)
In-Reply-To: <436FF6A6.1040708@yahoo.com.au>

Nick Piggin wrote:

>
> I think you are right that the NUMA domain is probably being too
> constrictive of task balancing, and that is where the regression
> is coming from.
>
> For some workloads it is definitely important to have the NUMA
> domain, because it helps spread load over memory controllers as
> well as CPUs - so I guess eliminating that domain is not a good
> long term solution.
>
> I would look at changing parameters of SD_NODE_INIT in include/
> asm-powerpc/topology.h so they are closer to SD_CPU_INIT parameters
> (ie. more aggressive).

I ran with the following:

--- topology.h.orig     2005-11-08 13:11:57.000000000 -0600
+++ topology.h  2005-11-08 13:17:15.000000000 -0600
@@ -43,11 +43,11 @@ static inline int node_to_first_cpu(int
        .span                   = CPU_MASK_NONE,        \
        .parent                 = NULL,                 \
        .groups                 = NULL,                 \
-       .min_interval           = 8,                    \
-       .max_interval           = 32,                   \
-       .busy_factor            = 32,                   \
+       .min_interval           = 1,                    \
+       .max_interval           = 4,                    \
+       .busy_factor            = 64,                   \
        .imbalance_pct          = 125,                  \
-       .cache_hot_time         = (10*1000000),         \
+       .cache_hot_time         = (5*1000000/2),        \
        .cache_nice_tries       = 1,                    \
        .per_cpu_gain           = 100,                  \
        .flags                  = SD_LOAD_BALANCE       \

There was no improvement in performance.  The schedstats from this run 
follow:

       2516          sys_sched_yield()
          0(  0.00%) found (only) active queue empty on current cpu
          0(  0.00%) found (only) expired queue empty on current cpu
         46(  1.83%) found both queues empty on current cpu
       2470( 98.17%) found neither queue empty on current cpu


    22969106          schedule()
     694922          goes idle
          3(  0.00%) switched active and expired queues
          0(  0.00%) used existing active queue

          0          active_load_balance()
          0          sched_balance_exec()

      0.19/1.28      avg runtime/latency over all cpus (ms)

[scheduler domain #0]
    1153606          load_balance()
      82580(  7.16%) called while idle
                         488(  0.59%) tried but failed to move any tasks
                       63876( 77.35%) found no busier group
                       18216( 22.06%) succeeded in moving at least one task
                                      (average imbalance:   1.526)
     317610( 27.53%) called while busy
                          15(  0.00%) tried but failed to move any tasks
                      220139( 69.31%) found no busier group
                       97456( 30.68%) succeeded in moving at least one task
                                      (average imbalance:   1.752)
     753416( 65.31%) called when newly idle
                         487(  0.06%) tried but failed to move any tasks
                      624132( 82.84%) found no busier group
                      128797( 17.10%) succeeded in moving at least one task
                                      (average imbalance:   1.531)

          0          sched_balance_exec() tried to push a task

[scheduler domain #1]
     715638          load_balance()
      68533(  9.58%) called while idle
                        3140(  4.58%) tried but failed to move any tasks
                       60357( 88.07%) found no busier group
                        5036(  7.35%) succeeded in moving at least one task
                                      (average imbalance:   1.251)
      22486(  3.14%) called while busy
                          64(  0.28%) tried but failed to move any tasks
                       21352( 94.96%) found no busier group
                        1070(  4.76%) succeeded in moving at least one task
                                      (average imbalance:   1.922)
     624619( 87.28%) called when newly idle
                        5218(  0.84%) tried but failed to move any tasks
                      591970( 94.77%) found no busier group
                       27431(  4.39%) succeeded in moving at least one task
                                      (average imbalance:   1.382)

          0          sched_balance_exec() tried to push a task

[scheduler domain #2]
     685164          load_balance()
      63247(  9.23%) called while idle
                        7280( 11.51%) tried but failed to move any tasks
                       52200( 82.53%) found no busier group
                        3767(  5.96%) succeeded in moving at least one task
                                      (average imbalance:   1.361)
      24729(  3.61%) called while busy
                         418(  1.69%) tried but failed to move any tasks
                       21025( 85.02%) found no busier group
                        3286( 13.29%) succeeded in moving at least one task
                                      (average imbalance:   3.579)
     597188( 87.16%) called when newly idle
                       67577( 11.32%) tried but failed to move any tasks
                      371377( 62.19%) found no busier group
                      158234( 26.50%) succeeded in moving at least one task
                                      (average imbalance:   2.146)

          0          sched_balance_exec() tried to push a task

>
> I would also take a look at removing SD_WAKE_IDLE from the flags.
> This flag should make balancing more aggressive, but it can have
> problems when applied to a NUMA domain due to too much task
> movement.

Independent from the run above, I ran with the following:

--- topology.h.orig     2005-11-08 19:32:19.000000000 -0600
+++ topology.h  2005-11-08 19:34:25.000000000 -0600
@@ -53,7 +53,6 @@ static inline int node_to_first_cpu(int
        .flags                  = SD_LOAD_BALANCE       \
                                | SD_BALANCE_EXEC       \
                                | SD_BALANCE_NEWIDLE    \
-                               | SD_WAKE_IDLE          \
                                | SD_WAKE_BALANCE,      \
        .last_balance           = jiffies,              \
        .balance_interval       = 1,                    \

There was no improvement in performance. 

I didn't expect any change in performance this time, because I
don't think the SD_WAKE_IDLE flag is effective in the NUMA
domain, due to the following code in wake_idle:

        for_each_domain(cpu, sd) {
                if (sd->flags & SD_WAKE_IDLE) {
                        cpus_and(tmp, sd->span, p->cpus_allowed);
                        for_each_cpu_mask(i, tmp) {
                                if (idle_cpu(i))
                                        return i;
                        }
                }
                else
                        break;
        }
 
If I read that loop correctly it stops at the first domain
which doesn't have SD_WAKE_IDLE set, which is the CPU domain
(see SD_CPU_INIT), and thus it never gets to the NUMA domain.

Thanks for the suggestions Nick.  Andrew raises some
good questions that I will address tomorrow.

Cheers,
Brian


  parent reply	other threads:[~2005-11-09  5:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-07 22:17 Database regression due to scheduler changes ? Brian Twichell
2005-11-07 22:35 ` David Lang
2005-11-07 23:06   ` Brian Twichell
2005-11-08  0:51     ` Nick Piggin
2005-11-08  1:15       ` Anton Blanchard
2005-11-08  1:34         ` Martin J. Bligh
2005-11-08  1:46           ` Nick Piggin
2005-11-08  1:48             ` Nick Piggin
2005-11-08  1:58             ` Martin J. Bligh
2005-11-08  2:04             ` David Lang
2005-11-08  2:12               ` Martin J. Bligh
2005-11-08  2:15               ` Nick Piggin
2005-11-09  5:03       ` Brian Twichell [this message]
     [not found]         ` <43718DFE.3040600@yahoo.com.au>
2005-11-14 23:03           ` Brian Twichell
2005-11-08  2:31   ` Byron Stanoszek
2005-11-07 22:47 ` linux-os (Dick Johnson)
2005-11-08  3:54   ` Nick Piggin
     [not found] <43715361.3070802@us.ibm.com>
2005-11-09  2:14 ` Andrew Theurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43718334.2090905@us.ibm.com \
    --to=tbrian@us.ibm.com \
    --cc=anton@samba.org \
    --cc=david.lang@digitalinsight.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@mbligh.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=slpratt@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.