From: Brian Twichell <tbrian@us.ibm.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Lang <david.lang@digitalinsight.com>,
linux-kernel@vger.kernel.org, mbligh@mbligh.org,
slpratt@us.ibm.com, anton@samba.org
Subject: Re: Database regression due to scheduler changes ?
Date: Tue, 08 Nov 2005 23:03:48 -0600 [thread overview]
Message-ID: <43718334.2090905@us.ibm.com> (raw)
In-Reply-To: <436FF6A6.1040708@yahoo.com.au>
Nick Piggin wrote:
>
> I think you are right that the NUMA domain is probably being too
> constrictive of task balancing, and that is where the regression
> is coming from.
>
> For some workloads it is definitely important to have the NUMA
> domain, because it helps spread load over memory controllers as
> well as CPUs - so I guess eliminating that domain is not a good
> long term solution.
>
> I would look at changing parameters of SD_NODE_INIT in include/
> asm-powerpc/topology.h so they are closer to SD_CPU_INIT parameters
> (ie. more aggressive).
I ran with the following:
--- topology.h.orig 2005-11-08 13:11:57.000000000 -0600
+++ topology.h 2005-11-08 13:17:15.000000000 -0600
@@ -43,11 +43,11 @@ static inline int node_to_first_cpu(int
.span = CPU_MASK_NONE, \
.parent = NULL, \
.groups = NULL, \
- .min_interval = 8, \
- .max_interval = 32, \
- .busy_factor = 32, \
+ .min_interval = 1, \
+ .max_interval = 4, \
+ .busy_factor = 64, \
.imbalance_pct = 125, \
- .cache_hot_time = (10*1000000), \
+ .cache_hot_time = (5*1000000/2), \
.cache_nice_tries = 1, \
.per_cpu_gain = 100, \
.flags = SD_LOAD_BALANCE \
There was no improvement in performance. The schedstats from this run
follow:
2516 sys_sched_yield()
0( 0.00%) found (only) active queue empty on current cpu
0( 0.00%) found (only) expired queue empty on current cpu
46( 1.83%) found both queues empty on current cpu
2470( 98.17%) found neither queue empty on current cpu
22969106 schedule()
694922 goes idle
3( 0.00%) switched active and expired queues
0( 0.00%) used existing active queue
0 active_load_balance()
0 sched_balance_exec()
0.19/1.28 avg runtime/latency over all cpus (ms)
[scheduler domain #0]
1153606 load_balance()
82580( 7.16%) called while idle
488( 0.59%) tried but failed to move any tasks
63876( 77.35%) found no busier group
18216( 22.06%) succeeded in moving at least one task
(average imbalance: 1.526)
317610( 27.53%) called while busy
15( 0.00%) tried but failed to move any tasks
220139( 69.31%) found no busier group
97456( 30.68%) succeeded in moving at least one task
(average imbalance: 1.752)
753416( 65.31%) called when newly idle
487( 0.06%) tried but failed to move any tasks
624132( 82.84%) found no busier group
128797( 17.10%) succeeded in moving at least one task
(average imbalance: 1.531)
0 sched_balance_exec() tried to push a task
[scheduler domain #1]
715638 load_balance()
68533( 9.58%) called while idle
3140( 4.58%) tried but failed to move any tasks
60357( 88.07%) found no busier group
5036( 7.35%) succeeded in moving at least one task
(average imbalance: 1.251)
22486( 3.14%) called while busy
64( 0.28%) tried but failed to move any tasks
21352( 94.96%) found no busier group
1070( 4.76%) succeeded in moving at least one task
(average imbalance: 1.922)
624619( 87.28%) called when newly idle
5218( 0.84%) tried but failed to move any tasks
591970( 94.77%) found no busier group
27431( 4.39%) succeeded in moving at least one task
(average imbalance: 1.382)
0 sched_balance_exec() tried to push a task
[scheduler domain #2]
685164 load_balance()
63247( 9.23%) called while idle
7280( 11.51%) tried but failed to move any tasks
52200( 82.53%) found no busier group
3767( 5.96%) succeeded in moving at least one task
(average imbalance: 1.361)
24729( 3.61%) called while busy
418( 1.69%) tried but failed to move any tasks
21025( 85.02%) found no busier group
3286( 13.29%) succeeded in moving at least one task
(average imbalance: 3.579)
597188( 87.16%) called when newly idle
67577( 11.32%) tried but failed to move any tasks
371377( 62.19%) found no busier group
158234( 26.50%) succeeded in moving at least one task
(average imbalance: 2.146)
0 sched_balance_exec() tried to push a task
>
> I would also take a look at removing SD_WAKE_IDLE from the flags.
> This flag should make balancing more aggressive, but it can have
> problems when applied to a NUMA domain due to too much task
> movement.
Independent from the run above, I ran with the following:
--- topology.h.orig 2005-11-08 19:32:19.000000000 -0600
+++ topology.h 2005-11-08 19:34:25.000000000 -0600
@@ -53,7 +53,6 @@ static inline int node_to_first_cpu(int
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_EXEC \
| SD_BALANCE_NEWIDLE \
- | SD_WAKE_IDLE \
| SD_WAKE_BALANCE, \
.last_balance = jiffies, \
.balance_interval = 1, \
There was no improvement in performance.
I didn't expect any change in performance this time, because I
don't think the SD_WAKE_IDLE flag is effective in the NUMA
domain, due to the following code in wake_idle:
for_each_domain(cpu, sd) {
if (sd->flags & SD_WAKE_IDLE) {
cpus_and(tmp, sd->span, p->cpus_allowed);
for_each_cpu_mask(i, tmp) {
if (idle_cpu(i))
return i;
}
}
else
break;
}
If I read that loop correctly it stops at the first domain
which doesn't have SD_WAKE_IDLE set, which is the CPU domain
(see SD_CPU_INIT), and thus it never gets to the NUMA domain.
Thanks for the suggestions Nick. Andrew raises some
good questions that I will address tomorrow.
Cheers,
Brian
next prev parent reply other threads:[~2005-11-09 5:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-07 22:17 Database regression due to scheduler changes ? Brian Twichell
2005-11-07 22:35 ` David Lang
2005-11-07 23:06 ` Brian Twichell
2005-11-08 0:51 ` Nick Piggin
2005-11-08 1:15 ` Anton Blanchard
2005-11-08 1:34 ` Martin J. Bligh
2005-11-08 1:46 ` Nick Piggin
2005-11-08 1:48 ` Nick Piggin
2005-11-08 1:58 ` Martin J. Bligh
2005-11-08 2:04 ` David Lang
2005-11-08 2:12 ` Martin J. Bligh
2005-11-08 2:15 ` Nick Piggin
2005-11-09 5:03 ` Brian Twichell [this message]
[not found] ` <43718DFE.3040600@yahoo.com.au>
2005-11-14 23:03 ` Brian Twichell
2005-11-08 2:31 ` Byron Stanoszek
2005-11-07 22:47 ` linux-os (Dick Johnson)
2005-11-08 3:54 ` Nick Piggin
[not found] <43715361.3070802@us.ibm.com>
2005-11-09 2:14 ` Andrew Theurer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43718334.2090905@us.ibm.com \
--to=tbrian@us.ibm.com \
--cc=anton@samba.org \
--cc=david.lang@digitalinsight.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@mbligh.org \
--cc=nickpiggin@yahoo.com.au \
--cc=slpratt@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox