From: Michael Neuling <mikey@neuling.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
ego@in.ibm.com
Subject: Re: [PATCHv4 2/2] powerpc: implement arch_scale_smt_power for Power7
Date: Thu, 18 Feb 2010 09:20:58 +1100 [thread overview]
Message-ID: <25851.1266445258@neuling.org> (raw)
In-Reply-To: <1266142340.5273.418.camel@laptop>
> Suppose for a moment we have 2 threads (hot-unplugged thread 1 and 3, we
> can construct an equivalent but more complex example for 4 threads), and
> we have 4 tasks, 3 SCHED_OTHER of equal nice level and 1 SCHED_FIFO, the
> SCHED_FIFO task will consume exactly 50% walltime of whatever cpu it
> ends up on.
>
> In that situation, provided that each cpu's cpu_power is of equal
> measure, scale_rt_power() ensures that we run 2 SCHED_OTHER tasks on the
> cpu that doesn't run the RT task, and 1 SCHED_OTHER task next to the RT
> task, so that each task consumes 50%, which is all fair and proper.
>
> However, if you do the above, thread 0 will have +75% = 1.75 and thread
> 2 will have -75% = 0.25, then if the RT task will land on thread 0,
> we'll be having: 0.875 vs 0.25, or on thread 3, 1.75 vs 0.125. In either
> case thread 0 will receive too many (if not all) SCHED_OTHER tasks.
>
> That is, unless these threads 2 and 3 really are _that_ weak, at which
> point one wonders why IBM bothered with the silicon ;-)
Peter,
2 & 3 aren't weaker than 0 & 1 but....
The core has dynamic SMT mode switching which is controlled by the
hypervisor (IBM's PHYP). There are 3 SMT modes:
SMT1 uses thread 0
SMT2 uses threads 0 & 1
SMT4 uses threads 0, 1, 2 & 3
When in any particular SMT mode, all threads have the same performance
as each other (ie. at any moment in time, all threads perform the same).
The SMT mode switching works such that when linux has threads 2 & 3 idle
and 0 & 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the
idle loop and the hypervisor will automatically switch to SMT2 for that
core (independent of other cores). The opposite is not true, so if
threads 0 & 1 are idle and 2 & 3 are active, we will stay in SMT4 mode.
Similarly if thread 0 is active and threads 1, 2 & 3 are idle, we'll go
into SMT1 mode.
If we can get the core into a lower SMT mode (SMT1 is best), the threads
will perform better (since they share less core resources). Hence when
we have idle threads, we want them to be the higher ones.
So to answer your question, threads 2 and 3 aren't weaker than the other
threads when in SMT4 mode. It's that if we idle threads 2 & 3, threads
0 & 1 will speed up since we'll move to SMT2 mode.
I'm pretty vague on linux scheduler details, so I'm a bit at sea as to
how to solve this. Can you suggest any mechanisms we currently have in
the kernel to reflect these properties, or do you think we need to
develop something new? If so, any pointers as to where we should look?
Thanks,
Mikey
next prev parent reply other threads:[~2010-02-17 22:20 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-20 20:00 [PATCH 0/2] sched: arch_scale_smt_powers Joel Schopp
2010-01-20 20:02 ` [PATCH 1/2] sched: Fix the place where group powers are updated Joel Schopp
2010-01-26 23:28 ` [PATCHv2 1/2] sched: enable ARCH_POWER Joel Schopp
2010-01-28 23:20 ` [PATCHv3 " Joel Schopp
2010-02-05 20:57 ` [PATCHv4 " Joel Schopp
2010-01-20 20:04 ` [PATCH 2/2] powerpc: implement arch_scale_smt_power for Power7 Joel Schopp
2010-01-20 20:48 ` Peter Zijlstra
2010-01-20 21:58 ` Michael Neuling
2010-01-20 22:44 ` Joel Schopp
2010-01-21 8:27 ` Peter Zijlstra
2010-01-20 21:04 ` Michael Neuling
2010-01-20 22:09 ` Joel Schopp
2010-01-24 3:00 ` Benjamin Herrenschmidt
2010-01-25 17:50 ` Joel Schopp
2010-01-26 4:23 ` Benjamin Herrenschmidt
2010-01-20 21:33 ` Benjamin Herrenschmidt
2010-01-20 22:36 ` Joel Schopp
2010-01-26 23:28 ` [PATCHv2 " Joel Schopp
2010-01-27 0:52 ` Benjamin Herrenschmidt
2010-01-28 22:39 ` Joel Schopp
2010-01-29 1:23 ` Benjamin Herrenschmidt
2010-01-28 23:20 ` [PATCHv3 " Joel Schopp
2010-01-28 23:24 ` Joel Schopp
2010-01-29 1:23 ` Benjamin Herrenschmidt
2010-01-29 10:13 ` Peter Zijlstra
2010-01-29 18:34 ` Joel Schopp
2010-01-29 18:41 ` Joel Schopp
2010-02-05 20:57 ` [PATCHv4 " Joel Schopp
2010-02-14 10:12 ` Peter Zijlstra
2010-02-17 22:20 ` Michael Neuling [this message]
2010-02-18 13:17 ` Peter Zijlstra
2010-02-18 13:19 ` Peter Zijlstra
2010-02-18 16:28 ` Joel Schopp
2010-02-18 17:08 ` Peter Zijlstra
2010-02-19 6:05 ` Michael Neuling
2010-02-19 10:01 ` Peter Zijlstra
2010-02-19 11:01 ` Michael Neuling
2010-02-23 6:08 ` Michael Neuling
2010-02-23 16:24 ` Peter Zijlstra
2010-02-23 16:30 ` Peter Zijlstra
2010-02-24 6:07 ` Michael Neuling
2010-02-24 11:13 ` Michael Neuling
2010-02-24 11:58 ` Michael Neuling
2010-02-27 10:21 ` Michael Neuling
2010-03-02 14:44 ` Peter Zijlstra
2010-03-04 22:28 ` Michael Neuling
2010-01-29 12:25 ` [PATCHv3 " Gabriel Paubert
2010-01-29 16:26 ` Joel Schopp
2010-01-26 23:27 ` [PATCHv2 0/2] sched: arch_scale_smt_powers v2 Joel Schopp
2010-01-28 23:20 ` [PATCHv3 0/2] sched: arch_scale_smt_powers Joel Schopp
2010-02-05 20:57 ` [PATCHv4 " Joel Schopp
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25851.1266445258@neuling.org \
--to=mikey@neuling.org \
--cc=ego@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).