From: Michael Wang <wangyun@linux.vnet.ibm.com>
To: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Paul Turner <pjt@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
alex.shi@intel.com, Ram Pai <linuxram@us.ibm.com>,
"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
Namhyung Kim <namhyung@kernel.org>
Subject: Re: [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair()
Date: Fri, 22 Feb 2013 14:42:38 +0800 [thread overview]
Message-ID: <5127135E.7030502@linux.vnet.ibm.com> (raw)
In-Reply-To: <1361509372.5817.60.camel@marge.simpson.net>
On 02/22/2013 01:02 PM, Mike Galbraith wrote:
> On Fri, 2013-02-22 at 10:36 +0800, Michael Wang wrote:
>> On 02/21/2013 05:43 PM, Mike Galbraith wrote:
>>> On Thu, 2013-02-21 at 17:08 +0800, Michael Wang wrote:
>>>
>>>> But is this patch set really cause regression on your Q6600? It may
>>>> sacrificed some thing, but I still think it will benefit far more,
>>>> especially on huge systems.
>>>
>>> We spread on FORK/EXEC, and will no longer will pull communicating tasks
>>> back to a shared cache with the new logic preferring to leave wakee
>>> remote, so while no, I haven't tested (will try to find round tuit) it
>>> seems it _must_ hurt. Dragging data from one llc to the other on Q6600
>>> hurts a LOT. Every time a client and server are cross llc, it's a huge
>>> hit. The previous logic pulled communicating tasks together right when
>>> it matters the most, intermittent load... or interactive use.
>>
>> I agree that this is a problem need to be solved, but don't agree that
>> wake_affine() is the solution.
>
> It's not perfect, but it's better than no countering force at all. It's
> a relic of the dark ages, when affine meant L2, ie this cpu. Now days,
> affine has a whole new meaning, L3, so it could be done differently, but
> _some_ kind of opposing force is required.
>
>> According to my understanding, in the old world, wake_affine() will only
>> be used if curr_cpu and prev_cpu share cache, which means they are in
>> one package, whatever search in llc sd of curr_cpu or prev_cpu, we won't
>> have the chance to spread the task out of that package.
>
> ? affine_sd is the first domain spanning both cpus, that may be NODE.
> True we won't ever spread in the wakeup path unless SD_WAKE_BALANCE is
> set that is. Would be nice to be able to do that without shredding
> performance.
That's right, we need two conditions in each select instance:
1. prev_cpu and curr_cpu are not affine
2. SD_WAKE_BALANCE
>
> Off the top of my pointy head, I can think of a way to _maybe_ improve
> the "affine" wakeup criteria: Add a small (package size? and very fast)
> FIFO queue to task struct, record waker/wakee relationship. If
> relationship exists in that queue (rbtree), try to wake local, if not,
> wake remote. The thought is to identify situations ala 1:N pgbench
> where you really need to keep the load spread. That need arises when
> the sum wakees + waker won't fit in one cache. True buddies would
> always hit (hm, hit rate), always try to become affine where they
> thrive. 1:N stuff starts missing when client count exceeds package
> size, starts expanding it's horizons. 'Course you would still need to
> NAK if imbalanced too badly, and let NUMA stuff NAK touching lard-balls
> and whatnot. With a little more smarts, we could have happy 1:N, and
> buddies don't have to chat through 2m thick walls to make 1:N scale as
> well as it can before it dies of stupidity.
So this is trying to take care the condition when curr_cpu(local) and
prev_cpu(remote) are on different nodes, which in the old world,
wake_affine() won't be invoked, correct?
Hmm...I think this maybe a good additional checking before enter balance
path, but I could not estimate the cost to record the relationship at
this moment of time...
Whatever, after applied the affine logical into new world, it will gain
the ability to spread tasks cross nodes just like the old world, your
idea may be an optimization, but the logical is out of the changing in
this patch set, which means if it benefits, the beneficiary will be not
only new but also old.
Regards,
Michael Wang
>
> -Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2013-02-22 6:42 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-29 9:08 [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair() Michael Wang
2013-01-29 9:09 ` [RFC PATCH v3 1/3] sched: schedule balance map foundation Michael Wang
2013-02-20 13:21 ` Peter Zijlstra
2013-02-21 4:52 ` Michael Wang
2013-02-20 13:25 ` Peter Zijlstra
2013-02-21 4:58 ` Michael Wang
2013-02-21 11:37 ` Peter Zijlstra
2013-02-22 2:53 ` Michael Wang
2013-02-22 3:33 ` Alex Shi
2013-02-22 4:19 ` Michael Wang
2013-02-22 4:46 ` Alex Shi
2013-02-22 5:05 ` Michael Wang
2013-01-29 9:09 ` [RFC PATCH v3 2/3] sched: build schedule balance map Michael Wang
2013-01-29 9:10 ` [RFC PATCH v3 3/3] sched: simplify select_task_rq_fair() with " Michael Wang
2013-02-18 5:52 ` [RFC PATCH v3 0/3] sched: simplify the select_task_rq_fair() Michael Wang
2013-02-20 10:49 ` Ingo Molnar
2013-02-20 13:32 ` Peter Zijlstra
2013-02-20 14:05 ` Mike Galbraith
2013-02-21 5:21 ` Michael Wang
2013-02-21 5:14 ` Michael Wang
2013-02-21 4:51 ` Michael Wang
2013-02-21 6:11 ` Mike Galbraith
2013-02-21 7:00 ` Michael Wang
2013-02-21 8:10 ` Mike Galbraith
2013-02-21 9:08 ` Michael Wang
2013-02-21 9:43 ` Mike Galbraith
2013-02-22 2:36 ` Michael Wang
2013-02-22 5:02 ` Mike Galbraith
2013-02-22 5:26 ` Michael Wang
2013-02-22 6:13 ` Mike Galbraith
2013-02-22 6:42 ` Michael Wang [this message]
2013-02-22 8:17 ` Mike Galbraith
2013-02-22 8:35 ` Michael Wang
2013-02-22 8:21 ` Peter Zijlstra
2013-02-22 9:10 ` Michael Wang
2013-02-22 9:39 ` Peter Zijlstra
2013-02-22 9:58 ` Michael Wang
2013-02-21 9:20 ` Michael Wang
2013-02-21 10:20 ` Peter Zijlstra
2013-02-22 2:37 ` Michael Wang
2013-02-22 5:08 ` Mike Galbraith
2013-02-22 6:06 ` Michael Wang
2013-02-22 6:19 ` Mike Galbraith
2013-02-22 8:36 ` Peter Zijlstra
2013-02-22 9:11 ` Michael Wang
2013-02-22 9:57 ` Peter Zijlstra
2013-02-22 10:08 ` Michael Wang
2013-02-22 9:40 ` Mike Galbraith
2013-02-22 9:54 ` Ingo Molnar
2013-02-22 10:01 ` Mike Galbraith
2013-02-22 12:11 ` Ingo Molnar
2013-02-22 12:35 ` Mike Galbraith
2013-02-22 13:06 ` Ingo Molnar
2013-02-22 14:30 ` Mike Galbraith
2013-02-22 14:42 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5127135E.7030502@linux.vnet.ibm.com \
--to=wangyun@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxram@us.ibm.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=nikunj@linux.vnet.ibm.com \
--cc=pjt@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).