Re: [PATCH] sched: wakeup buddy

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Mike Galbraith <efault@gmx.de>,
	Namhyung Kim <namhyung@kernel.org>, Alex Shi <alex.shi@intel.com>,
	Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	Ram Pai <linuxram@us.ibm.com>
Subject: Re: [PATCH] sched: wakeup buddy
Date: Thu, 07 Mar 2013 17:52:14 +0100	[thread overview]
Message-ID: <1362675134.10972.21.camel@laptop> (raw)
In-Reply-To: <51386207.5040808@linux.vnet.ibm.com>

On Thu, 2013-03-07 at 17:46 +0800, Michael Wang wrote:

> On 03/07/2013 04:36 PM, Peter Zijlstra wrote:
> > On Wed, 2013-03-06 at 15:06 +0800, Michael Wang wrote:
> > 
> >> wake_affine() stuff is trying to bind related tasks closely, but it doesn't
> >> work well according to the test on 'perf bench sched pipe' (thanks to Peter).
> > 
> > so sched-pipe is a poor benchmark for this.. 
> > 
> > Ideally we'd write a new benchmark that has some actual data footprint
> > and we'd measure the cost of tasks being apart on the various cache
> > metrics and see what affine wakeup does for it.
> 
> I think sched-pipe is still somewhat capable, 

Yeah, its not entirely crap for this, but its not ideal either. The
very big difference you see between it running on a single cpu and on
say two threads of a single core is mostly due to preemption
'artifacts' though. Not because of cache.

So we have 2 tasks -- lets call then A and B -- involved in a single
word ping-pong. So we're both doing write(); read(); loops. Now what
happens on a single cpu is that A's write()->wakeup() of B makes B
preempt A before A hits read() and blocks. This in turn ensures that
B's write()->wakeup() of A finds an already running A and doesn't
actually need to do the full (and expensive) wakeup thing (and vice
versa).

So by constantly preempting one another they avoid the expensive bit of
going to sleep and waking up again.

wake_affine() OTOH still has a (supposed) benefit if it gets the tasks
running 'closer' (in a cache hierarchy sense) since then the data
sharing is less expensive.

> the problem is that the
> select_idle_sibling() doesn't take care the wakeup related case, it
> doesn't contain the logical to locate an idle cpu closely.

I'm not entirely sure if I understand what you mean, do you mean to say
its idea of 'closely' is not quite correct? If so, I tend to agree, see
further down.

> So even we detect the relationship successfully, select_idle_sibling()
> can only help to make sure the target cpu won't be outside of the
> current package, it's a package level bind, not mc or smp level.

That is the entire point of select_idle_sibling(), selecting a cpu
'near' the target cpu that is currently idle.

Not too long ago we had a bit of a discussion on the unholy mess that
is select_idle_sibling() and if it actually does the right thing.
Arguably it doesn't for machines that have an effective L2 cache. The 
issue is that the arch<->sched interface only knows about
last-level-cache (L3 on anything modern) and SMT.

Expanding the topology description in a way that makes sense (and
doesn't make it a bigger mess) is somewhere on the todo-list.

> > Before doing something like what you're proposing, I'd have a hard look
> > at WF_SYNC, it is possible we should disable/fix select_idle_sibling
> > for sync wakeups.
> 
> The patch is supposed to stop using wake_affine() blindly, not improve
> the wake_affine() stuff itself, the whole stuff still works, but since
> we rely on select_idle_sibling() to make the choice, the benefit is not
> so significant, especially on my one node box...

OK, I'll have to go read the actual patch for that, I'll get back to
you on that :-)

> > The idea behind sync wakeups is that we try and detect the case where
> > we wakeup up one task only to go to sleep ourselves and try and avoid
> > the regular ping-pong this would otherwise create on account of the
> > waking task still being alive and so the current cpu isn't actually
> > idle yet but we know its going to be idle soon.
> 
> Are you suggesting that we should separate the process of wakeup related
> case, not just pass current cpu to select_idle_sibling()?

Depends a bit on what you're trying to fix, so far I'm just trying to
write down what I remember about stuff and reacting to half-read
changelogs ;-)

next prev parent reply	other threads:[~2013-03-07 16:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-06  7:06 [PATCH] sched: wakeup buddy Michael Wang
2013-03-07  8:36 ` Peter Zijlstra
2013-03-07  9:43   ` Mike Galbraith
2013-03-08  2:37     ` Michael Wang
2013-03-08  6:44       ` Mike Galbraith
2013-03-08  7:30         ` Michael Wang
2013-03-08  8:26           ` Mike Galbraith
2013-03-11  2:42             ` Michael Wang
2013-03-07  9:46   ` Michael Wang
2013-03-07 16:52     ` Peter Zijlstra [this message]
2013-03-08  2:31       ` Michael Wang
2013-03-11  8:21   ` Ingo Molnar
2013-03-11  9:14     ` Michael Wang
2013-03-11  9:40       ` Ingo Molnar
2013-03-12  6:00         ` Michael Wang
2013-03-12  8:48           ` Ingo Molnar
2013-03-12  9:41             ` Michael Wang
2013-03-07 17:21 ` Peter Zijlstra
2013-03-08  2:33   ` Michael Wang
2013-03-07 17:27 ` Peter Zijlstra
2013-03-08  2:50   ` Michael Wang
2013-03-11 10:36     ` Peter Zijlstra
2013-03-12  3:23       ` Michael Wang
2013-03-12 10:08         ` Peter Zijlstra
2013-03-13  3:07           ` Michael Wang
2013-03-14 10:58             ` Peter Zijlstra
2013-03-15  6:24               ` Michael Wang
2013-03-18  3:26                 ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1362675134.10972.21.camel@laptop \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox