All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Wang <wangyun@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>,
	Namhyung Kim <namhyung@kernel.org>, Alex Shi <alex.shi@intel.com>,
	Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	Ram Pai <linuxram@us.ibm.com>
Subject: Re: [PATCH] sched: wakeup buddy
Date: Mon, 11 Mar 2013 09:21:05 +0100	[thread overview]
Message-ID: <20130311082105.GB12742@gmail.com> (raw)
In-Reply-To: <1362645372.2606.11.camel@laptop>


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2013-03-06 at 15:06 +0800, Michael Wang wrote:
> 
> > wake_affine() stuff is trying to bind related tasks closely, but it 
> > doesn't work well according to the test on 'perf bench sched pipe' 
> > (thanks to Peter).
> 
> so sched-pipe is a poor benchmark for this..
> 
> Ideally we'd write a new benchmark that has some actual data footprint 
> and we'd measure the cost of tasks being apart on the various cache 
> metrics and see what affine wakeup does for it.

Ideally we'd offer applications a new, lightweight vsyscall:

   void sys_sched_work_tick(void)

Or, to speed up adoption, a new, vsyscall-accelerated prctrl():

   prctl(PR_WORK_TICK);

which applications could call in each basic work unit they are performing.

Sysbench would call it for every transaction completed, sched-pipe would 
call it for every pipe message sent, hackbench would call it for messages, 
etc. etc.

This is a minimal application level change, but gives *huge* information 
to the scheduler: we could balance tasks to maximize their observed work 
rate.

The scheduler could also do other things, like observe the wakeup/sleep 
patterns within a 'work atom', observe execution overlap between work 
atoms and place tasks accordingly, etc. etc.

Today we approximate work atoms by saying that scheduling atoms == work 
atoms. But that approximation breaks down in a number of important cases.

If we had such a design we'd be able to fix pretty much everything, 
without the catch-22 problems we are facing normally.

An added bonus would be increased instrumentation: we could trace, time, 
profile work atom rates and could collect work atom profiles. We see work 
atom execution histograms, etc. etc. - stuff that is simply not possible 
today without extensive application-dependent instrumentation.

We could also use utrace scripts to define work atoms without modifying 
the application: for many applications we know which particular function 
call means that a basic work unit was completed.

I have actually written the prctl() approach before, for instrumentation 
purposes, and it does wonders to system analysis.

Any objections?

Thanks,

	Ingo

  parent reply	other threads:[~2013-03-11  8:21 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-06  7:06 [PATCH] sched: wakeup buddy Michael Wang
2013-03-07  8:36 ` Peter Zijlstra
2013-03-07  9:43   ` Mike Galbraith
2013-03-08  2:37     ` Michael Wang
2013-03-08  6:44       ` Mike Galbraith
2013-03-08  7:30         ` Michael Wang
2013-03-08  8:26           ` Mike Galbraith
2013-03-11  2:42             ` Michael Wang
2013-03-07  9:46   ` Michael Wang
2013-03-07 16:52     ` Peter Zijlstra
2013-03-08  2:31       ` Michael Wang
2013-03-11  8:21   ` Ingo Molnar [this message]
2013-03-11  9:14     ` Michael Wang
2013-03-11  9:40       ` Ingo Molnar
2013-03-12  6:00         ` Michael Wang
2013-03-12  8:48           ` Ingo Molnar
2013-03-12  9:41             ` Michael Wang
2013-03-07 17:21 ` Peter Zijlstra
2013-03-08  2:33   ` Michael Wang
2013-03-07 17:27 ` Peter Zijlstra
2013-03-08  2:50   ` Michael Wang
2013-03-11 10:36     ` Peter Zijlstra
2013-03-12  3:23       ` Michael Wang
2013-03-12 10:08         ` Peter Zijlstra
2013-03-13  3:07           ` Michael Wang
2013-03-14 10:58             ` Peter Zijlstra
2013-03-15  6:24               ` Michael Wang
2013-03-18  3:26                 ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130311082105.GB12742@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=namhyung@kernel.org \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.