All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Mike Galbraith <efault@gmx.de>,
	Alex Shi <alex.shi@intel.com>, Namhyung Kim <namhyung@kernel.org>,
	Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	Ram Pai <linuxram@us.ibm.com>
Subject: Re: [PATCH] sched: smart wake-affine
Date: Tue, 2 Jul 2013 10:52:02 +0200	[thread overview]
Message-ID: <20130702085202.GA23916@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <51D25A80.8090406@linux.vnet.ibm.com>

On Tue, Jul 02, 2013 at 12:43:44PM +0800, Michael Wang wrote:

> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> ---
>  include/linux/sched.h |    3 +++
>  kernel/sched/fair.c   |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 178a8d9..1c996c7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1041,6 +1041,9 @@ struct task_struct {
>  #ifdef CONFIG_SMP
>  	struct llist_node wake_entry;
>  	int on_cpu;
> +	struct task_struct *last_wakee;
> +	unsigned long nr_wakee_switch;
> +	unsigned long last_switch_decay;
>  #endif
>  	int on_rq;
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c61a614..591c113 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3109,6 +3109,45 @@ static inline unsigned long effective_load(struct task_group *tg, int cpu,
>  
>  #endif
>  
> +static void record_wakee(struct task_struct *p)
> +{
> +	/*
> +	 * Rough decay, don't worry about the boundary, really active
> +	 * task won't care the loose.
> +	 */

OK so we 'decay' once a second.

> +	if (jiffies > current->last_switch_decay + HZ) {
> +		current->nr_wakee_switch = 0;
> +		current->last_switch_decay = jiffies;
> +	}

This isn't so much a decay as it is wiping state. Did you try an actual
decay -- something like: current->nr_wakee_switch >>= 1; ?

I suppose you wanted to avoid something like:

  now = jiffies;
  while (now > current->last_switch_decay + HZ) {
  	current->nr_wakee_switch >>= 1;
	current->last_switch_decay += HZ;
  }

?

And we increment every time we wake someone else. Gaining a measure of
how often we wake someone else.

> +	if (current->last_wakee != p) {
> +		current->last_wakee = p;
> +		current->nr_wakee_switch++;
> +	}
> +}
> +
> +static int nasty_pull(struct task_struct *p)

I've seen there's some discussion as to this function name.. good :-) It
really wants to change. How about something like:

int wake_affine()
{
  ...

  /*
   * If we wake multiple tasks be careful to not bounce
   * ourselves around too much.
   */
  if (wake_wide(p))
  	return 0;


> +{
> +	int factor = cpumask_weight(cpu_online_mask);

We have num_cpus_online() for this.. however both are rather expensive.
Having to walk and count a 4096 bitmap for every wakeup if going to get
tiresome real quick.

I suppose the question is; to what level do we really want to scale?

One fair answer would be node size I suppose; do you really want to go
bigger than that?

Also; you compare a size against a switching frequency, that's not
really and apples to apples comparison.

> +
> +	/*
> +	 * Yeah, it's the switching-frequency, could means many wakee or
> +	 * rapidly switch, use factor here will just help to automatically
> +	 * adjust the loose-degree, so more cpu will lead to more pull.
> +	 */
> +	if (p->nr_wakee_switch > factor) {
> +		/*
> +		 * wakee is somewhat hot, it needs certain amount of cpu
> +		 * resource, so if waker is far more hot, prefer to leave
> +		 * it alone.
> +		 */
> +		if (current->nr_wakee_switch > (factor * p->nr_wakee_switch))
> +			return 1;

Ah ok, this makes more sense; the first is simply a filter to avoid
doing the second dereference I suppose.

> +	}
> +
> +	return 0;
> +}
> +
>  static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>  {
>  	s64 this_load, load;
> @@ -3118,6 +3157,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>  	unsigned long weight;
>  	int balanced;
>  
> +	if (nasty_pull(p))
> +		return 0;
> +
>  	idx	  = sd->wake_idx;
>  	this_cpu  = smp_processor_id();
>  	prev_cpu  = task_cpu(p);
> @@ -3410,6 +3452,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>  		/* while loop will break here if sd == NULL */
>  	}
>  unlock:
> +	if (sd_flag & SD_BALANCE_WAKE)
> +		record_wakee(p);

if we put this in task_waking_fair() we can avoid an entire conditional!


  parent reply	other threads:[~2013-07-02  8:52 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-28  5:05 [RFC PATCH] sched: smart wake-affine Michael Wang
2013-06-03  2:28 ` Michael Wang
2013-06-03  3:09   ` Mike Galbraith
2013-06-03  3:26     ` Michael Wang
2013-06-03  3:53       ` Mike Galbraith
2013-06-03  4:52         ` Michael Wang
2013-06-03  5:22           ` Mike Galbraith
2013-06-03  5:50             ` Michael Wang
2013-06-03  6:05               ` Mike Galbraith
2013-06-03  6:31                 ` Michael Wang
2013-06-13  3:09 ` Michael Wang
2013-07-02  4:43 ` [PATCH] " Michael Wang
2013-07-02  5:38   ` Mike Galbraith
2013-07-02  5:50     ` Michael Wang
2013-07-02  5:54   ` Mike Galbraith
2013-07-02  6:17     ` Michael Wang
2013-07-02  6:29       ` Mike Galbraith
2013-07-02  6:45         ` Michael Wang
2013-07-02  8:52   ` Peter Zijlstra [this message]
2013-07-02  9:35     ` Michael Wang
2013-07-02  9:44       ` Michael Wang
2013-07-04  9:13       ` Peter Zijlstra
2013-07-04  9:38         ` Michael Wang
2013-07-04 10:33           ` Mike Galbraith
2013-07-05  2:47             ` Michael Wang
2013-07-05  4:08               ` Mike Galbraith
2013-07-05  4:33                 ` Michael Wang
2013-07-05  5:41                   ` Mike Galbraith
2013-07-05  6:16                     ` Michael Wang
2013-07-07  6:43                       ` Mike Galbraith
2013-07-08  2:49                         ` Michael Wang
2013-07-08  3:12                           ` Mike Galbraith
2013-07-08  8:21                         ` Peter Zijlstra
2013-07-08  8:49                           ` Mike Galbraith
2013-07-08  9:08                             ` Michael Wang
2013-07-08  8:58                           ` Michael Wang
2013-07-08 18:59                           ` Davidlohr Bueso
2013-07-09  2:30                             ` Michael Wang
2013-07-09  2:36                               ` Davidlohr Bueso
2013-07-09  2:52                                 ` Michael Wang
2013-07-15  5:13                                   ` Michael Wang
2013-07-15  5:57                                     ` Davidlohr Bueso
2013-07-15  6:01                                       ` Michael Wang
2013-07-18  2:15                                       ` Michael Wang
2013-07-03  6:10   ` [PATCH v2] " Michael Wang
2013-07-03  8:50     ` Peter Zijlstra
2013-07-03  9:11       ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130702085202.GA23916@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.