Re: [RFC PATCH] sched: smart wake-affine

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Wang <wangyun@linux.vnet.ibm.com>
To: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>, Alex Shi <alex.shi@intel.com>,
	Namhyung Kim <namhyung@kernel.org>, Paul Turner <pjt@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Nikunj A. Dadhania" <nikunj@linux.vnet.ibm.com>,
	Ram Pai <linuxram@us.ibm.com>
Subject: Re: [RFC PATCH] sched: smart wake-affine
Date: Mon, 03 Jun 2013 10:28:58 +0800	[thread overview]
Message-ID: <51ABFF6A.60206@linux.vnet.ibm.com> (raw)
In-Reply-To: <51A43B16.9080801@linux.vnet.ibm.com>

On 05/28/2013 01:05 PM, Michael Wang wrote:
> wake-affine stuff is always trying to pull wakee close to waker, by theory,
> this will bring benefit if waker's cpu cached hot data for wakee, or the
> extreme ping-pong case.
> 
> And testing show it could benefit hackbench 15% at most.
> 
> However, the whole stuff is somewhat blindly and time-consuming, some
> workload therefore suffer.
> 
> And testing show it could damage pgbench 50% at most.
> 
> Thus, wake-affine stuff should be smarter, and realise when to stop
> it's thankless effort.

Is there any comments?

Peter, do you have any comments on this idea? Is this the kind of fix we
are looking for? I think you mentioned we want some kind of filter
rather than the knob, correct?

Folks, please let me know your concerns so I could help on the research
work :)

Regards,
Michael Wang


> 
> This patch introduced per task 'nr_wakee_switch', which will be increased
> each time the task switch it's wakee.
> 
> So a high 'nr_wakee_switch' means the task has more than one wakee, and
> less the wakee number, higher the wakeup frequency.
> 
> Now when making the decision on whether to pull or not, pay attention on
> the wakee with a high 'nr_wakee_switch', pull such task may benefit wakee,
> but that imply waker will face cruel competition later, it could be very
> crule or very fast depends on the story behind 'nr_wakee_switch', whatever,
> waker therefore suffer.
> 
> Furthermore, if waker also has a high 'nr_wakee_switch', that imply multiple
> tasks rely on it, waker's higher latency will damage all those tasks, pull
> wakee in such cases seems to be a bad deal.
> 
> Thus, when 'waker->nr_wakee_switch / wakee->nr_wakee_switch' become higher
> and higher, the deal seems to be worse and worse.
> 
> This patch therefore help wake-affine stuff to stop it's work when:
> 
> 	wakee->nr_wakee_switch > factor &&
> 	waker->nr_wakee_switch > (factor * wakee->nr_wakee_switch)
> 
> The factor here is the online cpu number, so more cpu will lead to more pull
> since the trial become more severe.
> 
> After applied the patch, pgbench show 42% improvement at most.
> 
> Test:
> 	Test with 12 cpu X86 server and tip 3.10.0-rc1.
> 
> 				base	smart
> 
> 	| db_size | clients |  tps  | |  tps  |
> 	+---------+---------+-------+ +-------+
> 	| 21 MB   |       1 | 10749 | | 10337 |
> 	| 21 MB   |       2 | 21382 | | 21391 |
> 	| 21 MB   |       4 | 41570 | | 41808 |
> 	| 21 MB   |       8 | 52828 | | 58792 |
> 	| 21 MB   |      12 | 48447 | | 54553 |
> 	| 21 MB   |      16 | 46246 | | 56726 |	+22.66%
> 	| 21 MB   |      24 | 43850 | | 56853 |	+29.65%
> 	| 21 MB   |      32 | 43455 | | 55846 |	+28.51%
> 	| 7483 MB |       1 |  9290 | |  8848 |
> 	| 7483 MB |       2 | 19347 | | 19351 |
> 	| 7483 MB |       4 | 37135 | | 37511 |
> 	| 7483 MB |       8 | 47310 | | 50210 |
> 	| 7483 MB |      12 | 42721 | | 49396 |
> 	| 7483 MB |      16 | 41016 | | 51826 |	+26.36%
> 	| 7483 MB |      24 | 37540 | | 52579 |	+40.06%
> 	| 7483 MB |      32 | 36756 | | 51332 |	+39.66%
> 	| 15 GB   |       1 |  8758 | |  8670 |
> 	| 15 GB   |       2 | 19204 | | 19249 |
> 	| 15 GB   |       4 | 36997 | | 37199 |
> 	| 15 GB   |       8 | 46578 | | 50681 |
> 	| 15 GB   |      12 | 42141 | | 48671 |
> 	| 15 GB   |      16 | 40518 | | 51280 |	+26.56%
> 	| 15 GB   |      24 | 36788 | | 52329 |	+42.24%
> 	| 15 GB   |      32 | 36056 | | 50350 | +39.64%
> 
> 
> 
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> ---
>  include/linux/sched.h |    3 +++
>  kernel/sched/fair.c   |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 178a8d9..1c996c7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1041,6 +1041,9 @@ struct task_struct {
>  #ifdef CONFIG_SMP
>  	struct llist_node wake_entry;
>  	int on_cpu;
> +	struct task_struct *last_wakee;
> +	unsigned long nr_wakee_switch;
> +	unsigned long last_switch_decay;
>  #endif
>  	int on_rq;
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f62b16d..eaaceb7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3127,6 +3127,45 @@ static inline unsigned long effective_load(struct task_group *tg, int cpu,
>  
>  #endif
>  
> +static void record_wakee(struct task_struct *p)
> +{
> +	/*
> +	 * Rough decay, don't worry about the boundary, really active
> +	 * task won't care the loose.
> +	 */
> +	if (jiffies > current->last_switch_decay + HZ) {
> +		current->nr_wakee_switch = 0;
> +		current->last_switch_decay = jiffies;
> +	}
> +
> +	if (current->last_wakee != p) {
> +		current->last_wakee = p;
> +		current->nr_wakee_switch++;
> +	}
> +}
> +
> +static int nasty_pull(struct task_struct *p)
> +{
> +	int factor = cpumask_weight(cpu_online_mask);
> +
> +	/*
> +	 * Yeah, it's the switching-frequency, could means many wakee or
> +	 * rapidly switch, use factor here will just help to automatically
> +	 * adjust the loose-degree, so more cpu will lead to more pull.
> +	 */
> +	if (p->nr_wakee_switch > factor) {
> +		/*
> +		 * wakee is somewhat hot, it needs certain amount of cpu
> +		 * resource, so if waker is far more hot, prefer to leave
> +		 * it alone.
> +		 */
> +		if (current->nr_wakee_switch > (factor * p->nr_wakee_switch))
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
>  static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>  {
>  	s64 this_load, load;
> @@ -3136,6 +3175,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>  	unsigned long weight;
>  	int balanced;
>  
> +	if (nasty_pull(p))
> +		return 0;
> +
>  	idx	  = sd->wake_idx;
>  	this_cpu  = smp_processor_id();
>  	prev_cpu  = task_cpu(p);
> @@ -3428,6 +3470,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>  		/* while loop will break here if sd == NULL */
>  	}
>  unlock:
> +	if (sd_flag & SD_BALANCE_WAKE)
> +		record_wakee(p);
> +
>  	rcu_read_unlock();
>  
>  	return new_cpu;
>

next prev parent reply	other threads:[~2013-06-03  2:29 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-28  5:05 [RFC PATCH] sched: smart wake-affine Michael Wang
2013-06-03  2:28 ` Michael Wang [this message]
2013-06-03  3:09   ` Mike Galbraith
2013-06-03  3:26     ` Michael Wang
2013-06-03  3:53       ` Mike Galbraith
2013-06-03  4:52         ` Michael Wang
2013-06-03  5:22           ` Mike Galbraith
2013-06-03  5:50             ` Michael Wang
2013-06-03  6:05               ` Mike Galbraith
2013-06-03  6:31                 ` Michael Wang
2013-06-13  3:09 ` Michael Wang
2013-07-02  4:43 ` [PATCH] " Michael Wang
2013-07-02  5:38   ` Mike Galbraith
2013-07-02  5:50     ` Michael Wang
2013-07-02  5:54   ` Mike Galbraith
2013-07-02  6:17     ` Michael Wang
2013-07-02  6:29       ` Mike Galbraith
2013-07-02  6:45         ` Michael Wang
2013-07-02  8:52   ` Peter Zijlstra
2013-07-02  9:35     ` Michael Wang
2013-07-02  9:44       ` Michael Wang
2013-07-04  9:13       ` Peter Zijlstra
2013-07-04  9:38         ` Michael Wang
2013-07-04 10:33           ` Mike Galbraith
2013-07-05  2:47             ` Michael Wang
2013-07-05  4:08               ` Mike Galbraith
2013-07-05  4:33                 ` Michael Wang
2013-07-05  5:41                   ` Mike Galbraith
2013-07-05  6:16                     ` Michael Wang
2013-07-07  6:43                       ` Mike Galbraith
2013-07-08  2:49                         ` Michael Wang
2013-07-08  3:12                           ` Mike Galbraith
2013-07-08  8:21                         ` Peter Zijlstra
2013-07-08  8:49                           ` Mike Galbraith
2013-07-08  9:08                             ` Michael Wang
2013-07-08  8:58                           ` Michael Wang
2013-07-08 18:59                           ` Davidlohr Bueso
2013-07-09  2:30                             ` Michael Wang
2013-07-09  2:36                               ` Davidlohr Bueso
2013-07-09  2:52                                 ` Michael Wang
2013-07-15  5:13                                   ` Michael Wang
2013-07-15  5:57                                     ` Davidlohr Bueso
2013-07-15  6:01                                       ` Michael Wang
2013-07-18  2:15                                       ` Michael Wang
2013-07-03  6:10   ` [PATCH v2] " Michael Wang
2013-07-03  8:50     ` Peter Zijlstra
2013-07-03  9:11       ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51ABFF6A.60206@linux.vnet.ibm.com \
    --to=wangyun@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).