public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Greg Smith <gsmith@gregsmith.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Subject: Re: [patch] Re: PostgreSQL pgbench performance regression in 2.6.23+
Date: Tue, 27 May 2008 10:20:48 +0200	[thread overview]
Message-ID: <1211876448.5760.24.camel@marge.simson.net> (raw)
In-Reply-To: <1211867950.5505.47.camel@marge.simson.net>


On Tue, 2008-05-27 at 07:59 +0200, Mike Galbraith wrote:
> On Mon, 2008-05-26 at 20:28 -0400, Greg Smith wrote:
> 
> > 2) Mike suggested a patch to 2.6.25 in this thread that backports
> the 
> > feature for disabling SCHED_FEAT_SYNC_WAKEUPS.  Would it be reasonable to 
> > push that into 2.6.25.5?  It's clearly quite useful for this load and 
> > therefore possibly others.
> 
> If the patch above flies, imho, that should be folded into the backport
> to stable.  The heart of the patch is a bugfix, so definitely stable
> material.  Whether to enable the debugging/tweaking knobs as well is a
> different question.  (I would do it, but I ain't the maintainer;)

Hm, pbench's extreme dislike of preemption, and the starvation testcase
I sent earlier having an absolute requirement of preemption kinda argues
that some knobs and dials should be per task or task group (or, or... or
scheduler should be all knowing all seeing;) 
 
2.6.25.4-feat=45  2.6.25.4-feat=111    2.6.25.4-feat=47
1  11385.471887        10292.721924         9551.157672
2  16709.515434        15540.399522        16283.968970
3  25456.658841        20187.320016        24562.735943
4  24453.435157        24975.037450        23391.583053
5  25504.302958        23102.131056        23671.860667
6  27076.359200        24688.791507        25947.592071
8  31758.200682        29462.639752        29700.144372
10 32190.081142        30428.413809        27439.441838
15 31175.074906        11097.668025        20344.284129
20 30513.974332        10742.166624        19256.695409
30 28307.399275        10233.708047        17535.423344
40 26720.463867        10037.856773        16104.895695
50 24899.945793         9907.624283        15768.746911

Anyway, if patchlet flies, and Ingo concurs, I'll submit the below.

Prevent short-running wakers of short-running threads from overloading a
single
cpu via wakeup affinity, and provide affinity related debug/tuning
options.

Signed-off-by: Mike Galbraith <efault@gmx.de>

 kernel/sched.c      |    9 ++++++++-
 kernel/sched_fair.c |   25 ++++++++++++++-----------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 1e4596c..d6d70a8 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -596,6 +596,8 @@ enum {
 	SCHED_FEAT_START_DEBIT		= 4,
 	SCHED_FEAT_HRTICK		= 8,
 	SCHED_FEAT_DOUBLE_TICK		= 16,
+	SCHED_FEAT_AFFINE_WAKEUPS	= 32,
+	SCHED_FEAT_SYNC_WAKEUPS		= 64,
 };
 
 const_debug unsigned int sysctl_sched_features =
@@ -603,7 +605,9 @@ const_debug unsigned int sysctl_sched_features =
 		SCHED_FEAT_WAKEUP_PREEMPT	* 1 |
 		SCHED_FEAT_START_DEBIT		* 1 |
 		SCHED_FEAT_HRTICK		* 1 |
-		SCHED_FEAT_DOUBLE_TICK		* 0;
+		SCHED_FEAT_DOUBLE_TICK		* 0 |
+		SCHED_FEAT_AFFINE_WAKEUPS	* 1 |
+		SCHED_FEAT_SYNC_WAKEUPS		* 1;
 
 #define sched_feat(x) (sysctl_sched_features & SCHED_FEAT_##x)
 
@@ -1902,6 +1906,9 @@ static int try_to_wake_up(struct task_struct *p,
unsigned int state, int sync)
 	long old_state;
 	struct rq *rq;
 
+	if (!sched_feat(SYNC_WAKEUPS))
+		sync = 0;
+
 	smp_wmb();
 	rq = task_rq_lock(p, &flags);
 	old_state = p->state;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 0080968..bf2defe 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -992,16 +992,27 @@ wake_affine(struct rq *rq, struct sched_domain
*this_sd, struct rq *this_rq,
 	struct task_struct *curr = this_rq->curr;
 	unsigned long tl = this_load;
 	unsigned long tl_per_task;
+	int bad_imbalance;
 
-	if (!(this_sd->flags & SD_WAKE_AFFINE))
+	if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS))
 		return 0;
 
 	/*
+	 * If sync wakeup then subtract the (maximum possible)
+	 * effect of the currently running task from the load
+	 * of the current CPU:
+	 */
+	if (sync && tl)
+		tl -= curr->se.load.weight;
+
+	bad_imbalance = 100*(tl + p->se.load.weight) > imbalance*load;
+
+	/*
 	 * If the currently running task will sleep within
 	 * a reasonable amount of time then attract this newly
 	 * woken task:
 	 */
-	if (sync && curr->sched_class == &fair_sched_class) {
+	if (sync && !bad_imbalance && curr->sched_class == &fair_sched_class)
{
 		if (curr->se.avg_overlap < sysctl_sched_migration_cost &&
 				p->se.avg_overlap < sysctl_sched_migration_cost)
 			return 1;
@@ -1010,16 +1021,8 @@ wake_affine(struct rq *rq, struct sched_domain
*this_sd, struct rq *this_rq,
 	schedstat_inc(p, se.nr_wakeups_affine_attempts);
 	tl_per_task = cpu_avg_load_per_task(this_cpu);
 
-	/*
-	 * If sync wakeup then subtract the (maximum possible)
-	 * effect of the currently running task from the load
-	 * of the current CPU:
-	 */
-	if (sync)
-		tl -= current->se.load.weight;
-
 	if ((tl <= load && tl + target_load(prev_cpu, idx) <= tl_per_task) ||
-			100*(tl + p->se.load.weight) <= imbalance*load) {
+			!bad_imbalance) {
 		/*
 		 * This domain has SD_WAKE_AFFINE and
 		 * p is cache cold in this domain, and





  reply	other threads:[~2008-05-27  8:21 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-21 17:34 PostgreSQL pgbench performance regression in 2.6.23+ Greg Smith
2008-05-22  7:10 ` Mike Galbraith
2008-05-22  8:28   ` Dhaval Giani
2008-05-22  9:05     ` Mike Galbraith
2008-05-22 10:34       ` Mike Galbraith
2008-05-22 11:25         ` Mike Galbraith
2008-05-22 11:44           ` Peter Zijlstra
2008-05-22 12:09             ` Mike Galbraith
2008-05-22 12:24               ` Peter Zijlstra
2008-05-22 13:16                 ` Mike Galbraith
2008-05-23  7:13                 ` Greg Smith
2008-05-23 10:00                   ` Mike Galbraith
2008-05-23 10:10                     ` Ingo Molnar
2008-05-23 10:15                       ` Mike Galbraith
2008-05-23 23:18                         ` Greg Smith
2008-05-23 23:46                           ` Mike Galbraith
2008-05-24  8:08                             ` Mike Galbraith
2008-05-27  0:28                             ` Greg Smith
2008-05-27  5:59                               ` [patch] " Mike Galbraith
2008-05-27  8:20                                 ` Mike Galbraith [this message]
2008-05-27  8:35                                   ` Mike Galbraith
2008-06-06  5:03                                 ` Greg Smith
2008-06-06  6:13                                   ` Mike Galbraith
2008-06-07 11:38                                     ` Mike Galbraith
2008-06-07 12:50                                       ` Mike Galbraith
2008-06-07 13:07                                         ` Peter Zijlstra
2008-06-07 14:16                                           ` Mike Galbraith
2008-06-07 16:16                                             ` Peter Zijlstra
2008-06-07 17:56                                               ` Mike Galbraith
2008-06-07 13:08                                       ` Peter Zijlstra
2008-06-07 14:54                                         ` [patch part 2] " Mike Galbraith
2008-06-07 16:12                                           ` Peter Zijlstra
2008-06-07 17:53                                             ` Mike Galbraith
2008-06-07 18:19                                               ` Mike Galbraith
2008-05-23 13:05                       ` Mike Galbraith
2008-05-23 13:35                         ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1211876448.5760.24.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=gsmith@gregsmith.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox