All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: lkp@lists.01.org
Subject: Re: [sched/numa] 096aa33863a: -21.4% hackbench.throughput, -20.2% netperf.Throughput_Mbps
Date: Mon, 11 Aug 2014 16:17:36 +0200	[thread overview]
Message-ID: <20140811141736.GD9918@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <53E3C270.1010302@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2575 bytes --]

On Thu, Aug 07, 2014 at 02:16:16PM -0400, Rik van Riel wrote:
> On 08/07/2014 06:53 AM, Fengguang Wu wrote:
> > Hi Rik,
> > 
> > We noticed the below performance regression in commit
> > 096aa33863a5e48de52d2ff30e0801b7487944f4 ("sched/numa: Decay
> > ->wakee_flips instead of zeroing")
> > 
> > b1ad065e65f5610  096aa33863a5e48de52d2ff30  testbox/testcase/testparams
> > ---------------  -------------------------  ---------------------------
> >     122361 ± 0%     -21.4%      96140 ± 0%  lkp-snb01/hackbench/50%-process-pipe
> >     122361 ± 0%     -21.4%      96140 ± 0%  TOTAL hackbench.throughput
> 
> I guess the performance of that benchmark depends on it
> "slipping under the wire" after each time the kernel
> zeroes out ->wakee_flips.
> 
> Depending on repeatedly pulling the wakee back to the same
> node as the waker suggests something else in the kernel may
> be pulling the wakee to another place in the system repeatedly,
> as well, just at a lower frequency (load balancer?).
> 
> I have also noticed that select_idle_sibling often fails to
> find an idle sibling within the LLC domain, even when it
> exists. Fixing that bug sometimes results in lower performance.
> 
> It appears that some of the performance results of the scheduler
> appear on the code acting in an opposite way to its documented
> intention.
> 
> It may be best to revert 096aa33863a for now...

Does something like so cure anything? Is that a 'normal' hackbench run?
lemme see if I can reproduce on anything.

---
 kernel/sched/fair.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86d0d68..13f00daf8028 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4098,14 +4098,19 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 
 static void record_wakee(struct task_struct *p)
 {
+	unsigned long now = jiffies;
 	/*
 	 * Rough decay (wiping) for cost saving, don't worry
 	 * about the boundary, really active task won't care
 	 * about the loss.
 	 */
-	if (time_after(jiffies, current->wakee_flip_decay_ts + HZ)) {
+	if (time_after(now, current->wakee_flip_decay_ts + BITS_PER_LONG * HZ)) {
+		current->wakee_flips = 0;
+		current->wakee_flip_decay_ts = now;
+	} else while (time_after(now, current->wakee_flip_decay_ts + HZ) && 
+		      current->wakee_flips) {
 		current->wakee_flips >>= 1;
-		current->wakee_flip_decay_ts = jiffies;
+		current->wakee_flip_decay_ts += HZ;
 	}
 
 	if (current->last_wakee != p) {

[-- Attachment #2: attachment.sig --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Rik van Riel <riel@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
	Dave Hansen <dave.hansen@intel.com>, Ingo Molnar <mingo@elte.hu>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@01.org
Subject: Re: [sched/numa] 096aa33863a: -21.4% hackbench.throughput, -20.2% netperf.Throughput_Mbps
Date: Mon, 11 Aug 2014 16:17:36 +0200	[thread overview]
Message-ID: <20140811141736.GD9918@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <53E3C270.1010302@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2571 bytes --]

On Thu, Aug 07, 2014 at 02:16:16PM -0400, Rik van Riel wrote:
> On 08/07/2014 06:53 AM, Fengguang Wu wrote:
> > Hi Rik,
> > 
> > We noticed the below performance regression in commit
> > 096aa33863a5e48de52d2ff30e0801b7487944f4 ("sched/numa: Decay
> > ->wakee_flips instead of zeroing")
> > 
> > b1ad065e65f5610  096aa33863a5e48de52d2ff30  testbox/testcase/testparams
> > ---------------  -------------------------  ---------------------------
> >     122361 ± 0%     -21.4%      96140 ± 0%  lkp-snb01/hackbench/50%-process-pipe
> >     122361 ± 0%     -21.4%      96140 ± 0%  TOTAL hackbench.throughput
> 
> I guess the performance of that benchmark depends on it
> "slipping under the wire" after each time the kernel
> zeroes out ->wakee_flips.
> 
> Depending on repeatedly pulling the wakee back to the same
> node as the waker suggests something else in the kernel may
> be pulling the wakee to another place in the system repeatedly,
> as well, just at a lower frequency (load balancer?).
> 
> I have also noticed that select_idle_sibling often fails to
> find an idle sibling within the LLC domain, even when it
> exists. Fixing that bug sometimes results in lower performance.
> 
> It appears that some of the performance results of the scheduler
> appear on the code acting in an opposite way to its documented
> intention.
> 
> It may be best to revert 096aa33863a for now...

Does something like so cure anything? Is that a 'normal' hackbench run?
lemme see if I can reproduce on anything.

---
 kernel/sched/fair.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bfa3c86d0d68..13f00daf8028 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4098,14 +4098,19 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 
 static void record_wakee(struct task_struct *p)
 {
+	unsigned long now = jiffies;
 	/*
 	 * Rough decay (wiping) for cost saving, don't worry
 	 * about the boundary, really active task won't care
 	 * about the loss.
 	 */
-	if (time_after(jiffies, current->wakee_flip_decay_ts + HZ)) {
+	if (time_after(now, current->wakee_flip_decay_ts + BITS_PER_LONG * HZ)) {
+		current->wakee_flips = 0;
+		current->wakee_flip_decay_ts = now;
+	} else while (time_after(now, current->wakee_flip_decay_ts + HZ) && 
+		      current->wakee_flips) {
 		current->wakee_flips >>= 1;
-		current->wakee_flip_decay_ts = jiffies;
+		current->wakee_flip_decay_ts += HZ;
 	}
 
 	if (current->last_wakee != p) {

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  parent reply	other threads:[~2014-08-11 14:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-07 10:53 [sched/numa] 096aa33863a: -21.4% hackbench.throughput, -20.2% netperf.Throughput_Mbps Fengguang Wu
2014-08-07 10:53 ` Fengguang Wu
2014-08-07 18:16 ` Rik van Riel
2014-08-07 18:16   ` Rik van Riel
2014-08-08  8:38   ` Ingo Molnar
2014-08-08  8:38     ` Ingo Molnar
2014-08-11 14:17   ` Peter Zijlstra [this message]
2014-08-11 14:17     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140811141736.GD9918@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.