public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: better wake-balancing: respin
Date: Sat, 29 Oct 2005 01:04:10 +1000	[thread overview]
Message-ID: <43623DEA.3010506@yahoo.com.au> (raw)
In-Reply-To: <20051028143750.GA1806@elte.hu>

[-- Attachment #1: Type: text/plain, Size: 2416 bytes --]

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>Ingo, I wasn't aware that tasks are bouncing around wildly; does your 
>>patch improve things? Then by definition it must penalise workloads 
>>where the pairings are more predictable?
> 
> 
> for TPC, most of the non-to-idle migrations are 'wrong'. So basically

Sure, OK.

> any change that gets rid of extra migrations is a win. This does not 
> mean that it is all bouncing madly.
> 

Yes... semantics aside it is obviously something we want to eliminate
if possible.

> 
>>I would prefer to try fixing wake balancing before giving up and 
>>turning it off for busy CPUs.
> 
> 
> agreed, and that was my suggestion: improve the heuristics to not hurt 
> workloads where there is no natural pairing.
> 
> one possible way would be to do a task_hot() check in the passive 
> balancing code, and only migrate the task when it's been inactive for a 
> long time: that should be the case for most TPC wakeups. (This assumes 
> an accurate cache-hot estimator, for which another patch exists.)
> 

That sounds like a reasonable avenue to investigate. There was
something similar there before, but I found that there was a sharp
point where runqueue lengths got just the right length for performance
characteristics of some things to change.

This could have been made worse by inadequate cache-hot estimate though.

One thing I will look into is to check whether we can use previous
wakeup patterns to check whether the pattern looks random, or affine to
some group of CPUs. This could potentially allow us to have stronger
affinity than we currently do for the "good" workloads, while eliminating
task movement completely for bad cases.

Something along the lines of the attached patch may be 'good enough',
though I'd like to try a few other things as well.

> 
>>Without any form of wake balancing, then a multiprocessor system will 
>>tend to have a completely random distribution of tasks over CPUs over 
>>time. I prefer to add a driver so it is not completely random for 
>>amenable workloads.
> 
> 
> but my patch does not do 'no form of wake balancing'. It will do 
> non-load-related wake balancing if the target CPU is idle. Arguably, 
> that can easily be 'never' under common workloads.
> 

No that's true - I didn't mean to imply that your patch would
completely eliminate this input.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.


[-- Attachment #2: sched-less-affine.patch --]
[-- Type: text/plain, Size: 1723 bytes --]

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h	2005-10-29 00:52:31.000000000 +1000
+++ linux-2.6/include/linux/sched.h	2005-10-29 00:58:51.000000000 +1000
@@ -648,9 +648,12 @@ struct task_struct {
 
 	int lock_depth;		/* BKL lock depth */
 
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
+#if defined(CONFIG_SMP)
+	int last_waker_cpu;	/* CPU that last woke this task up */
+#if defined(__ARCH_WANT_UNLOCKED_CTXSW)
 	int oncpu;
 #endif
+#endif
 	int prio, static_prio;
 	struct list_head run_list;
 	prio_array_t *array;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c	2005-10-29 00:52:34.000000000 +1000
+++ linux-2.6/kernel/sched.c	2005-10-29 01:00:53.000000000 +1000
@@ -1183,6 +1183,10 @@ static int try_to_wake_up(task_t *p, uns
 		}
 	}
 
+	if (p->last_waker_cpu != this_cpu)
+		goto out_set_cpu;
+
+
 	if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed)))
 		goto out_set_cpu;
 
@@ -1253,6 +1257,8 @@ out_set_cpu:
 		cpu = task_cpu(p);
 	}
 
+	p->last_waker_cpu = this_cpu;
+	
 out_activate:
 #endif /* CONFIG_SMP */
 	if (old_state == TASK_UNINTERRUPTIBLE) {
@@ -1334,9 +1340,12 @@ void fastcall sched_fork(task_t *p, int 
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->sched_info, 0, sizeof(p->sched_info));
 #endif
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
+#if defined(CONFIG_SMP)
+	p->last_waker_cpu = cpu;
+#if defined(__ARCH_WANT_UNLOCKED_CTXSW)
 	p->oncpu = 0;
 #endif
+#endif
 #ifdef CONFIG_PREEMPT
 	/* Want to start with kernel preemption disabled. */
 	p->thread_info->preempt_count = 1;

      reply	other threads:[~2005-10-28 15:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-27  1:24 better wake-balancing: respin Chen, Kenneth W
2005-10-28  7:40 ` Ingo Molnar
2005-10-28 22:34   ` Chen, Kenneth W
2005-10-28  9:17 ` Nick Piggin
2005-10-28 10:08   ` Ingo Molnar
2005-10-28 11:03     ` Nick Piggin
2005-10-28 14:37       ` Ingo Molnar
2005-10-28 15:04         ` Nick Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43623DEA.3010506@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox