[RFC PATCH] sched: wakeup buddy

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] sched: wakeup buddy
@ 2013-02-28  6:38 Michael Wang
  2013-02-28  7:18 ` Mike Galbraith
  2013-02-28  9:25 ` Namhyung Kim
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Wang @ 2013-02-28  6:38 UTC (permalink / raw)
  To: LKML, Ingo Molnar, Peter Zijlstra
  Cc: Mike Galbraith, Paul Turner, Alex Shi, Andrew Morton, Ram Pai,
	Nikunj A. Dadhania, Namhyung Kim

wake_affine() stuff is trying to bind related tasks closely, but it doesn't
work well according to the test on 'perf bench sched pipe' (thanks to Peter).

Besides, pgbench show that blindly using wake_affine() will eat a lot of
performance.

Thus, we need a new solution, it should detect the tasks related to each
other, bind them closely, take care the balance, latency and performance
at the same time.

Feature wakeup buddy seems like a good solution (thanks to Mike for the hint).

The feature introduced waker, wakee pointer and their ref count, along with
the new knob sysctl_sched_wakeup_buddy_ref.

Now in select_task_rq_fair(), when current (task B) try to wakeup p (task A),
if match:

	1. A->waker == B && A->wakee == B
	2. A->waker_ref > sysctl_sched_wakeup_buddy_ref
	3. A->wakee_ref > sysctl_sched_wakeup_buddy_ref

then A is the wakeup buddy of B, which means A and B is likely to utilize
the memory of each other.

Thus, if B is also the wakeup buddy of A, which means no other task has
destroyed their relationship, then A is likely to benefit from the cached
data of B, make them running closely is likely to gain benefit.

This patch add the feature wakeup buddy, reorganized the logical of
wake_affine() stuff with the new feature, by doing these, pgbench and
'perf bench sched pipe' perform better.

Highlight:
	Default value of sysctl_sched_wakeup_buddy_ref is 8 temporarily,
	please let me know if some number perform better on your system,
	I'd like to make it bigger to make the decision more carefully,
	so we could provide the solution when it is really needed.

	Comments are very welcomed.

Test:
	Test with a 12 cpu X86 server and tip 3.8.0-rc7.

	'perf bench sched pipe' show nearly double improvement.

	pgbench result:
					prev	post

                | db_size | clients |  tps  |   |  tps  |
                +---------+---------+-------+   +-------+
                | 22 MB   |       1 | 10794 |   | 10820 |
                | 22 MB   |       2 | 21567 |   | 21915 |
                | 22 MB   |       4 | 41621 |   | 42766 |
                | 22 MB   |       8 | 53883 |   | 60511 |       +12.30%
                | 22 MB   |      12 | 50818 |   | 57129 |       +12.42%
                | 22 MB   |      16 | 50463 |   | 59345 |       +17.60%
                | 22 MB   |      24 | 46698 |   | 63787 |       +36.59%
                | 22 MB   |      32 | 43404 |   | 62643 |       +44.33%

                | 7484 MB |       1 |  7974 |   |  8014 |
                | 7484 MB |       2 | 19341 |   | 19534 |
                | 7484 MB |       4 | 36808 |   | 38092 |
                | 7484 MB |       8 | 47821 |   | 51968 |       +8.67%
                | 7484 MB |      12 | 45913 |   | 52284 |       +13.88%
                | 7484 MB |      16 | 46478 |   | 54418 |       +17.08%
                | 7484 MB |      24 | 42793 |   | 56375 |       +31.74%
                | 7484 MB |      32 | 36329 |   | 55783 |       +53.55%
                
                | 15 GB   |       1 |  7636 |   |  7880 |       
                | 15 GB   |       2 | 19195 |   | 19477 |
                | 15 GB   |       4 | 35975 |   | 37962 |
                | 15 GB   |       8 | 47919 |   | 51558 |       +7.59%
                | 15 GB   |      12 | 45397 |   | 51163 |       +12.70%
                | 15 GB   |      16 | 45926 |   | 53912 |       +17.39%
                | 15 GB   |      24 | 42184 |   | 55343 |       +31.19%
                | 15 GB   |      32 | 35983 |   | 55358 |       +53.84%

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
---
 include/linux/sched.h |    8 ++++
 kernel/sched/fair.c   |   97 ++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sysctl.c       |   10 +++++
 3 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d211247..c5a02b3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1235,6 +1235,10 @@ enum perf_event_task_context {
 	perf_nr_task_contexts,
 };
 
+#ifdef CONFIG_SMP
+extern unsigned int sysctl_sched_wakeup_buddy_ref;
+#endif
+
 struct task_struct {
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	void *stack;
@@ -1245,6 +1249,10 @@ struct task_struct {
 #ifdef CONFIG_SMP
 	struct llist_node wake_entry;
 	int on_cpu;
+	struct task_struct *waker;
+	struct task_struct *wakee;
+	unsigned int waker_ref;
+	unsigned int wakee_ref;
 #endif
 	int on_rq;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..d5acfd8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3173,6 +3173,75 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 }
 
 /*
+ * Reduce sysctl_sched_wakeup_buddy_ref will reduce the preparation time
+ * to active the wakeup buddy feature, and make it agile, however, this
+ * will increase the risk of misidentify.
+ *
+ * Check wakeup_buddy() for the usage.
+ */
+unsigned int sysctl_sched_wakeup_buddy_ref = 8UL;
+
+/*
+ * wakeup_buddy() help to check whether p1 is the wakeup buddy of p2.
+ *
+ * Return 1 for yes, 0 for no.
+*/
+static inline int wakeup_buddy(struct task_struct *p1, struct task_struct *p2)
+{
+	if (p1->waker != p2 || p1->wakee != p2)
+		return 0;
+
+	if (p1->waker_ref < sysctl_sched_wakeup_buddy_ref)
+		return 0;
+
+	if (p1->wakee_ref < sysctl_sched_wakeup_buddy_ref)
+		return 0;
+
+	return 1;
+}
+
+/*
+ * wakeup_related() help to check whether bind p close to current will
+ * benefit the system.
+ *
+ * If p and current are wakeup buddy of each other, usually that means
+ * they utilize the memory of each other, and current cached some data
+ * interested by p.
+ *
+ * Return 1 for yes, 0 for no.
+ */
+static inline int wakeup_related(struct task_struct *p)
+{
+	if (wakeup_buddy(p, current)) {
+		/*
+		 * Now check whether current still focus on his buddy.
+		 */
+		if (wakeup_buddy(current, p))
+			return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * wakeup_ref() help to record the ref when current wakeup p
+ */
+static inline void wakeup_ref(struct task_struct *p)
+{
+	if (p->waker != current) {
+		p->waker_ref = 0;
+		p->waker = current;
+	} else
+		p->waker_ref++;
+
+	if (current->wakee != p) {
+		current->wakee_ref = 0;
+		current->wakee = p;
+	} else
+		current->wakee_ref++;
+}
+
+/*
  * find_idlest_group finds and returns the least busy CPU group within the
  * domain.
  */
@@ -3351,8 +3420,30 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 	}
 
 	if (affine_sd) {
-		if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
-			prev_cpu = cpu;
+		if (wakeup_related(p) && wake_affine(affine_sd, p, sync)) {
+			/*
+			 * current and p are wakeup related, and balance is
+			 * guaranteed, try to make them closely.
+			 */
+			if (cpu_rq(cpu)->nr_running - sync) {
+				/*
+				 * Current is not going to sleep or there
+				 * are other task on current cpu, search
+				 * an idle cpu close to the current cpu to
+				 * take care latency.
+				 */
+				new_cpu = select_idle_sibling(p, cpu);
+			} else {
+				/*
+				 * current is the only task on rq and it is
+				 * going to sleep, current cpu will be a nice
+				 * candidate for p to run on.
+				 */
+				new_cpu = cpu;
+			}
+
+			goto unlock;
+		}
 
 		new_cpu = select_idle_sibling(p, prev_cpu);
 		goto unlock;
@@ -3399,6 +3490,8 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 unlock:
 	rcu_read_unlock();
 
+	wakeup_ref(p);
+
 	return new_cpu;
 }
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..6845d24 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -424,6 +424,16 @@ static struct ctl_table kern_table[] = {
 		.extra1		= &one,
 	},
 #endif
+#ifdef CONFIG_SMP
+	{
+		.procname	= "sched_wakeup_buddy_ref",
+		.data		= &sysctl_sched_wakeup_buddy_ref,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+	},
+#endif
 #ifdef CONFIG_PROVE_LOCKING
 	{
 		.procname	= "prove_locking",
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  6:38 [RFC PATCH] sched: wakeup buddy Michael Wang
@ 2013-02-28  7:18 ` Mike Galbraith
  2013-02-28  7:40   ` Michael Wang
  2013-02-28  9:25 ` Namhyung Kim
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2013-02-28  7:18 UTC (permalink / raw)
  To: Michael Wang
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On Thu, 2013-02-28 at 14:38 +0800, Michael Wang wrote:

> +				/*
> +				 * current is the only task on rq and it is
> +				 * going to sleep, current cpu will be a nice
> +				 * candidate for p to run on.
> +				 */

The sync hint only means it might be going to sleep soon, and even then,
there can still be enough execution overlap to be a win to schedule
cross core.  Sched pipe numbers will always be much prettier if you do
wakeup cpu affine, as it's ~100% scheduler and ~100% sync.  You may lose
a lot on other stuff if you interpret the hint as gospel truth.

IMHO, sched pipe is a "how fat have I become" benchmark, not "how well
do I perform".  The scheduler performs well when it makes more work
happen.  Playing ping-pong with yourself is _exercise_, not a job :)

-Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  7:18 ` Mike Galbraith
@ 2013-02-28  7:40   ` Michael Wang
  2013-02-28  7:42     ` Michael Wang
  2013-02-28  8:04     ` Mike Galbraith
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Wang @ 2013-02-28  7:40 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

Hi, Mike

Thanks for your reply.

On 02/28/2013 03:18 PM, Mike Galbraith wrote:
> On Thu, 2013-02-28 at 14:38 +0800, Michael Wang wrote:
> 
>> +				/*
>> +				 * current is the only task on rq and it is
>> +				 * going to sleep, current cpu will be a nice
>> +				 * candidate for p to run on.
>> +				 */
> 
> The sync hint only means it might be going to sleep soon, and even then,
> there can still be enough execution overlap to be a win to schedule
> cross core.  Sched pipe numbers will always be much prettier if you do
> wakeup cpu affine, as it's ~100% scheduler and ~100% sync.

Hmm.. so it's the comparison between 'cache benefit - execution overlap'
and 'latency - execution overlap'?

I could not estimate how many latency will be added to wait for current
going to sleep (it should be faster than access cold data, isn't it?),
but I really like the cache benefit, unless sync doesn't means current
is going to sleep every time, but that's the promise of WF_SYNC, isn't it?

You may lose
> a lot on other stuff if you interpret the hint as gospel truth.

Could you please give more details on this point?

> 
> IMHO, sched pipe is a "how fat have I become" benchmark, not "how well
> do I perform".  The scheduler performs well when it makes more work
> happen.  Playing ping-pong with yourself is _exercise_, not a job :)

That's right, may be I'm using the wrong description, it's the ops/sec
which has been doubled, that means 'fat', correct?

Regards,
Michael Wang

> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  7:40   ` Michael Wang
@ 2013-02-28  7:42     ` Michael Wang
  2013-02-28  8:06       ` Mike Galbraith
  2013-02-28  8:04     ` Mike Galbraith
  1 sibling, 1 reply; 16+ messages in thread
From: Michael Wang @ 2013-02-28  7:42 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On 02/28/2013 03:40 PM, Michael Wang wrote:
> Hi, Mike
> 
> Thanks for your reply.
> 
> On 02/28/2013 03:18 PM, Mike Galbraith wrote:
>> On Thu, 2013-02-28 at 14:38 +0800, Michael Wang wrote:
>>
>>> +				/*
>>> +				 * current is the only task on rq and it is
>>> +				 * going to sleep, current cpu will be a nice
>>> +				 * candidate for p to run on.
>>> +				 */
>>
>> The sync hint only means it might be going to sleep soon, and even then,
>> there can still be enough execution overlap to be a win to schedule
>> cross core.  Sched pipe numbers will always be much prettier if you do
>> wakeup cpu affine, as it's ~100% scheduler and ~100% sync.
> 
> Hmm.. so it's the comparison between 'cache benefit - execution overlap'
> and 'latency - execution overlap'?
> 
> I could not estimate how many latency will be added to wait for current
> going to sleep (it should be faster than access cold data, isn't it?),
> but I really like the cache benefit, unless sync doesn't means current
> is going to sleep every time, but that's the promise of WF_SYNC, isn't it?
> 
> You may lose
>> a lot on other stuff if you interpret the hint as gospel truth.
> 
> Could you please give more details on this point?
> 
>>
>> IMHO, sched pipe is a "how fat have I become" benchmark, not "how well
>> do I perform".  The scheduler performs well when it makes more work
>> happen.  Playing ping-pong with yourself is _exercise_, not a job :)
> 
> That's right, may be I'm using the wrong description, it's the ops/sec
> which has been doubled, that means 'fat', correct?

I mean could we say that more ops/sec means more works has been done?

Regards,
Michael Wang

> 
> Regards,
> Michael Wang
> 
>>
>> -Mike
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  7:40   ` Michael Wang
  2013-02-28  7:42     ` Michael Wang
@ 2013-02-28  8:04     ` Mike Galbraith
  2013-02-28  8:14       ` Michael Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2013-02-28  8:04 UTC (permalink / raw)
  To: Michael Wang
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On Thu, 2013-02-28 at 15:40 +0800, Michael Wang wrote: 
> Hi, Mike
> 
> Thanks for your reply.
> 
> On 02/28/2013 03:18 PM, Mike Galbraith wrote:
> > On Thu, 2013-02-28 at 14:38 +0800, Michael Wang wrote:
> > 
> >> +				/*
> >> +				 * current is the only task on rq and it is
> >> +				 * going to sleep, current cpu will be a nice
> >> +				 * candidate for p to run on.
> >> +				 */
> > 
> > The sync hint only means it might be going to sleep soon, and even then,
> > there can still be enough execution overlap to be a win to schedule
> > cross core.  Sched pipe numbers will always be much prettier if you do
> > wakeup cpu affine, as it's ~100% scheduler and ~100% sync.
> 
> Hmm.. so it's the comparison between 'cache benefit - execution overlap'
> and 'latency - execution overlap'?

Yeah.  You'll always lose power cross core, and throughput breakeven and
win depends on convertible overlap, and how much L2 miss etc costs.  For
sched pipe there is no win, but for other sync hint users there is.

> I could not estimate how many latency will be added to wait for current
> going to sleep (it should be faster than access cold data, isn't it?),
> but I really like the cache benefit, unless sync doesn't means current
> is going to sleep every time, but that's the promise of WF_SYNC, isn't it?

It would be nice if it _were_ a promise, but it is not, it's a hint.

> You may lose
> > a lot on other stuff if you interpret the hint as gospel truth.
> 
> Could you please give more details on this point?

tbench, mysql+oltp, on and on use the sync hint, many things jabber on
localhost, use the sync hint, and have been shown in cold hard numbers
to benefit, some things massively from cross core scheduling.  You lose
for sure at extreme context rates, but it has to be pretty darn high to
be a guaranteed loser.

That's why select_idle_sibling() is so very damn annoying.
> > IMHO, sched pipe is a "how fat have I become" benchmark, not "how well
> > do I perform".  The scheduler performs well when it makes more work
> > happen.  Playing ping-pong with yourself is _exercise_, not a job :)
> 
> That's right, may be I'm using the wrong description, it's the ops/sec
> which has been doubled, that means 'fat', correct?

In this case, it means you're not running a kernel with nohz on a chain,
running two schedulers is more expensive than running one, and missing
L2 each and every time hurts very badly when the load is ultra skinny.

-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  7:42     ` Michael Wang
@ 2013-02-28  8:06       ` Mike Galbraith
  0 siblings, 0 replies; 16+ messages in thread
From: Mike Galbraith @ 2013-02-28  8:06 UTC (permalink / raw)
  To: Michael Wang
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On Thu, 2013-02-28 at 15:42 +0800, Michael Wang wrote:

> I mean could we say that more ops/sec means more works has been done?

Sure.  But it's fairly meaningless, it's all scheduler.  Real tasks do
more than schedule.

-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  8:04     ` Mike Galbraith
@ 2013-02-28  8:14       ` Michael Wang
  2013-02-28  8:24         ` Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Wang @ 2013-02-28  8:14 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On 02/28/2013 04:04 PM, Mike Galbraith wrote:
> On Thu, 2013-02-28 at 15:40 +0800, Michael Wang wrote: 
>> Hi, Mike
>>
>> Thanks for your reply.
>>
>> On 02/28/2013 03:18 PM, Mike Galbraith wrote:
>>> On Thu, 2013-02-28 at 14:38 +0800, Michael Wang wrote:
>>>
>>>> +				/*
>>>> +				 * current is the only task on rq and it is
>>>> +				 * going to sleep, current cpu will be a nice
>>>> +				 * candidate for p to run on.
>>>> +				 */
>>>
>>> The sync hint only means it might be going to sleep soon, and even then,
>>> there can still be enough execution overlap to be a win to schedule
>>> cross core.  Sched pipe numbers will always be much prettier if you do
>>> wakeup cpu affine, as it's ~100% scheduler and ~100% sync.
>>
>> Hmm.. so it's the comparison between 'cache benefit - execution overlap'
>> and 'latency - execution overlap'?
> 
> Yeah.  You'll always lose power cross core, and throughput breakeven and
> win depends on convertible overlap, and how much L2 miss etc costs.  For
> sched pipe there is no win, but for other sync hint users there is.
> 
>> I could not estimate how many latency will be added to wait for current
>> going to sleep (it should be faster than access cold data, isn't it?),
>> but I really like the cache benefit, unless sync doesn't means current
>> is going to sleep every time, but that's the promise of WF_SYNC, isn't it?
> 
> It would be nice if it _were_ a promise, but it is not, it's a hint.

Bad to know :(

Should we fix it or this is by designed? The comments after WF_SYNC
cheated me...

Regards,
Michael Wang

> 
>> You may lose
>>> a lot on other stuff if you interpret the hint as gospel truth.
>>
>> Could you please give more details on this point?
> 
> tbench, mysql+oltp, on and on use the sync hint, many things jabber on
> localhost, use the sync hint, and have been shown in cold hard numbers
> to benefit, some things massively from cross core scheduling.  You lose
> for sure at extreme context rates, but it has to be pretty darn high to
> be a guaranteed loser.
> 
> That's why select_idle_sibling() is so very damn annoying.
>>> IMHO, sched pipe is a "how fat have I become" benchmark, not "how well
>>> do I perform".  The scheduler performs well when it makes more work
>>> happen.  Playing ping-pong with yourself is _exercise_, not a job :)
>>
>> That's right, may be I'm using the wrong description, it's the ops/sec
>> which has been doubled, that means 'fat', correct?
> 
> In this case, it means you're not running a kernel with nohz on a chain,
> running two schedulers is more expensive than running one, and missing
> L2 each and every time hurts very badly when the load is ultra skinny.
> 
> -Mike
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  8:14       ` Michael Wang
@ 2013-02-28  8:24         ` Mike Galbraith
  2013-02-28  8:49           ` Michael Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2013-02-28  8:24 UTC (permalink / raw)
  To: Michael Wang
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On Thu, 2013-02-28 at 16:14 +0800, Michael Wang wrote: 
> On 02/28/2013 04:04 PM, Mike Galbraith wrote:

> > It would be nice if it _were_ a promise, but it is not, it's a hint.
> 
> Bad to know :(
> 
> Should we fix it or this is by designed? The comments after WF_SYNC
> cheated me...

You can't fix it, because it's not busted.  You can say "Ok guys, I'm
off for a nap RSN" all you want, but that won't guarantee that nobody
pokes you, and hands you something more useful to do than snoozing.

-Mike  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  8:24         ` Mike Galbraith
@ 2013-02-28  8:49           ` Michael Wang
  2013-02-28  9:18             ` Mike Galbraith
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Wang @ 2013-02-28  8:49 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On 02/28/2013 04:24 PM, Mike Galbraith wrote:
> On Thu, 2013-02-28 at 16:14 +0800, Michael Wang wrote: 
>> On 02/28/2013 04:04 PM, Mike Galbraith wrote:
> 
>>> It would be nice if it _were_ a promise, but it is not, it's a hint.
>>
>> Bad to know :(
>>
>> Should we fix it or this is by designed? The comments after WF_SYNC
>> cheated me...
> 
> You can't fix it, because it's not busted.  You can say "Ok guys, I'm
> off for a nap RSN" all you want, but that won't guarantee that nobody
> pokes you, and hands you something more useful to do than snoozing.

So sync still means current is going to sleep, what you concerned is
this promise will be easily broken by other waker, correct?

Hmm.. may be you are right, if 'perf bench sched pipe' is not the one we
should care, I have no reason to add this logical currently.

I will remove this plus branch, unless I found other benchmark could
benefit a lot from it.

Besides this, how do you think about this idea?

Regards,
Michael Wang

> 
> -Mike  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  8:49           ` Michael Wang
@ 2013-02-28  9:18             ` Mike Galbraith
  2013-03-01  2:18               ` Michael Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2013-02-28  9:18 UTC (permalink / raw)
  To: Michael Wang
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On Thu, 2013-02-28 at 16:49 +0800, Michael Wang wrote: 
> On 02/28/2013 04:24 PM, Mike Galbraith wrote:
> > On Thu, 2013-02-28 at 16:14 +0800, Michael Wang wrote: 
> >> On 02/28/2013 04:04 PM, Mike Galbraith wrote:
> > 
> >>> It would be nice if it _were_ a promise, but it is not, it's a hint.
> >>
> >> Bad to know :(
> >>
> >> Should we fix it or this is by designed? The comments after WF_SYNC
> >> cheated me...
> > 
> > You can't fix it, because it's not busted.  You can say "Ok guys, I'm
> > off for a nap RSN" all you want, but that won't guarantee that nobody
> > pokes you, and hands you something more useful to do than snoozing.
> 
> So sync still means current is going to sleep, what you concerned is
> this promise will be easily broken by other waker, correct?

That makes it a lie, and it can already have been one with no help.
Just because you wake one sync does not mean you're not going to find
another to wake.  Smart tasks are taught to look before they leap.

> Hmm.. may be you are right, if 'perf bench sched pipe' is not the one we
> should care, I have no reason to add this logical currently.

Well, there is reason to identify task relationships methinks, you just
can't rely on the fact that you're alone on the rq at the moment, and
doing a sync wakeup to bind tasks.  They _will_ lie to you :)

> I will remove this plus branch, unless I found other benchmark could
> benefit a lot from it.
> 
> Besides this, how do you think about this idea?

I like the idea of filtering true buddy pairs, and automagically
detecting the point when 1:N wants spreading rather a lot (fwtw).  I'll
look closer at your method, but when it comes to implementation
opinions, the only one I trust comes out of a box in front of me.

I'm somewhat.. "taste challenged", Peter and Ingo have some though :)

-Mike

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  6:38 [RFC PATCH] sched: wakeup buddy Michael Wang
  2013-02-28  7:18 ` Mike Galbraith
@ 2013-02-28  9:25 ` Namhyung Kim
  2013-02-28 10:06   ` Mike Galbraith
  2013-03-01  2:18   ` Michael Wang
  1 sibling, 2 replies; 16+ messages in thread
From: Namhyung Kim @ 2013-02-28  9:25 UTC (permalink / raw)
  To: Michael Wang
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Mike Galbraith, Paul Turner,
	Alex Shi, Andrew Morton, Ram Pai, Nikunj A. Dadhania

Hi Michael,

On Thu, 28 Feb 2013 14:38:03 +0800, Michael Wang wrote:
> wake_affine() stuff is trying to bind related tasks closely, but it doesn't
> work well according to the test on 'perf bench sched pipe' (thanks to Peter).
>
> Besides, pgbench show that blindly using wake_affine() will eat a lot of
> performance.
>
> Thus, we need a new solution, it should detect the tasks related to each
> other, bind them closely, take care the balance, latency and performance
> at the same time.
>
> Feature wakeup buddy seems like a good solution (thanks to Mike for the hint).
>
> The feature introduced waker, wakee pointer and their ref count, along with
> the new knob sysctl_sched_wakeup_buddy_ref.
>
> Now in select_task_rq_fair(), when current (task B) try to wakeup p (task A),
> if match:
>
> 	1. A->waker == B && A->wakee == B
> 	2. A->waker_ref > sysctl_sched_wakeup_buddy_ref
> 	3. A->wakee_ref > sysctl_sched_wakeup_buddy_ref
>
> then A is the wakeup buddy of B, which means A and B is likely to utilize
> the memory of each other.
>
> Thus, if B is also the wakeup buddy of A, which means no other task has
> destroyed their relationship, then A is likely to benefit from the cached
> data of B, make them running closely is likely to gain benefit.

Not sure if it should require bidirectional relationship.  Looks like
just for benchmarks.  Isn't there a one-way relationship that could get
a benefit from this?  I don't know ;-)

Few nitpicks below..

>
> This patch add the feature wakeup buddy, reorganized the logical of
> wake_affine() stuff with the new feature, by doing these, pgbench and
> 'perf bench sched pipe' perform better.
>
> Highlight:
> 	Default value of sysctl_sched_wakeup_buddy_ref is 8 temporarily,
> 	please let me know if some number perform better on your system,
> 	I'd like to make it bigger to make the decision more carefully,
> 	so we could provide the solution when it is really needed.
>
> 	Comments are very welcomed.
>
> Test:
> 	Test with a 12 cpu X86 server and tip 3.8.0-rc7.
>
> 	'perf bench sched pipe' show nearly double improvement.
>
> 	pgbench result:
> 					prev	post
>
>                 | db_size | clients |  tps  |   |  tps  |
>                 +---------+---------+-------+   +-------+
>                 | 22 MB   |       1 | 10794 |   | 10820 |
>                 | 22 MB   |       2 | 21567 |   | 21915 |
>                 | 22 MB   |       4 | 41621 |   | 42766 |
>                 | 22 MB   |       8 | 53883 |   | 60511 |       +12.30%
>                 | 22 MB   |      12 | 50818 |   | 57129 |       +12.42%
>                 | 22 MB   |      16 | 50463 |   | 59345 |       +17.60%
>                 | 22 MB   |      24 | 46698 |   | 63787 |       +36.59%
>                 | 22 MB   |      32 | 43404 |   | 62643 |       +44.33%
>
>                 | 7484 MB |       1 |  7974 |   |  8014 |
>                 | 7484 MB |       2 | 19341 |   | 19534 |
>                 | 7484 MB |       4 | 36808 |   | 38092 |
>                 | 7484 MB |       8 | 47821 |   | 51968 |       +8.67%
>                 | 7484 MB |      12 | 45913 |   | 52284 |       +13.88%
>                 | 7484 MB |      16 | 46478 |   | 54418 |       +17.08%
>                 | 7484 MB |      24 | 42793 |   | 56375 |       +31.74%
>                 | 7484 MB |      32 | 36329 |   | 55783 |       +53.55%
>                 
>                 | 15 GB   |       1 |  7636 |   |  7880 |       
>                 | 15 GB   |       2 | 19195 |   | 19477 |
>                 | 15 GB   |       4 | 35975 |   | 37962 |
>                 | 15 GB   |       8 | 47919 |   | 51558 |       +7.59%
>                 | 15 GB   |      12 | 45397 |   | 51163 |       +12.70%
>                 | 15 GB   |      16 | 45926 |   | 53912 |       +17.39%
>                 | 15 GB   |      24 | 42184 |   | 55343 |       +31.19%
>                 | 15 GB   |      32 | 35983 |   | 55358 |       +53.84%
>
> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> ---
[SNIP]
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 81fa536..d5acfd8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3173,6 +3173,75 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>  }
>  
>  /*
> + * Reduce sysctl_sched_wakeup_buddy_ref will reduce the preparation time
> + * to active the wakeup buddy feature, and make it agile, however, this
> + * will increase the risk of misidentify.
> + *
> + * Check wakeup_buddy() for the usage.
> + */
> +unsigned int sysctl_sched_wakeup_buddy_ref = 8UL;

It seems that just 8U (or even 8) is enough.

> +
> +/*
> + * wakeup_buddy() help to check whether p1 is the wakeup buddy of p2.
> + *
> + * Return 1 for yes, 0 for no.
> +*/
> +static inline int wakeup_buddy(struct task_struct *p1, struct task_struct *p2)
> +{
> +	if (p1->waker != p2 || p1->wakee != p2)
> +		return 0;
> +
> +	if (p1->waker_ref < sysctl_sched_wakeup_buddy_ref)
> +		return 0;
> +
> +	if (p1->wakee_ref < sysctl_sched_wakeup_buddy_ref)
> +		return 0;
> +
> +	return 1;
> +}
[SNIP]
> @@ -3399,6 +3490,8 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>  unlock:
>  	rcu_read_unlock();
>  
> +	wakeup_ref(p);
> +

Why did you call it here?  Shouldn't it be on somewhere in the ttwu?


>  	return new_cpu;
>  }
>  
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c88878d..6845d24 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -424,6 +424,16 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= &one,
>  	},
>  #endif
> +#ifdef CONFIG_SMP
> +	{
> +		.procname	= "sched_wakeup_buddy_ref",
> +		.data		= &sysctl_sched_wakeup_buddy_ref,
> +		.maxlen		= sizeof(unsigned int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &one,
> +	},
> +#endif
>  #ifdef CONFIG_PROVE_LOCKING
>  	{
>  		.procname	= "prove_locking",

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  9:25 ` Namhyung Kim
@ 2013-02-28 10:06   ` Mike Galbraith
  2013-02-28 15:31     ` Namhyung Kim
  2013-03-01  2:18   ` Michael Wang
  1 sibling, 1 reply; 16+ messages in thread
From: Mike Galbraith @ 2013-02-28 10:06 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Michael Wang, LKML, Ingo Molnar, Peter Zijlstra, Paul Turner,
	Alex Shi, Andrew Morton, Ram Pai, Nikunj A. Dadhania

On Thu, 2013-02-28 at 18:25 +0900, Namhyung Kim wrote:

> Not sure if it should require bidirectional relationship.  Looks like
> just for benchmarks.  Isn't there a one-way relationship that could get
> a benefit from this?  I don't know ;-)

??  Meaningful relationships are bare minimum bidirectional, how can you
describe one connection and have it remain meaningful?  I love "her" is
unlikely to lead to anything meaningful if "she" doesn't know you exist.

-Mike


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28 10:06   ` Mike Galbraith
@ 2013-02-28 15:31     ` Namhyung Kim
  2013-03-01  2:30       ` Michael Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2013-02-28 15:31 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Michael Wang, LKML, Ingo Molnar, Peter Zijlstra, Paul Turner,
	Alex Shi, Andrew Morton, Ram Pai, Nikunj A. Dadhania

2013-02-28 (목), 11:06 +0100, Mike Galbraith:
> On Thu, 2013-02-28 at 18:25 +0900, Namhyung Kim wrote:
> 
> > Not sure if it should require bidirectional relationship.  Looks like
> > just for benchmarks.  Isn't there a one-way relationship that could get
> > a benefit from this?  I don't know ;-)
> 
> ??  Meaningful relationships are bare minimum bidirectional, how can you
> describe one connection and have it remain meaningful?  I love "her" is
> unlikely to lead to anything meaningful if "she" doesn't know you exist.

Maybe I misunderstood something.  I was thinking about typical
cooperation models like manager-worker, producer-consumer or pipeline
and thought that they are usually one-way relationship in terms of the
wakeup.

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  9:18             ` Mike Galbraith
@ 2013-03-01  2:18               ` Michael Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Wang @ 2013-03-01  2:18 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Paul Turner, Alex Shi,
	Andrew Morton, Ram Pai, Nikunj A. Dadhania, Namhyung Kim

On 02/28/2013 05:18 PM, Mike Galbraith wrote:
> On Thu, 2013-02-28 at 16:49 +0800, Michael Wang wrote: 
>> On 02/28/2013 04:24 PM, Mike Galbraith wrote:
>>> On Thu, 2013-02-28 at 16:14 +0800, Michael Wang wrote: 
>>>> On 02/28/2013 04:04 PM, Mike Galbraith wrote:
>>>
>>>>> It would be nice if it _were_ a promise, but it is not, it's a hint.
>>>>
>>>> Bad to know :(
>>>>
>>>> Should we fix it or this is by designed? The comments after WF_SYNC
>>>> cheated me...
>>>
>>> You can't fix it, because it's not busted.  You can say "Ok guys, I'm
>>> off for a nap RSN" all you want, but that won't guarantee that nobody
>>> pokes you, and hands you something more useful to do than snoozing.
>>
>> So sync still means current is going to sleep, what you concerned is
>> this promise will be easily broken by other waker, correct?
> 
> That makes it a lie, and it can already have been one with no help.
> Just because you wake one sync does not mean you're not going to find
> another to wake.  Smart tasks are taught to look before they leap.
> 
>> Hmm.. may be you are right, if 'perf bench sched pipe' is not the one we
>> should care, I have no reason to add this logical currently.
> 
> Well, there is reason to identify task relationships methinks, you just
> can't rely on the fact that you're alone on the rq at the moment, and
> doing a sync wakeup to bind tasks.  They _will_ lie to you :)

I see.

> 
>> I will remove this plus branch, unless I found other benchmark could
>> benefit a lot from it.
>>
>> Besides this, how do you think about this idea?
> 
> I like the idea of filtering true buddy pairs, and automagically
> detecting the point when 1:N wants spreading rather a lot (fwtw).  I'll
> look closer at your method, but when it comes to implementation
> opinions, the only one I trust comes out of a box in front of me.

And please let me know how it works on your box ;-)

Regards,
Michael Wang

> 
> I'm somewhat.. "taste challenged", Peter and Ingo have some though :)
> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28  9:25 ` Namhyung Kim
  2013-02-28 10:06   ` Mike Galbraith
@ 2013-03-01  2:18   ` Michael Wang
  1 sibling, 0 replies; 16+ messages in thread
From: Michael Wang @ 2013-03-01  2:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Mike Galbraith, Paul Turner,
	Alex Shi, Andrew Morton, Ram Pai, Nikunj A. Dadhania

Hi, Namhyung

Thanks for your reply.

On 02/28/2013 05:25 PM, Namhyung Kim wrote:
[snip]
>> Thus, if B is also the wakeup buddy of A, which means no other task has
>> destroyed their relationship, then A is likely to benefit from the cached
>> data of B, make them running closely is likely to gain benefit.
> 
> Not sure if it should require bidirectional relationship.  Looks like
> just for benchmarks.  Isn't there a one-way relationship that could get
> a benefit from this?  I don't know ;-)

That's one point :)

Actually I have tried the one-way case at very beginning, the
performance is not good.

I think it was caused by that if A lost interesting on B and walking
with C, then make A and B closely won't gain so many benefit, since the
cached data of A is likely to benefit C not B now.

> 
> Few nitpicks below..
> 
>>
>> This patch add the feature wakeup buddy, reorganized the logical of
>> wake_affine() stuff with the new feature, by doing these, pgbench and
>> 'perf bench sched pipe' perform better.
>>
>> Highlight:
>> 	Default value of sysctl_sched_wakeup_buddy_ref is 8 temporarily,
>> 	please let me know if some number perform better on your system,
>> 	I'd like to make it bigger to make the decision more carefully,
>> 	so we could provide the solution when it is really needed.
>>
>> 	Comments are very welcomed.
>>
>> Test:
>> 	Test with a 12 cpu X86 server and tip 3.8.0-rc7.
>>
>> 	'perf bench sched pipe' show nearly double improvement.
>>
>> 	pgbench result:
>> 					prev	post
>>
>>                 | db_size | clients |  tps  |   |  tps  |
>>                 +---------+---------+-------+   +-------+
>>                 | 22 MB   |       1 | 10794 |   | 10820 |
>>                 | 22 MB   |       2 | 21567 |   | 21915 |
>>                 | 22 MB   |       4 | 41621 |   | 42766 |
>>                 | 22 MB   |       8 | 53883 |   | 60511 |       +12.30%
>>                 | 22 MB   |      12 | 50818 |   | 57129 |       +12.42%
>>                 | 22 MB   |      16 | 50463 |   | 59345 |       +17.60%
>>                 | 22 MB   |      24 | 46698 |   | 63787 |       +36.59%
>>                 | 22 MB   |      32 | 43404 |   | 62643 |       +44.33%
>>
>>                 | 7484 MB |       1 |  7974 |   |  8014 |
>>                 | 7484 MB |       2 | 19341 |   | 19534 |
>>                 | 7484 MB |       4 | 36808 |   | 38092 |
>>                 | 7484 MB |       8 | 47821 |   | 51968 |       +8.67%
>>                 | 7484 MB |      12 | 45913 |   | 52284 |       +13.88%
>>                 | 7484 MB |      16 | 46478 |   | 54418 |       +17.08%
>>                 | 7484 MB |      24 | 42793 |   | 56375 |       +31.74%
>>                 | 7484 MB |      32 | 36329 |   | 55783 |       +53.55%
>>                 
>>                 | 15 GB   |       1 |  7636 |   |  7880 |       
>>                 | 15 GB   |       2 | 19195 |   | 19477 |
>>                 | 15 GB   |       4 | 35975 |   | 37962 |
>>                 | 15 GB   |       8 | 47919 |   | 51558 |       +7.59%
>>                 | 15 GB   |      12 | 45397 |   | 51163 |       +12.70%
>>                 | 15 GB   |      16 | 45926 |   | 53912 |       +17.39%
>>                 | 15 GB   |      24 | 42184 |   | 55343 |       +31.19%
>>                 | 15 GB   |      32 | 35983 |   | 55358 |       +53.84%
>>
>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
>> ---
> [SNIP]
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 81fa536..d5acfd8 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3173,6 +3173,75 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
>>  }
>>  
>>  /*
>> + * Reduce sysctl_sched_wakeup_buddy_ref will reduce the preparation time
>> + * to active the wakeup buddy feature, and make it agile, however, this
>> + * will increase the risk of misidentify.
>> + *
>> + * Check wakeup_buddy() for the usage.
>> + */
>> +unsigned int sysctl_sched_wakeup_buddy_ref = 8UL;
> 
> It seems that just 8U (or even 8) is enough.

I will correct it.

> 
>> +
>> +/*
>> + * wakeup_buddy() help to check whether p1 is the wakeup buddy of p2.
>> + *
>> + * Return 1 for yes, 0 for no.
>> +*/
>> +static inline int wakeup_buddy(struct task_struct *p1, struct task_struct *p2)
>> +{
>> +	if (p1->waker != p2 || p1->wakee != p2)
>> +		return 0;
>> +
>> +	if (p1->waker_ref < sysctl_sched_wakeup_buddy_ref)
>> +		return 0;
>> +
>> +	if (p1->wakee_ref < sysctl_sched_wakeup_buddy_ref)
>> +		return 0;
>> +
>> +	return 1;
>> +}
> [SNIP]
>> @@ -3399,6 +3490,8 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>>  unlock:
>>  	rcu_read_unlock();
>>  
>> +	wakeup_ref(p);
>> +
> 
> Why did you call it here?  Shouldn't it be on somewhere in the ttwu?

I'd like to put the changes closely, just another 'bad' habit ;-)

But you notified me that I should add a check on WAKEUP flag, will
correct it.

Regards,
Michael Wang

> 
> 
>>  	return new_cpu;
>>  }
>>  
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index c88878d..6845d24 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -424,6 +424,16 @@ static struct ctl_table kern_table[] = {
>>  		.extra1		= &one,
>>  	},
>>  #endif
>> +#ifdef CONFIG_SMP
>> +	{
>> +		.procname	= "sched_wakeup_buddy_ref",
>> +		.data		= &sysctl_sched_wakeup_buddy_ref,
>> +		.maxlen		= sizeof(unsigned int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_dointvec_minmax,
>> +		.extra1		= &one,
>> +	},
>> +#endif
>>  #ifdef CONFIG_PROVE_LOCKING
>>  	{
>>  		.procname	= "prove_locking",
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] sched: wakeup buddy
  2013-02-28 15:31     ` Namhyung Kim
@ 2013-03-01  2:30       ` Michael Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Wang @ 2013-03-01  2:30 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Mike Galbraith, LKML, Ingo Molnar, Peter Zijlstra, Paul Turner,
	Alex Shi, Andrew Morton, Ram Pai, Nikunj A. Dadhania

On 02/28/2013 11:31 PM, Namhyung Kim wrote:
> 2013-02-28 (목), 11:06 +0100, Mike Galbraith:
>> On Thu, 2013-02-28 at 18:25 +0900, Namhyung Kim wrote:
>>
>>> Not sure if it should require bidirectional relationship.  Looks like
>>> just for benchmarks.  Isn't there a one-way relationship that could get
>>> a benefit from this?  I don't know ;-)
>>
>> ??  Meaningful relationships are bare minimum bidirectional, how can you
>> describe one connection and have it remain meaningful?  I love "her" is
>> unlikely to lead to anything meaningful if "she" doesn't know you exist.
> 
> Maybe I misunderstood something.  I was thinking about typical
> cooperation models like manager-worker, producer-consumer or pipeline
> and thought that they are usually one-way relationship in terms of the
> wakeup.

I agree with Mike's point here, relax the restriction usually benefit
one model but damage more.

The whole wake_affine() stuff is somewhat blindly, we image that the
cache will benefit the wakee but could not estimate how much it is, and
the formula contain too many elements, I'd prefer to gamble only when
I'm likely to win, that will win less money, but lose less too ;-)

Regards,
Michael Wang

> 
> Thanks,
> Namhyung
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-03-01  2:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-28  6:38 [RFC PATCH] sched: wakeup buddy Michael Wang
2013-02-28  7:18 ` Mike Galbraith
2013-02-28  7:40   ` Michael Wang
2013-02-28  7:42     ` Michael Wang
2013-02-28  8:06       ` Mike Galbraith
2013-02-28  8:04     ` Mike Galbraith
2013-02-28  8:14       ` Michael Wang
2013-02-28  8:24         ` Mike Galbraith
2013-02-28  8:49           ` Michael Wang
2013-02-28  9:18             ` Mike Galbraith
2013-03-01  2:18               ` Michael Wang
2013-02-28  9:25 ` Namhyung Kim
2013-02-28 10:06   ` Mike Galbraith
2013-02-28 15:31     ` Namhyung Kim
2013-03-01  2:30       ` Michael Wang
2013-03-01  2:18   ` Michael Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox