clock drift in set_task

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* clock drift in set_task_cpu()
@ 2010-07-21 11:40 Jack Daniel
  2010-07-22  5:34 ` Jack Daniel
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jack Daniel @ 2010-07-21 11:40 UTC (permalink / raw)
  To: Peter Zijlstra, Peter Zijlstra, Ingo Molnar; +Cc: LKML

Hi Peter/Ingo,

I have a query with the kernel code that was changed not too long time
back in v2.6.33-rc1 commit id 5afcdab706d6002cb02b567ba46e650215e694e8
[tip:sched/urgent] sched: Remove rq->clock coupling from
set_task_cpu()

void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
{
int old_cpu = task_cpu(p);
struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
struct cfs_rq *old_cfsrq = task_cfs_rq(p),
     *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
u64 clock_offset;

clock_offset = old_rq->clock - new_rq->clock;
---

On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
sometimes larger than old_rq->clock and so clock_offset tends to warp
around leading to incorrect values. You have very correctly noted in
the commit header that all functions that access set_task_cpu() must
do so after a call to sched_clock_remote(), in this case the function
is sched_fork(). I validated by adding update_rq_clock(old_rq); into
set_task_cpu() and that seems to fix the issue. But I noticed that
since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
(sched_clock_stable)  in sched_clock_cpu() will yield to true and the
flow never gets to sched_clock_remote() or sched_clock_local().

What do you think is the best way to approach the problem *assuming
the older kernel*, since I believe the problem still exists? That is
to reinstate your axiom ".... which should ensure the observed time
between these two cpus is monotonic"

1) CONFIG_HAVE_UNSTABLE_SCHED_CLOCK cannot be disabled since it is set
by default for x86
2) Does one create a new function with just this line of code?
fix_clock_drift()
{
if (cpu != smp_processor_id())
		clock = sched_clock_remote(scd);
	else
		clock = sched_clock_local(scd);

	return clock;
}

Thanks and regards,
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
@ 2010-07-22  5:34 ` Jack Daniel
  2010-07-30 11:55 ` Jack Daniel
  2010-08-05  9:58 ` Peter Zijlstra
  2 siblings, 0 replies; 9+ messages in thread
From: Jack Daniel @ 2010-07-22  5:34 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Peter Zijlstra, Ingo Molnar

Greetings,

I would be much obliged if anyone can answer my below query. Any
guidance or advice is much appreciated. I believe the problem still
exists in the new kernel versions.

Thanks and regards,
Jack

On Wed, Jul 21, 2010 at 5:10 PM, Jack Daniel <wanders.thirst@gmail.com> wrote:
> Hi Peter/Ingo,
>
> I have a query with the kernel code that was changed not too long time
> back in v2.6.33-rc1 commit id 5afcdab706d6002cb02b567ba46e650215e694e8
> [tip:sched/urgent] sched: Remove rq->clock coupling from
> set_task_cpu()
>
> void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> {
> int old_cpu = task_cpu(p);
> struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
> struct cfs_rq *old_cfsrq = task_cfs_rq(p),
>      *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
> u64 clock_offset;
>
> clock_offset = old_rq->clock - new_rq->clock;
> ---
>
> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> sometimes larger than old_rq->clock and so clock_offset tends to warp
> around leading to incorrect values. You have very correctly noted in
> the commit header that all functions that access set_task_cpu() must
> do so after a call to sched_clock_remote(), in this case the function
> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> set_task_cpu() and that seems to fix the issue. But I noticed that
> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> flow never gets to sched_clock_remote() or sched_clock_local().
>
> What do you think is the best way to approach the problem *assuming
> the older kernel*, since I believe the problem still exists? That is
> to reinstate your axiom ".... which should ensure the observed time
> between these two cpus is monotonic"
>
> 1) CONFIG_HAVE_UNSTABLE_SCHED_CLOCK cannot be disabled since it is set
> by default for x86
> 2) Does one create a new function with just this line of pseudo code?
> fix_clock_drift()
> {
> if (cpu != smp_processor_id())
>                clock = sched_clock_remote(scd);
>        else
>                clock = sched_clock_local(scd);
>
>        return clock;
> }
>
> Thanks and regards,
> Jack
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
  2010-07-22  5:34 ` Jack Daniel
@ 2010-07-30 11:55 ` Jack Daniel
  2010-08-05  9:58 ` Peter Zijlstra
  2 siblings, 0 replies; 9+ messages in thread
From: Jack Daniel @ 2010-07-30 11:55 UTC (permalink / raw)
  To: Peter Zijlstra, Peter Zijlstra, Ingo Molnar; +Cc: LKML

Hi Peter,

As a follow up on this...

On Wed, Jul 21, 2010 at 5:10 PM, Jack Daniel <wanders.thirst@gmail.com> wrote:
> Hi Peter/Ingo,
>
> I have a query with the kernel code that was changed not too long time
> back in v2.6.33-rc1 commit id 5afcdab706d6002cb02b567ba46e650215e694e8
> [tip:sched/urgent] sched: Remove rq->clock coupling from
> set_task_cpu()
>
> void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> {
> int old_cpu = task_cpu(p);
> struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu);
> struct cfs_rq *old_cfsrq = task_cfs_rq(p),
>      *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
> u64 clock_offset;
>
> clock_offset = old_rq->clock - new_rq->clock;
> ---
>
> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> sometimes larger than old_rq->clock and so clock_offset tends to warp
> around leading to incorrect values. You have very correctly noted in
> the commit header that all functions that access set_task_cpu() must
> do so after a call to sched_clock_remote(), in this case the function
> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> set_task_cpu() and that seems to fix the issue. But I noticed that
> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> flow never gets to sched_clock_remote() or sched_clock_local().
>
> What do you think is the best way to approach the problem *assuming
> the older kernel*, since I believe the problem still exists? That is
> to reinstate your axiom ".... which should ensure the observed time
> between these two cpus is monotonic"
>
> 1) CONFIG_HAVE_UNSTABLE_SCHED_CLOCK cannot be disabled since it is set
> by default for x86
> 2) Does one create a new function with just this line of code?
> fix_clock_drift()
> {
> if (cpu != smp_processor_id())
>                clock = sched_clock_remote(scd);
>        else
>                clock = sched_clock_local(scd);
>
>        return clock;
> }
>

I bet you would have had come across this problem and hence chose to
surgically remove the impeding code with commit 5afcdab. I now think
it was a good choice but the right thing would have been to correct
the problem itself. I think this code should have solved the problem.

diff --git a/kernel/sched.c b/kernel/sched.c
index 1d39b00..5fd63f2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2068,6 +2068,13 @@ void set_task_cpu(struct task_struct *p,
unsigned int new_cpu)
        struct cfs_rq *old_cfsrq = task_cfs_rq(p),
                      *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
        u64 clock_offset;
+       unsigned long flags;
+
+       rmb();
+       local_irq_save(flags);
+       update_rq_clock(old_rq);
+       update_rq_clock(new_rq);
+       local_irq_restore(flags);


Thanks and regards,
Jack

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
  2010-07-22  5:34 ` Jack Daniel
  2010-07-30 11:55 ` Jack Daniel
@ 2010-08-05  9:58 ` Peter Zijlstra
  2010-08-09 13:17   ` Jack Daniel
  2010-08-20 14:16   ` [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2010-08-05  9:58 UTC (permalink / raw)
  To: Jack Daniel; +Cc: Ingo Molnar, LKML, Mike Galbraith

On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> sometimes larger than old_rq->clock and so clock_offset tends to warp
> around leading to incorrect values.

What values get incorrect, do you observe vruntime funnies or only the
schedstat values?

>  You have very correctly noted in
> the commit header that all functions that access set_task_cpu() must
> do so after a call to sched_clock_remote(), in this case the function
> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> set_task_cpu() and that seems to fix the issue. 

Ah, so the problem is that task_fork_fair() does the task placement
without updated rq clocks? In which case I think we should at least do
an update_rq_clock(rq) in there (see the below patch).

> But I noticed that
> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> flow never gets to sched_clock_remote() or sched_clock_local().

sched_clock_stable being true implies the clock is stable across cores
and thus it shouldn't matter. Or are you saying you're seeing it being
set and still have issues?

> I bet you would have had come across this problem and hence chose to
> surgically remove the impeding code with commit 5afcdab. I now think
> it was a good choice but the right thing would have been to correct
> the problem itself. I think this code should have solved the problem.
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 1d39b00..5fd63f2 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2068,6 +2068,13 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>         struct cfs_rq *old_cfsrq = task_cfs_rq(p),
>                       *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
>         u64 clock_offset;
> +       unsigned long flags;
> +
> +       rmb();
> +       local_irq_save(flags);
> +       update_rq_clock(old_rq);
> +       update_rq_clock(new_rq);
> +       local_irq_restore(flags);

The problem here is that your patch introduces the exact race I was
trying to close. We can only access rq->clock when also holding the
appropriate rq->lock, disabling IRQs, while also required is not
sufficient.

[ Also, that rmb() looks just plain wrong ]

Anyway, does the below cure your trouble? (Also, could you describe your
actually observed trouble in more detail?)

---

Subject: sched: ensure rq->clock get sync'ed when migrating tasks

sched_fork() -- we do task placement in ->task_fork_fair() ensure we
  update_rq_clock() so we work with current time. We leave the vruntime
  in relative state, so the time delay until wake_up_new_task() doesn't
  matter.

wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
  relative state we can safely migrate, the activate_task() on the
  remote rq will call update_rq_clock() and causes the clock to be
  synced (enough).

try_to_wake_up() -- In case we'll migrate, we need to update the old
  rq->clock, the activate_task() in ttwu_activate() will already update
  the new rq->clock, and thus the clocks will get sync'ed.

load-balance -- Migrating running tasks always happens with both rq's
  locked, either through double_rq_lock() or double_lock_balance(). So
  sync the rq->clock there.

The problem seems to be that this all could result in too many rq->clock
updates, which are expensive.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c      |   20 ++++++++++++++++++--
 kernel/sched_fair.c |    2 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index d3c0262..69584b4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1756,13 +1756,22 @@ static int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
  */
 static int double_lock_balance(struct rq *this_rq, struct rq *busiest)
 {
+	int ret;
+
+#ifdef CONFIG_SCHED_DEBUG
 	if (unlikely(!irqs_disabled())) {
 		/* printk() doesn't work good under rq->lock */
 		raw_spin_unlock(&this_rq->lock);
 		BUG_ON(1);
 	}
+#endif
+
+	ret = _double_lock_balance(this_rq, busiest);
+
+	update_rq_clock(this_rq);
+	update_rq_clock(busiest);
 
-	return _double_lock_balance(this_rq, busiest);
+	return ret;
 }
 
 static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
@@ -1782,7 +1791,9 @@ static void double_rq_lock(struct rq *rq1, struct rq *rq2)
 	__acquires(rq1->lock)
 	__acquires(rq2->lock)
 {
+#ifdef CONFIG_SCHED_DEBUG
 	BUG_ON(!irqs_disabled());
+#endif
 	if (rq1 == rq2) {
 		raw_spin_lock(&rq1->lock);
 		__acquire(rq2->lock);	/* Fake it out ;) */
@@ -1795,6 +1806,9 @@ static void double_rq_lock(struct rq *rq1, struct rq *rq2)
 			raw_spin_lock_nested(&rq1->lock, SINGLE_DEPTH_NESTING);
 		}
 	}
+
+	update_rq_clock(rq1);
+	update_rq_clock(rq2);
 }
 
 /*
@@ -2395,8 +2409,10 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	}
 
 	cpu = select_task_rq(rq, p, SD_BALANCE_WAKE, wake_flags);
-	if (cpu != orig_cpu)
+	if (cpu != orig_cpu) {
 		set_task_cpu(p, cpu);
+		update_rq_clock(rq);
+	}
 	__task_rq_unlock(rq);
 
 	rq = cpu_rq(cpu);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..f816e74 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
 
+	update_rq_clock(rq);
+
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-08-05  9:58 ` Peter Zijlstra
@ 2010-08-09 13:17   ` Jack Daniel
  2010-08-09 14:56     ` Philby John
  2010-08-20 14:16   ` [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Jack Daniel @ 2010-08-09 13:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML, Mike Galbraith

On Thu, Aug 5, 2010 at 3:28 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
>> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
>> sometimes larger than old_rq->clock and so clock_offset tends to warp
>> around leading to incorrect values.
>
> What values get incorrect, do you observe vruntime funnies or only the
> schedstat values?

Just the schedstat values, did not observe anything wrong with vruntime.

>
>>  You have very correctly noted in
>> the commit header that all functions that access set_task_cpu() must
>> do so after a call to sched_clock_remote(), in this case the function
>> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
>> set_task_cpu() and that seems to fix the issue.
>
> Ah, so the problem is that task_fork_fair() does the task placement
> without updated rq clocks? In which case I think we should at least do
> an update_rq_clock(rq) in there (see the below patch).

Yes, this is indeed the problem and your patch seems to fix the issue.

>
>> But I noticed that
>> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
>> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
>> flow never gets to sched_clock_remote() or sched_clock_local().
>
> sched_clock_stable being true implies the clock is stable across cores
> and thus it shouldn't matter. Or are you saying you're seeing it being
> set and still have issues?
>

Please ignore these comments, initial debugging set me on the wrong
foot, to suggest that TSC is unstable.

> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..f816e74 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
>
>        raw_spin_lock_irqsave(&rq->lock, flags);
>
> +       update_rq_clock(rq);

As you rightly pointed out above, updating the clocks in
task_fork_fair() will rightly fix the issue. Can get rid of rest of
the update_rq_clock() functions as they (like you said), are expensive
and I tested commenting them out.

Thanks,
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-08-09 13:17   ` Jack Daniel
@ 2010-08-09 14:56     ` Philby John
  2010-09-06  6:34       ` Jack Daniel
  0 siblings, 1 reply; 9+ messages in thread
From: Philby John @ 2010-08-09 14:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Jack Daniel, Ingo Molnar, LKML, Mike Galbraith

On Mon, 2010-08-09 at 18:47 +0530, Jack Daniel wrote:
> On Thu, Aug 5, 2010 at 3:28 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
> >> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
> >> sometimes larger than old_rq->clock and so clock_offset tends to warp
> >> around leading to incorrect values.
> >
> > What values get incorrect, do you observe vruntime funnies or only the
> > schedstat values?
> 
> Just the schedstat values, did not observe anything wrong with vruntime.
> 
> >
> >>  You have very correctly noted in
> >> the commit header that all functions that access set_task_cpu() must
> >> do so after a call to sched_clock_remote(), in this case the function
> >> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
> >> set_task_cpu() and that seems to fix the issue.
> >
> > Ah, so the problem is that task_fork_fair() does the task placement
> > without updated rq clocks? In which case I think we should at least do
> > an update_rq_clock(rq) in there (see the below patch).
> 
> Yes, this is indeed the problem and your patch seems to fix the issue.
> 
> >
> >> But I noticed that
> >> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
> >> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
> >> flow never gets to sched_clock_remote() or sched_clock_local().
> >
> > sched_clock_stable being true implies the clock is stable across cores
> > and thus it shouldn't matter. Or are you saying you're seeing it being
> > set and still have issues?
> >
> 
> Please ignore these comments, initial debugging set me on the wrong
> foot, to suggest that TSC is unstable.
> 
> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > index 9910e1b..f816e74 100644
> > --- a/kernel/sched_fair.c
> > +++ b/kernel/sched_fair.c
> > @@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
> >
> >        raw_spin_lock_irqsave(&rq->lock, flags);
> >
> > +       update_rq_clock(rq);
> 
> As you rightly pointed out above, updating the clocks in
> task_fork_fair() will rightly fix the issue. Can get rid of rest of
> the update_rq_clock() functions as they (like you said), are expensive
> and I tested commenting them out.

>From 1bc695bc2ac6c941724953b29f6c18196a474b8f Mon Sep 17 00:00:00 2001
From: Philby John <pjohn@mvista.com>
Date: Mon, 9 Aug 2010 18:19:08 +0530
Subject: [PATCH] sched: ensure rq->clock get sync'ed when migrating tasks

In sched_fork() when we do task placement in ->task_fork_fair()
ensure we update_rq_clock() so we work with current time. This has
been noted and verified on an Intel Greencity (Xeon 55xx)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Philby John <pjohn@mvista.com>
---
 kernel/sched_fair.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 806d1b2..48bc31c 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3751,7 +3751,7 @@ static void task_fork_fair(struct task_struct *p)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
-
+	update_rq_clock(rq);
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 
-- 
1.6.3.3.333.g4d53f





^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-08-09 14:56     ` Philby John
@ 2010-09-06  6:34       ` Jack Daniel
  2010-09-06  6:52         ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Jack Daniel @ 2010-09-06  6:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, LKML, Mike Galbraith, pjohn

Hi Ingo,

On Mon, Aug 9, 2010 at 8:26 PM, Philby John <pjohn@mvista.com> wrote:
> On Mon, 2010-08-09 at 18:47 +0530, Jack Daniel wrote:
>> On Thu, Aug 5, 2010 at 3:28 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, 2010-07-21 at 17:10 +0530, Jack Daniel wrote:
>> >> On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is
>> >> sometimes larger than old_rq->clock and so clock_offset tends to warp
>> >> around leading to incorrect values.
>> >
>> > What values get incorrect, do you observe vruntime funnies or only the
>> > schedstat values?
>>
>> Just the schedstat values, did not observe anything wrong with vruntime.
>>
>> >
>> >>  You have very correctly noted in
>> >> the commit header that all functions that access set_task_cpu() must
>> >> do so after a call to sched_clock_remote(), in this case the function
>> >> is sched_fork(). I validated by adding update_rq_clock(old_rq); into
>> >> set_task_cpu() and that seems to fix the issue.
>> >
>> > Ah, so the problem is that task_fork_fair() does the task placement
>> > without updated rq clocks? In which case I think we should at least do
>> > an update_rq_clock(rq) in there (see the below patch).
>>
>> Yes, this is indeed the problem and your patch seems to fix the issue.
>>
>> >
>> >> But I noticed that
>> >> since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if
>> >> (sched_clock_stable)  in sched_clock_cpu() will yield to true and the
>> >> flow never gets to sched_clock_remote() or sched_clock_local().
>> >
>> > sched_clock_stable being true implies the clock is stable across cores
>> > and thus it shouldn't matter. Or are you saying you're seeing it being
>> > set and still have issues?
>> >
>>
>> Please ignore these comments, initial debugging set me on the wrong
>> foot, to suggest that TSC is unstable.
>>
>> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> > index 9910e1b..f816e74 100644
>> > --- a/kernel/sched_fair.c
>> > +++ b/kernel/sched_fair.c
>> > @@ -3751,6 +3751,8 @@ static void task_fork_fair(struct task_struct *p)
>> >
>> >        raw_spin_lock_irqsave(&rq->lock, flags);
>> >
>> > +       update_rq_clock(rq);
>>
>> As you rightly pointed out above, updating the clocks in
>> task_fork_fair() will rightly fix the issue. Can get rid of rest of
>> the update_rq_clock() functions as they (like you said), are expensive
>> and I tested commenting them out.
>
> >From 1bc695bc2ac6c941724953b29f6c18196a474b8f Mon Sep 17 00:00:00 2001
> From: Philby John <pjohn@mvista.com>
> Date: Mon, 9 Aug 2010 18:19:08 +0530
> Subject: [PATCH] sched: ensure rq->clock get sync'ed when migrating tasks
>
> In sched_fork() when we do task placement in ->task_fork_fair()
> ensure we update_rq_clock() so we work with current time. This has
> been noted and verified on an Intel Greencity (Xeon 55xx)
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Philby John <pjohn@mvista.com>
> ---
>  kernel/sched_fair.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 806d1b2..48bc31c 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3751,7 +3751,7 @@ static void task_fork_fair(struct task_struct *p)
>        unsigned long flags;
>
>        raw_spin_lock_irqsave(&rq->lock, flags);
> -
> +       update_rq_clock(rq);
>        if (unlikely(task_cpu(p) != this_cpu))
>                __set_task_cpu(p, this_cpu);
>

Any chance that you will be pulling in this fix ?

Regards,
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: clock drift in set_task_cpu()
  2010-09-06  6:34       ` Jack Daniel
@ 2010-09-06  6:52         ` Ingo Molnar
  0 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2010-09-06  6:52 UTC (permalink / raw)
  To: Jack Daniel; +Cc: Peter Zijlstra, LKML, Mike Galbraith, pjohn


* Jack Daniel <wanders.thirst@gmail.com> wrote:

> Hi Ingo,
>
> [...]
> 
> Any chance that you will be pulling in this fix ?

It's already upstream, as of v2.6.36-rc3. See the commit notification 
email below from Aug 19.

Thanks,

	Ingo

----- Forwarded message from tip-bot for Peter Zijlstra <peterz@infradead.org> -----

Date: Fri, 20 Aug 2010 14:16:14 GMT
From: tip-bot for Peter Zijlstra <peterz@infradead.org>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@redhat.com,
	a.p.zijlstra@chello.nl, peterz@infradead.org, pjohn@mvista.com,
	wanders.thirst@gmail.com, tglx@linutronix.de, mingo@elte.hu
Subject: [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks

Commit-ID:  861d034ee814917a83bd5de4b26e3b8336ddeeb8
Gitweb:     http://git.kernel.org/tip/861d034ee814917a83bd5de4b26e3b8336ddeeb8
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 19 Aug 2010 13:31:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 14:59:01 +0200

sched: Fix rq->clock synchronization when migrating tasks

sched_fork() -- we do task placement in ->task_fork_fair() ensure we
  update_rq_clock() so we work with current time. We leave the vruntime
  in relative state, so the time delay until wake_up_new_task() doesn't
  matter.

wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
  relative state we can safely migrate, the activate_task() on the
  remote rq will call update_rq_clock() and causes the clock to be
  synced (enough).

Tested-by: Jack Daniel <wanders.thirst@gmail.com>
Tested-by: Philby John <pjohn@mvista.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281002322.1923.1708.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 806d1b2..ab661eb 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3752,6 +3752,8 @@ static void task_fork_fair(struct task_struct *p)
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
 
+	update_rq_clock(rq);
+
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 

----- End forwarded message -----

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks
  2010-08-05  9:58 ` Peter Zijlstra
  2010-08-09 13:17   ` Jack Daniel
@ 2010-08-20 14:16   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-08-20 14:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, peterz, pjohn,
	wanders.thirst, tglx, mingo

Commit-ID:  861d034ee814917a83bd5de4b26e3b8336ddeeb8
Gitweb:     http://git.kernel.org/tip/861d034ee814917a83bd5de4b26e3b8336ddeeb8
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 19 Aug 2010 13:31:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 20 Aug 2010 14:59:01 +0200

sched: Fix rq->clock synchronization when migrating tasks

sched_fork() -- we do task placement in ->task_fork_fair() ensure we
  update_rq_clock() so we work with current time. We leave the vruntime
  in relative state, so the time delay until wake_up_new_task() doesn't
  matter.

wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
  relative state we can safely migrate, the activate_task() on the
  remote rq will call update_rq_clock() and causes the clock to be
  synced (enough).

Tested-by: Jack Daniel <wanders.thirst@gmail.com>
Tested-by: Philby John <pjohn@mvista.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281002322.1923.1708.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 806d1b2..ab661eb 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3752,6 +3752,8 @@ static void task_fork_fair(struct task_struct *p)
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
 
+	update_rq_clock(rq);
+
 	if (unlikely(task_cpu(p) != this_cpu))
 		__set_task_cpu(p, this_cpu);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-09-06  6:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-21 11:40 clock drift in set_task_cpu() Jack Daniel
2010-07-22  5:34 ` Jack Daniel
2010-07-30 11:55 ` Jack Daniel
2010-08-05  9:58 ` Peter Zijlstra
2010-08-09 13:17   ` Jack Daniel
2010-08-09 14:56     ` Philby John
2010-09-06  6:34       ` Jack Daniel
2010-09-06  6:52         ` Ingo Molnar
2010-08-20 14:16   ` [tip:sched/urgent] sched: Fix rq->clock synchronization when migrating tasks tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).