[PATCH] Revert "cpuidle: Replace ktime_get() with local

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
@ 2017-04-20 12:44 ville.syrjala
  2017-04-20 13:07 ` Daniel Lezcano
  2017-04-20 13:08 ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: ville.syrjala @ 2017-04-20 12:44 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, stable, Daniel Lezcano, Peter Zijlstra,
	Rafael J . Wysocki, Ville Syrjälä

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.

The TSC stops in deeper C states, so using local_clock() in cpuidle
to track the C state residency seems like a bad idea. With local_clock()
powertop is reporting mostly 0% residency for C states here. Presumably
the core is still spending most of its time in some deep C-state since
the totals typically add up to only 5% or so, so perhaps the governor
isn't getting totally confused by these bogus numbers. But let's go
back to using ktime_get() as that at least works correctly across the
board.

Note that the code has changed somewhat since the regression happened,
so this isn't a 1:1 revert of the offending commit.

Cc: stable@vger.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/cpuidle/cpuidle.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 548b90be7685..24a52805527f 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -213,13 +213,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	sched_idle_set_state(target_state);

 	trace_cpu_idle_rcuidle(index, dev->cpu);
-	time_start = ns_to_ktime(local_clock());
+	time_start = ktime_get();

 	stop_critical_timings();
 	entered_state = target_state->enter(dev, drv, index);
 	start_critical_timings();

-	time_end = ns_to_ktime(local_clock());
+	time_end = ktime_get();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);

 	/* The cpu is no longer idle or about to enter idle. */
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
  2017-04-20 12:44 [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()" ville.syrjala
@ 2017-04-20 13:07 ` Daniel Lezcano
  2017-04-20 13:08 ` Peter Zijlstra
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Lezcano @ 2017-04-20 13:07 UTC (permalink / raw)
  To: ville.syrjala
  Cc: linux-pm, linux-kernel, stable, Peter Zijlstra,
	Rafael J . Wysocki

On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> 
> The TSC stops in deeper C states, so using local_clock() in cpuidle
> to track the C state residency seems like a bad idea. With local_clock()
> powertop is reporting mostly 0% residency for C states here. Presumably
> the core is still spending most of its time in some deep C-state since
> the totals typically add up to only 5% or so, so perhaps the governor
> isn't getting totally confused by these bogus numbers. But let's go
> back to using ktime_get() as that at least works correctly across the
> board.

The local clock is faster, more accurate and more stable. We saw ktime_get()
can be expensive, especially on slower CPUs.

Why not add flag for the idle state to tell the local clocksource stops and use
in this case ktime_get() ?

This flag can be set on the idle state at init time in intel_idle.c around:

	...
        if (((mwait_cstate + 1) > 2) &&
              !boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
               mark_tsc_unstable("TSC halts in idle"
               " states deeper than C2");
	...

and in processor_idlec.c around:

	...
	tsc_check_state(cx->type);
	...

So we keep using local_clock() in most of the cases, for most of the boards.


 
> Note that the code has changed somewhat since the regression happened,
> so this isn't a 1:1 revert of the offending commit.
> 
> Cc: stable@vger.kernel.org
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/cpuidle/cpuidle.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 548b90be7685..24a52805527f 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -213,13 +213,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	sched_idle_set_state(target_state);
>  
>  	trace_cpu_idle_rcuidle(index, dev->cpu);
> -	time_start = ns_to_ktime(local_clock());
> +	time_start = ktime_get();
>  
>  	stop_critical_timings();
>  	entered_state = target_state->enter(dev, drv, index);
>  	start_critical_timings();
>  
> -	time_end = ns_to_ktime(local_clock());
> +	time_end = ktime_get();
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
>  	/* The cpu is no longer idle or about to enter idle. */
> -- 
> 2.10.2
> 

-- 

 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
  2017-04-20 12:44 [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()" ville.syrjala
  2017-04-20 13:07 ` Daniel Lezcano
@ 2017-04-20 13:08 ` Peter Zijlstra
  2017-04-20 13:43   ` Ville Syrjälä
  2017-04-20 14:37   ` Daniel Lezcano
  1 sibling, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-04-20 13:08 UTC (permalink / raw)
  To: ville.syrjala
  Cc: linux-pm, linux-kernel, stable, Daniel Lezcano,
	Rafael J . Wysocki

On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjï¿½lï¿½ <ville.syrjala@linux.intel.com>
> 
> This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> 
> The TSC stops in deeper C states, 

On some old hardware (Core2 era and before) only. You've forgotten to
mention what hardware you've observed problems with.

> so using local_clock() in cpuidle

But on said hardware, local_clock() isn't an immediate TSC user.

> to track the C state residency seems like a bad idea. With local_clock()
> powertop is reporting mostly 0% residency for C states here. Presumably
> the core is still spending most of its time in some deep C-state since
> the totals typically add up to only 5% or so, so perhaps the governor
> isn't getting totally confused by these bogus numbers. But let's go
> back to using ktime_get() as that at least works correctly across the
> board.

Does this cure it?

---
 drivers/cpuidle/cpuidle.c | 2 ++
 kernel/sched/clock.c      | 7 +++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 548b90be7685..e0d4ad108887 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -219,6 +219,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	entered_state = target_state->enter(dev, drv, index);
 	start_critical_timings();
 
+	sched_clock_idle_wakeup_event(0);
+
 	time_end = ns_to_ktime(local_clock());
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 00a45c45beca..15e848706be4 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -347,6 +347,9 @@ void sched_clock_tick(void)
 {
 	struct sched_clock_data *scd;
 
+	if (timekeeping_suspended)
+		return;
+
 	WARN_ON_ONCE(!irqs_disabled());
 
 	/*
@@ -378,11 +381,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
  */
 void sched_clock_idle_wakeup_event(u64 delta_ns)
 {
-	if (timekeeping_suspended)
-		return;
-
 	sched_clock_tick();
-	touch_softlockup_watchdog_sched();
 }
 EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
  2017-04-20 13:08 ` Peter Zijlstra
@ 2017-04-20 13:43   ` Ville Syrjälä
  2017-04-20 13:49     ` Peter Zijlstra
  2017-04-20 14:37   ` Daniel Lezcano
  1 sibling, 1 reply; 8+ messages in thread
From: Ville Syrjälä @ 2017-04-20 13:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, stable, Daniel Lezcano,
	Rafael J . Wysocki

On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjï¿½lï¿½ <ville.syrjala@linux.intel.com>
> > 
> > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > 
> > The TSC stops in deeper C states, 
> 
> On some old hardware (Core2 era and before) only. You've forgotten to
> mention what hardware you've observed problems with.

Yeah, Core2 is what I used when I finally decided to bisect this. I've
been plagued by the bogus powertop numbers on many machines, most
likely all of them were of some older vintage.

> 
> > so using local_clock() in cpuidle
> 
> But on said hardware, local_clock() isn't an immediate TSC user.
> 
> > to track the C state residency seems like a bad idea. With local_clock()
> > powertop is reporting mostly 0% residency for C states here. Presumably
> > the core is still spending most of its time in some deep C-state since
> > the totals typically add up to only 5% or so, so perhaps the governor
> > isn't getting totally confused by these bogus numbers. But let's go
> > back to using ktime_get() as that at least works correctly across the
> > board.
> 
> Does this cure it?

It does indeed.

Tested-by: Ville Syrjï¿½lï¿½ <ville.syrjala@linux.intel.com>

> 
> ---
>  drivers/cpuidle/cpuidle.c | 2 ++
>  kernel/sched/clock.c      | 7 +++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 548b90be7685..e0d4ad108887 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -219,6 +219,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	entered_state = target_state->enter(dev, drv, index);
>  	start_critical_timings();
>  
> +	sched_clock_idle_wakeup_event(0);
> +
>  	time_end = ns_to_ktime(local_clock());
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 00a45c45beca..15e848706be4 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -347,6 +347,9 @@ void sched_clock_tick(void)
>  {
>  	struct sched_clock_data *scd;
>  
> +	if (timekeeping_suspended)
> +		return;
> +
>  	WARN_ON_ONCE(!irqs_disabled());
>  
>  	/*
> @@ -378,11 +381,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
>   */
>  void sched_clock_idle_wakeup_event(u64 delta_ns)
>  {
> -	if (timekeeping_suspended)
> -		return;
> -
>  	sched_clock_tick();
> -	touch_softlockup_watchdog_sched();
>  }
>  EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
>  

-- 
Ville Syrjï¿½lï¿½
Intel OTC

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
  2017-04-20 13:43   ` Ville Syrjälä
@ 2017-04-20 13:49     ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-04-20 13:49 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-pm, linux-kernel, stable, Daniel Lezcano,
	Rafael J . Wysocki

On Thu, Apr 20, 2017 at 04:43:45PM +0300, Ville Syrjï¿½lï¿½ wrote:
> On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjï¿½lï¿½ <ville.syrjala@linux.intel.com>
> > > 
> > > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > > 
> > > The TSC stops in deeper C states, 
> > 
> > On some old hardware (Core2 era and before) only. You've forgotten to
> > mention what hardware you've observed problems with.
> 
> Yeah, Core2 is what I used when I finally decided to bisect this. I've
> been plagued by the bogus powertop numbers on many machines, most
> likely all of them were of some older vintage.

> Tested-by: Ville Syrjï¿½lï¿½ <ville.syrjala@linux.intel.com>

OK, thanks. I'm currently chasing some other Core2 issue that is
somewhat related. See:

  http://lkml.kernel.org/r/20170413132349.thxkwptdymsfsyxb@hirez.programming.kicks-ass.net

Once I have that sorted I'll post both patches.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
  2017-04-20 13:08 ` Peter Zijlstra
  2017-04-20 13:43   ` Ville Syrjälä
@ 2017-04-20 14:37   ` Daniel Lezcano
  2017-04-20 14:41     ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Lezcano @ 2017-04-20 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ville.syrjala, linux-pm, linux-kernel, stable, Rafael J . Wysocki

On Thu, Apr 20, 2017 at 03:08:13PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:44:47PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > This reverts commit e93e59ce5b85e6c2b444f09fd1f707274ec066dc.
> > 
> > The TSC stops in deeper C states, 
> 
> On some old hardware (Core2 era and before) only. You've forgotten to
> mention what hardware you've observed problems with.
> 
> > so using local_clock() in cpuidle
> 
> But on said hardware, local_clock() isn't an immediate TSC user.
> 
> > to track the C state residency seems like a bad idea. With local_clock()
> > powertop is reporting mostly 0% residency for C states here. Presumably
> > the core is still spending most of its time in some deep C-state since
> > the totals typically add up to only 5% or so, so perhaps the governor
> > isn't getting totally confused by these bogus numbers. But let's go
> > back to using ktime_get() as that at least works correctly across the
> > board.
> 
> Does this cure it?
> 
> ---
>  drivers/cpuidle/cpuidle.c | 2 ++
>  kernel/sched/clock.c      | 7 +++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 548b90be7685..e0d4ad108887 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -219,6 +219,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	entered_state = target_state->enter(dev, drv, index);
>  	start_critical_timings();
>  
> +	sched_clock_idle_wakeup_event(0);
> +

Is it planned to skip this if the tsc is reliable?

>  	time_end = ns_to_ktime(local_clock());
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
>  
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 00a45c45beca..15e848706be4 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -347,6 +347,9 @@ void sched_clock_tick(void)
>  {
>  	struct sched_clock_data *scd;
>  
> +	if (timekeeping_suspended)
> +		return;
> +
>  	WARN_ON_ONCE(!irqs_disabled());
>  
>  	/*
> @@ -378,11 +381,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
>   */
>  void sched_clock_idle_wakeup_event(u64 delta_ns)
>  {
> -	if (timekeeping_suspended)
> -		return;
> -
>  	sched_clock_tick();
> -	touch_softlockup_watchdog_sched();
>  }
>  EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
>  

-- 

 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
  2017-04-20 14:37   ` Daniel Lezcano
@ 2017-04-20 14:41     ` Peter Zijlstra
  2017-04-20 14:47       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2017-04-20 14:41 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: ville.syrjala, linux-pm, linux-kernel, stable, Rafael J . Wysocki

On Thu, Apr 20, 2017 at 04:37:51PM +0200, Daniel Lezcano wrote:

> >  
> > +	sched_clock_idle_wakeup_event(0);
> > +
> 
> Is it planned to skip this if the tsc is reliable?

Yes. Current code doesn't quite do that, but if you follow that link I
send earlier, I'm about to fix that (again).

Now, if only this Core2 piece of crap would actually boot a recent
kernel, I could go figure out wth is wrong.. :/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()"
  2017-04-20 14:41     ` Peter Zijlstra
@ 2017-04-20 14:47       ` Peter Zijlstra
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-04-20 14:47 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: ville.syrjala, linux-pm, linux-kernel, stable, Rafael J . Wysocki

On Thu, Apr 20, 2017 at 04:41:46PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 04:37:51PM +0200, Daniel Lezcano wrote:
> 
> > >  
> > > +	sched_clock_idle_wakeup_event(0);
> > > +
> > 
> > Is it planned to skip this if the tsc is reliable?
> 
> Yes. Current code doesn't quite do that, but if you follow that link I
> send earlier, I'm about to fix that (again).

Bugger, just noticed that never went to the list... see below.

Was reported to not work sufficiently well, now fighting my Core2 box
that hasn't been booted/updated in ages.

---
Subject: sched/clock,x86/tsc: Improve clock continuity for stable->unstable transition
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Apr 13 14:56:44 CEST 2017

Marta reported that commit:

  7b09cc5a9deb ("sched/clock: Fix broken stable to unstable transfer")

appeared to have broken things on a Core2Duo machine. While that patch
is in fact correct, it exposes a problem with commit:

  5680d8094ffa ("sched/clock: Provide better clock continuity")

Where we hoped that TSC would not make big jumps after SMP bringup. Of
course, TSC needs to prove us wrong. Because Core2 comes up with a
semi-stable TSC and only goes funny once we probe the idle drivers,
because Core2 stops TSC on idle.

Now we could of course delay the final switch to stable longer, but it
would be better to entirely remove the assumption that TSC doesn't
make big jumps and improve things all-round.

So instead we have the clocksource watchdog call a special function
when it finds the TSC is still good (there's a race, it could've
gotten bad between us determining it's still good and calling our
function, do we care?).

This function then updates the __gtod_offset using sane values, which
is the value needed for clock continuity when being marked unstable.

Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Martin Peres <martin.peres@linux.intel.com>
Reported-by: "Lofstedt, Marta" <marta.lofstedt@intel.com>
Fixes: 5680d8094ffa ("sched/clock: Provide better clock continuity")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/tsc.c       |   12 ++++++++++
 include/linux/clocksource.h |    1 
 include/linux/sched/clock.h |    2 -
 kernel/sched/clock.c        |   50 ++++++++++++++++++++++++--------------------
 kernel/time/clocksource.c   |    3 ++
 5 files changed, 45 insertions(+), 23 deletions(-)

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -374,6 +374,8 @@ static int __init tsc_setup(char *str)
 		tsc_clocksource_reliable = 1;
 	if (!strncmp(str, "noirqtime", 9))
 		no_sched_irq_time = 1;
+	if (!strcmp(str, "unstable"))
+		mark_tsc_unstable("boot parameter");
 	return 1;
 }
 
@@ -1127,6 +1129,15 @@ static void tsc_cs_mark_unstable(struct
 	pr_info("Marking TSC unstable due to clocksource watchdog\n");
 }
 
+static void tsc_cs_tick_stable(struct clocksource *cs)
+{
+	if (tsc_unstable)
+		return;
+
+	if (using_native_sched_clock())
+		sched_clock_tick_stable();
+}
+
 /*
  * .mask MUST be CLOCKSOURCE_MASK(64). See comment above read_tsc()
  */
@@ -1140,6 +1151,7 @@ static struct clocksource clocksource_ts
 	.archdata               = { .vclock_mode = VCLOCK_TSC },
 	.resume			= tsc_resume,
 	.mark_unstable		= tsc_cs_mark_unstable,
+	.tick_stable		= tsc_cs_tick_stable,
 };
 
 void mark_tsc_unstable(char *reason)
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -96,6 +96,7 @@ struct clocksource {
 	void (*suspend)(struct clocksource *cs);
 	void (*resume)(struct clocksource *cs);
 	void (*mark_unstable)(struct clocksource *cs);
+	void (*tick_stable)(struct clocksource *cs);
 
 	/* private: */
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -63,8 +63,8 @@ extern void clear_sched_clock_stable(voi
  */
 extern u64 __sched_clock_offset;
 
-
 extern void sched_clock_tick(void);
+extern void sched_clock_tick_stable(void);
 extern void sched_clock_idle_sleep_event(void);
 extern void sched_clock_idle_wakeup_event(u64 delta_ns);
 
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -152,25 +152,15 @@ static void __clear_sched_clock_stable(v
 {
 	struct sched_clock_data *scd = this_scd();
 
-	/*
-	 * Attempt to make the stable->unstable transition continuous.
-	 *
-	 * Trouble is, this is typically called from the TSC watchdog
-	 * timer, which is late per definition. This means the tick
-	 * values can already be screwy.
-	 *
-	 * Still do what we can.
-	 */
-	__gtod_offset = (scd->tick_raw + __sched_clock_offset) - (scd->tick_gtod);
+	if (!sched_clock_stable())
+		return;
 
 	printk(KERN_INFO "sched_clock: Marking unstable (%lld, %lld)<-(%lld, %lld)\n",
 			scd->tick_gtod, __gtod_offset,
 			scd->tick_raw,  __sched_clock_offset);
 
 	tick_dep_set(TICK_DEP_BIT_CLOCK_UNSTABLE);
-
-	if (sched_clock_stable())
-		schedule_work(&sched_clock_work);
+	schedule_work(&sched_clock_work);
 }
 
 void clear_sched_clock_stable(void)
@@ -347,21 +337,37 @@ void sched_clock_tick(void)
 {
 	struct sched_clock_data *scd;
 
+	if (sched_clock_stable())
+		return;
+
+	if (unlikely(!sched_clock_running))
+		return;
+
 	WARN_ON_ONCE(!irqs_disabled());
 
-	/*
-	 * Update these values even if sched_clock_stable(), because it can
-	 * become unstable at any point in time at which point we need some
-	 * values to fall back on.
-	 *
-	 * XXX arguably we can skip this if we expose tsc_clocksource_reliable
-	 */
 	scd = this_scd();
 	scd->tick_raw  = sched_clock();
 	scd->tick_gtod = ktime_get_ns();
+	sched_clock_local(scd);
+}
 
-	if (!sched_clock_stable() && likely(sched_clock_running))
-		sched_clock_local(scd);
+void sched_clock_tick_stable(void)
+{
+	u64 gtod, clock;
+
+	if (!sched_clock_stable())
+		return;
+
+	/*
+	 * Called under watchdog_lock.
+	 *
+	 * The watchdog just found this TSC to (still) be stable, so now is a
+	 * good moment to update our __gtod_offset. Because once we find the
+	 * TSC to be unstable, any computation will be computing crap.
+	 */
+	gtod = ktime_get_ns();
+	clock = sched_clock();
+	__gtod_offset = (clock + __sched_clock_offset) - gtod;
 }
 
 /*
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -233,6 +233,9 @@ static void clocksource_watchdog(unsigne
 			continue;
 		}
 
+		if (cs == curr_clocksource && cs->tick_stable)
+			cs->tick_stable(cs);
+
 		if (!(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES) &&
 		    (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS) &&
 		    (watchdog->flags & CLOCK_SOURCE_IS_CONTINUOUS)) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-20 14:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-20 12:44 [PATCH] Revert "cpuidle: Replace ktime_get() with local_clock()" ville.syrjala
2017-04-20 13:07 ` Daniel Lezcano
2017-04-20 13:08 ` Peter Zijlstra
2017-04-20 13:43   ` Ville Syrjälä
2017-04-20 13:49     ` Peter Zijlstra
2017-04-20 14:37   ` Daniel Lezcano
2017-04-20 14:41     ` Peter Zijlstra
2017-04-20 14:47       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).