linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
  2014-12-01  2:38 Cyril Bur
@ 2014-12-01  2:39 ` Cyril Bur
  0 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2014-12-01  2:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: mpe, akpm, Cyril Bur

On POWER8 virtualised kernels the VTB register can be read to have a view of
time that only increases while the guest is running. This will prevent guests
from seeing time jump if a guest is paused for significant amounts of time.

On POWER7 and below virtualised kernels stolen time is subtracted from
sched_clock as a best effort approximation. This will not eliminate spurious
warnings in the case of a suspended guest but may reduce the occurance in the
case of softlockups due to host over commit.

Bare metal kernels should avoid reading the VTB as KVM does not restore sane
values when not executing. sched_clock is returned in this case.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/kernel/time.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 7505599..839aeae 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -621,6 +621,30 @@ unsigned long long sched_clock(void)
 	return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
 }
 
+unsigned long long running_clock(void)
+{
+	/*
+	 * Don't read the VTB as a host since KVM does not switch in host timebase
+	 * into the VTB when it takes a guest off the CPU, reading the VTB would
+	 * result in reading 'last switched out' guest VTB.
+	 */
+
+	if (firmware_has_feature(FW_FEATURE_LPAR)) {
+		if (cpu_has_feature(CPU_FTR_ARCH_207S))
+			return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
+
+		/* This is a next best approximation without a VTB. */
+		return sched_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
+	}
+
+	/*
+	 * On a host which doesn't do any virtualisation TB *should* equal VTB so
+	 * it makes no difference anyway.
+	 */
+
+	return sched_clock();
+}
+
 static int __init get_freq(char *name, int cells, unsigned long *val)
 {
 	struct device_node *cpu;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 0/2] Quieten softlockup detector on virtualised kernels
@ 2014-12-22  5:06 Cyril Bur
  2014-12-22  5:06 ` [PATCH 1/2] Add another clock for use with the soft lockup watchdog Cyril Bur
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Cyril Bur @ 2014-12-22  5:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mpe, drjones, dzickus, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, Cyril Bur

When the hypervisor pauses a virtualised kernel the kernel will observe a jump
in timebase, this can cause spurious messages from the softlockup detector.

Whilst these messages are harmless, they are accompanied with a stack trace
which causes undue concern and more problematically the stack trace in the
guest has nothing to do with the observed problem and can only be misleading.

Futhermore, on POWER8 this is completely avoidable with the introduction of
the Virtual Time Base (VTB) register.

Cyril Bur (2):
  Add another clock for use with the soft lockup watchdog.
  powerpc: add running_clock for powerpc to prevent spurious softlockup
    warnings

 arch/powerpc/kernel/time.c | 24 ++++++++++++++++++++++++
 include/linux/sched.h      |  1 +
 kernel/sched/clock.c       | 14 ++++++++++++++
 kernel/watchdog.c          |  2 +-
 4 files changed, 40 insertions(+), 1 deletion(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] Add another clock for use with the soft lockup watchdog.
  2014-12-22  5:06 [PATCH 0/2] Quieten softlockup detector on virtualised kernels Cyril Bur
@ 2014-12-22  5:06 ` Cyril Bur
  2015-01-05 22:09   ` Andrew Morton
  2014-12-22  5:06 ` [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings Cyril Bur
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2014-12-22  5:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mpe, drjones, dzickus, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, Cyril Bur

This permits the use of arch specific clocks for which virtualised kernels can
use their notion of 'running' time, not the elpased wall time which will
include host execution time.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 include/linux/sched.h |  1 +
 kernel/sched/clock.c  | 14 ++++++++++++++
 kernel/watchdog.c     |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef..e400162 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2145,6 +2145,7 @@ extern unsigned long long notrace sched_clock(void);
  */
 extern u64 cpu_clock(int cpu);
 extern u64 local_clock(void);
+extern u64 running_clock(void);
 extern u64 sched_clock_cpu(int cpu);
 
 
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index c27e4f8..c83af4f 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -74,6 +74,20 @@ unsigned long long __weak sched_clock(void)
 }
 EXPORT_SYMBOL_GPL(sched_clock);
 
+/*
+ * Running clock - returns the time that has elapsed while a guest has been
+ * running.
+ * On a guest this value should be sched_clock minus the time the
+ * guest was suspended by the hypervisor (for any reason).
+ * On bare metal this function should return the same as sched_clock.
+ * Architectures and sub-architectures can override this.
+ */
+unsigned long long __weak running_clock(void)
+{
+	return sched_clock();
+}
+EXPORT_SYMBOL_GPL(running_clock);
+
 __read_mostly int sched_clock_running;
 
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 70bf118..3174bf8 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -154,7 +154,7 @@ static int get_softlockup_thresh(void)
  */
 static unsigned long get_timestamp(void)
 {
-	return local_clock() >> 30LL;  /* 2^30 ~= 10^9 */
+	return running_clock() >> 30LL;  /* 2^30 ~= 10^9 */
 }
 
 static void set_sample_period(void)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
  2014-12-22  5:06 [PATCH 0/2] Quieten softlockup detector on virtualised kernels Cyril Bur
  2014-12-22  5:06 ` [PATCH 1/2] Add another clock for use with the soft lockup watchdog Cyril Bur
@ 2014-12-22  5:06 ` Cyril Bur
  2015-01-05 22:10   ` Andrew Morton
  2015-01-05 16:50 ` [PATCH 0/2] Quieten softlockup detector on virtualised kernels Don Zickus
  2015-01-05 22:09 ` Andrew Morton
  3 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2014-12-22  5:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mpe, drjones, dzickus, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, Cyril Bur

On POWER8 virtualised kernels the VTB register can be read to have a view of
time that only increases while the guest is running. This will prevent guests
from seeing time jump if a guest is paused for significant amounts of time.

On POWER7 and below virtualised kernels stolen time is subtracted from
sched_clock as a best effort approximation. This will not eliminate spurious
warnings in the case of a suspended guest but may reduce the occurance in the
case of softlockups due to host over commit.

Bare metal kernels should avoid reading the VTB as KVM does not restore sane
values when not executing. sched_clock is returned in this case.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/kernel/time.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index fa7c4f1..9ba13ec 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -621,6 +621,30 @@ unsigned long long sched_clock(void)
 	return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
 }
 
+unsigned long long running_clock(void)
+{
+	/*
+	 * Don't read the VTB as a host since KVM does not switch in host timebase
+	 * into the VTB when it takes a guest off the CPU, reading the VTB would
+	 * result in reading 'last switched out' guest VTB.
+	 */
+
+	if (firmware_has_feature(FW_FEATURE_LPAR)) {
+		if (cpu_has_feature(CPU_FTR_ARCH_207S))
+			return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
+
+		/* This is a next best approximation without a VTB. */
+		return sched_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
+	}
+
+	/*
+	 * On a host which doesn't do any virtualisation TB *should* equal VTB so
+	 * it makes no difference anyway.
+	 */
+
+	return sched_clock();
+}
+
 static int __init get_freq(char *name, int cells, unsigned long *val)
 {
 	struct device_node *cpu;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
  2014-12-22  5:06 [PATCH 0/2] Quieten softlockup detector on virtualised kernels Cyril Bur
  2014-12-22  5:06 ` [PATCH 1/2] Add another clock for use with the soft lockup watchdog Cyril Bur
  2014-12-22  5:06 ` [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings Cyril Bur
@ 2015-01-05 16:50 ` Don Zickus
  2015-01-05 23:53   ` Cyril Bur
  2015-01-05 22:09 ` Andrew Morton
  3 siblings, 1 reply; 16+ messages in thread
From: Don Zickus @ 2015-01-05 16:50 UTC (permalink / raw)
  To: Cyril Bur
  Cc: linux-kernel, mpe, drjones, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, mtosatti

cc'ing Marcelo

On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> in timebase, this can cause spurious messages from the softlockup detector.
> 
> Whilst these messages are harmless, they are accompanied with a stack trace
> which causes undue concern and more problematically the stack trace in the
> guest has nothing to do with the observed problem and can only be misleading.
> 
> Futhermore, on POWER8 this is completely avoidable with the introduction of
> the Virtual Time Base (VTB) register.

Hi Cyril,

Your solution seems simple and doesn't disturb the softlockup code as much
as the x86 solution does.  The only small issue I had was the use of
sched_clock instead of local_clock.  I keep forgetting the difference
(unstable clock is the biggest reason I think).

Other than that, I am not the biggest fan of putting multiple virtual
guest solutions for the same problem into the watchdog code.  I would
prefer a common solution/framework to leverage.

I have the x86 folks focusing on the steal_time stuff.  It started with
KVM and I believe VMWare is working on utilizing it too (and maybe Xen).

Not sure if that is useful or could be incoporated into the power8 code.
Though to be honest I am curious if the steal_time code could be ported to
your solution as it seems the watchdog code could remove all the
steal_time warts.

I have cc'd Marcelo into this discussion as he was the last person I
remember talking with about this problem.

Cheers,
Don

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
  2014-12-22  5:06 [PATCH 0/2] Quieten softlockup detector on virtualised kernels Cyril Bur
                   ` (2 preceding siblings ...)
  2015-01-05 16:50 ` [PATCH 0/2] Quieten softlockup detector on virtualised kernels Don Zickus
@ 2015-01-05 22:09 ` Andrew Morton
  2015-01-06  2:43   ` Cyril Bur
  3 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2015-01-05 22:09 UTC (permalink / raw)
  To: Cyril Bur
  Cc: linux-kernel, mpe, drjones, dzickus, mingo, uobergfe, chaiw.fnst,
	cl, fabf, atomlin, benzh

On Mon, 22 Dec 2014 16:06:02 +1100 Cyril Bur <cyrilbur@gmail.com> wrote:

> When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> in timebase, this can cause spurious messages from the softlockup detector.
> 
> Whilst these messages are harmless, they are accompanied with a stack trace
> which causes undue concern and more problematically the stack trace in the
> guest has nothing to do with the observed problem and can only be misleading.
> 
> Futhermore, on POWER8 this is completely avoidable with the introduction of
> the Virtual Time Base (VTB) register.

Does this problem apply to other KVM implementations and to Xen?  If
so, what would implementations of running_clock() for those look like? 
If not, why not?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] Add another clock for use with the soft lockup watchdog.
  2014-12-22  5:06 ` [PATCH 1/2] Add another clock for use with the soft lockup watchdog Cyril Bur
@ 2015-01-05 22:09   ` Andrew Morton
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2015-01-05 22:09 UTC (permalink / raw)
  To: Cyril Bur
  Cc: linux-kernel, mpe, drjones, dzickus, mingo, uobergfe, chaiw.fnst,
	cl, fabf, atomlin, benzh

On Mon, 22 Dec 2014 16:06:03 +1100 Cyril Bur <cyrilbur@gmail.com> wrote:

> This permits the use of arch specific clocks for which virtualised kernels can
> use their notion of 'running' time, not the elpased wall time which will
> include host execution time.
> 
> ...
>
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -74,6 +74,20 @@ unsigned long long __weak sched_clock(void)
>  }
>  EXPORT_SYMBOL_GPL(sched_clock);
>  
> +/*
> + * Running clock - returns the time that has elapsed while a guest has been
> + * running.
> + * On a guest this value should be sched_clock minus the time the
> + * guest was suspended by the hypervisor (for any reason).
> + * On bare metal this function should return the same as sched_clock.
> + * Architectures and sub-architectures can override this.
> + */
> +unsigned long long __weak running_clock(void)
> +{
> +	return sched_clock();
> +}
> +EXPORT_SYMBOL_GPL(running_clock);

Exporting a weak symbol seems a bit strange, but I guess it will work. 
However nothing *uses* this export and it seems unlikely to me that
anything will do so.

So I'm inclined to zap it, OK?

--- a/kernel/sched/clock.c~add-another-clock-for-use-with-the-soft-lockup-watchdog-fix
+++ a/kernel/sched/clock.c
@@ -86,7 +86,6 @@ unsigned long long __weak running_clock(
 {
 	return sched_clock();
 }
-EXPORT_SYMBOL_GPL(running_clock);
 
 __read_mostly int sched_clock_running;
 
_


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
  2014-12-22  5:06 ` [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings Cyril Bur
@ 2015-01-05 22:10   ` Andrew Morton
  2015-01-06  2:44     ` Cyril Bur
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2015-01-05 22:10 UTC (permalink / raw)
  To: Cyril Bur
  Cc: linux-kernel, mpe, drjones, dzickus, mingo, uobergfe, chaiw.fnst,
	cl, fabf, atomlin, benzh

On Mon, 22 Dec 2014 16:06:04 +1100 Cyril Bur <cyrilbur@gmail.com> wrote:

> On POWER8 virtualised kernels the VTB register can be read to have a view of
> time that only increases while the guest is running. This will prevent guests
> from seeing time jump if a guest is paused for significant amounts of time.
> 
> On POWER7 and below virtualised kernels stolen time is subtracted from
> sched_clock as a best effort approximation. This will not eliminate spurious
> warnings in the case of a suspended guest but may reduce the occurance in the
> case of softlockups due to host over commit.
> 
> Bare metal kernels should avoid reading the VTB as KVM does not restore sane
> values when not executing. sched_clock is returned in this case.
> 
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -621,6 +621,30 @@ unsigned long long sched_clock(void)
>  	return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
>  }
>  
> +unsigned long long running_clock(void)

Non-kvm kernels don't need this code.  Is there some appropriate
"#ifdef CONFIG_foo" we can wrap this in?


> +{
> +	/*
> +	 * Don't read the VTB as a host since KVM does not switch in host timebase
> +	 * into the VTB when it takes a guest off the CPU, reading the VTB would
> +	 * result in reading 'last switched out' guest VTB.
> +	 */
> +
> +	if (firmware_has_feature(FW_FEATURE_LPAR)) {
> +		if (cpu_has_feature(CPU_FTR_ARCH_207S))
> +			return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> +
> +		/* This is a next best approximation without a VTB. */
> +		return sched_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);

Why is this result dependent on FW_FEATURE_LPAR?  It's all generic code.

In fact the kernel/sched/clock.c default implementation of
running_clock() could use this expression.  Would that be good or bad? :)

> +	}
> +
> +	/*
> +	 * On a host which doesn't do any virtualisation TB *should* equal VTB so
> +	 * it makes no difference anyway.
> +	 */
> +
> +	return sched_clock();
> +}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
  2015-01-05 16:50 ` [PATCH 0/2] Quieten softlockup detector on virtualised kernels Don Zickus
@ 2015-01-05 23:53   ` Cyril Bur
  2015-01-06 15:01     ` Don Zickus
  0 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2015-01-05 23:53 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, mpe, drjones, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, mtosatti

On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote:
> cc'ing Marcelo
> 
> On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> > When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> > in timebase, this can cause spurious messages from the softlockup detector.
> > 
> > Whilst these messages are harmless, they are accompanied with a stack trace
> > which causes undue concern and more problematically the stack trace in the
> > guest has nothing to do with the observed problem and can only be misleading.
> > 
> > Futhermore, on POWER8 this is completely avoidable with the introduction of
> > the Virtual Time Base (VTB) register.
> 
> Hi Cyril,
> 
> Your solution seems simple and doesn't disturb the softlockup code as much
> as the x86 solution does.  The only small issue I had was the use of
> sched_clock instead of local_clock.  I keep forgetting the difference
> (unstable clock is the biggest reason I think).
My apologies there it appears I stuffed up, local_clock was used
initially in the softlockup code, I'll send a v2.

> Other than that, I am not the biggest fan of putting multiple virtual
> guest solutions for the same problem into the watchdog code.  I would
> prefer a common solution/framework to leverage.
Agreed.

> I have the x86 folks focusing on the steal_time stuff.  It started with
> KVM and I believe VMWare is working on utilizing it too (and maybe Xen).
I'm not sure I've ever seen this, could you please point me towards
something I can look at?

> Not sure if that is useful or could be incoporated into the power8 code.
> Though to be honest I am curious if the steal_time code could be ported to
> your solution as it seems the watchdog code could remove all the
> steal_time warts.
Happy to help sus out the situation here, again, if you could pass on
what the x86 guys are working on, thanks.


Thanks,

Cyril
> I have cc'd Marcelo into this discussion as he was the last person I
> remember talking with about this problem.
> 
> Cheers,
> Don



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
  2015-01-05 22:09 ` Andrew Morton
@ 2015-01-06  2:43   ` Cyril Bur
  0 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2015-01-06  2:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mpe, drjones, dzickus, mingo, uobergfe, chaiw.fnst,
	cl, fabf, atomlin, benzh

On Mon, 2015-01-05 at 14:09 -0800, Andrew Morton wrote:
> On Mon, 22 Dec 2014 16:06:02 +1100 Cyril Bur <cyrilbur@gmail.com> wrote:
> 
> > When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> > in timebase, this can cause spurious messages from the softlockup detector.
> > 
> > Whilst these messages are harmless, they are accompanied with a stack trace
> > which causes undue concern and more problematically the stack trace in the
> > guest has nothing to do with the observed problem and can only be misleading.
> > 
> > Futhermore, on POWER8 this is completely avoidable with the introduction of
> > the Virtual Time Base (VTB) register.
> 
> Does this problem apply to other KVM implementations and to Xen?  If
> so, what would implementations of running_clock() for those look like? 
> If not, why not?
Yes the problem should appear on other KVM implementations, not really
sure about Xen but I don't see why the problem wouldn't crop up.

x86 do have a method for dealing with it in the softlockup detector,
they've added a check in the softlockup using a paravirtualised clock
where the guest can discover if it had been paused, Xen could be using
too.
It doesn't appear s390 do anything.

Thanks,

Cyril
> 
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
  2015-01-05 22:10   ` Andrew Morton
@ 2015-01-06  2:44     ` Cyril Bur
  2015-01-07 10:20       ` Martin Schwidefsky
  0 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2015-01-06  2:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, mpe, drjones, dzickus, mingo, uobergfe, chaiw.fnst,
	cl, fabf, atomlin, benzh, schwidefsky, heiko.carstens

cc'd Martin Schwidefsky and Heiko Carstens.

On Mon, 2015-01-05 at 14:10 -0800, Andrew Morton wrote:
> On Mon, 22 Dec 2014 16:06:04 +1100 Cyril Bur <cyrilbur@gmail.com> wrote:
> 
> > On POWER8 virtualised kernels the VTB register can be read to have a view of
> > time that only increases while the guest is running. This will prevent guests
> > from seeing time jump if a guest is paused for significant amounts of time.
> > 
> > On POWER7 and below virtualised kernels stolen time is subtracted from
> > sched_clock as a best effort approximation. This will not eliminate spurious
> > warnings in the case of a suspended guest but may reduce the occurance in the
> > case of softlockups due to host over commit.
> > 
> > Bare metal kernels should avoid reading the VTB as KVM does not restore sane
> > values when not executing. sched_clock is returned in this case.
> > 
> > --- a/arch/powerpc/kernel/time.c
> > +++ b/arch/powerpc/kernel/time.c
> > @@ -621,6 +621,30 @@ unsigned long long sched_clock(void)
> >  	return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> >  }
> >  
> > +unsigned long long running_clock(void)
> 
> Non-kvm kernels don't need this code.  Is there some appropriate
> "#ifdef CONFIG_foo" we can wrap this in?
CONFIG_PSERIES would work, having said that typical compilation for a
powernv kernel almost always includes CONFIG_PSERIES (although it
doesn't need to)... still, your point is valid, will add in v2.
> 
> 
> > +{
> > +	/*
> > +	 * Don't read the VTB as a host since KVM does not switch in host timebase
> > +	 * into the VTB when it takes a guest off the CPU, reading the VTB would
> > +	 * result in reading 'last switched out' guest VTB.
> > +	 */
> > +
> > +	if (firmware_has_feature(FW_FEATURE_LPAR)) {
> > +		if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > +			return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> > +
> > +		/* This is a next best approximation without a VTB. */
> > +		return sched_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
> 
> Why is this result dependent on FW_FEATURE_LPAR?  It's all generic code.
Good point, the reason it ended up there is because I wanted to avoid
behaviour changes.
> 
> In fact the kernel/sched/clock.c default implementation of
> running_clock() could use this expression.  Would that be good or bad? :)
For power I'm almost certain it would be fine, on platforms which don't
do stolen time cpustat[CPUTIME_STEAL] should always be zero and if not
then the value should always be sane (although as mentioned in the
comment, not as accurate as using the VTB).

Putting it in the default implementation could cause behavioural changes
for x86 and s390, I would want their views on doing that.
> 
> > +	}
> > +
> > +	/*
> > +	 * On a host which doesn't do any virtualisation TB *should* equal VTB so
> > +	 * it makes no difference anyway.
> > +	 */
> > +
> > +	return sched_clock();
> > +}
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
  2015-01-05 23:53   ` Cyril Bur
@ 2015-01-06 15:01     ` Don Zickus
  2015-01-09  3:15       ` Cyril Bur
  0 siblings, 1 reply; 16+ messages in thread
From: Don Zickus @ 2015-01-06 15:01 UTC (permalink / raw)
  To: Cyril Bur
  Cc: linux-kernel, mpe, drjones, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, mtosatti

On Tue, Jan 06, 2015 at 10:53:35AM +1100, Cyril Bur wrote:
> On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote:
> > cc'ing Marcelo
> > 
> > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> > > When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> > > in timebase, this can cause spurious messages from the softlockup detector.
> > > 
> > > Whilst these messages are harmless, they are accompanied with a stack trace
> > > which causes undue concern and more problematically the stack trace in the
> > > guest has nothing to do with the observed problem and can only be misleading.
> > > 
> > > Futhermore, on POWER8 this is completely avoidable with the introduction of
> > > the Virtual Time Base (VTB) register.
> > 
> > Hi Cyril,
> > 
> > Your solution seems simple and doesn't disturb the softlockup code as much
> > as the x86 solution does.  The only small issue I had was the use of
> > sched_clock instead of local_clock.  I keep forgetting the difference
> > (unstable clock is the biggest reason I think).
> My apologies there it appears I stuffed up, local_clock was used
> initially in the softlockup code, I'll send a v2.

Thanks!

> 
> > Other than that, I am not the biggest fan of putting multiple virtual
> > guest solutions for the same problem into the watchdog code.  I would
> > prefer a common solution/framework to leverage.
> Agreed.
> 
> > I have the x86 folks focusing on the steal_time stuff.  It started with
> > KVM and I believe VMWare is working on utilizing it too (and maybe Xen).
> I'm not sure I've ever seen this, could you please point me towards
> something I can look at?

I am not too familar with it, but the kernel/watchdog.c code has calls to
kvm_check_and_clear_guest_paused(), which is probably a good place to
start.

Cheers,
Don

> 
> > Not sure if that is useful or could be incoporated into the power8 code.
> > Though to be honest I am curious if the steal_time code could be ported to
> > your solution as it seems the watchdog code could remove all the
> > steal_time warts.
> Happy to help sus out the situation here, again, if you could pass on
> what the x86 guys are working on, thanks.
> 
> 
> Thanks,
> 
> Cyril
> > I have cc'd Marcelo into this discussion as he was the last person I
> > remember talking with about this problem.
> > 
> > Cheers,
> > Don
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
  2015-01-06  2:44     ` Cyril Bur
@ 2015-01-07 10:20       ` Martin Schwidefsky
  2015-01-09  3:22         ` Cyril Bur
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Schwidefsky @ 2015-01-07 10:20 UTC (permalink / raw)
  To: Cyril Bur
  Cc: Andrew Morton, linux-kernel, mpe, drjones, dzickus, mingo,
	uobergfe, chaiw.fnst, cl, fabf, atomlin, benzh, heiko.carstens

On Tue, 06 Jan 2015 13:44:01 +1100
Cyril Bur <cyrilbur@gmail.com> wrote:

> On Mon, 2015-01-05 at 14:10 -0800, Andrew Morton wrote:
> > On Mon, 22 Dec 2014 16:06:04 +1100 Cyril Bur <cyrilbur@gmail.com> wrote:
> > 
> > > On POWER8 virtualised kernels the VTB register can be read to have a view of
> > > time that only increases while the guest is running. This will prevent guests
> > > from seeing time jump if a guest is paused for significant amounts of time.
> > > 
> > > On POWER7 and below virtualised kernels stolen time is subtracted from
> > > sched_clock as a best effort approximation. This will not eliminate spurious
> > > warnings in the case of a suspended guest but may reduce the occurance in the
> > > case of softlockups due to host over commit.
> > > 
> > > Bare metal kernels should avoid reading the VTB as KVM does not restore sane
> > > values when not executing. sched_clock is returned in this case.
> > > 
> > > --- a/arch/powerpc/kernel/time.c
> > > +++ b/arch/powerpc/kernel/time.c
> > > @@ -621,6 +621,30 @@ unsigned long long sched_clock(void)
> > >  	return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> > >  }
> > >  
> > > +unsigned long long running_clock(void)
> > 
> > Non-kvm kernels don't need this code.  Is there some appropriate
> > "#ifdef CONFIG_foo" we can wrap this in?
> CONFIG_PSERIES would work, having said that typical compilation for a
> powernv kernel almost always includes CONFIG_PSERIES (although it
> doesn't need to)... still, your point is valid, will add in v2.
> > 
> > 
> > > +{
> > > +	/*
> > > +	 * Don't read the VTB as a host since KVM does not switch in host timebase
> > > +	 * into the VTB when it takes a guest off the CPU, reading the VTB would
> > > +	 * result in reading 'last switched out' guest VTB.
> > > +	 */
> > > +
> > > +	if (firmware_has_feature(FW_FEATURE_LPAR)) {
> > > +		if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > > +			return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> > > +
> > > +		/* This is a next best approximation without a VTB. */
> > > +		return sched_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
> > 
> > Why is this result dependent on FW_FEATURE_LPAR?  It's all generic code.
> Good point, the reason it ended up there is because I wanted to avoid
> behaviour changes.
> > 
> > In fact the kernel/sched/clock.c default implementation of
> > running_clock() could use this expression.  Would that be good or bad? :)
> For power I'm almost certain it would be fine, on platforms which don't
> do stolen time cpustat[CPUTIME_STEAL] should always be zero and if not
> then the value should always be sane (although as mentioned in the
> comment, not as accurate as using the VTB).
> 
> Putting it in the default implementation could cause behavioural changes
> for x86 and s390, I would want their views on doing that.

I would prefer to make sched_clock do all the work. We have been thinking
about steal time vs sched_clock as well, our solution would be to exchange
the time source. Right now sched_clock is based on the TOD clock, the code
that takes steal time into account would use the CPU timer instead.
With the subtraction of kcpustat_this_cpu->cpustat[CPUTIME_STEAL] in
common code we would have to add the same value in the sched_clock
implementation as the steal time is already included in the CPU timer
deltas.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
  2015-01-06 15:01     ` Don Zickus
@ 2015-01-09  3:15       ` Cyril Bur
  2015-01-09 14:56         ` Don Zickus
  0 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2015-01-09  3:15 UTC (permalink / raw)
  To: Don Zickus
  Cc: linux-kernel, mpe, drjones, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, mtosatti


On Tue, 2015-01-06 at 10:01 -0500, Don Zickus wrote:
> On Tue, Jan 06, 2015 at 10:53:35AM +1100, Cyril Bur wrote:
> > On Mon, 2015-01-05 at 11:50 -0500, Don Zickus wrote:
> > > cc'ing Marcelo
> > > 
> > > On Mon, Dec 22, 2014 at 04:06:02PM +1100, Cyril Bur wrote:
> > > > When the hypervisor pauses a virtualised kernel the kernel will observe a jump
> > > > in timebase, this can cause spurious messages from the softlockup detector.
> > > > 
> > > > Whilst these messages are harmless, they are accompanied with a stack trace
> > > > which causes undue concern and more problematically the stack trace in the
> > > > guest has nothing to do with the observed problem and can only be misleading.
> > > > 
> > > > Futhermore, on POWER8 this is completely avoidable with the introduction of
> > > > the Virtual Time Base (VTB) register.
> > > 
> > > Hi Cyril,
> > > 
> > > Your solution seems simple and doesn't disturb the softlockup code as much
> > > as the x86 solution does.  The only small issue I had was the use of
> > > sched_clock instead of local_clock.  I keep forgetting the difference
> > > (unstable clock is the biggest reason I think).
> > My apologies there it appears I stuffed up, local_clock was used
> > initially in the softlockup code, I'll send a v2.
> 
> Thanks!
> 
> > 
> > > Other than that, I am not the biggest fan of putting multiple virtual
> > > guest solutions for the same problem into the watchdog code.  I would
> > > prefer a common solution/framework to leverage.
> > Agreed.
> > 
> > > I have the x86 folks focusing on the steal_time stuff.  It started with
> > > KVM and I believe VMWare is working on utilizing it too (and maybe Xen).
> > I'm not sure I've ever seen this, could you please point me towards
> > something I can look at?
> 
> I am not too familar with it, but the kernel/watchdog.c code has calls to
> kvm_check_and_clear_guest_paused(), which is probably a good place to
> start.
> 
Ah yes that, I did initially have a look at what it does when I
undertook to solve the problem on power and I suppose the two solutions
are similar in that they both just use a virtualised time source. The
similarities stop there though, the paravirtualised clock that x86 uses
provides (as the name of the function implies) a 'was paused' flag.
Obviously the flag isn't something the vtb register on power8 can
provide and since we have a vtb, its preferable to use that.
Perhaps x86 can do something with running_clock?

Regards,

Cyril

> Cheers,
> Don
> 
> > 
> > > Not sure if that is useful or could be incoporated into the power8 code.
> > > Though to be honest I am curious if the steal_time code could be ported to
> > > your solution as it seems the watchdog code could remove all the
> > > steal_time warts.
> > Happy to help sus out the situation here, again, if you could pass on
> > what the x86 guys are working on, thanks.
> > 
> > 
> > Thanks,
> > 
> > Cyril
> > > I have cc'd Marcelo into this discussion as he was the last person I
> > > remember talking with about this problem.
> > > 
> > > Cheers,
> > > Don
> > 
> > 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings
  2015-01-07 10:20       ` Martin Schwidefsky
@ 2015-01-09  3:22         ` Cyril Bur
  0 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2015-01-09  3:22 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Andrew Morton, linux-kernel, mpe, drjones, dzickus, mingo,
	uobergfe, chaiw.fnst, cl, fabf, atomlin, benzh, heiko.carstens

On Wed, 2015-01-07 at 11:20 +0100, Martin Schwidefsky wrote:
> On Tue, 06 Jan 2015 13:44:01 +1100
> Cyril Bur <cyrilbur@gmail.com> wrote:
> 
> > On Mon, 2015-01-05 at 14:10 -0800, Andrew Morton wrote:
> > > On Mon, 22 Dec 2014 16:06:04 +1100 Cyril Bur <cyrilbur@gmail.com> wrote:
> > > 
> > > > On POWER8 virtualised kernels the VTB register can be read to have a view of
> > > > time that only increases while the guest is running. This will prevent guests
> > > > from seeing time jump if a guest is paused for significant amounts of time.
> > > > 
> > > > On POWER7 and below virtualised kernels stolen time is subtracted from
> > > > sched_clock as a best effort approximation. This will not eliminate spurious
> > > > warnings in the case of a suspended guest but may reduce the occurance in the
> > > > case of softlockups due to host over commit.
> > > > 
> > > > Bare metal kernels should avoid reading the VTB as KVM does not restore sane
> > > > values when not executing. sched_clock is returned in this case.
> > > > 
> > > > --- a/arch/powerpc/kernel/time.c
> > > > +++ b/arch/powerpc/kernel/time.c
> > > > @@ -621,6 +621,30 @@ unsigned long long sched_clock(void)
> > > >  	return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> > > >  }
> > > >  
> > > > +unsigned long long running_clock(void)
> > > 
> > > Non-kvm kernels don't need this code.  Is there some appropriate
> > > "#ifdef CONFIG_foo" we can wrap this in?
> > CONFIG_PSERIES would work, having said that typical compilation for a
> > powernv kernel almost always includes CONFIG_PSERIES (although it
> > doesn't need to)... still, your point is valid, will add in v2.
> > > 
> > > 
> > > > +{
> > > > +	/*
> > > > +	 * Don't read the VTB as a host since KVM does not switch in host timebase
> > > > +	 * into the VTB when it takes a guest off the CPU, reading the VTB would
> > > > +	 * result in reading 'last switched out' guest VTB.
> > > > +	 */
> > > > +
> > > > +	if (firmware_has_feature(FW_FEATURE_LPAR)) {
> > > > +		if (cpu_has_feature(CPU_FTR_ARCH_207S))
> > > > +			return mulhdu(get_vtb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> > > > +
> > > > +		/* This is a next best approximation without a VTB. */
> > > > +		return sched_clock() - cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
> > > 
> > > Why is this result dependent on FW_FEATURE_LPAR?  It's all generic code.
> > Good point, the reason it ended up there is because I wanted to avoid
> > behaviour changes.
> > > 
> > > In fact the kernel/sched/clock.c default implementation of
> > > running_clock() could use this expression.  Would that be good or bad? :)
> > For power I'm almost certain it would be fine, on platforms which don't
> > do stolen time cpustat[CPUTIME_STEAL] should always be zero and if not
> > then the value should always be sane (although as mentioned in the
> > comment, not as accurate as using the VTB).
> > 
> > Putting it in the default implementation could cause behavioural changes
> > for x86 and s390, I would want their views on doing that.
> 
> I would prefer to make sched_clock do all the work. We have been thinking
> about steal time vs sched_clock as well, our solution would be to exchange
> the time source. Right now sched_clock is based on the TOD clock, the code
> that takes steal time into account would use the CPU timer instead.
> With the subtraction of kcpustat_this_cpu->cpustat[CPUTIME_STEAL] in
> common code we would have to add the same value in the sched_clock
> implementation as the steal time is already included in the CPU timer
> deltas.

Thanks for the quick reply Martin,
Sound like you've got ideas and while I didn't really grasp all of that,
I gather we best leave the common code as is.
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Quieten softlockup detector on virtualised kernels
  2015-01-09  3:15       ` Cyril Bur
@ 2015-01-09 14:56         ` Don Zickus
  0 siblings, 0 replies; 16+ messages in thread
From: Don Zickus @ 2015-01-09 14:56 UTC (permalink / raw)
  To: Cyril Bur
  Cc: linux-kernel, mpe, drjones, akpm, mingo, uobergfe, chaiw.fnst, cl,
	fabf, atomlin, benzh, mtosatti

On Fri, Jan 09, 2015 at 02:15:00PM +1100, Cyril Bur wrote:
> > 
> > I am not too familar with it, but the kernel/watchdog.c code has calls to
> > kvm_check_and_clear_guest_paused(), which is probably a good place to
> > start.
> > 
> Ah yes that, I did initially have a look at what it does when I
> undertook to solve the problem on power and I suppose the two solutions
> are similar in that they both just use a virtualised time source. The
> similarities stop there though, the paravirtualised clock that x86 uses
> provides (as the name of the function implies) a 'was paused' flag.
> Obviously the flag isn't something the vtb register on power8 can
> provide and since we have a vtb, its preferable to use that.
> Perhaps x86 can do something with running_clock?

Marcello?  Drew?

Cheers,
Don

> 
> Regards,
> 
> Cyril
> 
> > Cheers,
> > Don
> > 
> > > 
> > > > Not sure if that is useful or could be incoporated into the power8 code.
> > > > Though to be honest I am curious if the steal_time code could be ported to
> > > > your solution as it seems the watchdog code could remove all the
> > > > steal_time warts.
> > > Happy to help sus out the situation here, again, if you could pass on
> > > what the x86 guys are working on, thanks.
> > > 
> > > 
> > > Thanks,
> > > 
> > > Cyril
> > > > I have cc'd Marcelo into this discussion as he was the last person I
> > > > remember talking with about this problem.
> > > > 
> > > > Cheers,
> > > > Don
> > > 
> > > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-01-09 14:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-22  5:06 [PATCH 0/2] Quieten softlockup detector on virtualised kernels Cyril Bur
2014-12-22  5:06 ` [PATCH 1/2] Add another clock for use with the soft lockup watchdog Cyril Bur
2015-01-05 22:09   ` Andrew Morton
2014-12-22  5:06 ` [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings Cyril Bur
2015-01-05 22:10   ` Andrew Morton
2015-01-06  2:44     ` Cyril Bur
2015-01-07 10:20       ` Martin Schwidefsky
2015-01-09  3:22         ` Cyril Bur
2015-01-05 16:50 ` [PATCH 0/2] Quieten softlockup detector on virtualised kernels Don Zickus
2015-01-05 23:53   ` Cyril Bur
2015-01-06 15:01     ` Don Zickus
2015-01-09  3:15       ` Cyril Bur
2015-01-09 14:56         ` Don Zickus
2015-01-05 22:09 ` Andrew Morton
2015-01-06  2:43   ` Cyril Bur
  -- strict thread matches above, loose matches on Subject: below --
2014-12-01  2:38 Cyril Bur
2014-12-01  2:39 ` [PATCH 2/2] powerpc: add running_clock for powerpc to prevent spurious softlockup warnings Cyril Bur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).