public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:10 [PATCH] raise tsc clocksource rating Glauber de Oliveira Costa
@ 2007-10-29 22:17 ` Thomas Gleixner
  2007-10-29 22:36   ` Ingo Molnar
                     ` (2 more replies)
  2007-10-29 22:42 ` Zachary Amsden
  2007-10-30  0:17 ` H. Peter Anvin
  2 siblings, 3 replies; 29+ messages in thread
From: Thomas Gleixner @ 2007-10-29 22:17 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: LKML, Rusty Russell, Jeremy Fitzhardinge, --cc, Ingo Molnar, avi,
	kvm-devel, John Stultz

On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:

CC'ed John and removed glauber@t60.localdomain :)

> From: Glauber de Oliveira Costa <glauber@t60.localdomain>
> 
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
> 
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.
> 
> This patch also touches the comments on clocksource.h, which suggests
> that 499 would be a limit on the rating values.
> 
> Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>

Acked-by: Thomas Gleixner <tglx@linutronix.de>

> ---
>  arch/x86/kernel/tsc_32.c    |    2 +-
>  arch/x86/kernel/tsc_64.c    |    2 +-
>  include/linux/clocksource.h |    2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> index 9ebc0da..4d91e59 100644
> --- a/arch/x86/kernel/tsc_32.c
> +++ b/arch/x86/kernel/tsc_32.c
> @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
>  
>  static struct clocksource clocksource_tsc = {
>  	.name			= "tsc",
> -	.rating			= 300,
> +	.rating			= 500,
>  	.read			= read_tsc,
>  	.mask			= CLOCKSOURCE_MASK(64),
>  	.mult			= 0, /* to be set */
> diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> index 9c70af4..4fd5b1b 100644
> --- a/arch/x86/kernel/tsc_64.c
> +++ b/arch/x86/kernel/tsc_64.c
> @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
>  
>  static struct clocksource clocksource_tsc = {
>  	.name			= "tsc",
> -	.rating			= 300,
> +	.rating			= 500,
>  	.read			= read_tsc,
>  	.mask			= CLOCKSOURCE_MASK(64),
>  	.shift			= 22,
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 107787a..5b0aadd 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -39,7 +39,7 @@ struct clocksource;
>   *				A correct and usable clocksource.
>   *			300-399: Desired.
>   *				A reasonably fast and accurate clocksource.
> - *			400-499: Perfect
> + *			>= 400 : Perfect
>   *				The ideal clocksource. A must-use where
>   *				available.
>   * @read:		returns a cycle value
> -- 
> 1.5.0.6
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:17 ` Thomas Gleixner
@ 2007-10-29 22:36   ` Ingo Molnar
  2007-10-30  1:26   ` john stultz
  2007-10-30  2:39   ` Rusty Russell
  2 siblings, 0 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-29 22:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Glauber de Oliveira Costa, LKML, Rusty Russell,
	Jeremy Fitzhardinge, --cc, avi, kvm-devel, John Stultz


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > Since it's being a tendency among paravirt clocksources to use 
> > values around 400, we should declare tsc as even better: So we use 
> > 500.
> > 
> > This patch also touches the comments on clocksource.h, which 
> > suggests that 499 would be a limit on the rating values.
> > 
> > Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
> 
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:10 [PATCH] raise tsc clocksource rating Glauber de Oliveira Costa
  2007-10-29 22:17 ` Thomas Gleixner
@ 2007-10-29 22:42 ` Zachary Amsden
  2007-10-29 22:45   ` Jeremy Fitzhardinge
  2007-10-29 22:48   ` Ingo Molnar
  2007-10-30  0:17 ` H. Peter Anvin
  2 siblings, 2 replies; 29+ messages in thread
From: Zachary Amsden @ 2007-10-29 22:42 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: linux-kernel, tglx, rusty, jeremy, --cc, mingo, avi, kvm-devel,
	Glauber de Oliveira Costa, Dan Hecht, Garrett Smith

On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
> From: Glauber de Oliveira Costa <glauber@t60.localdomain>
> 
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
> 
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.

Why is the TSC better than a paravirt clocksource?  In our case this is
definitely inaccurate.  Paravirt clocksources should be preferred to
TSC, and both must be made available in hardware for platforms which do
not support paravirt.

Also, please cc all the paravirt developers on things related to
paravirt, especially things with such broad effect.  I think 400 is a
good value for a perfect native clocksource.  >400 should be reserved
for super-real (i.e. paravirt) sources that should always be chosen over
a hardware realistic implementation in a virtual environment.

Zach


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:42 ` Zachary Amsden
@ 2007-10-29 22:45   ` Jeremy Fitzhardinge
  2007-10-29 22:48   ` Ingo Molnar
  1 sibling, 0 replies; 29+ messages in thread
From: Jeremy Fitzhardinge @ 2007-10-29 22:45 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Glauber de Oliveira Costa, linux-kernel, tglx, rusty, --cc, mingo,
	avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith

Zachary Amsden wrote:
> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
>   
>> From: Glauber de Oliveira Costa <glauber@t60.localdomain>
>>
>> tsc is very good time source (when it does not have drifts, does not
>> change it's frequency, i.e. when it works), so it should have its rating
>> raised to a value greater than, or equal 400.
>>
>> Since it's being a tendency among paravirt clocksources to use values
>> around 400, we should declare tsc as even better: So we use 500.
>>     
>
> Why is the TSC better than a paravirt clocksource?  In our case this is
> definitely inaccurate.  Paravirt clocksources should be preferred to
> TSC, and both must be made available in hardware for platforms which do
> not support paravirt.
>
> Also, please cc all the paravirt developers on things related to
> paravirt, especially things with such broad effect.  I think 400 is a
> good value for a perfect native clocksource.  >400 should be reserved
> for super-real (i.e. paravirt) sources that should always be chosen over
> a hardware realistic implementation in a virtual environment.
>   

Yes, agreed.  The tsc is never the right thing to use if there's a
paravirt clocksource available.

What's wrong with rating it 300?  What inferior clocksource does it lose
out to?  Shouldn't that clocksource be lowered?  (Why don't we just use
1 to 10?)

    J

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:42 ` Zachary Amsden
  2007-10-29 22:45   ` Jeremy Fitzhardinge
@ 2007-10-29 22:48   ` Ingo Molnar
  2007-10-29 22:52     ` Jeremy Fitzhardinge
  2007-10-29 22:55     ` Zachary Amsden
  1 sibling, 2 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-29 22:48 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Glauber de Oliveira Costa, linux-kernel, tglx, rusty, jeremy,
	--cc, avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith


* Zachary Amsden <zach@vmware.com> wrote:

> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
> > From: Glauber de Oliveira Costa <glauber@t60.localdomain>
> > 
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.
> > 
> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.
> 
> Why is the TSC better than a paravirt clocksource?  In our case this 
> is definitely inaccurate.  Paravirt clocksources should be preferred 
> to TSC, and both must be made available in hardware for platforms 
> which do not support paravirt.

if it's inaccurate why are you exposing it to the guest then? Native 
only uses the TSC if it's safe and accurate to do so.

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:48   ` Ingo Molnar
@ 2007-10-29 22:52     ` Jeremy Fitzhardinge
  2007-10-29 22:55       ` Ingo Molnar
  2007-10-29 22:55     ` Zachary Amsden
  1 sibling, 1 reply; 29+ messages in thread
From: Jeremy Fitzhardinge @ 2007-10-29 22:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, --cc, avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith

Ingo Molnar wrote:
> * Zachary Amsden <zach@vmware.com> wrote:
>
>   
>> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
>>     
>>> From: Glauber de Oliveira Costa <glauber@t60.localdomain>
>>>
>>> tsc is very good time source (when it does not have drifts, does not
>>> change it's frequency, i.e. when it works), so it should have its rating
>>> raised to a value greater than, or equal 400.
>>>
>>> Since it's being a tendency among paravirt clocksources to use values
>>> around 400, we should declare tsc as even better: So we use 500.
>>>       
>> Why is the TSC better than a paravirt clocksource?  In our case this 
>> is definitely inaccurate.  Paravirt clocksources should be preferred 
>> to TSC, and both must be made available in hardware for platforms 
>> which do not support paravirt.
>>     
>
> if it's inaccurate why are you exposing it to the guest then? Native 
> only uses the TSC if it's safe and accurate to do so.
>   

It is used as part of the Xen clocksource as a short term extrapolator,
with correction parameters supplied by the hypervisor.  It should never
be used directly.

    J

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:48   ` Ingo Molnar
  2007-10-29 22:52     ` Jeremy Fitzhardinge
@ 2007-10-29 22:55     ` Zachary Amsden
  2007-10-29 23:02       ` Ingo Molnar
  2007-10-30 11:59       ` Glauber de Oliveira Costa
  1 sibling, 2 replies; 29+ messages in thread
From: Zachary Amsden @ 2007-10-29 22:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Glauber de Oliveira Costa, linux-kernel, tglx, rusty, jeremy,
	--cc, avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith

On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
> * Zachary Amsden <zach@vmware.com> wrote:

> if it's inaccurate why are you exposing it to the guest then? Native 
> only uses the TSC if it's safe and accurate to do so.

Not every guest support paravirt, but for correctness, all guests
require TSC, which must be exposed all the way up to userspace, no
matter what the efficiency or accuracy may be.

Zach


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:52     ` Jeremy Fitzhardinge
@ 2007-10-29 22:55       ` Ingo Molnar
  2007-10-29 23:17         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 29+ messages in thread
From: Ingo Molnar @ 2007-10-29 22:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, --cc, avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> > if it's inaccurate why are you exposing it to the guest then? Native 
> > only uses the TSC if it's safe and accurate to do so.
> 
> It is used as part of the Xen clocksource as a short term 
> extrapolator, with correction parameters supplied by the hypervisor.  
> It should never be used directly.

that's totally broken then. You cannot create an SMP-safe monotonic 
clocksource via interpolation - native does not do it either. Good thing 
this problem got exposed, it needs to be fixed.

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:55     ` Zachary Amsden
@ 2007-10-29 23:02       ` Ingo Molnar
  2007-10-29 23:13         ` Zachary Amsden
  2007-10-29 23:24         ` Dan Hecht
  2007-10-30 11:59       ` Glauber de Oliveira Costa
  1 sibling, 2 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-29 23:02 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Glauber de Oliveira Costa, linux-kernel, tglx, rusty, jeremy,
	--cc, avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith


* Zachary Amsden <zach@vmware.com> wrote:

> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
> > * Zachary Amsden <zach@vmware.com> wrote:
> 
> > if it's inaccurate why are you exposing it to the guest then? Native 
> > only uses the TSC if it's safe and accurate to do so.
> 
> Not every guest support paravirt, but for correctness, all guests 
> require TSC, which must be exposed all the way up to userspace, no 
> matter what the efficiency or accuracy may be.

but if there's a perfect TSC available (there is such hardware) then the 
TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
in essence.

anyway, that's at most an optimization issue. No strong feelings here, 
and we can certainly delay this patch until this gets all sorted out.

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] raise tsc clocksource rating
@ 2007-10-29 23:10 Glauber de Oliveira Costa
  2007-10-29 22:17 ` Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Glauber de Oliveira Costa @ 2007-10-29 23:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, rusty, jeremy, --cc, mingo, avi, kvm-devel,
	Glauber de Oliveira Costa, Glauber de Oliveira Costa

From: Glauber de Oliveira Costa <glauber@t60.localdomain>

tsc is very good time source (when it does not have drifts, does not
change it's frequency, i.e. when it works), so it should have its rating
raised to a value greater than, or equal 400.

Since it's being a tendency among paravirt clocksources to use values
around 400, we should declare tsc as even better: So we use 500.

This patch also touches the comments on clocksource.h, which suggests
that 499 would be a limit on the rating values.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 arch/x86/kernel/tsc_32.c    |    2 +-
 arch/x86/kernel/tsc_64.c    |    2 +-
 include/linux/clocksource.h |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
index 9ebc0da..4d91e59 100644
--- a/arch/x86/kernel/tsc_32.c
+++ b/arch/x86/kernel/tsc_32.c
@@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
 
 static struct clocksource clocksource_tsc = {
 	.name			= "tsc",
-	.rating			= 300,
+	.rating			= 500,
 	.read			= read_tsc,
 	.mask			= CLOCKSOURCE_MASK(64),
 	.mult			= 0, /* to be set */
diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
index 9c70af4..4fd5b1b 100644
--- a/arch/x86/kernel/tsc_64.c
+++ b/arch/x86/kernel/tsc_64.c
@@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
 
 static struct clocksource clocksource_tsc = {
 	.name			= "tsc",
-	.rating			= 300,
+	.rating			= 500,
 	.read			= read_tsc,
 	.mask			= CLOCKSOURCE_MASK(64),
 	.shift			= 22,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 107787a..5b0aadd 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -39,7 +39,7 @@ struct clocksource;
  *				A correct and usable clocksource.
  *			300-399: Desired.
  *				A reasonably fast and accurate clocksource.
- *			400-499: Perfect
+ *			>= 400 : Perfect
  *				The ideal clocksource. A must-use where
  *				available.
  * @read:		returns a cycle value
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:02       ` Ingo Molnar
@ 2007-10-29 23:13         ` Zachary Amsden
  2007-10-29 23:17           ` Ingo Molnar
  2007-10-30 12:02           ` Glauber de Oliveira Costa
  2007-10-29 23:24         ` Dan Hecht
  1 sibling, 2 replies; 29+ messages in thread
From: Zachary Amsden @ 2007-10-29 23:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Glauber de Oliveira Costa, linux-kernel, tglx, rusty, jeremy,
	--cc, avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith

On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
> * Zachary Amsden <zach@vmware.com> wrote:

> > Not every guest support paravirt, but for correctness, all guests 
> > require TSC, which must be exposed all the way up to userspace, no 
> > matter what the efficiency or accuracy may be.
> 
> but if there's a perfect TSC available (there is such hardware) then the 
> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
> in essence.

No, if no paravirt clocksource is detected, nothing can override the
perfect TSC hardware clocksource rating of 400.  And if a paravirt
clocksource is detected, it is always better than TSC.

> anyway, that's at most an optimization issue. No strong feelings here, 
> and we can certainly delay this patch until this gets all sorted out.

This patch should be nacked, since it is just wrong.  This is not an
optimization issue.  It is an accuracy issue for all virtualization
environments that affects long term kernel clock stability, which is
important to fix, and the best way to do that is to use a paravirt
clocksource.

Zach


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:13         ` Zachary Amsden
@ 2007-10-29 23:17           ` Ingo Molnar
  2007-10-30 12:02           ` Glauber de Oliveira Costa
  1 sibling, 0 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-29 23:17 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Glauber de Oliveira Costa, linux-kernel, tglx, rusty, jeremy,
	--cc, avi, kvm-devel, Glauber de Oliveira Costa, Dan Hecht,
	Garrett Smith


* Zachary Amsden <zach@vmware.com> wrote:

> On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
> > * Zachary Amsden <zach@vmware.com> wrote:
> 
> > > Not every guest support paravirt, but for correctness, all guests 
> > > require TSC, which must be exposed all the way up to userspace, no 
> > > matter what the efficiency or accuracy may be.
> > 
> > but if there's a perfect TSC available (there is such hardware) then the 
> > TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
> > in essence.
> 
> No, if no paravirt clocksource is detected, nothing can override the
> perfect TSC hardware clocksource rating of 400.  And if a paravirt
> clocksource is detected, it is always better than TSC.
> 
> > anyway, that's at most an optimization issue. No strong feelings here, 
> > and we can certainly delay this patch until this gets all sorted out.
> 
> This patch should be nacked, since it is just wrong.  This is not an 
> optimization issue.  It is an accuracy issue for all virtualization 
> environments that affects long term kernel clock stability, which is 
> important to fix, and the best way to do that is to use a paravirt 
> clocksource.

i know it's not an optimization issue. Your current pessimisation of 
even perfect TSCs _is_ an optimization issue.

(and note that if the TSC is unstable the guest will/should notice it 
and will fall back to the hyper clocksource)

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:55       ` Ingo Molnar
@ 2007-10-29 23:17         ` Jeremy Fitzhardinge
  2007-10-29 23:21           ` Ingo Molnar
  0 siblings, 1 reply; 29+ messages in thread
From: Jeremy Fitzhardinge @ 2007-10-29 23:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, Avi Kivity, kvm-devel, Glauber de Oliveira Costa,
	Dan Hecht, Garrett Smith

Ingo Molnar wrote:
> that's totally broken then. You cannot create an SMP-safe monotonic 
> clocksource via interpolation - native does not do it either. Good thing 
> this problem got exposed, it needs to be fixed.
>   

Sigh, I don't really want to have this fight again.

I don't really see what point there is in raising the tsc's rating (why
is 300 insufficient again?), but regardless of the value, the Xen
clocksource rating needs to be higher.

    J

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:17         ` Jeremy Fitzhardinge
@ 2007-10-29 23:21           ` Ingo Molnar
  2007-10-29 23:33             ` Jeremy Fitzhardinge
  2007-10-30  0:45             ` Ian Pratt
  0 siblings, 2 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-29 23:21 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, Avi Kivity, kvm-devel, Glauber de Oliveira Costa,
	Dan Hecht, Garrett Smith


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
> > that's totally broken then. You cannot create an SMP-safe monotonic 
> > clocksource via interpolation - native does not do it either. Good thing 
> > this problem got exposed, it needs to be fixed.
> >   
> 
> Sigh, I don't really want to have this fight again.

i dont remember us having discussed this before, ever. If there's any 
"fight" about monotonicity and SMP then it would be a pretty onesided 
affair, with you being beaten up seriously ;-)

> I don't really see what point there is in raising the tsc's rating 
> (why is 300 insufficient again?), but regardless of the value, the Xen 
> clocksource rating needs to be higher.

anyway, i agree that this patch cannot go in in its current form.

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:02       ` Ingo Molnar
  2007-10-29 23:13         ` Zachary Amsden
@ 2007-10-29 23:24         ` Dan Hecht
  2007-10-30  4:24           ` [kvm-devel] " Avi Kivity
  2007-10-30  7:14           ` Ingo Molnar
  1 sibling, 2 replies; 29+ messages in thread
From: Dan Hecht @ 2007-10-29 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, jeremy, --cc, avi, kvm-devel, Glauber de Oliveira Costa,
	Garrett Smith

On 10/29/2007 04:02 PM, Ingo Molnar wrote:
> * Zachary Amsden <zach@vmware.com> wrote:
> 
>> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
>>> * Zachary Amsden <zach@vmware.com> wrote:
>>> if it's inaccurate why are you exposing it to the guest then? Native 
>>> only uses the TSC if it's safe and accurate to do so.
>> Not every guest support paravirt, but for correctness, all guests 
>> require TSC, which must be exposed all the way up to userspace, no 
>> matter what the efficiency or accuracy may be.
> 
> but if there's a perfect TSC available (there is such hardware) then the 
> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
> in essence.
> 

Not really.  In the case hardware TSC is perfect, the paravirt time 
counter can be implemented directly in terms of hardware TSC; there is 
no loss in optimization.  This is done transparently.  And virtual TSC 
can be implemented this way too.

The real improvement that a paravirt clocksource offers over the TSC 
clocksource is that the guest does not need to measure the TSC frequency 
itself against some other constant frequency source (which is 
problematic on a virtual machine).  Instead, the paravirt clocksource 
queries the hypervisor for the frequency of the counter.  As you know, 
with clocksource style kernels, it's important to get this frequency 
correct, or else the guest will have long-term time drift.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:21           ` Ingo Molnar
@ 2007-10-29 23:33             ` Jeremy Fitzhardinge
  2007-10-30  0:45             ` Ian Pratt
  1 sibling, 0 replies; 29+ messages in thread
From: Jeremy Fitzhardinge @ 2007-10-29 23:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, Avi Kivity, kvm-devel, Glauber de Oliveira Costa,
	Dan Hecht, Garrett Smith

Ingo Molnar wrote:
> i dont remember us having discussed this before, ever. If there's any 
> "fight" about monotonicity and SMP then it would be a pretty onesided 
> affair, with you being beaten up seriously ;-)
>   

This is part of Xen's ABI, so it isn't easily changed.  You're right
that getting monotonicity is a pretty ugly affair of doing something
like "if (now < previous_return) return ++previous_return;", but its
better than using the tsc.

(Last time around, it was "Xen can't use the tsc because it can't ever
be used as a stable timebase" - though I don't think that was you
specifically.)

    J

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:10 [PATCH] raise tsc clocksource rating Glauber de Oliveira Costa
  2007-10-29 22:17 ` Thomas Gleixner
  2007-10-29 22:42 ` Zachary Amsden
@ 2007-10-30  0:17 ` H. Peter Anvin
  2 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2007-10-30  0:17 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: linux-kernel, tglx, rusty, jeremy, --cc, mingo, avi, kvm-devel,
	Glauber de Oliveira Costa

Glauber de Oliveira Costa wrote:
> From: Glauber de Oliveira Costa <glauber@t60.localdomain>
> 
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
> 
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.
> 
> This patch also touches the comments on clocksource.h, which suggests
> that 499 would be a limit on the rating values.
> 
> Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>

Are you sure these paravirt sources don't intentionally trump the TSC?

	-hpa


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH] raise tsc clocksource rating
  2007-10-29 23:21           ` Ingo Molnar
  2007-10-29 23:33             ` Jeremy Fitzhardinge
@ 2007-10-30  0:45             ` Ian Pratt
  2007-10-30  7:19               ` Ingo Molnar
  1 sibling, 1 reply; 29+ messages in thread
From: Ian Pratt @ 2007-10-30  0:45 UTC (permalink / raw)
  To: Ingo Molnar, Jeremy Fitzhardinge
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, Avi Kivity, kvm-devel, Glauber de Oliveira Costa,
	Dan Hecht, Garrett Smith, ian.pratt

> > Sigh, I don't really want to have this fight again.
> 
> i dont remember us having discussed this before, ever. If there's any
> "fight" about monotonicity and SMP then it would be a pretty onesided
> affair, with you being beaten up seriously ;-)

Actually, it is possible, even for NUMA systems with CPUs running off
completely different oscillators, and in the presence of CPU frequency
changes, power management, and even in the presence of thermal
throttling (though the latter introduces temporary inaccuracies it
doesn't affect monotonicity or rate).

Take a look at the Xen code to see how each physical CPU is
independently calibrated on an ongoing basis, how movement of VCPUs
between physical CPUs is tracked, and how shared variables are used to
ensure montonicity if a guest requires it. 

The fixed-rate TSCs on newer CPUs make some of this stuff easier, but
you still need to cope with different source oscillators and some power
management states.
 
Ian

> > I don't really see what point there is in raising the tsc's rating
> > (why is 300 insufficient again?), but regardless of the value, the
Xen
> > clocksource rating needs to be higher.
> 
> anyway, i agree that this patch cannot go in in its current form.
> 
> 	Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:17 ` Thomas Gleixner
  2007-10-29 22:36   ` Ingo Molnar
@ 2007-10-30  1:26   ` john stultz
  2007-10-30  2:39   ` Rusty Russell
  2 siblings, 0 replies; 29+ messages in thread
From: john stultz @ 2007-10-30  1:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Glauber de Oliveira Costa, LKML, Rusty Russell,
	Jeremy Fitzhardinge, --cc, Ingo Molnar, avi, kvm-devel

On Mon, 2007-10-29 at 23:17 +0100, Thomas Gleixner wrote:
> On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
> 
> CC'ed John and removed glauber@t60.localdomain :)
> 
> > From: Glauber de Oliveira Costa <glauber@t60.localdomain>
> > 
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.

But all of those qualities (and more) make it less then ideal.

What issue exactly does this patch resolve?

> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.

This is the rating inflation I was initially worried about and created
the advisory scale to avoid. If paravirt clocksources are rated too
high, they should be dropped. 

I strongly disagree that the TSC is a "perfect" clocksource, and I think
its rating is appropriate. 


> > This patch also touches the comments on clocksource.h, which suggests
> > that 499 would be a limit on the rating values.
> > 
> > Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
> 
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

If a better justification can be provided, I might reconsider, but right
now I can't ack this.

thanks
-john

> > ---
> >  arch/x86/kernel/tsc_32.c    |    2 +-
> >  arch/x86/kernel/tsc_64.c    |    2 +-
> >  include/linux/clocksource.h |    2 +-
> >  3 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> > index 9ebc0da..4d91e59 100644
> > --- a/arch/x86/kernel/tsc_32.c
> > +++ b/arch/x86/kernel/tsc_32.c
> > @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
> >  
> >  static struct clocksource clocksource_tsc = {
> >  	.name			= "tsc",
> > -	.rating			= 300,
> > +	.rating			= 500,
> >  	.read			= read_tsc,
> >  	.mask			= CLOCKSOURCE_MASK(64),
> >  	.mult			= 0, /* to be set */
> > diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> > index 9c70af4..4fd5b1b 100644
> > --- a/arch/x86/kernel/tsc_64.c
> > +++ b/arch/x86/kernel/tsc_64.c
> > @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
> >  
> >  static struct clocksource clocksource_tsc = {
> >  	.name			= "tsc",
> > -	.rating			= 300,
> > +	.rating			= 500,
> >  	.read			= read_tsc,
> >  	.mask			= CLOCKSOURCE_MASK(64),
> >  	.shift			= 22,
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 107787a..5b0aadd 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -39,7 +39,7 @@ struct clocksource;
> >   *				A correct and usable clocksource.
> >   *			300-399: Desired.
> >   *				A reasonably fast and accurate clocksource.
> > - *			400-499: Perfect
> > + *			>= 400 : Perfect
> >   *				The ideal clocksource. A must-use where
> >   *				available.
> >   * @read:		returns a cycle value
> > -- 
> > 1.5.0.6
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:17 ` Thomas Gleixner
  2007-10-29 22:36   ` Ingo Molnar
  2007-10-30  1:26   ` john stultz
@ 2007-10-30  2:39   ` Rusty Russell
  2007-10-30  7:37     ` Ingo Molnar
  2 siblings, 1 reply; 29+ messages in thread
From: Rusty Russell @ 2007-10-30  2:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Glauber de Oliveira Costa, LKML, Jeremy Fitzhardinge, --cc,
	Ingo Molnar, avi, kvm-devel, John Stultz

On Tuesday 30 October 2007 09:17:38 Thomas Gleixner wrote:
> On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
>
> CC'ed John and removed glauber@t60.localdomain :)
>
> > From: Glauber de Oliveira Costa <glauber@t60.localdomain>
> >
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.
> >
> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.
> >
> > This patch also touches the comments on clocksource.h, which suggests
> > that 499 would be a limit on the rating values.
> >
> > Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
>
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

No.  tsc is very good, it's not perfect.  If a paravirt clock registers 400 it 
really means "pick me over the tsc".

That's *why* they use > 400: it's in the documentation.

Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [kvm-devel] [PATCH] raise tsc clocksource rating
  2007-10-29 23:24         ` Dan Hecht
@ 2007-10-30  4:24           ` Avi Kivity
  2007-10-30  7:14           ` Ingo Molnar
  1 sibling, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2007-10-30  4:24 UTC (permalink / raw)
  To: Dan Hecht
  Cc: Ingo Molnar, Zachary Amsden, jeremy, avi, kvm-devel, linux-kernel,
	Glauber de Oliveira Costa, Glauber de Oliveira Costa, --cc,
	Garrett Smith

Dan Hecht wrote:
> Not really.  In the case hardware TSC is perfect, the paravirt time 
> counter can be implemented directly in terms of hardware TSC; there is 
> no loss in optimization.  This is done transparently.  And virtual TSC 
> can be implemented this way too.
>
> The real improvement that a paravirt clocksource offers over the TSC 
> clocksource is that the guest does not need to measure the TSC frequency 
> itself against some other constant frequency source (which is 
> problematic on a virtual machine).  Instead, the paravirt clocksource 
> queries the hypervisor for the frequency of the counter.  As you know, 
> with clocksource style kernels, it's important to get this frequency 
> correct, or else the guest will have long-term time drift.
>
>   

In addition, a paravirt clocksource can compensate for events like vcpu 
migration to another host cpu.  So I agree: a paravirt clocksource is 
always better than or equal to the tsc.


-- 
Any sufficiently difficult bug is indistinguishable from a feature.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:24         ` Dan Hecht
  2007-10-30  4:24           ` [kvm-devel] " Avi Kivity
@ 2007-10-30  7:14           ` Ingo Molnar
  1 sibling, 0 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-30  7:14 UTC (permalink / raw)
  To: Dan Hecht
  Cc: Zachary Amsden, Glauber de Oliveira Costa, linux-kernel, tglx,
	rusty, jeremy, --cc, avi, kvm-devel, Glauber de Oliveira Costa,
	Garrett Smith


* Dan Hecht <dhecht@vmware.com> wrote:

>> but if there's a perfect TSC available (there is such hardware) then 
>> the TSC _is_ the best clocksource. Paravirt now turns it off 
>> unconditionally in essence.
>
> Not really.  In the case hardware TSC is perfect, the paravirt time 
> counter can be implemented directly in terms of hardware TSC; there is 
> no loss in optimization.  This is done transparently.  And virtual TSC 
> can be implemented this way too.

Of course if you duplicate all (or part) of the TSC clocksource driver 
in the paravirt guest code then the "paravirt clocksource" is at least 
as good as the TSC. But that argument is playing word-games, _of course_ 
if you use the same (or similar) code it's at least as good. The real 
question are clocksources that communicate out to the hypervisor, and 
hence have higher overhead than a native, TSC based clocksource - and 
clocksources that use the TSC in a broken way.

> The real improvement that a paravirt clocksource offers over the TSC 
> clocksource is that the guest does not need to measure the TSC 
> frequency itself against some other constant frequency source (which 
> is problematic on a virtual machine). [...]

hey, you need not tell me, i've implemented a hyper-clocksource driver 
myself. But calibration is a boot only issue and there's no reason why 
calibration _has_ to be fragile. For example we could easily extend the 
TSC clocksource driver to not calibrate in the guest but take 
calibration information from the host. It's in essence a trivial and 
obvious extension to calibration. That way we get the highest possible 
performance _and_ we share much of the clocksource driver with the host.

also, the way the TSC is used by guests like Xen is fundamentally 
fragile on SMP. So i have a good reason to distrust the approach of 
hypervisors to timekeeping. The maintenance problem to me is that 
everyone in the paravirt space is busy coding away in their own (often 
broken) direction, replicating the essence of the TSC clocksource driver 
4 times over again and again, with subtle bugs in each variant, even in 
cases where the TSC readout can be trusted perfectly well. 
"Consolidation" and "sharing code" is not a particularly strong point of 
the paravirt projects ;-) (ok, KVM is a notable exception there.)

anyway, i do agree that this patch is wrong currently, mainly due to TSC 
calibration not being reliable in guest-space at the moment - but the 
whole concept of putting a separate clocksource driver into each 
paravirt guest, even in the case where the TSC is perfect, is madness. 
That code, once the hardware gets sane (and there are good signs for 
that), and once calibration can be passed from host to guest reliably, 
_will_ be consolidated, because it makes perfect technical sense.

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-30  0:45             ` Ian Pratt
@ 2007-10-30  7:19               ` Ingo Molnar
  0 siblings, 0 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-30  7:19 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Jeremy Fitzhardinge, Zachary Amsden, Glauber de Oliveira Costa,
	linux-kernel, tglx, rusty, Avi Kivity, kvm-devel,
	Glauber de Oliveira Costa, Dan Hecht, Garrett Smith


* Ian Pratt <Ian.Pratt@cl.cam.ac.uk> wrote:

> > > Sigh, I don't really want to have this fight again.
> > 
> > i dont remember us having discussed this before, ever. If there's any
> > "fight" about monotonicity and SMP then it would be a pretty onesided
> > affair, with you being beaten up seriously ;-)
> 
> Actually, it is possible, even for NUMA systems with CPUs running off 
> completely different oscillators, and in the presence of CPU frequency 
> changes, power management, and even in the presence of thermal 
> throttling (though the latter introduces temporary inaccuracies it 
> doesn't affect monotonicity or rate).
> 
> Take a look at the Xen code to see how each physical CPU is 
> independently calibrated on an ongoing basis, how movement of VCPUs 
> between physical CPUs is tracked, and how shared variables are used to 
> ensure montonicity if a guest requires it.

I think that's wishful thinking. Check out:

    http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

change TEST_TSC to 0, run it on an SMP guest (on a reasonably fast 
machine) and let me know whether you can make SMP guests not come up 
with monotonicity violations in the CLOCK tests.

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-30  2:39   ` Rusty Russell
@ 2007-10-30  7:37     ` Ingo Molnar
  2007-10-30 10:52       ` Rusty Russell
  2007-10-30 12:13       ` Glauber de Oliveira Costa
  0 siblings, 2 replies; 29+ messages in thread
From: Ingo Molnar @ 2007-10-30  7:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Thomas Gleixner, Glauber de Oliveira Costa, LKML,
	Jeremy Fitzhardinge, --cc, avi, kvm-devel, John Stultz


* Rusty Russell <rusty@rustcorp.com.au> wrote:

> No.  tsc is very good, it's not perfect.  If a paravirt clock 
> registers 400 it really means "pick me over the tsc".

often the TSC is not perfect, but _IF_ it's perfect, using the paravirt 
driver is a pessimisation in performance.

the main problem at the moment is that there's no mechanism at the 
moment to convey to the guest the information that the TSC is "perfect", 
and to convey the calibration values.

> That's *why* they use > 400: it's in the documentation.

static values do not capture conditional quality like that of the TSC.

and just in case it's not obvious: i am not arguing for the inclusion of 
the patch, i'm just pointing out the plain fact that in the case where 
the TSC _is_ reliable, 5 different clocksource drivers for has obvious 
disadvantages. Anyone arguing against that simple point needs his head 
examined :) Once we can pass around calibration information from the 
host to the guest (which we dont do at the moment) there will be reason 
not to use the native clocksource driver in the guest.

in the long run, the paravirt clocksource drivers must become _fallback_ 
drivers, for the case when the TSC is not perfect. Instead of the 
current "lets try to make it reliable but also nearly as fast as the 
TSC", which leads to inevitable breakage on SMP.

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-30  7:37     ` Ingo Molnar
@ 2007-10-30 10:52       ` Rusty Russell
  2007-10-30 12:13       ` Glauber de Oliveira Costa
  1 sibling, 0 replies; 29+ messages in thread
From: Rusty Russell @ 2007-10-30 10:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Glauber de Oliveira Costa, LKML,
	Jeremy Fitzhardinge, avi, kvm-devel, John Stultz

On Tuesday 30 October 2007 18:37:36 Ingo Molnar wrote:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> > No.  tsc is very good, it's not perfect.  If a paravirt clock
> > registers 400 it really means "pick me over the tsc".
>
> often the TSC is not perfect, but _IF_ it's perfect, using the paravirt
> driver is a pessimisation in performance.
>
> the main problem at the moment is that there's no mechanism at the
> moment to convey to the guest the information that the TSC is "perfect",
> and to convey the calibration values.

The host can communicate to the guest what clock to use: the guest can decide 
to register a paravirt clock or not depending on whether it wants to leave it 
to the TSC.

For a while we couldn't remove the TSC cpuid capability in the guest, because 
if you configured your kernel in some ways it was hardcoded on.  I think 
the "all 686+ have a tsc" assumption has now been fixed, so I should change 
the lguest clock to do as you said: register its clock at lower prio to the 
TSC and then the host can simply remove the TSC cpuid if it isn't suitable 
for the guest to use.

ISTR the core TSC timing code (which lguest could use) and various hardware 
manipulations (which lguest couldn't) were intertwined, but I'll have to go 
back and check exactly what the issue was.

> and just in case it's not obvious: i am not arguing for the inclusion of
> the patch

Unfortunately, you and Thomas both acked the patch.  This says v bad things 
about how much review kernel patches get.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 22:55     ` Zachary Amsden
  2007-10-29 23:02       ` Ingo Molnar
@ 2007-10-30 11:59       ` Glauber de Oliveira Costa
  1 sibling, 0 replies; 29+ messages in thread
From: Glauber de Oliveira Costa @ 2007-10-30 11:59 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Ingo Molnar, linux-kernel, tglx, rusty, jeremy, --cc, avi,
	kvm-devel, Glauber de Oliveira Costa, Dan Hecht, Garrett Smith

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Zachary Amsden escreveu:
> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
>> * Zachary Amsden <zach@vmware.com> wrote:
> 
>> if it's inaccurate why are you exposing it to the guest then? Native 
>> only uses the TSC if it's safe and accurate to do so.
> 
> Not every guest support paravirt, but for correctness, all guests
> require TSC, which must be exposed all the way up to userspace, no
> matter what the efficiency or accuracy may be.
> 
> Zach
> 
However, with such accuracy, it will fail the tsc clocksources tests. If
it passes, it'a good clocksource to use.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJxyujYI8LaFUWXMRAhI5AKCDl/O7iA3TdtpvElVg8NoSDVfKpgCeNvz3
ZEZDMxg8T7FA2R+5cgH/uKI=
=8qSH
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-29 23:13         ` Zachary Amsden
  2007-10-29 23:17           ` Ingo Molnar
@ 2007-10-30 12:02           ` Glauber de Oliveira Costa
  2007-10-30 17:58             ` Zachary Amsden
  1 sibling, 1 reply; 29+ messages in thread
From: Glauber de Oliveira Costa @ 2007-10-30 12:02 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Ingo Molnar, linux-kernel, tglx, rusty, jeremy, --cc, avi,
	kvm-devel, Glauber de Oliveira Costa, Dan Hecht, Garrett Smith

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Zachary Amsden escreveu:
> On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
>> * Zachary Amsden <zach@vmware.com> wrote:
> 
>>> Not every guest support paravirt, but for correctness, all guests 
>>> require TSC, which must be exposed all the way up to userspace, no 
>>> matter what the efficiency or accuracy may be.
>> but if there's a perfect TSC available (there is such hardware) then the 
>> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
>> in essence.
> 
> No, if no paravirt clocksource is detected, nothing can override the
> perfect TSC hardware clocksource rating of 400.  And if a paravirt
> clocksource is detected, it is always better than TSC.

Why always? tsc is the best possible alternative the _platform_ can
provide. So the paravirt clocksource will be at best, as good as tsc.
And if it is the case: why bother with communication mechanisms of all
kinds, calculations, etc, if we can just read the tsc?

Noting again: If the tsc does not arrive accurate to the guest, it will
fail the tsc clocksource tests. So it will be rated zero anyway



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJx1KjYI8LaFUWXMRAv0hAJ4sj0Z1FraYrgbU5Mj0pWOJGk6jtwCfc5xL
jpTC273X0oqPTCR7NcVHJOI=
=WPzV
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-30  7:37     ` Ingo Molnar
  2007-10-30 10:52       ` Rusty Russell
@ 2007-10-30 12:13       ` Glauber de Oliveira Costa
  1 sibling, 0 replies; 29+ messages in thread
From: Glauber de Oliveira Costa @ 2007-10-30 12:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rusty Russell, Thomas Gleixner, LKML, Jeremy Fitzhardinge, --cc,
	avi, kvm-devel, John Stultz

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ingo Molnar escreveu:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> and just in case it's not obvious: i am not arguing for the inclusion of 
> the patch, i'm just pointing out the plain fact that in the case where 
> the TSC _is_ reliable, 5 different clocksource drivers for has obvious 
> disadvantages. Anyone arguing against that simple point needs his head 
> examined :) Once we can pass around calibration information from the 
> host to the guest (which we dont do at the moment) there will be reason 
> not to use the native clocksource driver in the guest.
If you sustain that we cannot have a reliable synchronization test
mechanism, neither do I. All this is based in the assumption that a bad
tsc will fail such tests.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJyAHjYI8LaFUWXMRAtYoAJ9l5kNodRLfTsNHDvyioufM7SQzTACfarEy
XXq3sDq9uxZ/72hhA46YmgM=
=DwN/
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] raise tsc clocksource rating
  2007-10-30 12:02           ` Glauber de Oliveira Costa
@ 2007-10-30 17:58             ` Zachary Amsden
  0 siblings, 0 replies; 29+ messages in thread
From: Zachary Amsden @ 2007-10-30 17:58 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: Ingo Molnar, linux-kernel, tglx, rusty, jeremy, --cc, avi,
	kvm-devel, Glauber de Oliveira Costa, Dan Hecht, Garrett Smith

On Tue, 2007-10-30 at 10:02 -0200, Glauber de Oliveira Costa wrote:

> > No, if no paravirt clocksource is detected, nothing can override the
> > perfect TSC hardware clocksource rating of 400.  And if a paravirt
> > clocksource is detected, it is always better than TSC.
> 
> Why always? tsc is the best possible alternative the _platform_ can
> provide. So the paravirt clocksource will be at best, as good as tsc.

What is the justification for this claim?  The TSC provides no
information about frequency.  The frequency must be measured from a
known fixed rate source.  This measurement is error prone, both on real
hardware, and even more so in a VM.

The TSC can only provide a single measure of elapsed periods.  It has no
concept to differentiate between elapsed periods of time spent in the
guest, which is relevant to the scheduler, and elapsed periods of time
spent in real time, which is relevant to timekeeping.

So a TSC can not drive both timekeeping and scheduling in a virtual
machine, or if it does, it does so inaccurately.


> And if it is the case: why bother with communication mechanisms of all
> kinds, calculations, etc, if we can just read the tsc?
> 
> Noting again: If the tsc does not arrive accurate to the guest, it will
> fail the tsc clocksource tests. So it will be rated zero anyway

You always need to provide a TSC, and it will always initially appear to
be accurate.  Long term, as time is reported via the TSC is an
approximation somewhere between real time and elapsed running time,
measurements made using it in a VM will drift, and it is impossible to
remove the inaccuracy from one end of the spectrum (by providing real
time exactly) without compromising it at the other end (by skewing
relative time measurement).

A paravirt clocksource is always better than a TSC because it
compensates for these effects.

Zach


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-10-30 17:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-29 23:10 [PATCH] raise tsc clocksource rating Glauber de Oliveira Costa
2007-10-29 22:17 ` Thomas Gleixner
2007-10-29 22:36   ` Ingo Molnar
2007-10-30  1:26   ` john stultz
2007-10-30  2:39   ` Rusty Russell
2007-10-30  7:37     ` Ingo Molnar
2007-10-30 10:52       ` Rusty Russell
2007-10-30 12:13       ` Glauber de Oliveira Costa
2007-10-29 22:42 ` Zachary Amsden
2007-10-29 22:45   ` Jeremy Fitzhardinge
2007-10-29 22:48   ` Ingo Molnar
2007-10-29 22:52     ` Jeremy Fitzhardinge
2007-10-29 22:55       ` Ingo Molnar
2007-10-29 23:17         ` Jeremy Fitzhardinge
2007-10-29 23:21           ` Ingo Molnar
2007-10-29 23:33             ` Jeremy Fitzhardinge
2007-10-30  0:45             ` Ian Pratt
2007-10-30  7:19               ` Ingo Molnar
2007-10-29 22:55     ` Zachary Amsden
2007-10-29 23:02       ` Ingo Molnar
2007-10-29 23:13         ` Zachary Amsden
2007-10-29 23:17           ` Ingo Molnar
2007-10-30 12:02           ` Glauber de Oliveira Costa
2007-10-30 17:58             ` Zachary Amsden
2007-10-29 23:24         ` Dan Hecht
2007-10-30  4:24           ` [kvm-devel] " Avi Kivity
2007-10-30  7:14           ` Ingo Molnar
2007-10-30 11:59       ` Glauber de Oliveira Costa
2007-10-30  0:17 ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox