public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/3] kvmclock - the host part.
       [not found]   ` <11945615703593-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2007-11-08 22:39     ` Glauber de Oliveira Costa
       [not found]       ` <11945615751747-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-08 22:39 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: jeremy-TSDbQ3PG+2Y, hollisb-r/Jw6+rmf7HQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	avi-atKUWr5tajBWk0Htik3J/w, Glauber de Oliveira Costa

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 drivers/kvm/kvm_main.c |    1 +
 drivers/kvm/x86.c      |   32 ++++++++++++++++++++++++++++++++
 drivers/kvm/x86.h      |    4 ++++
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index d095002..c2c79b8 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1243,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
 		case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
 		case KVM_CAP_USER_MEMORY:
 		case KVM_CAP_SET_TSS_ADDR:
+		case KVM_CAP_CLOCKSOURCE:
 			r = 1;
 			break;
 		default:
diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..ef31fed 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -19,6 +19,7 @@
 #include "segment_descriptor.h"
 #include "irq.h"
 
+#include <linux/clocksource.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
 #include <linux/vmalloc.h>
@@ -1628,6 +1629,28 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
+static void kvm_write_guest_time(struct kvm_vcpu *vcpu)
+{
+	struct timespec ts;
+	int r;
+
+	if (!vcpu->clock_gpa)
+		return;
+
+	/* Updates version to the next odd number, indicating we're writing */
+	vcpu->hv_clock.version++;
+	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE);
+
+	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.last_tsc);
+
+	ktime_get_ts(&ts);
+	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
+	vcpu->hv_clock.wc_sec = get_seconds();
+	vcpu->hv_clock.version++;
+	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa, &vcpu->hv_clock, PAGE_SIZE);
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -1648,7 +1671,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &= 0xFFFFFFFF;
 	}
 
+	ret = 0;
 	switch (nr) {
+	case  KVM_HCALL_REGISTER_CLOCK:
+
+		vcpu->clock_gpa = a0 << PAGE_SHIFT;
+		vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22);
+
+		break;
+
 	default:
 		ret = -KVM_ENOSYS;
 		break;
@@ -1924,6 +1955,7 @@ out:
 		goto preempted;
 	}
 
+	kvm_write_guest_time(vcpu);
 	post_kvm_run_save(vcpu, kvm_run);
 
 	return r;
diff --git a/drivers/kvm/x86.h b/drivers/kvm/x86.h
index 663b822..fd77b66 100644
--- a/drivers/kvm/x86.h
+++ b/drivers/kvm/x86.h
@@ -83,6 +83,10 @@ struct kvm_vcpu {
 	/* emulate context */
 
 	struct x86_emulate_ctxt emulate_ctxt;
+
+	struct kvm_hv_clock_s hv_clock;
+	gpa_t clock_gpa; /* guest frame number, physical addr */
+
 };
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code);
-- 
1.5.0.6


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]       ` <11945615751747-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2007-11-11 10:17         ` Avi Kivity
       [not found]           ` <4736D6C0.8040408-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-11-13  5:00         ` Dong, Eddie
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2007-11-11 10:17 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
>
>   

Missing live migration support  (a way for userspace to read and write 
the guest clock address).  Should probably be in a separate patch.

> @@ -1924,6 +1955,7 @@ out:
>  		goto preempted;
>  	}
>  
> +	kvm_write_guest_time(vcpu);
>  	post_kvm_run_save(vcpu, kvm_run);
>   

Why here?  Seems like we're leaving the guest for a while at this place.

Suggest putting it on top of __vcpu_run(), guarded by a flag, and 
setting the flag every time we put the vcpu.

 



-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]       ` <11945615751747-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2007-11-11 10:17         ` Avi Kivity
@ 2007-11-13  5:00         ` Dong, Eddie
       [not found]           ` <10EA09EFD8728347A513008B6B0DA77A025DF8A2-wq7ZOvIWXbNpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Dong, Eddie @ 2007-11-13  5:00 UTC (permalink / raw)
  To: Glauber de Oliveira Costa, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	avi-atKUWr5tajBWk0Htik3J/w, hollisb-r/Jw6+rmf7HQT0dZR+AlfA


> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
> +	struct timespec ts;
> +	int r;
> +
> +	if (!vcpu->clock_gpa)
> +		return;
> +
> +	/* Updates version to the next odd number, indicating
> we're writing */
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
> &vcpu->hv_clock, PAGE_SIZE);
> +
> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.last_tsc);
> +
> +	ktime_get_ts(&ts);

Do we need to disable preemption here?

> +	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
> (u64)ts.tv_sec); +	vcpu->hv_clock.wc_sec = get_seconds();
> +	vcpu->hv_clock.version++;
> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
> &vcpu->hv_clock, PAGE_SIZE);
> +}
> +
thx,eddie

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]           ` <4736D6C0.8040408-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-13 11:28             ` Glauber de Oliveira Costa
       [not found]               ` <47398A68.8080009-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-13 11:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Avi Kivity escreveu:
> Glauber de Oliveira Costa wrote:
>> This is the host part of kvm clocksource implementation. As it does
>> not include clockevents, it is a fairly simple implementation. We
>> only have to register a per-vcpu area, and start writting to it
>> periodically.
>>
>>   
> 
> Missing live migration support  (a way for userspace to read and write
> the guest clock address).  Should probably be in a separate patch.

I think it's a matter of issuing a hypercall for reading the clock
address. It's fair simple, and can be done in a later version of this patch.
As for writting, the register hypercall itself can be used. It has no
special side-effects we should care about.

>> @@ -1924,6 +1955,7 @@ out:
>>          goto preempted;
>>      }
>>  
>> +    kvm_write_guest_time(vcpu);
>>      post_kvm_run_save(vcpu, kvm_run);
>>   
> 
> Why here?  Seems like we're leaving the guest for a while at this place.
> 
> Suggest putting it on top of __vcpu_run(), guarded by a flag, and
> setting the flag every time we put the vcpu.

No special preference. It just sounded exity enough to me. I can move to
where you suggest.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHOYpojYI8LaFUWXMRApf8AJ4jQ/ZTBlub1IwFkJrYZyart+f7bwCfT+9m
l1Rblsmw96ZatCf60g2dNYY=
=DBpn
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]           ` <10EA09EFD8728347A513008B6B0DA77A025DF8A2-wq7ZOvIWXbNpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2007-11-13 11:54             ` Glauber de Oliveira Costa
       [not found]               ` <4739906B.2080103-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Glauber de Oliveira Costa @ 2007-11-13 11:54 UTC (permalink / raw)
  To: Dong, Eddie
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA, avi-atKUWr5tajBWk0Htik3J/w

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dong, Eddie escreveu:
>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>> +	struct timespec ts;
>> +	int r;
>> +
>> +	if (!vcpu->clock_gpa)
>> +		return;
>> +
>> +	/* Updates version to the next odd number, indicating
>> we're writing */
>> +	vcpu->hv_clock.version++;
>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>> &vcpu->hv_clock, PAGE_SIZE);
>> +
>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>> +			  &vcpu->hv_clock.last_tsc);
>> +
>> +	ktime_get_ts(&ts);
> 
> Do we need to disable preemption here?
After thinking for a little while, you are theoretically right.
In the current state, we could even be preempted between all operations
;-) Maybe after avi's suggestion of moving the call to it it will end up
in a preempt safe region, but anyway, it's safer to add the preempt
markers here.
I'll put it in next version, thanks

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHOZBrjYI8LaFUWXMRAo81AKCfbkzhLl7F6BUjzUHVyErCFeHxFACg1teB
eqsOnJiAqB3JiYf+2YdMZ4o=
=ENKU
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]               ` <4739906B.2080103-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2007-11-13 12:08                 ` Izik Eidus
  2007-11-13 14:47                 ` Avi Kivity
  1 sibling, 0 replies; 19+ messages in thread
From: Izik Eidus @ 2007-11-13 12:08 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: jeremy-TSDbQ3PG+2Y, hollisb-r/Jw6+rmf7HQT0dZR+AlfA,
	kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, avi-atKUWr5tajBWk0Htik3J/w

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dong, Eddie escreveu:
>   
>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>> +	struct timespec ts;
>>> +	int r;
>>> +
>>> +	if (!vcpu->clock_gpa)
>>> +		return;
>>> +
>>> +	/* Updates version to the next odd number, indicating
>>> we're writing */
>>> +	vcpu->hv_clock.version++;
>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>> &vcpu->hv_clock, PAGE_SIZE);
>>> +
>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>> +			  &vcpu->hv_clock.last_tsc);
>>> +
>>> +	ktime_get_ts(&ts);
>>>       
>> Do we need to disable preemption here?
>>     
> After thinking for a little while, you are theoretically right.
> In the current state, we could even be preempted between all operations
> ;-) Maybe after avi's suggestion of moving the call to it it will end up
> in a preempt safe region, but anyway, it's safer to add the preempt
> markers here.
> I'll put it in next version, thanks
>   
note that you cant call to kvm_write_guest with preemption disabled 
(witch you do few lines below)

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]               ` <47398A68.8080009-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2007-11-13 14:44                 ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2007-11-13 14:44 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Avi Kivity escreveu:
>   
>> Glauber de Oliveira Costa wrote:
>>     
>>> This is the host part of kvm clocksource implementation. As it does
>>> not include clockevents, it is a fairly simple implementation. We
>>> only have to register a per-vcpu area, and start writting to it
>>> periodically.
>>>
>>>   
>>>       
>> Missing live migration support  (a way for userspace to read and write
>> the guest clock address).  Should probably be in a separate patch.
>>     
>
> I think it's a matter of issuing a hypercall for reading the clock
> address. It's fair simple, and can be done in a later version of this patch.
> As for writting, the register hypercall itself can be used. It has no
> special side-effects we should care about.
>   

kvm live migration is done (at least thus far) without guest 
involvement.  So you need the host to be able to transfer this state.


>   
>>> @@ -1924,6 +1955,7 @@ out:
>>>          goto preempted;
>>>      }
>>>  
>>> +    kvm_write_guest_time(vcpu);
>>>      post_kvm_run_save(vcpu, kvm_run);
>>>   
>>>       
>> Why here?  Seems like we're leaving the guest for a while at this place.
>>
>> Suggest putting it on top of __vcpu_run(), guarded by a flag, and
>> setting the flag every time we put the vcpu.
>>     
>
> No special preference. It just sounded exity enough to me. I can move to
> where you suggest.
>
>   

It's more than a place, it's a set of rules:

- if the vcpu is migrated, we need a new timebase
- ditto if we're descheduled (well that's the same thing)
- if "some time passes"


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]               ` <4739906B.2080103-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2007-11-13 12:08                 ` Izik Eidus
@ 2007-11-13 14:47                 ` Avi Kivity
       [not found]                   ` <4739B916.4000405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2007-11-13 14:47 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

Glauber de Oliveira Costa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dong, Eddie escreveu:
>   
>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>> +	struct timespec ts;
>>> +	int r;
>>> +
>>> +	if (!vcpu->clock_gpa)
>>> +		return;
>>> +
>>> +	/* Updates version to the next odd number, indicating
>>> we're writing */
>>> +	vcpu->hv_clock.version++;
>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>> &vcpu->hv_clock, PAGE_SIZE);
>>> +
>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>> +			  &vcpu->hv_clock.last_tsc);
>>> +
>>> +	ktime_get_ts(&ts);
>>>       
>> Do we need to disable preemption here?
>>     
> After thinking for a little while, you are theoretically right.
> In the current state, we could even be preempted between all operations
> ;-) Maybe after avi's suggestion of moving the call to it it will end up
> in a preempt safe region, but anyway, it's safer to add the preempt
> markers here.
> I'll put it in next version, thanks
>
>   

Well, you can't kvm_write_guest() with preemption enabled.

preempt notifiers to the rescue!  We have a callout during preemption, 
so you can just zero out a flag there, and when we're scheduled again 
retry the whole thing.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]                   ` <4739B916.4000405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-13 15:23                     ` Dong, Eddie
       [not found]                       ` <10EA09EFD8728347A513008B6B0DA77A025DFB75-wq7ZOvIWXbNpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dong, Eddie @ 2007-11-13 15:23 UTC (permalink / raw)
  To: Avi Kivity, Glauber de Oliveira Costa
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA

Avi Kivity wrote:
> Glauber de Oliveira Costa wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Dong, Eddie escreveu:
>> 
>>>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu) +{
>>>> +	struct timespec ts; +	int r;
>>>> +
>>>> +	if (!vcpu->clock_gpa)
>>>> +		return;
>>>> +
>>>> +	/* Updates version to the next odd number, indicating we're
>>>> writing */ +	vcpu->hv_clock.version++;
>>>> +	kvm_write_guest(vcpu->kvm, vcpu->clock_gpa,
>>>> &vcpu->hv_clock, PAGE_SIZE);
>>>> +
>>>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>>>> +			  &vcpu->hv_clock.last_tsc);
>>>> +
>>>> +	ktime_get_ts(&ts);
>>>> 
>>> Do we need to disable preemption here?
>>> 
>> After thinking for a little while, you are theoretically right.
>> In the current state, we could even be preempted between all
>> operations ;-) Maybe after avi's suggestion of moving the call to it
>> it will end up in a preempt safe region, but anyway, it's safer to
>> add the preempt markers here. I'll put it in next version, thanks
>> 
>> 
> 
> Well, you can't kvm_write_guest() with preemption enabled.
> 
> preempt notifiers to the rescue!  We have a callout during preemption,
> so you can just zero out a flag there, and when we're scheduled again
> retry the whole thing. 
> 

The preemption issue is within following code which need to be done in a
short enough period.

+	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.last_tsc);
+
+	ktime_get_ts(&ts);
+	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
(u64)ts.tv_sec);
+	vcpu->hv_clock.wc_sec = get_seconds();

I am even thinking we have to disable interrupt between these lines,
otherwise
guest wall clock may see backward time source when calculating the
delta TSC since last vcpu->hv_clock.now_ns update.

Also why we use both ktime_get_ts(&ts) & get_seconds() together? they
are not atomic
and may cause issues?

thx,eddie

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]                       ` <10EA09EFD8728347A513008B6B0DA77A025DFB75-wq7ZOvIWXbNpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2007-11-13 16:12                         ` Avi Kivity
       [not found]                           ` <4739CD03.90406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2007-11-13 16:12 UTC (permalink / raw)
  To: Dong, Eddie
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA, Glauber de Oliveira Costa

Dong, Eddie wrote:
>>>
>>> After thinking for a little while, you are theoretically right.
>>> In the current state, we could even be preempted between all
>>> operations ;-) Maybe after avi's suggestion of moving the call to it
>>> it will end up in a preempt safe region, but anyway, it's safer to
>>> add the preempt markers here. I'll put it in next version, thanks
>>>
>>>
>>>       
>> Well, you can't kvm_write_guest() with preemption enabled.
>>
>> preempt notifiers to the rescue!  We have a callout during preemption,
>> so you can just zero out a flag there, and when we're scheduled again
>> retry the whole thing. 
>>
>>     
>
> The preemption issue is within following code which need to be done in a
> short enough period.
>
> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.last_tsc);
> +
> +	ktime_get_ts(&ts);
> +	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
> (u64)ts.tv_sec);
> +	vcpu->hv_clock.wc_sec = get_seconds();
>
> I am even thinking we have to disable interrupt between these lines,
> otherwise
> guest wall clock may see backward time source when calculating the
> delta TSC since last vcpu->hv_clock.now_ns update.
>   

That's true.  While we do need to handle vcpu migration and 
descheduling, the code sequence you note needs to be as atomic as possible.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]                           ` <4739CD03.90406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-11-14  0:41                             ` Dong, Eddie
  0 siblings, 0 replies; 19+ messages in thread
From: Dong, Eddie @ 2007-11-14  0:41 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	hollisb-r/Jw6+rmf7HQT0dZR+AlfA, Glauber de Oliveira Costa

>> 
>> +	kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER,
>> +			  &vcpu->hv_clock.last_tsc);
>> +
>> +	ktime_get_ts(&ts);
>> +	vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC *
>> (u64)ts.tv_sec); +	vcpu->hv_clock.wc_sec = get_seconds();
>> 
>> I am even thinking we have to disable interrupt between these lines,
>> otherwise guest wall clock may see backward time source when
>> calculating the delta TSC since last vcpu->hv_clock.now_ns update.
>> 
> 
> That's true.  While we do need to handle vcpu migration and
> descheduling, the code sequence you note needs to be as atomic as
> possible. 
> 

Yes VCPU migration is another issue. Since the physical platform TSC is 
totally different across migration, so we can't expose host TSC to
guest.
For KVM, it should be simple to just use guest_TSC (=HOST_TSC + offset).

thanks, eddie

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/3] kvm clock - with the patches this time
@ 2008-01-15 15:10 Glauber de Oliveira Costa
       [not found] ` <1200409817411-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Glauber de Oliveira Costa @ 2008-01-15 15:10 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
  Cc: jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w

The 3 patches should be attached now.

userspace follows



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] put kvm_para.h include outside __KERNEL__
       [not found] ` <1200409817411-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2008-01-15 15:10   ` Glauber de Oliveira Costa
       [not found]     ` <120040983179-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Glauber de Oliveira Costa @ 2008-01-15 15:10 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
  Cc: jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w,
	Glauber de Oliveira Costa

kvm_para.h potentially contains definitions that are to be used by kvm-userspace,
so it should not be included inside the __KERNEL__ block. To protect its own data structures,
kvm_para.h already includes its own __KERNEL__ block.

Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: Amit Shah <amit.shah-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
---
 include/linux/kvm_para.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 6af91a5..5497aac 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -14,12 +14,12 @@
 
 #define KVM_HC_VAPIC_POLL_IRQ            1
 
-#ifdef __KERNEL__
 /*
  * hypercalls use architecture specific
  */
 #include <asm/kvm_para.h>
 
+#ifdef __KERNEL__
 static inline int kvm_para_has_feature(unsigned int feature)
 {
 	if (kvm_arch_para_features() & (1UL << feature))
-- 
1.5.0.6


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] kvmclock - the host part.
       [not found]     ` <120040983179-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2008-01-15 15:10       ` Glauber de Oliveira Costa
       [not found]         ` <12004098373625-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2008-01-16  8:41       ` [PATCH 1/3] put kvm_para.h include outside __KERNEL__ Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Glauber de Oliveira Costa @ 2008-01-15 15:10 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
  Cc: jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w,
	Glauber de Oliveira Costa

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

The area is binary compatible with xen, as we use the same shadow_info structure.

Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/kvm/x86.c         |   79 +++++++++++++++++++++++++++++++++++++++++++-
 include/asm-x86/kvm_host.h |    4 ++
 include/asm-x86/kvm_para.h |   37 ++++++++++++++++++++
 include/linux/kvm.h        |    1 +
 4 files changed, 120 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8a90403..53b5692 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -19,6 +19,7 @@
 #include "irq.h"
 #include "mmu.h"
 
+#include <linux/clocksource.h>
 #include <linux/kvm.h>
 #include <linux/fs.h>
 #include <linux/vmalloc.h>
@@ -412,7 +413,7 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
 	MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-	MSR_IA32_TIME_STAMP_COUNTER,
+	MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_PARAVIRT_CLOCK,
 };
 
 static unsigned num_msrs_to_save;
@@ -467,6 +468,60 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr(vcpu, index, *data);
 }
 
+#define WC_OFFSET offsetof(struct xen_shared_info, wc_version)
+
+static void kvm_write_guest_time(struct kvm_vcpu *v)
+{
+	struct timespec ts, wc_ts;
+	int wc_args[3]; /* version, wc_sec, wc_nsec */
+	unsigned long flags;
+	struct kvm_vcpu_arch *vcpu = &v->arch;
+	struct xen_shared_info *shared_kaddr;
+
+	if ((!vcpu->shared_page))
+		return;
+
+	/* Keep irq disabled to prevent changes to the clock */
+	local_irq_save(flags);
+	kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
+			  &vcpu->hv_clock.tsc_timestamp);
+	wc_ts = current_kernel_time();
+	ktime_get_ts(&ts);
+	local_irq_restore(flags);
+
+	/* With all the info we got, fill in the values */
+	wc_args[1] = wc_ts.tv_sec;
+	wc_args[2] = wc_ts.tv_nsec;
+
+	vcpu->hv_clock.system_time = ts.tv_nsec +
+				     (NSEC_PER_SEC * (u64)ts.tv_sec);
+	/*
+	 * The interface expects us to write an even number signaling that the
+	 * update is finished. Since the guest won't see the intermediate states,
+	 * we just write "2" at the end
+	 */
+	wc_args[0] = 2;
+	vcpu->hv_clock.version = 2;
+
+	preempt_disable();
+
+	shared_kaddr = kmap_atomic(vcpu->shared_page, KM_USER0);
+
+	/*
+	 * We could write everything at once, but it can break future
+	 * implementations. We're just a tiny and lonely clock, so let's
+	 * write only what matters here
+	 */
+	memcpy(&shared_kaddr->wc_version, wc_args, sizeof(wc_args));
+	memcpy(&shared_kaddr->vcpu_info[v->vcpu_id].time, &vcpu->hv_clock,
+		sizeof(vcpu->hv_clock));
+
+	kunmap_atomic(shared_kaddr, KM_USER0);
+	preempt_enable();
+
+	mark_page_dirty(v->kvm, vcpu->shared_info >> PAGE_SHIFT);
+}
+
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
@@ -494,6 +549,20 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	case MSR_IA32_MISC_ENABLE:
 		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
+	case MSR_KVM_PARAVIRT_CLOCK: {
+		vcpu->arch.shared_info = data;
+
+		vcpu->arch.hv_clock.tsc_to_system_mul =
+					clocksource_khz2mult(tsc_khz, 22);
+		vcpu->arch.hv_clock.tsc_shift = 22;
+
+		down_write(&current->mm->mmap_sem);
+		vcpu->arch.shared_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT);
+		up_write(&current->mm->mmap_sem);
+		if (is_error_page(vcpu->arch.shared_page))
+			vcpu->arch.shared_page = NULL;
+		break;
+	}
 	default:
 		pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
 		return 1;
@@ -553,6 +622,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 		data = vcpu->arch.shadow_efer;
 		break;
 #endif
+	case MSR_KVM_PARAVIRT_CLOCK:
+		data = vcpu->arch.shared_info;
+		break;
+
 	default:
 		pr_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
 		return 1;
@@ -680,6 +753,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_USER_MEMORY:
 	case KVM_CAP_SET_TSS_ADDR:
 	case KVM_CAP_EXT_CPUID:
+	case KVM_CAP_CLOCKSOURCE:
 		r = 1;
 		break;
 	case KVM_CAP_VAPIC:
@@ -737,6 +811,7 @@ out:
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	kvm_x86_ops->vcpu_load(vcpu, cpu);
+	kvm_write_guest_time(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -3190,6 +3265,8 @@ static void kvm_free_vcpus(struct kvm *kvm)
 			kvm_unload_vcpu_mmu(kvm->vcpus[i]);
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
+			if (kvm->vcpus[i]->arch.shared_page)
+				kvm_release_page_dirty(kvm->vcpus[i]->arch.shared_page);
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
 			kvm->vcpus[i] = NULL;
 		}
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index d6db0de..9a66b90 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -261,6 +261,10 @@ struct kvm_vcpu_arch {
 	/* emulate context */
 
 	struct x86_emulate_ctxt emulate_ctxt;
+
+	struct xen_vcpu_time_info hv_clock;
+	gpa_t shared_info;
+	struct page *shared_page;
 };
 
 struct kvm_mem_alias {
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..145107d 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -10,10 +10,47 @@
  * paravirtualization, the appropriate feature bit should be checked.
  */
 #define KVM_CPUID_FEATURES	0x40000001
+#define KVM_FEATURE_CLOCKSOURCE 0
+
+#define MSR_KVM_PARAVIRT_CLOCK  0x11
 
 #ifdef __KERNEL__
 #include <asm/processor.h>
 
+/* xen binary-compatible interfaces. See xen headers for details */
+struct xen_vcpu_time_info {
+	uint32_t version;
+	uint32_t pad0;
+	uint64_t tsc_timestamp;
+	uint64_t system_time;
+	uint32_t tsc_to_system_mul;
+	int8_t   tsc_shift;
+	int8_t   pad1[3];
+};
+
+struct xen_vcpu_info {
+	uint8_t  pad[32];
+	struct xen_vcpu_time_info time;
+};
+
+#define XEN_MAX_VIRT_CPUS	32
+
+struct xen_shared_info {
+	struct xen_vcpu_info vcpu_info[XEN_MAX_VIRT_CPUS];
+
+	unsigned long evt[2];
+
+	uint32_t wc_version;      /* Version counter: see vcpu_time_info_t. */
+	uint32_t wc_sec;          /* Secs  00:00:00 UTC, Jan 1, 1970.  */
+	uint32_t wc_nsec;         /* Nsecs 00:00:00 UTC, Jan 1, 1970.  */
+
+	unsigned long pad[12];
+};
+
+
+extern void kvmclock_init(void);
+
+
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
  */
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 4de4fd2..78ce53f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -232,6 +232,7 @@ struct kvm_vapic_addr {
 #define KVM_CAP_SET_TSS_ADDR 4
 #define KVM_CAP_EXT_CPUID 5
 #define KVM_CAP_VAPIC 6
+#define KVM_CAP_CLOCKSOURCE 7
 
 /*
  * ioctls for VM fds
-- 
1.5.0.6


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] [PATCH] kvmclock implementation, the guest part.
       [not found]         ` <12004098373625-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2008-01-15 15:10           ` Glauber de Oliveira Costa
  2008-01-16  8:39           ` [PATCH 2/3] kvmclock - the host part Avi Kivity
  1 sibling, 0 replies; 19+ messages in thread
From: Glauber de Oliveira Costa @ 2008-01-15 15:10 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
  Cc: jeremy-TSDbQ3PG+2Y, avi-atKUWr5tajBWk0Htik3J/w,
	Glauber de Oliveira Costa

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

Signed-off-by: Glauber de Oliveira Costa <gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 arch/x86/Kconfig            |   10 +++
 arch/x86/kernel/Makefile_32 |    1 +
 arch/x86/kernel/kvmclock.c  |  161 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup_32.c  |    5 ++
 4 files changed, 177 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/kvmclock.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab2df55..968315e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -350,6 +350,16 @@ config VMI
 	  at the moment), by linking the kernel to a GPL-ed ROM module
 	  provided by the hypervisor.
 
+config KVM_CLOCK
+	bool "KVM paravirtualized clock"
+	select PARAVIRT
+	help
+	  Turning on this option will allow you to run a paravirtualized clock
+	  when running over the KVM hypervisor. Instead of relying on a PIT
+	  (or probably other) emulation by the underlying device model, the host
+	  provides the guest with timing infrastructure, as time of day, and
+	  timer expiration.
+
 source "arch/x86/lguest/Kconfig"
 
 endif
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index a7bc93c..f6332b6 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -44,6 +44,7 @@ obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_MGEODE_LX)		+= geode_32.o mfgpt_32.o
 
 obj-$(CONFIG_VMI)		+= vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_CLOCK)		+= kvmclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt_32.o
 obj-y				+= pcspeaker.o
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
new file mode 100644
index 0000000..317bee0
--- /dev/null
+++ b/arch/x86/kernel/kvmclock.c
@@ -0,0 +1,161 @@
+/*  KVM paravirtual clock driver. A clocksource implementation
+    Copyright (C) 2008 Glauber de Oliveira Costa, Red Hat Inc.
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+*/
+
+#include <linux/clocksource.h>
+#include <linux/kvm_para.h>
+#include <asm/arch_hooks.h>
+#include <asm/msr.h>
+
+
+#define KVM_SCALE 22
+
+static int kvmclock = 1;
+
+static int parse_no_kvmclock(char *arg)
+{
+	kvmclock = 0;
+	return 0;
+}
+early_param("no-kvmclock", parse_no_kvmclock);
+
+struct xen_shared_info shared_info __attribute__((__aligned__(PAGE_SIZE)));
+
+/* The hypervisor will put information about time periodically here */
+DEFINE_PER_CPU(struct xen_vcpu_time_info *, hv_clock);
+#define get_clock(cpu, field) per_cpu(hv_clock, cpu)->field
+
+static inline u64 kvm_get_delta(u64 last_tsc)
+{
+	int cpu = smp_processor_id();
+	u64 delta = native_read_tsc() - last_tsc;
+	return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE;
+}
+
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that. Even if the tsc is not accurate, it gives us a more accurate timing
+ * than not adjusting at all
+ */
+unsigned long kvm_get_wallclock(void)
+{
+	u32 wc_sec, wc_nsec;
+	u64 delta, last_tsc;
+	struct timespec ts;
+	int version, nsec, cpu = smp_processor_id();
+
+	do {
+		version = shared_info.wc_version;
+		rmb();
+		wc_sec = shared_info.wc_sec;
+		wc_nsec = shared_info.wc_nsec;
+		last_tsc = get_clock(cpu, tsc_timestamp);
+		rmb();
+	} while ((shared_info.wc_version != version) || (version & 1));
+
+	delta = kvm_get_delta(last_tsc);
+	delta += wc_nsec;
+	nsec = do_div(delta, NSEC_PER_SEC);
+	set_normalized_timespec(&ts, wc_sec + delta, nsec);
+	/*
+	 * Of all mechanisms of time adjustment I've tested, this one
+	 * was the champion!
+	 */
+	return ts.tv_sec + 1;
+}
+
+int kvm_set_wallclock(unsigned long now)
+{
+	return 0;
+}
+
+/*
+ * This is our read_clock function. The host puts an tsc timestamp each time
+ * it updates a new time, and then we can use it to derive a slightly more
+ * precise notion of elapsed time, converted to nanoseconds.
+ */
+static cycle_t kvm_clock_read(void)
+{
+	u64 last_tsc, now;
+	u32 version;
+	int cpu;
+
+	preempt_disable();
+	cpu = smp_processor_id();
+
+	do {
+		version = get_clock(cpu, version);
+		rmb();
+		last_tsc = get_clock(cpu, tsc_timestamp);
+		now = get_clock(cpu, system_time);
+		rmb();
+	} while ((get_clock(cpu, version) != version) || (version & 1));
+
+	now += kvm_get_delta(last_tsc);
+	preempt_enable();
+
+	return now;
+}
+
+static struct clocksource kvm_clock = {
+	.name = "kvm-clock",
+	.read = kvm_clock_read,
+	.rating = 400,
+	.mask = CLOCKSOURCE_MASK(64),
+	.mult = 1 << KVM_SCALE,
+	.shift = KVM_SCALE,
+	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static int kvm_register_clock(void)
+{
+	int cpu = smp_processor_id();
+	u64 kvm_clock_info = (u64)(u64 *)&shared_info;
+
+	kvm_clock_info = __pa(kvm_clock_info);
+	__get_cpu_var(hv_clock) = &(shared_info.vcpu_info[cpu].time);
+
+	return native_write_msr_safe(MSR_KVM_PARAVIRT_CLOCK, kvm_clock_info);
+}
+
+static void kvm_setup_secondary_clock(void)
+{
+	/*
+	 * Now that the first cpu already had this clocksource initialized,
+	 * we shouldn't fail.
+	 */
+	WARN_ON(kvm_register_clock());
+	/* ok, done with our trickery, call native */
+	setup_secondary_APIC_clock();
+}
+
+void __init kvmclock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+
+	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
+		if (kvm_register_clock())
+			return;
+		pv_time_ops.get_wallclock = kvm_get_wallclock;
+		pv_time_ops.set_wallclock = kvm_set_wallclock;
+		pv_time_ops.sched_clock = kvm_clock_read;
+		pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock;
+		clocksource_register(&kvm_clock);
+	}
+}
diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c
index 9c24b45..89c0eb2 100644
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -44,6 +44,7 @@
 #include <linux/crash_dump.h>
 #include <linux/dmi.h>
 #include <linux/pfn.h>
+#include <linux/kvm_para.h>
 
 #include <video/edid.h>
 
@@ -614,6 +615,10 @@ void __init setup_arch(char **cmdline_p)
 
 	max_low_pfn = setup_memory();
 
+#ifdef CONFIG_KVM_CLOCK
+	kvmclock_init();
+#endif
+
 #ifdef CONFIG_VMI
 	/*
 	 * Must be after max_low_pfn is determined, and before kernel
-- 
1.5.0.6


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]         ` <12004098373625-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2008-01-15 15:10           ` [PATCH 3/3] [PATCH] kvmclock implementation, the guest part Glauber de Oliveira Costa
@ 2008-01-16  8:39           ` Avi Kivity
       [not found]             ` <478DC2CF.4040803-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2008-01-16  8:39 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y

Glauber de Oliveira Costa wrote:

> +static void kvm_write_guest_time(struct kvm_vcpu *v)
> +{
> +	struct timespec ts, wc_ts;
> +	int wc_args[3]; /* version, wc_sec, wc_nsec */
> +	unsigned long flags;
> +	struct kvm_vcpu_arch *vcpu = &v->arch;
> +	struct xen_shared_info *shared_kaddr;
> +
> +	if ((!vcpu->shared_page))
> +		return;
> +
> +	/* Keep irq disabled to prevent changes to the clock */
> +	local_irq_save(flags);
> +	kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
> +			  &vcpu->hv_clock.tsc_timestamp);
> +	wc_ts = current_kernel_time();
> +	ktime_get_ts(&ts);
> +	local_irq_restore(flags);
> +
> +	/* With all the info we got, fill in the values */
> +	wc_args[1] = wc_ts.tv_sec;
> +	wc_args[2] = wc_ts.tv_nsec;
> +
> +	vcpu->hv_clock.system_time = ts.tv_nsec +
> +				     (NSEC_PER_SEC * (u64)ts.tv_sec);
> +	/*
> +	 * The interface expects us to write an even number signaling that the
> +	 * update is finished. Since the guest won't see the intermediate states,
> +	 * we just write "2" at the end
> +	 */
> +	wc_args[0] = 2;
> +	vcpu->hv_clock.version = 2;
> +
> +	preempt_disable();
> +
> +	shared_kaddr = kmap_atomic(vcpu->shared_page, KM_USER0);
> +
> +	/*
> +	 * We could write everything at once, but it can break future
> +	 * implementations. We're just a tiny and lonely clock, so let's
> +	 * write only what matters here
> +	 */
> +	memcpy(&shared_kaddr->wc_version, wc_args, sizeof(wc_args));
>   


We want to avoid updating wall clock all the time.  As far as I 
understand, wall clock is just a base which doesn't change.  To get the 
real wall clock, you read the shared_info wall clock and add the current 
system time.  This means that you avoid writing to a shared global  
(which is expensive in cache lines).  The shared_info wall clock is only 
updated if the host clocked is moved (other than in the way you expect 
it to).

Also, when you write to the shared clock, you must respect the protocol 
since it can be read concurrently:

- increment the version
- smp_wmb()
- copy the goodies
- smp_wmb()
- increment the version again

[I think this is the protocol, but better read the sources to double-check]

>  		}
> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
> index d6db0de..9a66b90 100644
> --- a/include/asm-x86/kvm_host.h
> +++ b/include/asm-x86/kvm_host.h
> @@ -261,6 +261,10 @@ struct kvm_vcpu_arch {
>  	/* emulate context */
>  
>  	struct x86_emulate_ctxt emulate_ctxt;
> +
> +	struct xen_vcpu_time_info hv_clock;
> +	gpa_t shared_info;
> +	struct page *shared_page;
>  };
>   

shared_{info,page} is too generic a name for just a clock.

>  
> +/* xen binary-compatible interfaces. See xen headers for details */
> +struct xen_vcpu_time_info {
> +	uint32_t version;
> +	uint32_t pad0;
> +	uint64_t tsc_timestamp;
> +	uint64_t system_time;
> +	uint32_t tsc_to_system_mul;
> +	int8_t   tsc_shift;
> +	int8_t   pad1[3];
> +};
> +
> +struct xen_vcpu_info {
> +	uint8_t  pad[32];
> +	struct xen_vcpu_time_info time;
> +};
>   

Please drop xen_vcpu_info...

> +
> +#define XEN_MAX_VIRT_CPUS	32
> +
> +struct xen_shared_info {
> +	struct xen_vcpu_info vcpu_info[XEN_MAX_VIRT_CPUS];
> +
> +	unsigned long evt[2];
> +
> +	uint32_t wc_version;      /* Version counter: see vcpu_time_info_t. */
> +	uint32_t wc_sec;          /* Secs  00:00:00 UTC, Jan 1, 1970.  */
> +	uint32_t wc_nsec;         /* Nsecs 00:00:00 UTC, Jan 1, 1970.  */
> +
> +	unsigned long pad[12];
> +};
>   

... and everything non-time-related in here.  Yes, in means we need two 
msrs (for wall clock and system time), but it also means we don't impose 
any layout upon the guest, and do not (for example) restrict the number 
of vcpus.  We could easily put the vcpu clock in a per_cpu() area.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] put kvm_para.h include outside __KERNEL__
       [not found]     ` <120040983179-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2008-01-15 15:10       ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
@ 2008-01-16  8:41       ` Avi Kivity
  1 sibling, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2008-01-16  8:41 UTC (permalink / raw)
  To: Glauber de Oliveira Costa
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y

Glauber de Oliveira Costa wrote:
> kvm_para.h potentially contains definitions that are to be used by kvm-userspace,
> so it should not be included inside the __KERNEL__ block. To protect its own data structures,
> kvm_para.h already includes its own __KERNEL__ block.
>
>   

Applied this one, thanks.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]             ` <478DC2CF.4040803-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2008-01-16  8:54               ` Gerd Hoffmann
       [not found]                 ` <478DC662.6010902-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Gerd Hoffmann @ 2008-01-16  8:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	Glauber de Oliveira Costa

  Hi,

> We want to avoid updating wall clock all the time.  As far as I 
> understand, wall clock is just a base which doesn't change.

Yep, it is.  Got that wrong first in xenner, with the result that guest
time ran at double speed ;)

>> +/* xen binary-compatible interfaces. See xen headers for details */
>> +struct xen_vcpu_time_info {
>> +	uint32_t version;
>> +	uint32_t pad0;
>> +	uint64_t tsc_timestamp;
>> +	uint64_t system_time;
>> +	uint32_t tsc_to_system_mul;
>> +	int8_t   tsc_shift;
>> +	int8_t   pad1[3];
>> +};

>> +struct xen_vcpu_info {
>> +	uint8_t  pad[32];
>> +	struct xen_vcpu_time_info time;
>> +};
>>   
> 
> Please drop xen_vcpu_info...

Oh, yeah.  No point in assembling the whole xen shared info page.  Just
xen_vcpu_time_info is enougth, it will work just fine for xenner.

cheers,
  Gerd

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] kvmclock - the host part.
       [not found]                 ` <478DC662.6010902-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2008-01-16  9:32                   ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2008-01-16  9:32 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, jeremy-TSDbQ3PG+2Y,
	Glauber de Oliveira Costa

Gerd Hoffmann wrote:
>   Hi,
>
>   
>> We want to avoid updating wall clock all the time.  As far as I 
>> understand, wall clock is just a base which doesn't change.
>>     
>
> Yep, it is.  Got that wrong first in xenner, with the result that guest
> time ran at double speed ;)
>
>   
>>> +/* xen binary-compatible interfaces. See xen headers for details */
>>> +struct xen_vcpu_time_info {
>>> +	uint32_t version;
>>> +	uint32_t pad0;
>>> +	uint64_t tsc_timestamp;
>>> +	uint64_t system_time;
>>> +	uint32_t tsc_to_system_mul;
>>> +	int8_t   tsc_shift;
>>> +	int8_t   pad1[3];
>>> +};
>>>       
>
>   
>>> +struct xen_vcpu_info {
>>> +	uint8_t  pad[32];
>>> +	struct xen_vcpu_time_info time;
>>> +};
>>>   
>>>       
>> Please drop xen_vcpu_info...
>>     
>
> Oh, yeah.  No point in assembling the whole xen shared info page.  Just
> xen_vcpu_time_info is enougth, it will work just fine for xenner.
>
>   

We should also not use the xen_ namespace, that can only cause conflicts.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-01-16  9:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-15 15:10 [PATCH 0/3] kvm clock - with the patches this time Glauber de Oliveira Costa
     [not found] ` <1200409817411-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-01-15 15:10   ` [PATCH 1/3] put kvm_para.h include outside __KERNEL__ Glauber de Oliveira Costa
     [not found]     ` <120040983179-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-01-15 15:10       ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
     [not found]         ` <12004098373625-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-01-15 15:10           ` [PATCH 3/3] [PATCH] kvmclock implementation, the guest part Glauber de Oliveira Costa
2008-01-16  8:39           ` [PATCH 2/3] kvmclock - the host part Avi Kivity
     [not found]             ` <478DC2CF.4040803-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-16  8:54               ` Gerd Hoffmann
     [not found]                 ` <478DC662.6010902-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-01-16  9:32                   ` Avi Kivity
2008-01-16  8:41       ` [PATCH 1/3] put kvm_para.h include outside __KERNEL__ Avi Kivity
  -- strict thread matches above, loose matches on Subject: below --
2007-11-08 22:39 [PATCH 0/3] Kvm clocksource, new spin Glauber de Oliveira Costa
2007-11-08 22:39 ` [PATCH 1/3] include files for kvmclock Glauber de Oliveira Costa
     [not found]   ` <11945615703593-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-11-08 22:39     ` [PATCH 2/3] kvmclock - the host part Glauber de Oliveira Costa
     [not found]       ` <11945615751747-git-send-email-gcosta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-11-11 10:17         ` Avi Kivity
     [not found]           ` <4736D6C0.8040408-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-13 11:28             ` Glauber de Oliveira Costa
     [not found]               ` <47398A68.8080009-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-11-13 14:44                 ` Avi Kivity
2007-11-13  5:00         ` Dong, Eddie
     [not found]           ` <10EA09EFD8728347A513008B6B0DA77A025DF8A2-wq7ZOvIWXbNpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2007-11-13 11:54             ` Glauber de Oliveira Costa
     [not found]               ` <4739906B.2080103-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2007-11-13 12:08                 ` Izik Eidus
2007-11-13 14:47                 ` Avi Kivity
     [not found]                   ` <4739B916.4000405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-13 15:23                     ` Dong, Eddie
     [not found]                       ` <10EA09EFD8728347A513008B6B0DA77A025DFB75-wq7ZOvIWXbNpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2007-11-13 16:12                         ` Avi Kivity
     [not found]                           ` <4739CD03.90406-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-11-14  0:41                             ` Dong, Eddie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox