From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: [RFC] Paravirt timer for KVM Date: Fri, 12 Oct 2007 11:39:20 -0700 Message-ID: <470FBF58.2080701@goop.org> References: <5d6222a80710120908s6b1f5845head84e7b7a463cd1@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: kvm-devel , Avi Kivity To: Glauber de Oliveira Costa Return-path: In-Reply-To: <5d6222a80710120908s6b1f5845head84e7b7a463cd1-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org Glauber de Oliveira Costa wrote: > My next TODOs with it are: > * Get SMP working > * Try something for stolen time, as jeremy's last suggestion for anthony's patch > * Measure the time it takes for a hypercall, and subtract this time > for calculating the expiry time for the timer event. > I don't think there's much point in trying to do stuff like this. The guest can be preempted at any time, so there's an arbitrary amount of time between deciding to set a timeout, and the time the timeout actually happens. In theory you can mitigate this by using an absolute rather than relative timeout value, but in practice I don't think it makes much difference. > + > +/* > + * This is our read_clock function. The host puts an tsc timestamp each time > + * it updates a new time, and then we can use it to derive a slightly more > + * precise notion of elapsed time, converted to nanoseconds. > + * > + * If the platform provides a stable tsc, we just use it, and there is no need > + * for the host to update anything. How would you deal with suspend/resume/migrate? Also, do you assume that stable_tsc also means synchronized tsc on an SMP host? > + */ > +static cycle_t kvm_clock_read(void) { > + > + u64 delta, last_tsc; > + struct timespec *now; > + > + if (hv_clock.stable_tsc) { > + rdtscll(last_tsc); > + return last_tsc; > So this returns a tsc here? ----> > + } > + > + do { > + last_tsc = hv_clock.last_tsc; > + rmb(); > + now = &hv_clock.now; > Shouldn't this be taking a copy of now, rather than a pointer to it? Otherwise what's the point of this loop? > + rmb(); > + } while (hv_clock.last_tsc != last_tsc); > This won't be an atomic compare on 32-bit; it could get confused by seeing a half-updated tsc value. > + > + delta = native_read_tsc() - last_tsc; > + delta = (delta * hv_clock.tsc_mult) >> KVM_SCALE; > + > + return (cycle_t)now->tv_sec * NSEC_PER_SEC + now->tv_nsec + delta; > ---> But returns ns here? > +} > + > +static void kvm_timer_set_mode(enum clock_event_mode mode, > + struct clock_event_device *evt) > +{ > + WARN_ON(!irqs_disabled()); > + > + switch (mode) { > + case CLOCK_EVT_MODE_ONESHOT: > + /* this is what we want */ > + break; > + case CLOCK_EVT_MODE_RESUME: > + break; > + case CLOCK_EVT_MODE_PERIODIC: > + WARN_ON(1); > + break; > + case CLOCK_EVT_MODE_UNUSED: > + case CLOCK_EVT_MODE_SHUTDOWN: > + kvm_hypercall0(KVM_HCALL_STOP_ONESHOT); > + break; > + default: > + break; > + } > +} > + > +/* > + * Programming the next event is just a matter of asking the host > + * to generate us an interrupt when the time expires. We pass the > + * delta on, and hypervisor will do all remaining tricks. For a more > + * precise timing, we can just subtract the time spent by the hypercall > Not worthwhile. It would be better to make the hypercall take an absolute time, and pass it now+delta. At least then if you get preempted past the timeout period you can return -ETIME, and the clock subsystem will know what to do. > + */ > +static int kvm_timer_next_event(unsigned long delta, > + struct clock_event_device *evt) > +{ > + WARN_ON(evt->mode != CLOCK_EVT_MODE_ONESHOT); > + kvm_hypercall1(KVM_HCALL_SET_ALARM, delta); > + return 0; > +} > + > diff --git a/arch/i386/kernel/setup.c b/arch/i386/kernel/setup.c > index d474cd6..fd758f9 100644 > --- a/arch/i386/kernel/setup.c > +++ b/arch/i386/kernel/setup.c > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include > > #include