From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: kvm guest loops_per_jiffy miscalibration under host load Date: Mon, 07 Jul 2008 18:35:52 -0300 Message-ID: <48728C38.9060303@redhat.com> References: <20080702164021.GA31751@dmt.cnet> <486CD151.8020004@redhat.com> <487177CB.60104@us.ibm.com> <5d6222a80807071127s1ed157eaje2f1fcf511a34813@mail.gmail.com> <20080707184843.GA14634@dmt.cnet> <48726CC9.1070008@us.ibm.com> <5d6222a80807071232m1031ebdao9f93997fcaefd958@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------080409050303080106020303" Cc: Anthony Liguori , Marcelo Tosatti , kvm-devel , kraxel@redhat.com, chrisw@redhat.com To: Glauber Costa Return-path: Received: from mx1.redhat.com ([66.187.233.31]:35965 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758206AbYGGVh2 (ORCPT ); Mon, 7 Jul 2008 17:37:28 -0400 In-Reply-To: <5d6222a80807071232m1031ebdao9f93997fcaefd958@mail.gmail.com> Sender: kvm-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------080409050303080106020303 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Glauber Costa wrote: > On Mon, Jul 7, 2008 at 4:21 PM, Anthony Liguori wrote: >> Marcelo Tosatti wrote: >>> On Mon, Jul 07, 2008 at 03:27:16PM -0300, Glauber Costa wrote: >>> >>>>> I agree. A paravirt solution solves the problem. >>>>> >>>> Please, look at the patch I've attached. >>>> >>>> It does __delay with host help. This may have the nice effect of not >>>> busy waiting for long-enough delays, and may well. >>>> >>>> It is _completely_ PoC, just to show the idea. It's ugly, broken, >>>> obviously have to go through pv-ops, etc. >>>> >>>> Also, I intend to add a lpj field in the kvm clock memory area. We >>>> could do just this later, do both, etc. >>>> >>>> If we agree this is a viable solution, I'll start working on a patch >>>> >>> This stops interrupts from being processed during the delay. And also >>> there are cases like this recently introduced break: >>> >>> /* Allow RT tasks to run */ >>> preempt_enable(); >>> rep_nop(); >>> preempt_disable(); >>> >>> I think it would be better to just pass the lpj value via paravirt and >>> let the guest busy-loop as usual. >>> >> I agree. VMI and Xen already pass a cpu_khz paravirt value. Something >> similar would probably do the trick. > > yeah, there is a pv-op for this, so I won't have to mess with the > clock interface. I'll draft a patch for it, and sent it. > >> It may be worthwhile having udelay() or spinlocks call into KVM if they've >> been spinning long enough but I think that's a separate discussion. > > I think it is, but I'd have to back it up with numbers. measurements > are on the way. >> Regards, >> >> Anthony Liguori >> >> > > > How about this? RFC only for now --------------080409050303080106020303 Content-Type: text/plain; name="preset.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="preset.patch" diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 87edf1c..8514b04 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -78,6 +78,14 @@ static cycle_t kvm_clock_read(void) return ret; } +static unsigned long kvm_get_cpu_khz(void) +{ + struct pvclock_vcpu_time_info *info; + + info = &per_cpu(hv_clock, 0); + return pv_cpu_khz(info); +} + static struct clocksource kvm_clock = { .name = "kvm-clock", .read = kvm_clock_read, @@ -153,6 +161,7 @@ void __init kvmclock_init(void) pv_time_ops.get_wallclock = kvm_get_wallclock; pv_time_ops.set_wallclock = kvm_set_wallclock; pv_time_ops.sched_clock = kvm_clock_read; + pv_time_ops.get_cpu_khz = kvm_get_cpu_khz; #ifdef CONFIG_X86_LOCAL_APIC pv_apic_ops.setup_secondary_clock = kvm_setup_secondary_clock; #endif diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 05fbe9a..2d325ae 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -97,6 +97,18 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst, return dst->version; } +unsigned long pv_cpu_khz(struct pvclock_vcpu_time_info *info) +{ + u64 cpu_khz = 1000000ULL << 32; + + do_div(cpu_khz, info->tsc_to_system_mul); + if (info->tsc_shift < 0) + cpu_khz <<= -info->tsc_shift; + else + cpu_khz >>= info->tsc_shift; + return cpu_khz; +} + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 64f0038..074cabb 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -200,17 +200,10 @@ unsigned long long xen_sched_clock(void) /* Get the CPU speed from Xen */ unsigned long xen_cpu_khz(void) { - u64 xen_khz = 1000000ULL << 32; - const struct pvclock_vcpu_time_info *info = + struct pvclock_vcpu_time_info *info = &HYPERVISOR_shared_info->vcpu_info[0].time; - do_div(xen_khz, info->tsc_to_system_mul); - if (info->tsc_shift < 0) - xen_khz <<= -info->tsc_shift; - else - xen_khz >>= info->tsc_shift; - - return xen_khz; + return pv_cpu_khz(info); } static cycle_t xen_clocksource_read(void) diff --git a/include/asm-x86/pvclock.h b/include/asm-x86/pvclock.h index 85b1bba..41d816f 100644 --- a/include/asm-x86/pvclock.h +++ b/include/asm-x86/pvclock.h @@ -6,6 +6,7 @@ /* some helper functions for xen and kvm pv clock sources */ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src); +unsigned long pv_cpu_khz(struct pvclock_vcpu_time_info *info); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, struct timespec *ts); --------------080409050303080106020303--