From mboxrd@z Thu Jan  1 00:00:00 1970
From: Glauber Costa <glommer@parallels.com>
Subject: Re: [patch 02/18] x86: kvmclock: allocate pvclock shared memory area
Date: Thu, 15 Nov 2012 21:05:55 +0400
Message-ID: <50A520F3.8060105@parallels.com>
References: <20121115000823.285102321@redhat.com> <20121115000944.034452554@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: <kvm@vger.kernel.org>, <johnstul@us.ibm.com>, <jeremy@goop.org>,
	<zamsden@gmail.com>, <gleb@redhat.com>, <avi@redhat.com>,
	<pbonzini@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.parallels.com ([64.131.90.16]:38946 "EHLO mx2.parallels.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754528Ab2KOJFy (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 15 Nov 2012 04:05:54 -0500
In-Reply-To: <20121115000944.034452554@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/15/2012 04:08 AM, Marcelo Tosatti wrote:
> We want to expose the pvclock shared memory areas, which 
> the hypervisor periodically updates, to userspace.
> 
> For a linear mapping from userspace, it is necessary that
> entire page sized regions are used for array of pvclock 
> structures.
> 
> There is no such guarantee with per cpu areas, therefore move
> to memblock_alloc based allocation.
Ok.

> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
For the concept:

Acked-by: Glauber Costa <glommer@parallels.com>

I do have one comment.

> 
> Index: vsyscall/arch/x86/kernel/kvmclock.c
> ===================================================================
> --- vsyscall.orig/arch/x86/kernel/kvmclock.c
> +++ vsyscall/arch/x86/kernel/kvmclock.c
> @@ -23,6 +23,7 @@
>  #include <asm/apic.h>
>  #include <linux/percpu.h>
>  #include <linux/hardirq.h>
> +#include <linux/memblock.h>
>  
>  #include <asm/x86_init.h>
>  #include <asm/reboot.h>
> @@ -39,7 +40,11 @@ static int parse_no_kvmclock(char *arg)
>  early_param("no-kvmclock", parse_no_kvmclock);
>  
>  /* The hypervisor will put information about time periodically here */
> -static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
> +struct pvclock_aligned_vcpu_time_info {
> +	struct pvclock_vcpu_time_info clock;
> +} __attribute__((__aligned__(SMP_CACHE_BYTES)));
> +
> +static struct pvclock_aligned_vcpu_time_info *hv_clock;
>  static struct pvclock_wall_clock wall_clock;
>  
>  /*
> @@ -52,15 +57,20 @@ static unsigned long kvm_get_wallclock(v
>  	struct pvclock_vcpu_time_info *vcpu_time;
>  	struct timespec ts;
>  	int low, high;
> +	int cpu;
> +
> +	preempt_disable();
> +	cpu = smp_processor_id();
>  
>  	low = (int)__pa_symbol(&wall_clock);
>  	high = ((u64)__pa_symbol(&wall_clock) >> 32);
>  
>  	native_write_msr(msr_kvm_wall_clock, low, high);
>  
> -	vcpu_time = &get_cpu_var(hv_clock);
> +	vcpu_time = &hv_clock[cpu].clock;


You are leaving preempt disabled for a lot longer than in should be. In
particular, you'll have a round trip to the hypervisor in the middle. It
doesn't matter *that* much because wallclock is mostly a one-time thing,
so I won't oppose in this basis.

But if you have the chance, either fix it, or tell us why you need
preemption on for longer than it was before.

>  void __init kvmclock_init(void)
>  {
> +	unsigned long mem;
> +
>  	if (!kvm_para_available())
>  		return;
>  
> +	mem = memblock_alloc(sizeof(struct pvclock_aligned_vcpu_time_info) * NR_CPUS,
> +			     PAGE_SIZE);
> +	if (!mem)
> +		return;
> +	hv_clock = __va(mem);
> +
>  	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
>  		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
>  		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
> 
>
If you don't have kvmclock enabled, which the next line in here would
test for, you will exit this function with allocated memory. How about
the following?

if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
    msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
    msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
    return;

mem = memblock_alloc(sizeof(struct pvclock_aligned_vcpu_time_info)
                     * NR_CPUS, PAGE_SIZE);
if (!mem)
    return;
hv_clock = __va(mem);

printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
       msr_kvm_system_time, msr_kvm_wall_clock);

if (kvm_register_clock("boot clock")) {
    memblock_free();
    return;
}