LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Matt Evans @ 2010-06-28  4:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-19-git-send-email-agraf@suse.de>

Howdy Alex!

Alexander Graf wrote:
> We will soon start and replace instructions from the text section with
> other, paravirtualized versions. To ease the readability of those patches
> I split out the generic looping and magic page mapping code out.
> 
> This patch still only contains stubs. But at least it loops through the
> text section :).
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/powerpc/kernel/kvm.c |   59 +++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 59 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
> index 2d8dd73..d873bc6 100644
> --- a/arch/powerpc/kernel/kvm.c
> +++ b/arch/powerpc/kernel/kvm.c
> @@ -32,3 +32,62 @@
>  #define KVM_MAGIC_PAGE		(-4096L)
>  #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
>  
> +static bool kvm_patching_worked = true;
> +
> +static void kvm_map_magic_page(void *data)
> +{
> +	kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
> +		       KVM_MAGIC_PAGE,  /* Physical Address */
> +		       KVM_MAGIC_PAGE); /* Effective Address */
> +}
> +
> +static void kvm_check_ins(u32 *inst)
> +{
> +	u32 _inst = *inst;
> +	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
> +	u32 inst_rt = _inst & KVM_MASK_RT;
> +
> +	switch (inst_no_rt) {
> +	}
> +
> +	switch (_inst) {
> +	}
> +
> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
> +}
> +
> +static void kvm_use_magic_page(void)
> +{
> +	u32 *p;
> +	u32 *start, *end;
> +
> +	/* Tell the host to map the magic page to -4096 on all CPUs */
> +
> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
> +
> +	/* Now loop through all code and find instructions */
> +
> +	start = (void*)_stext;
> +	end = (void*)_etext;
> +
> +	for (p = start; p < end; p++)
> +		kvm_check_ins(p);
> +}

Could you do something similar in module_finalize() to patch loaded modules' .text sections?

> +
> +static int __init kvm_guest_init(void)
> +{
> +	char *p;
> +
> +	if (!kvm_para_available())
> +		return 0;
> +
> +	if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
> +		kvm_use_magic_page();
> +
> +	printk(KERN_INFO "KVM: Live patching for a fast VM %s\n",
> +			 kvm_patching_worked ? "worked" : "failed");
> +
> +	return 0;
> +}
> +
> +postcore_initcall(kvm_guest_init);

Cheers,


Matt

^ permalink raw reply

* Re: [PATCH 1/2] KVM: PPC: Add generic hpte management functions
From: Benjamin Herrenschmidt @ 2010-06-27 22:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-ppc, linuxppc-dev, Alexander Graf, kvm
In-Reply-To: <4C27035D.1010604@redhat.com>

On Sun, 2010-06-27 at 10:53 +0300, Avi Kivity wrote:
> On 06/27/2010 01:58 AM, Benjamin Herrenschmidt wrote:
> >
> >> Then mmu intensive loads can expect to be slow.
> >>      
> > Well, depends. ppc64 indeed requires the hash to be managed by the
> > hypervisor, so inserting or invalidating translations will mean a
> > roundtrip to the hypervisor, though there are ways at least the
> > insertion could be alleviated (for example, the HV could service the
> > hash misses directly walking the guest page tables).
> >    
> 
> But the guest page tables are software defined, no?  That means the 
> interface will break if the page table format changes.

Yes. Unless the hypervisor or architecture defines the format to be
used :-) IE. That's what Niagara 1 did. But we don't do that indeed
currently.

> > But that's due in part to a design choice (whether it's a good one or
> > not I'm not going to argue here) which favors huge reasonably static
> > workloads where the hash is expected to contain all translations for
> > everything.
> >    
> 
> What about when you have memory pressure?  The hash will have to reflect 
> those pte_clear_flush_young(), no?

Well, our architects would argue that the kind of workloads we target
don't have memory pressure :-)

But yes, I agree, harvesting of dirty and young bits is going to force a
hash flush which can be pretty expensive. Heh, we've been trying to
convince our own architects at designers that the MMU sucks for long
enough...

> It seems horribly expensive.
> 
> > However, note that BookE (the embedded variant of the architecture) uses
> > a different model for virtualization, including options in its latest
> > variant for a HW logical->real translation (via a small dedicated TLB)
> > and direct access to some TLB ops from the guest.
> >    
> 
> I'm somewhat familiar with it, yes.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Benjamin Herrenschmidt @ 2010-06-27 22:04 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-ppc@vger.kernel.org, linuxppc-dev, Avi Kivity, KVM list
In-Reply-To: <0E529B3E-541C-4E3B-81E7-AACCD96CBF2C@suse.de>

On Sun, 2010-06-27 at 11:47 +0200, Alexander Graf wrote:
> I did that at first. It breaks. During the patching we may take  
> interrupts (pahe faults for example) that contain just patched  
> instructions. And really, hell breaks loose if we don't flush it  
> immediately :). I was hoping at first a 32 bit replace would be
> atomic  
> in cache, but the cpu tried to execute invalid instructions, so it  
> must have gotten some intermediate state.

A 32-bit aligned store -is- atomic. The other threads/cpu will see
either the old or the new instruction, nothing in between.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Benjamin Herrenschmidt @ 2010-06-27 22:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-ppc@vger.kernel.org, linuxppc-dev, Avi Kivity, KVM list
In-Reply-To: <CF78C6B8-F600-4781-82CA-27865F745A60@suse.de>

On Sun, 2010-06-27 at 14:06 +0200, Alexander Graf wrote:
> > Right, I was agreeing with you.  I meant there was no inc/dec mem  
> > insns.
> 
> Ah, I see :). I think that's really the only point where I deem the  
> x86 isa superior :). 

Heh, well, that's a typical RISC thing tho. I think ARM and MIPS are
like PowerPC here at least :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 0/2 v3] mpc5200 ac97 gpio reset
From: Mark Brown @ 2010-06-27 22:01 UTC (permalink / raw)
  To: Grant Likely; +Cc: linuxppc-dev@lists.ozlabs.org, Eric Millbrandt
In-Reply-To: <AANLkTiluyeGSwVSXgTeRxS3hVWTe3Gkn8qHrqbHJFMun@mail.gmail.com>


On 26 Jun 2010, at 00:04, Grant Likely <grant.likely@secretlab.ca> =
wrote:

> On Tue, Jun 22, 2010 at 5:00 PM, Mark Brown
>> On Tue, Jun 15, 2010 at 12:05:05PM -0400, Eric Millbrandt wrote:
>>> These patches reimplement the reset fuction in the ac97 to use gpio =
pins
>>> instead of using the mpc5200 ac97 reset functionality in the psc.  =
This
>>> avoids a problem in which attached ac97 devices go into "test" mode =
appear
>>> unresponsive.
>>>=20
>>> These patches were tested on a pcm030 baseboard and on custom =
hardware with
>>> a wm9715 audio codec/touchscreen controller.
>>=20
>> Grant, are you OK with this series?
>=20
> Yes, I'm going to pick it up.

I'm a little concerned with a collision with multi codec here. It'd
be handy if you could keep it separate in case it needs merging
into both trees (or we could merge via ASoC only).=

^ permalink raw reply

* Re: [PATCH 01/26] KVM: PPC: Introduce shared page
From: Avi Kivity @ 2010-06-27 12:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-2-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> For transparent variable sharing between the hypervisor and guest, I introduce
> a shared page. This shared page will contain all the registers the guest can
> read and write safely without exiting guest context.
>
> This patch only implements the stubs required for the basic structure of the
> shared page. The actual register moving follows.
>
>
> @@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
>   	if (err)
>   		goto free_vcpu;
>
> +	vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
> +	if (!vcpu->arch.shared)
> +		goto uninit_vcpu;
> +
>   	return vcpu;
>
> +uninit_vcpu:
> +	kvm_vcpu_uninit(vcpu);
>   free_vcpu:
>   	kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
>   out:
> @@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
>   {
>   	struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
>
> +	free_page((unsigned long)vcpu->arch.shared);
>   	kvm_vcpu_uninit(vcpu);
>   	kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
>   }
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 884d4a5..ba79b35 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -1247,6 +1247,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
>   	if (err)
>   		goto free_shadow_vcpu;
>
> +	vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
> +	if (!vcpu->arch.shared)
> +		goto uninit_vcpu;
> +
>   	vcpu->arch.host_retip = kvm_return_point;
>   	vcpu->arch.host_msr = mfmsr();
>   #ifdef CONFIG_PPC_BOOK3S_64
> @@ -1277,6 +1281,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
>
>   	return vcpu;
>
> +uninit_vcpu:
> +	kvm_vcpu_uninit(vcpu);
>   free_shadow_vcpu:
>   	kfree(vcpu_book3s->shadow_vcpu);
>   free_vcpu:
> @@ -1289,6 +1295,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
>   {
>   	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
>
> +	free_page((unsigned long)vcpu->arch.shared);
>   	kvm_vcpu_uninit(vcpu);
>   	kfree(vcpu_book3s->shadow_vcpu);
>   	vfree(vcpu_book3s);
> diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
> index e8a00b0..71750f2 100644
> --- a/arch/powerpc/kvm/e500.c
> +++ b/arch/powerpc/kvm/e500.c
> @@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
>   	if (err)
>   		goto uninit_vcpu;
>
> +	vcpu->arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
> +	if (!vcpu->arch.shared)
> +		goto uninit_tlb;
> +
>   	return vcpu;
>
> +uninit_tlb:
> +	kvmppc_e500_tlb_uninit(vcpu_e500);
>   uninit_vcpu:
>   	kvm_vcpu_uninit(vcpu);
>   free_vcpu:
> @@ -131,6 +137,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
>   {
>   	struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>
> +	free_page((unsigned long)vcpu->arch.shared);
>   	kvmppc_e500_tlb_uninit(vcpu_e500);
>   	kvm_vcpu_uninit(vcpu);
>   	kmem_cache_free(kvm_vcpu_cache, vcpu_e500);
>    

Code repeats 3x.  Share please.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27 12:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C273BAD.2090305@redhat.com>


Am 27.06.2010 um 13:53 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 02:49 PM, Alexander Graf wrote:
>>
>> Am 27.06.2010 um 12:59 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 06/27/2010 01:33 PM, Alexander Graf wrote:
>>>>> Or inc/dec...
>>>>
>>>>
>>>> Uh - huh? How would that help?
>>>
>>> inc shared->critical   # disable interrupts
>>
>> There is no opcode in the ppc isa that acts on momery without  
>> involving a register.
>>
>> An inc would be:
>>
>> ld rX, A(0)
>> addi rX, rX, 1
>> std rX, A(0)
>
> Right, I was agreeing with you.  I meant there was no inc/dec mem  
> insns.

Ah, I see :). I think that's really the only point where I deem the  
x86 isa superior :).

Alex

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27 11:53 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <C90A3ABA-60E4-48D4-8332-3B49F3554447@suse.de>

On 06/27/2010 02:49 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 12:59 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/27/2010 01:33 PM, Alexander Graf wrote:
>>>> Or inc/dec...
>>>
>>>
>>> Uh - huh? How would that help?
>>
>> inc shared->critical   # disable interrupts
>
> There is no opcode in the ppc isa that acts on momery without 
> involving a register.
>
> An inc would be:
>
> ld rX, A(0)
> addi rX, rX, 1
> std rX, A(0)

Right, I was agreeing with you.  I meant there was no inc/dec mem insns.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27 11:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C272F08.8090709@redhat.com>


Am 27.06.2010 um 12:59 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 01:33 PM, Alexander Graf wrote:
>>> Or inc/dec...
>>
>>
>> Uh - huh? How would that help?
>
> inc shared->critical   # disable interrupts

There is no opcode in the ppc isa that acts on momery without  
involving a register.

An inc would be:

ld rX, A(0)
addi rX, rX, 1
std rX, A(0)

Alex
>

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27 10:59 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <DFC77851-3BE7-4746-93DE-287D5E27EF7D@suse.de>

On 06/27/2010 01:33 PM, Alexander Graf wrote:
>> Or inc/dec...
>
>
> Uh - huh? How would that help?

inc shared->critical   # disable interrupts

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Alexander Graf @ 2010-06-27 10:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C271EE5.1060401@redhat.com>


Am 27.06.2010 um 11:50 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 12:38 PM, Alexander Graf wrote:
>>
>> Am 27.06.2010 um 10:16 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>>> One of the most obvious registers to share with the guest  
>>>> directly is the
>>>> MSR. The MSR contains the "interrupts enabled" flag which the  
>>>> guest has to
>>>> toggle in critical sections.
>>>>
>>>> So in order to bring the overhead of interrupt en- and disabling  
>>>> down, let's
>>>> put msr into the shared page. Keep in mind that even though you  
>>>> can fully read
>>>> its contents, writing to it doesn't always update all state.  
>>>> There are a few
>>>> safe fields that don't require hypervisor interaction. See the  
>>>> guest
>>>> implementation that follows later for reference.
>>>>
>>>
>>>
>>> You mean, see the documentation for reference.
>>>
>>> It should be possible to write the guest code looking only at the  
>>> documentation.
>>
>> *shrug* since we're writing open source I don't mind telling people  
>> to read code for a reference implemenration.
>
> It's impossible to infer from the source what's a guaranteed part of  
> the interface and what is just an implementation artifact.  So  
> people rely on implementation artifacts (or even bugs) and that  
> reduces our ability to change things.
>
>> If well written, that's more comprehensible than documentation  
>> anyways :).
>
> If the documentation is poorly written, yes.

I think I start to agree. I guess i should just list all fields of the  
MSR that are ok to modify inside the guest context.

Alex

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Alexander Graf @ 2010-06-27 10:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C272503.7030605@redhat.com>


Am 27.06.2010 um 12:16 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 12:47 PM, Alexander Graf wrote:
>>
>> Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>>>> We will soon start and replace instructions from the text section  
>>>> with
>>>> other, paravirtualized versions. To ease the readability of those  
>>>> patches
>>>> I split out the generic looping and magic page mapping code out.
>>>>
>>>> This patch still only contains stubs. But at least it loops  
>>>> through the
>>>> text section :).
>>>>
>>>>
>>>> +
>>>> +static void kvm_check_ins(u32 *inst)
>>>> +{
>>>> +    u32 _inst = *inst;
>>>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>>>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>>>> +
>>>> +    switch (inst_no_rt) {
>>>> +    }
>>>> +
>>>> +    switch (_inst) {
>>>> +    }
>>>> +
>>>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>>>> +}
>>>>
>>>
>>> Shouldn't we flush only if we patched something?
>>
>> We introduce the patching in the next patches. This is only a  
>> preparation stub.
>
> Well, unless I missed something, this remains unconditional after  
> all the patches.
>
> A helper patch(pc, replacement) could patch and flush in one go.

Oh I see what you mean. While not necessary, it would save a few  
cycles on guest bootup.

>
>>
>>>
>>>> +
>>>> +static void kvm_use_magic_page(void)
>>>> +{
>>>> +    u32 *p;
>>>> +    u32 *start, *end;
>>>> +
>>>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>>>> +
>>>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>>>> +
>>>> +    /* Now loop through all code and find instructions */
>>>> +
>>>> +    start = (void*)_stext;
>>>> +    end = (void*)_etext;
>>>> +
>>>> +    for (p = start; p<  end; p++)
>>>> +        kvm_check_ins(p);
>>>> +}
>>>> +
>>>>
>>>
>>> Or, flush the entire thing here.
>>
>> I did that at first. It breaks. During the patching we may take  
>> interrupts (pahe faults for example) that contain just patched  
>> instructions. And really, hell breaks loose if we don't flush it  
>> immediately :). I was hoping at first a 32 bit replace would be  
>> atomic in cache, but the cpu tried to execute invalid instructions,  
>> so it must have gotten some intermediate state.
>
> Surprising.  Maybe you need a flush after writing to the out-of-line  
> code?

I do that too now :). Better flush too often that too rarely. It's not  
_that_ expensive after all.

Alex

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27 10:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C27220D.7090508@redhat.com>


Am 27.06.2010 um 12:03 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> When running in hooked code we need a way to disable interrupts  
>> without
>> clobbering any interrupts or exiting out to the hypervisor.
>>
>> To achieve this, we have an additional critical field in the shared  
>> page. If
>> that field is equal to the r1 register of the guest, it tells the  
>> hypervisor
>> that we're in such a critical section and thus may not receive any  
>> interrupts.
>>
>>
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -251,14 +251,25 @@ int kvmppc_book3s_irqprio_deliver(struct  
>> kvm_vcpu *vcpu, unsigned int priority)
>>      int deliver = 1;
>>      int vec = 0;
>>      ulong flags = 0ULL;
>> +    ulong crit_raw = vcpu->arch.shared->critical;
>> +    ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
>> +    bool crit;
>> +
>> +    /* Truncate crit indicators in 32 bit mode */
>> +    if (!(vcpu->arch.shared->msr&  MSR_SF)) {
>> +        crit_raw&= 0xffffffff;
>> +        crit_r1&= 0xffffffff;
>> +    }
>> +
>> +    crit = (crit_raw == crit_r1);
>>
>
> I think you need to qualify that for supervisor mode only.   
> Otherwise guest userspace can guess the value of shared->critical  
> and disable interrupts.


Yes, you're right. Good catch!

Alex

>

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27 10:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C271F5A.1030409@redhat.com>


Am 27.06.2010 um 11:52 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 12:40 PM, Alexander Graf wrote:
>>
>> Am 27.06.2010 um 10:21 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>>> When running in hooked code we need a way to disable interrupts  
>>>> without
>>>> clobbering any interrupts or exiting out to the hypervisor.
>>>>
>>>> To achieve this, we have an additional critical field in the  
>>>> shared page. If
>>>> that field is equal to the r1 register of the guest, it tells the  
>>>> hypervisor
>>>> that we're in such a critical section and thus may not receive  
>>>> any interrupts.
>>>>
>>>
>>> Is r1 reserved for this purpose?  Can't it match accidentally?
>>
>> r1 is defined by the abi to be the stack.
>
> Neat trick!
>
>>>
>>> Why won't zero/nonzero work for this?
>>
>> Because there is no store immediate opcode on powerpc :(.
>
> Or inc/dec...

Uh - huh? How would that help?

Alex

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Avi Kivity @ 2010-06-27 10:16 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <0E529B3E-541C-4E3B-81E7-AACCD96CBF2C@suse.de>

On 06/27/2010 12:47 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>>> We will soon start and replace instructions from the text section with
>>> other, paravirtualized versions. To ease the readability of those 
>>> patches
>>> I split out the generic looping and magic page mapping code out.
>>>
>>> This patch still only contains stubs. But at least it loops through the
>>> text section :).
>>>
>>>
>>> +
>>> +static void kvm_check_ins(u32 *inst)
>>> +{
>>> +    u32 _inst = *inst;
>>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>>> +
>>> +    switch (inst_no_rt) {
>>> +    }
>>> +
>>> +    switch (_inst) {
>>> +    }
>>> +
>>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>>> +}
>>>
>>
>> Shouldn't we flush only if we patched something?
>
> We introduce the patching in the next patches. This is only a 
> preparation stub.

Well, unless I missed something, this remains unconditional after all 
the patches.

A helper patch(pc, replacement) could patch and flush in one go.

>
>>
>>> +
>>> +static void kvm_use_magic_page(void)
>>> +{
>>> +    u32 *p;
>>> +    u32 *start, *end;
>>> +
>>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>>> +
>>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>>> +
>>> +    /* Now loop through all code and find instructions */
>>> +
>>> +    start = (void*)_stext;
>>> +    end = (void*)_etext;
>>> +
>>> +    for (p = start; p<  end; p++)
>>> +        kvm_check_ins(p);
>>> +}
>>> +
>>>
>>
>> Or, flush the entire thing here.
>
> I did that at first. It breaks. During the patching we may take 
> interrupts (pahe faults for example) that contain just patched 
> instructions. And really, hell breaks loose if we don't flush it 
> immediately :). I was hoping at first a 32 bit replace would be atomic 
> in cache, but the cpu tried to execute invalid instructions, so it 
> must have gotten some intermediate state.

Surprising.  Maybe you need a flush after writing to the out-of-line code?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27 10:03 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-9-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> When running in hooked code we need a way to disable interrupts without
> clobbering any interrupts or exiting out to the hypervisor.
>
> To achieve this, we have an additional critical field in the shared page. If
> that field is equal to the r1 register of the guest, it tells the hypervisor
> that we're in such a critical section and thus may not receive any interrupts.
>
>
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -251,14 +251,25 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
>   	int deliver = 1;
>   	int vec = 0;
>   	ulong flags = 0ULL;
> +	ulong crit_raw = vcpu->arch.shared->critical;
> +	ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
> +	bool crit;
> +
> +	/* Truncate crit indicators in 32 bit mode */
> +	if (!(vcpu->arch.shared->msr&  MSR_SF)) {
> +		crit_raw&= 0xffffffff;
> +		crit_r1&= 0xffffffff;
> +	}
> +
> +	crit = (crit_raw == crit_r1);
>    

I think you need to qualify that for supervisor mode only.  Otherwise 
guest userspace can guess the value of shared->critical and disable 
interrupts.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 09/26] KVM: PPC: Add PV guest scratch registers
From: Avi Kivity @ 2010-06-27  9:53 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <27BB673F-F34E-4CC6-A22D-02CF95E7529F@suse.de>

On 06/27/2010 12:41 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:22 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>> While running in hooked code we need to store register contents out 
>>> because
>>> we must not clobber any registers.
>>>
>>> So let's add some fields to the shared page we can just happily 
>>> write to.
>>>
>>>
>>
>> How are these protected during interrupts?
>
> By the 'critical section' bit. When in a critical section (read: using 
> scratch registers), we don't issue interrupts.

Ok.  I thought you needed scratch registers to set up the critical 
section, but you don't.  Neat stuff.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27  9:52 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <77DBE095-884F-4986-BE2B-15B2EEAD8CAC@suse.de>

On 06/27/2010 12:40 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:21 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>> When running in hooked code we need a way to disable interrupts without
>>> clobbering any interrupts or exiting out to the hypervisor.
>>>
>>> To achieve this, we have an additional critical field in the shared 
>>> page. If
>>> that field is equal to the r1 register of the guest, it tells the 
>>> hypervisor
>>> that we're in such a critical section and thus may not receive any 
>>> interrupts.
>>>
>>
>> Is r1 reserved for this purpose?  Can't it match accidentally?
>
> r1 is defined by the abi to be the stack.

Neat trick!

>>
>> Why won't zero/nonzero work for this?
>
> Because there is no store immediate opcode on powerpc :(.

Or inc/dec...

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Avi Kivity @ 2010-06-27  9:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <651805F1-54AB-466F-8D23-D053D8082177@suse.de>

On 06/27/2010 12:38 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:16 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>> One of the most obvious registers to share with the guest directly 
>>> is the
>>> MSR. The MSR contains the "interrupts enabled" flag which the guest 
>>> has to
>>> toggle in critical sections.
>>>
>>> So in order to bring the overhead of interrupt en- and disabling 
>>> down, let's
>>> put msr into the shared page. Keep in mind that even though you can 
>>> fully read
>>> its contents, writing to it doesn't always update all state. There 
>>> are a few
>>> safe fields that don't require hypervisor interaction. See the guest
>>> implementation that follows later for reference.
>>>
>>
>>
>> You mean, see the documentation for reference.
>>
>> It should be possible to write the guest code looking only at the 
>> documentation.
>
> *shrug* since we're writing open source I don't mind telling people to 
> read code for a reference implemenration. 

It's impossible to infer from the source what's a guaranteed part of the 
interface and what is just an implementation artifact.  So people rely 
on implementation artifacts (or even bugs) and that reduces our ability 
to change things.

> If well written, that's more comprehensible than documentation anyways 
> :).

If the documentation is poorly written, yes.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-06-27  9:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270CFE.2040600@redhat.com>


Am 27.06.2010 um 10:34 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We just introduced a new PV interface that screams for  
>> documentation. So here
>> it is - a shiny new and awesome text file describing the internal  
>> works of
>> the PPC KVM paravirtual interface.
>>
>>
>> +Querying for existence
>> +======================
>> +
>> +To find out if we're running on KVM or not, we overlay the PVR  
>> register. Usually
>> +the PVR register contains an id that identifies your CPU type. If,  
>> however, you
>> +pass KVM_PVR_PARA in the register that you want the PVR result in,  
>> the register
>> +still contains KVM_PVR_PARA after the mfpvr call.
>> +
>> +    LOAD_REG_IMM(r5, KVM_PVR_PARA)
>> +    mfpvr    r5
>> +    [r5 still contains KVM_PVR_PARA]
>> +
>> +Once determined to run under a PV capable KVM, you can now use  
>> hypercalls as
>> +described below.
>>
>
> On x86 we allow host userspace to determine whether the guest sees  
> the paravirt interface (and what features are exposed).  This allows  
> you to live migrate from a newer host to an older host, by not  
> exposing the newer features.

A very good idea indeed. Let's postpone that to when we expose enough  
state to make live migration possible.

Alex

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Alexander Graf @ 2010-06-27  9:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270BB8.60404@redhat.com>


Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We will soon start and replace instructions from the text section  
>> with
>> other, paravirtualized versions. To ease the readability of those  
>> patches
>> I split out the generic looping and magic page mapping code out.
>>
>> This patch still only contains stubs. But at least it loops through  
>> the
>> text section :).
>>
>>
>> +
>> +static void kvm_check_ins(u32 *inst)
>> +{
>> +    u32 _inst = *inst;
>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>> +
>> +    switch (inst_no_rt) {
>> +    }
>> +
>> +    switch (_inst) {
>> +    }
>> +
>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>> +}
>>
>
> Shouldn't we flush only if we patched something?

We introduce the patching in the next patches. This is only a  
preparation stub.

>
>> +
>> +static void kvm_use_magic_page(void)
>> +{
>> +    u32 *p;
>> +    u32 *start, *end;
>> +
>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>> +
>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>> +
>> +    /* Now loop through all code and find instructions */
>> +
>> +    start = (void*)_stext;
>> +    end = (void*)_etext;
>> +
>> +    for (p = start; p<  end; p++)
>> +        kvm_check_ins(p);
>> +}
>> +
>>
>
> Or, flush the entire thing here.

I did that at first. It breaks. During the patching we may take  
interrupts (pahe faults for example) that contain just patched  
instructions. And really, hell breaks loose if we don't flush it  
immediately :). I was hoping at first a 32 bit replace would be atomic  
in cache, but the cpu tried to execute invalid instructions, so it  
must have gotten some intermediate state.

Alex

^ permalink raw reply

* Re: [PATCH 12/26] KVM: PPC: First magic page steps
From: Alexander Graf @ 2010-06-27  9:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270AA1.5030801@redhat.com>


Am 27.06.2010 um 10:24 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We will be introducing a method to project the shared page in guest  
>> context.
>> As soon as we're talking about this coupling, the shared page is  
>> colled magic
>> page.
>>
>> This patch introduces simple defines, so the follow-up patches are  
>> easier to
>> read.
>>
>>
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/ 
>> include/asm/kvm_host.h
>> index e35c1ac..5f8c214 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
>>      u64 dec_jiffies;
>>      unsigned long pending_exceptions;
>>      struct kvm_vcpu_arch_shared *shared;
>> +    unsigned long magic_page_pa; /* phys addr to map the magic  
>> page to */
>> +    unsigned long magic_page_ea; /* effect. addr to map the magic  
>> page to */
>>
>
> Is ea like a va?  If so, can't the guest specify it by manipulating  
> the hash table (or tlb)?

ea in ppc speech is va in x86 speech. Yes, the guest could map it  
itself, but I couldn't find out how. This way I at least know what's  
happening :).


Alex

^ permalink raw reply

* Re: [PATCH 09/26] KVM: PPC: Add PV guest scratch registers
From: Alexander Graf @ 2010-06-27  9:41 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270A34.4020706@redhat.com>


Am 27.06.2010 um 10:22 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> While running in hooked code we need to store register contents out  
>> because
>> we must not clobber any registers.
>>
>> So let's add some fields to the shared page we can just happily  
>> write to.
>>
>>
>
> How are these protected during interrupts?

By the 'critical section' bit. When in a critical section (read: using  
scratch registers), we don't issue interrupts.

Alex

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27  9:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C2709F4.10805@redhat.com>


Am 27.06.2010 um 10:21 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> When running in hooked code we need a way to disable interrupts  
>> without
>> clobbering any interrupts or exiting out to the hypervisor.
>>
>> To achieve this, we have an additional critical field in the shared  
>> page. If
>> that field is equal to the r1 register of the guest, it tells the  
>> hypervisor
>> that we're in such a critical section and thus may not receive any  
>> interrupts.
>>
>
> Is r1 reserved for this purpose?  Can't it match accidentally?

r1 is defined by the abi to be the stack.

>
> Why won't zero/nonzero work for this?

Because there is no store immediate opcode on powerpc :(.

Alex

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Alexander Graf @ 2010-06-27  9:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C2708EB.9020500@redhat.com>


Am 27.06.2010 um 10:16 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> One of the most obvious registers to share with the guest directly  
>> is the
>> MSR. The MSR contains the "interrupts enabled" flag which the guest  
>> has to
>> toggle in critical sections.
>>
>> So in order to bring the overhead of interrupt en- and disabling  
>> down, let's
>> put msr into the shared page. Keep in mind that even though you can  
>> fully read
>> its contents, writing to it doesn't always update all state. There  
>> are a few
>> safe fields that don't require hypervisor interaction. See the guest
>> implementation that follows later for reference.
>>
>
>
> You mean, see the documentation for reference.
>
> It should be possible to write the guest code looking only at the  
> documentation.

*shrug* since we're writing open source I don't mind telling people to  
read code for a reference implemenration. If well written, that's more  
comprehensible than documentation anyways :).

But either way, you can take a look at both - documentation and code,  
yes.

What I really meant here is that the list of registers we patch should  
be taken from the patch code. I didn't want to write out all of them  
in the description.


Alex

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox