LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Alexander Graf @ 2010-06-27 10:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C272503.7030605@redhat.com>


Am 27.06.2010 um 12:16 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 12:47 PM, Alexander Graf wrote:
>>
>> Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>>>> We will soon start and replace instructions from the text section  
>>>> with
>>>> other, paravirtualized versions. To ease the readability of those  
>>>> patches
>>>> I split out the generic looping and magic page mapping code out.
>>>>
>>>> This patch still only contains stubs. But at least it loops  
>>>> through the
>>>> text section :).
>>>>
>>>>
>>>> +
>>>> +static void kvm_check_ins(u32 *inst)
>>>> +{
>>>> +    u32 _inst = *inst;
>>>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>>>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>>>> +
>>>> +    switch (inst_no_rt) {
>>>> +    }
>>>> +
>>>> +    switch (_inst) {
>>>> +    }
>>>> +
>>>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>>>> +}
>>>>
>>>
>>> Shouldn't we flush only if we patched something?
>>
>> We introduce the patching in the next patches. This is only a  
>> preparation stub.
>
> Well, unless I missed something, this remains unconditional after  
> all the patches.
>
> A helper patch(pc, replacement) could patch and flush in one go.

Oh I see what you mean. While not necessary, it would save a few  
cycles on guest bootup.

>
>>
>>>
>>>> +
>>>> +static void kvm_use_magic_page(void)
>>>> +{
>>>> +    u32 *p;
>>>> +    u32 *start, *end;
>>>> +
>>>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>>>> +
>>>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>>>> +
>>>> +    /* Now loop through all code and find instructions */
>>>> +
>>>> +    start = (void*)_stext;
>>>> +    end = (void*)_etext;
>>>> +
>>>> +    for (p = start; p<  end; p++)
>>>> +        kvm_check_ins(p);
>>>> +}
>>>> +
>>>>
>>>
>>> Or, flush the entire thing here.
>>
>> I did that at first. It breaks. During the patching we may take  
>> interrupts (pahe faults for example) that contain just patched  
>> instructions. And really, hell breaks loose if we don't flush it  
>> immediately :). I was hoping at first a 32 bit replace would be  
>> atomic in cache, but the cpu tried to execute invalid instructions,  
>> so it must have gotten some intermediate state.
>
> Surprising.  Maybe you need a flush after writing to the out-of-line  
> code?

I do that too now :). Better flush too often that too rarely. It's not  
_that_ expensive after all.

Alex

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27 10:35 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C27220D.7090508@redhat.com>


Am 27.06.2010 um 12:03 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> When running in hooked code we need a way to disable interrupts  
>> without
>> clobbering any interrupts or exiting out to the hypervisor.
>>
>> To achieve this, we have an additional critical field in the shared  
>> page. If
>> that field is equal to the r1 register of the guest, it tells the  
>> hypervisor
>> that we're in such a critical section and thus may not receive any  
>> interrupts.
>>
>>
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -251,14 +251,25 @@ int kvmppc_book3s_irqprio_deliver(struct  
>> kvm_vcpu *vcpu, unsigned int priority)
>>      int deliver = 1;
>>      int vec = 0;
>>      ulong flags = 0ULL;
>> +    ulong crit_raw = vcpu->arch.shared->critical;
>> +    ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
>> +    bool crit;
>> +
>> +    /* Truncate crit indicators in 32 bit mode */
>> +    if (!(vcpu->arch.shared->msr&  MSR_SF)) {
>> +        crit_raw&= 0xffffffff;
>> +        crit_r1&= 0xffffffff;
>> +    }
>> +
>> +    crit = (crit_raw == crit_r1);
>>
>
> I think you need to qualify that for supervisor mode only.   
> Otherwise guest userspace can guess the value of shared->critical  
> and disable interrupts.


Yes, you're right. Good catch!

Alex

>

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27 10:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C271F5A.1030409@redhat.com>


Am 27.06.2010 um 11:52 schrieb Avi Kivity <avi@redhat.com>:

> On 06/27/2010 12:40 PM, Alexander Graf wrote:
>>
>> Am 27.06.2010 um 10:21 schrieb Avi Kivity <avi@redhat.com>:
>>
>>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>>> When running in hooked code we need a way to disable interrupts  
>>>> without
>>>> clobbering any interrupts or exiting out to the hypervisor.
>>>>
>>>> To achieve this, we have an additional critical field in the  
>>>> shared page. If
>>>> that field is equal to the r1 register of the guest, it tells the  
>>>> hypervisor
>>>> that we're in such a critical section and thus may not receive  
>>>> any interrupts.
>>>>
>>>
>>> Is r1 reserved for this purpose?  Can't it match accidentally?
>>
>> r1 is defined by the abi to be the stack.
>
> Neat trick!
>
>>>
>>> Why won't zero/nonzero work for this?
>>
>> Because there is no store immediate opcode on powerpc :(.
>
> Or inc/dec...

Uh - huh? How would that help?

Alex

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Avi Kivity @ 2010-06-27 10:16 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <0E529B3E-541C-4E3B-81E7-AACCD96CBF2C@suse.de>

On 06/27/2010 12:47 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>>> We will soon start and replace instructions from the text section with
>>> other, paravirtualized versions. To ease the readability of those 
>>> patches
>>> I split out the generic looping and magic page mapping code out.
>>>
>>> This patch still only contains stubs. But at least it loops through the
>>> text section :).
>>>
>>>
>>> +
>>> +static void kvm_check_ins(u32 *inst)
>>> +{
>>> +    u32 _inst = *inst;
>>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>>> +
>>> +    switch (inst_no_rt) {
>>> +    }
>>> +
>>> +    switch (_inst) {
>>> +    }
>>> +
>>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>>> +}
>>>
>>
>> Shouldn't we flush only if we patched something?
>
> We introduce the patching in the next patches. This is only a 
> preparation stub.

Well, unless I missed something, this remains unconditional after all 
the patches.

A helper patch(pc, replacement) could patch and flush in one go.

>
>>
>>> +
>>> +static void kvm_use_magic_page(void)
>>> +{
>>> +    u32 *p;
>>> +    u32 *start, *end;
>>> +
>>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>>> +
>>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>>> +
>>> +    /* Now loop through all code and find instructions */
>>> +
>>> +    start = (void*)_stext;
>>> +    end = (void*)_etext;
>>> +
>>> +    for (p = start; p<  end; p++)
>>> +        kvm_check_ins(p);
>>> +}
>>> +
>>>
>>
>> Or, flush the entire thing here.
>
> I did that at first. It breaks. During the patching we may take 
> interrupts (pahe faults for example) that contain just patched 
> instructions. And really, hell breaks loose if we don't flush it 
> immediately :). I was hoping at first a 32 bit replace would be atomic 
> in cache, but the cpu tried to execute invalid instructions, so it 
> must have gotten some intermediate state.

Surprising.  Maybe you need a flush after writing to the out-of-line code?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27 10:03 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-9-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> When running in hooked code we need a way to disable interrupts without
> clobbering any interrupts or exiting out to the hypervisor.
>
> To achieve this, we have an additional critical field in the shared page. If
> that field is equal to the r1 register of the guest, it tells the hypervisor
> that we're in such a critical section and thus may not receive any interrupts.
>
>
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -251,14 +251,25 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
>   	int deliver = 1;
>   	int vec = 0;
>   	ulong flags = 0ULL;
> +	ulong crit_raw = vcpu->arch.shared->critical;
> +	ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
> +	bool crit;
> +
> +	/* Truncate crit indicators in 32 bit mode */
> +	if (!(vcpu->arch.shared->msr&  MSR_SF)) {
> +		crit_raw&= 0xffffffff;
> +		crit_r1&= 0xffffffff;
> +	}
> +
> +	crit = (crit_raw == crit_r1);
>    

I think you need to qualify that for supervisor mode only.  Otherwise 
guest userspace can guess the value of shared->critical and disable 
interrupts.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 09/26] KVM: PPC: Add PV guest scratch registers
From: Avi Kivity @ 2010-06-27  9:53 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <27BB673F-F34E-4CC6-A22D-02CF95E7529F@suse.de>

On 06/27/2010 12:41 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:22 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>> While running in hooked code we need to store register contents out 
>>> because
>>> we must not clobber any registers.
>>>
>>> So let's add some fields to the shared page we can just happily 
>>> write to.
>>>
>>>
>>
>> How are these protected during interrupts?
>
> By the 'critical section' bit. When in a critical section (read: using 
> scratch registers), we don't issue interrupts.

Ok.  I thought you needed scratch registers to set up the critical 
section, but you don't.  Neat stuff.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27  9:52 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <77DBE095-884F-4986-BE2B-15B2EEAD8CAC@suse.de>

On 06/27/2010 12:40 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:21 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>> When running in hooked code we need a way to disable interrupts without
>>> clobbering any interrupts or exiting out to the hypervisor.
>>>
>>> To achieve this, we have an additional critical field in the shared 
>>> page. If
>>> that field is equal to the r1 register of the guest, it tells the 
>>> hypervisor
>>> that we're in such a critical section and thus may not receive any 
>>> interrupts.
>>>
>>
>> Is r1 reserved for this purpose?  Can't it match accidentally?
>
> r1 is defined by the abi to be the stack.

Neat trick!

>>
>> Why won't zero/nonzero work for this?
>
> Because there is no store immediate opcode on powerpc :(.

Or inc/dec...

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Avi Kivity @ 2010-06-27  9:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <651805F1-54AB-466F-8D23-D053D8082177@suse.de>

On 06/27/2010 12:38 PM, Alexander Graf wrote:
>
> Am 27.06.2010 um 10:16 schrieb Avi Kivity <avi@redhat.com>:
>
>> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>>> One of the most obvious registers to share with the guest directly 
>>> is the
>>> MSR. The MSR contains the "interrupts enabled" flag which the guest 
>>> has to
>>> toggle in critical sections.
>>>
>>> So in order to bring the overhead of interrupt en- and disabling 
>>> down, let's
>>> put msr into the shared page. Keep in mind that even though you can 
>>> fully read
>>> its contents, writing to it doesn't always update all state. There 
>>> are a few
>>> safe fields that don't require hypervisor interaction. See the guest
>>> implementation that follows later for reference.
>>>
>>
>>
>> You mean, see the documentation for reference.
>>
>> It should be possible to write the guest code looking only at the 
>> documentation.
>
> *shrug* since we're writing open source I don't mind telling people to 
> read code for a reference implemenration. 

It's impossible to infer from the source what's a guaranteed part of the 
interface and what is just an implementation artifact.  So people rely 
on implementation artifacts (or even bugs) and that reduces our ability 
to change things.

> If well written, that's more comprehensible than documentation anyways 
> :).

If the documentation is poorly written, yes.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-06-27  9:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270CFE.2040600@redhat.com>


Am 27.06.2010 um 10:34 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We just introduced a new PV interface that screams for  
>> documentation. So here
>> it is - a shiny new and awesome text file describing the internal  
>> works of
>> the PPC KVM paravirtual interface.
>>
>>
>> +Querying for existence
>> +======================
>> +
>> +To find out if we're running on KVM or not, we overlay the PVR  
>> register. Usually
>> +the PVR register contains an id that identifies your CPU type. If,  
>> however, you
>> +pass KVM_PVR_PARA in the register that you want the PVR result in,  
>> the register
>> +still contains KVM_PVR_PARA after the mfpvr call.
>> +
>> +    LOAD_REG_IMM(r5, KVM_PVR_PARA)
>> +    mfpvr    r5
>> +    [r5 still contains KVM_PVR_PARA]
>> +
>> +Once determined to run under a PV capable KVM, you can now use  
>> hypercalls as
>> +described below.
>>
>
> On x86 we allow host userspace to determine whether the guest sees  
> the paravirt interface (and what features are exposed).  This allows  
> you to live migrate from a newer host to an older host, by not  
> exposing the newer features.

A very good idea indeed. Let's postpone that to when we expose enough  
state to make live migration possible.

Alex

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Alexander Graf @ 2010-06-27  9:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270BB8.60404@redhat.com>


Am 27.06.2010 um 10:28 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We will soon start and replace instructions from the text section  
>> with
>> other, paravirtualized versions. To ease the readability of those  
>> patches
>> I split out the generic looping and magic page mapping code out.
>>
>> This patch still only contains stubs. But at least it loops through  
>> the
>> text section :).
>>
>>
>> +
>> +static void kvm_check_ins(u32 *inst)
>> +{
>> +    u32 _inst = *inst;
>> +    u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
>> +    u32 inst_rt = _inst&  KVM_MASK_RT;
>> +
>> +    switch (inst_no_rt) {
>> +    }
>> +
>> +    switch (_inst) {
>> +    }
>> +
>> +    flush_icache_range((ulong)inst, (ulong)inst + 4);
>> +}
>>
>
> Shouldn't we flush only if we patched something?

We introduce the patching in the next patches. This is only a  
preparation stub.

>
>> +
>> +static void kvm_use_magic_page(void)
>> +{
>> +    u32 *p;
>> +    u32 *start, *end;
>> +
>> +    /* Tell the host to map the magic page to -4096 on all CPUs */
>> +
>> +    on_each_cpu(kvm_map_magic_page, NULL, 1);
>> +
>> +    /* Now loop through all code and find instructions */
>> +
>> +    start = (void*)_stext;
>> +    end = (void*)_etext;
>> +
>> +    for (p = start; p<  end; p++)
>> +        kvm_check_ins(p);
>> +}
>> +
>>
>
> Or, flush the entire thing here.

I did that at first. It breaks. During the patching we may take  
interrupts (pahe faults for example) that contain just patched  
instructions. And really, hell breaks loose if we don't flush it  
immediately :). I was hoping at first a 32 bit replace would be atomic  
in cache, but the cpu tried to execute invalid instructions, so it  
must have gotten some intermediate state.

Alex

^ permalink raw reply

* Re: [PATCH 12/26] KVM: PPC: First magic page steps
From: Alexander Graf @ 2010-06-27  9:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270AA1.5030801@redhat.com>


Am 27.06.2010 um 10:24 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We will be introducing a method to project the shared page in guest  
>> context.
>> As soon as we're talking about this coupling, the shared page is  
>> colled magic
>> page.
>>
>> This patch introduces simple defines, so the follow-up patches are  
>> easier to
>> read.
>>
>>
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/ 
>> include/asm/kvm_host.h
>> index e35c1ac..5f8c214 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
>>      u64 dec_jiffies;
>>      unsigned long pending_exceptions;
>>      struct kvm_vcpu_arch_shared *shared;
>> +    unsigned long magic_page_pa; /* phys addr to map the magic  
>> page to */
>> +    unsigned long magic_page_ea; /* effect. addr to map the magic  
>> page to */
>>
>
> Is ea like a va?  If so, can't the guest specify it by manipulating  
> the hash table (or tlb)?

ea in ppc speech is va in x86 speech. Yes, the guest could map it  
itself, but I couldn't find out how. This way I at least know what's  
happening :).


Alex

^ permalink raw reply

* Re: [PATCH 09/26] KVM: PPC: Add PV guest scratch registers
From: Alexander Graf @ 2010-06-27  9:41 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270A34.4020706@redhat.com>


Am 27.06.2010 um 10:22 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> While running in hooked code we need to store register contents out  
>> because
>> we must not clobber any registers.
>>
>> So let's add some fields to the shared page we can just happily  
>> write to.
>>
>>
>
> How are these protected during interrupts?

By the 'critical section' bit. When in a critical section (read: using  
scratch registers), we don't issue interrupts.

Alex

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Alexander Graf @ 2010-06-27  9:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C2709F4.10805@redhat.com>


Am 27.06.2010 um 10:21 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> When running in hooked code we need a way to disable interrupts  
>> without
>> clobbering any interrupts or exiting out to the hypervisor.
>>
>> To achieve this, we have an additional critical field in the shared  
>> page. If
>> that field is equal to the r1 register of the guest, it tells the  
>> hypervisor
>> that we're in such a critical section and thus may not receive any  
>> interrupts.
>>
>
> Is r1 reserved for this purpose?  Can't it match accidentally?

r1 is defined by the abi to be the stack.

>
> Why won't zero/nonzero work for this?

Because there is no store immediate opcode on powerpc :(.

Alex

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Alexander Graf @ 2010-06-27  9:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C2708EB.9020500@redhat.com>

Am 27.06.2010 um 10:16 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:24 AM, Alexander Graf wrote:
>> One of the most obvious registers to share with the guest directly  
>> is the
>> MSR. The MSR contains the "interrupts enabled" flag which the guest  
>> has to
>> toggle in critical sections.
>>
>> So in order to bring the overhead of interrupt en- and disabling  
>> down, let's
>> put msr into the shared page. Keep in mind that even though you can  
>> fully read
>> its contents, writing to it doesn't always update all state. There  
>> are a few
>> safe fields that don't require hypervisor interaction. See the guest
>> implementation that follows later for reference.
>>
>
>
> You mean, see the documentation for reference.
>
> It should be possible to write the guest code looking only at the  
> documentation.

*shrug* since we're writing open source I don't mind telling people to  
read code for a reference implemenration. If well written, that's more  
comprehensible than documentation anyways :).

But either way, you can take a look at both - documentation and code,  
yes.

What I really meant here is that the list of registers we patch should  
be taken from the patch code. I didn't want to write out all of them  
in the description.

Alex

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-06-27  9:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <4C270876.2050806@redhat.com>


Am 27.06.2010 um 10:14 schrieb Avi Kivity <avi@redhat.com>:

> On 06/26/2010 02:25 AM, Alexander Graf wrote:
>> We just introduced a new PV interface that screams for  
>> documentation. So here
>> it is - a shiny new and awesome text file describing the internal  
>> works of
>> the PPC KVM paravirtual interface.
>>
>
> Good, that lets people who have no idea what they're talking about  
> participate in the review.

Heh, I knew you'd like this :).

>
>> +
>> +PPC hypercalls
>> +==============
>> +
>> +The only viable ways to reliably get from guest context to host  
>> context are:
>> +
>> +    1) Call an invalid instruction
>> +    2) Call the "sc" instruction with a parameter to "sc"
>> +    3) Call the "sc" instruction with parameters in GPRs
>> +
>> +Method 1 is always a bad idea. Invalid instructions can be  
>> replaced later on
>> +by valid instructions, rendering the interface broken.
>> +
>> +Method 2 also has downfalls. If the parameter to "sc" is != 0 the  
>> spec is
>> +rather unclear if the sc is targeted directly for the hypervisor  
>> or the
>> +supervisor. It would also require that we read the syscall issuing  
>> instruction
>> +every time a syscall is issued, slowing down guest syscalls.
>> +
>> +Method 3 is what KVM uses. We pass magic constants  
>> (KVM_SC_MAGIC_R3 and
>> +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall  
>> instruction with these
>> +magic values arrives from the guest's kernel mode, we take the  
>> syscall as a
>> +hypercall.
>>
>
> Is there any chance a normal syscall will have those values in r3  
> and r4?

r3 is the syscall number. So as long as the guest doesn't reuse that  
value, we're safe. Since in general syscall numbers are not randomly  
scattered throughout the number range, we should be ok here.

>
> If so, maybe it's better to use pc as they key for hypercalls.  Let  
> the guest designate one instruction address as the hypercall call  
> point; kvm can easily check it and reflect it back to the guest if  
> it doesn't match.
>

You mean the guest would tell the hv where the hypercall lies? That  
would require a hypercall, no? Defining it statically is tricky. I  
want to PV'nize osx using a kernel module later, so I don't have  
control over the physical layout.

> Is it valid and useful to issue sc from privileged mode anyway,  
> except for calling the hypervisor?

Same as a syscall on x86 really. The kernel can and does issue  
syscalls within itself.

>
>> +
>> +The parameters are as follows:
>> +
>> +    r3        KVM_SC_MAGIC_R3
>> +    r4        KVM_SC_MAGIC_R4
>> +    r5        Hypercall number
>> +    r6        First parameter
>> +    r7        Second parameter
>> +    r8        Third parameter
>> +    r9        Fourth parameter
>> +
>> +Hypercall definitions are shared in generic code, so the same  
>> hypercall numbers
>> +apply for x86 and powerpc alike.
>>
>
> Addresses passed in hypercall paramters are guest physical addresses.
>
> Do you have >32 bit physical addresses on 32-bit guests?  if so,  
> you'll need to pass physical addresses in two registers.

I think theoretically it's possible. Will we ever support it?  
Doubtful. Do we need to pass hogh memory addresses to the hv? Even  
more doubtful.

If we hit such a case, I'd just disable the hypercall for 32 bit. Or  
define param1 and param2 to contain the address if the guest is in 32- 
bit mode. No need to always make all params 64 bit imho.

>
>> +
>> +The magic page
>> +==============
>> +
>> +To enable communication between the hypervisor and guest there is  
>> a new shared
>> +page that contains parts of supervisor visible register state. The  
>> guest can
>> +map this shared page using the KVM hypercall  
>> KVM_HC_PPC_MAP_MAGIC_PAGE.
>> +
>> +With this hypercall issued the guest always gets the magic page  
>> mapped at the
>> +desired location in effective and physical address space. For now,  
>> we always
>> +map the page to -4096. This way we can access it using absolute  
>> load and store
>> +functions. The following instruction reads the first field of the  
>> magic page:
>> +
>> +    ld    rX, -4096(0)
>>
>
> Is the address guest controlled or host controlled?

Guest controlled. It's passed in to the map_magic_page hypercall.

>
>> +
>> +The interface is designed to be extensible should there be need  
>> later to add
>> +additional registers to the magic page. If you add fields to the  
>> magic page,
>> +also define a new hypercall feature to indicate that the host can  
>> give you more
>> +registers. Only if the host supports the additional features, make  
>> use of them.
>> +
>> +The magic page has the following layout as described in
>> +arch/powerpc/include/asm/kvm_para.h:
>> +
>> +struct kvm_vcpu_arch_shared {
>> +    __u64 scratch1;
>> +    __u64 scratch2;
>> +    __u64 scratch3;
>> +    __u64 critical;        /* Guest may not get interrupts if ==  
>> r1 */
>>
>
> Elaborate?

I think I have a description in the respective patch. Probably a good  
idea to add it to the documentation.

>
>> +    __u64 sprg0;
>> +    __u64 sprg1;
>> +    __u64 sprg2;
>> +    __u64 sprg3;
>> +    __u64 srr0;
>> +    __u64 srr1;
>> +    __u64 dar;
>> +    __u64 msr;
>> +    __u32 dsisr;
>> +    __u32 int_pending;    /* Tells the guest if we have an  
>> interrupt */
>> +};
>> +
>> +Additions to the page must only occur at the end. Struct fields  
>> are always 32
>> +bit aligned.
>> +
>> +Patched instructions
>> +====================
>> +
>> +The "ld" and "std" instructions are transormed to "lwz" and "stw"  
>> instructions
>> +respectively on 32 bit systems with an added offset of 4 to  
>> accomodate for big
>> +endianness.
>>
>
> Who does the patching? guest or host?

All patching is done by the guest. Probably worth mentioning, yeah.

>
>> +
>> +From            To
>> +====            ==
>> +
>> +mfmsr    rX        ld    rX, magic_page->msr
>> +mfsprg    rX, 0        ld    rX, magic_page->sprg0
>> +mfsprg    rX, 1        ld    rX, magic_page->sprg1
>> +mfsprg    rX, 2        ld    rX, magic_page->sprg2
>> +mfsprg    rX, 3        ld    rX, magic_page->sprg3
>> +mfsrr0    rX        ld    rX, magic_page->srr0
>> +mfsrr1    rX        ld    rX, magic_page->srr1
>> +mfdar    rX        ld    rX, magic_page->dar
>> +mfdsisr    rX        ld    rX, magic_page->dsisr
>> +
>> +mtmsr    rX        std    rX, magic_page->msr
>> +mtsprg    0, rX        std    rX, magic_page->sprg0
>> +mtsprg    1, rX        std    rX, magic_page->sprg1
>> +mtsprg    2, rX        std    rX, magic_page->sprg2
>> +mtsprg    3, rX        std    rX, magic_page->sprg3
>> +mtsrr0    rX        std    rX, magic_page->srr0
>> +mtsrr1    rX        std    rX, magic_page->srr1
>> +mtdar    rX        std    rX, magic_page->dar
>> +mtdsisr    rX        std    rX, magic_page->dsisr
>> +
>> +tlbsync            nop
>> +
>> +mtmsrd    rX, 0        b    <special mtmsr section>
>> +mtmsr            b    <special mtmsr section>
>> +
>> +mtmsrd    rX, 1        b    <special mtmsrd section>
>> +
>> +[BookE only]
>> +wrteei    [0|1]        b    <special wrteei section>
>>
>
> Probably the guest, as only it can arrange for special * sections.   
> Good.
>
>> +
>> +Some instructions require more logic to determine what's going on  
>> than a load
>> +or store instruction can deliver. To enable patching of those, we  
>> keep some
>> +RAM around where we can live translate instructions to. What  
>> happens is the
>> +following:
>> +
>> +    1) copy emulation code to memory
>> +    2) patch that code to fit the emulated instruction
>> +    3) patch that code to return to the original pc + 4
>> +    4) patch the original instruction to branch to the new code
>> +
>> +That way we can inject an arbitrary amount of code as replacement  
>> for a single
>> +instruction. This allows us to check for pending interrupts when  
>> setting EE=1
>> +for example.
>> +
>>
>
> Or not.
>
> What about transitions from paravirt to non-paravirt?  For example,  
> a system reset.

That ... eh ... good question. It would leave the map pending, but  
everything still continues working.

I don't really know in kvm when a reset occured. So we have to make  
qemu set the map to 0 on reset. Let's add then when we add migration  
support and actually expose all those missing states to userspace.  
Currently we only expose half the necessary state for migration  
anyway :).


Alex

^ permalink raw reply

* Re: [PATCH 24/26] KVM: PPC: PV mtmsrd L=0 and mtmsr
From: Alexander Graf @ 2010-06-27  9:10 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <EDF0A567-C440-4F1B-9AF5-2E0F8203D566@kernel.crashing.org>


Am 26.06.2010 um 19:03 schrieb Segher Boessenkool <segher@kernel.crashing.org 
 >:

>> There is also a form of mtmsr where all bits need to be addressed.  
>> While the
>> PPC64 Linux kernel behaves resonably well here, the PPC32 one never  
>> uses the
>> L=1 form but does mtmsr even for simple things like only changing EE.
>
> You make it sound like the 32-bit kernel does something stupid, while
> there is no other choice.  The "L=1" thing only exists for 64-bit.

Oh, so that's why :). That doesn't really change the fact that it's  
very hard to distinguish between a mtmsr that only changes MSR_EE vs  
one that changes MSR_IR for example :).

Alex

>

^ permalink raw reply

* Re: [PATCH 11/26] KVM: PPC: Make RMO a define
From: Alexander Graf @ 2010-06-27  9:08 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, KVM list, kvm-ppc@vger.kernel.org
In-Reply-To: <2078D8A9-7D36-4B5D-A779-9BBAB545A53D@kernel.crashing.org>

Am 26.06.2010 um 18:52 schrieb Segher Boessenkool <segher@kernel.crashing.org 
 >:

>> On PowerPC it's very normal to not support all of the physical RAM  
>> in real mode.
>
> Oh?  Are you referring to "real mode limit", or 32-bit  
> implementations with
> more than 32 address lines, or something else?

The former.

>
> Either way, RMO is a really bad name for this, since that name is  
> already
> used for a similar but different concept.

It's the same concept, no? Not all physical memory is accessible from  
real mode.

>
> Also, it seems you construct the physical address by masking out  
> bits from
> the effective address.  Most implementations will trap or machine  
> check if
> you address outside of physical address space, instead.

Well the only case where I remember to have hit a real RMO case is on  
the PS3 - that issues a data/instruction storage interrupt when  
accessing anything > 8MB in real mode.

So I'd argue this is heavily implementation specific.

Apart from that what I'm trying to cover is that on ppc64 accessing  
0xc0000000000000 in real mode gets you 0x0. Is there a better name for  
this?

Alex

>

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Avi Kivity @ 2010-06-27  8:34 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-27-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We just introduced a new PV interface that screams for documentation. So here
> it is - a shiny new and awesome text file describing the internal works of
> the PPC KVM paravirtual interface.
>
>
> +Querying for existence
> +======================
> +
> +To find out if we're running on KVM or not, we overlay the PVR register. Usually
> +the PVR register contains an id that identifies your CPU type. If, however, you
> +pass KVM_PVR_PARA in the register that you want the PVR result in, the register
> +still contains KVM_PVR_PARA after the mfpvr call.
> +
> +	LOAD_REG_IMM(r5, KVM_PVR_PARA)
> +	mfpvr	r5
> +	[r5 still contains KVM_PVR_PARA]
> +
> +Once determined to run under a PV capable KVM, you can now use hypercalls as
> +described below.
>    

On x86 we allow host userspace to determine whether the guest sees the 
paravirt interface (and what features are exposed).  This allows you to 
live migrate from a newer host to an older host, by not exposing the 
newer features.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 18/26] KVM: PPC: KVM PV guest stubs
From: Avi Kivity @ 2010-06-27  8:28 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-19-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We will soon start and replace instructions from the text section with
> other, paravirtualized versions. To ease the readability of those patches
> I split out the generic looping and magic page mapping code out.
>
> This patch still only contains stubs. But at least it loops through the
> text section :).
>
>
> +
> +static void kvm_check_ins(u32 *inst)
> +{
> +	u32 _inst = *inst;
> +	u32 inst_no_rt = _inst&  ~KVM_MASK_RT;
> +	u32 inst_rt = _inst&  KVM_MASK_RT;
> +
> +	switch (inst_no_rt) {
> +	}
> +
> +	switch (_inst) {
> +	}
> +
> +	flush_icache_range((ulong)inst, (ulong)inst + 4);
> +}
>    

Shouldn't we flush only if we patched something?

> +
> +static void kvm_use_magic_page(void)
> +{
> +	u32 *p;
> +	u32 *start, *end;
> +
> +	/* Tell the host to map the magic page to -4096 on all CPUs */
> +
> +	on_each_cpu(kvm_map_magic_page, NULL, 1);
> +
> +	/* Now loop through all code and find instructions */
> +
> +	start = (void*)_stext;
> +	end = (void*)_etext;
> +
> +	for (p = start; p<  end; p++)
> +		kvm_check_ins(p);
> +}
> +
>    

Or, flush the entire thing here.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 12/26] KVM: PPC: First magic page steps
From: Avi Kivity @ 2010-06-27  8:24 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-13-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We will be introducing a method to project the shared page in guest context.
> As soon as we're talking about this coupling, the shared page is colled magic
> page.
>
> This patch introduces simple defines, so the follow-up patches are easier to
> read.
>
>
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index e35c1ac..5f8c214 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
>   	u64 dec_jiffies;
>   	unsigned long pending_exceptions;
>   	struct kvm_vcpu_arch_shared *shared;
> +	unsigned long magic_page_pa; /* phys addr to map the magic page to */
> +	unsigned long magic_page_ea; /* effect. addr to map the magic page to */
>    

Is ea like a va?  If so, can't the guest specify it by manipulating the 
hash table (or tlb)?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 09/26] KVM: PPC: Add PV guest scratch registers
From: Avi Kivity @ 2010-06-27  8:22 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-10-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> While running in hooked code we need to store register contents out because
> we must not clobber any registers.
>
> So let's add some fields to the shared page we can just happily write to.
>
>    

How are these protected during interrupts?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 08/26] KVM: PPC: Add PV guest critical sections
From: Avi Kivity @ 2010-06-27  8:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-9-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> When running in hooked code we need a way to disable interrupts without
> clobbering any interrupts or exiting out to the hypervisor.
>
> To achieve this, we have an additional critical field in the shared page. If
> that field is equal to the r1 register of the guest, it tells the hypervisor
> that we're in such a critical section and thus may not receive any interrupts.
>    

Is r1 reserved for this purpose?  Can't it match accidentally?

Why won't zero/nonzero work for this?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 02/26] KVM: PPC: Convert MSR to shared page
From: Avi Kivity @ 2010-06-27  8:16 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-3-git-send-email-agraf@suse.de>

On 06/26/2010 02:24 AM, Alexander Graf wrote:
> One of the most obvious registers to share with the guest directly is the
> MSR. The MSR contains the "interrupts enabled" flag which the guest has to
> toggle in critical sections.
>
> So in order to bring the overhead of interrupt en- and disabling down, let's
> put msr into the shared page. Keep in mind that even though you can fully read
> its contents, writing to it doesn't always update all state. There are a few
> safe fields that don't require hypervisor interaction. See the guest
> implementation that follows later for reference.
>    


You mean, see the documentation for reference.

It should be possible to write the guest code looking only at the 
documentation.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 26/26] KVM: PPC: Add Documentation about PV interface
From: Avi Kivity @ 2010-06-27  8:14 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277508314-915-27-git-send-email-agraf@suse.de>

On 06/26/2010 02:25 AM, Alexander Graf wrote:
> We just introduced a new PV interface that screams for documentation. So here
> it is - a shiny new and awesome text file describing the internal works of
> the PPC KVM paravirtual interface.
>    

Good, that lets people who have no idea what they're talking about 
participate in the review.

> +
> +PPC hypercalls
> +==============
> +
> +The only viable ways to reliably get from guest context to host context are:
> +
> +	1) Call an invalid instruction
> +	2) Call the "sc" instruction with a parameter to "sc"
> +	3) Call the "sc" instruction with parameters in GPRs
> +
> +Method 1 is always a bad idea. Invalid instructions can be replaced later on
> +by valid instructions, rendering the interface broken.
> +
> +Method 2 also has downfalls. If the parameter to "sc" is != 0 the spec is
> +rather unclear if the sc is targeted directly for the hypervisor or the
> +supervisor. It would also require that we read the syscall issuing instruction
> +every time a syscall is issued, slowing down guest syscalls.
> +
> +Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R3 and
> +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall instruction with these
> +magic values arrives from the guest's kernel mode, we take the syscall as a
> +hypercall.
>    

Is there any chance a normal syscall will have those values in r3 and r4?

If so, maybe it's better to use pc as they key for hypercalls.  Let the 
guest designate one instruction address as the hypercall call point; kvm 
can easily check it and reflect it back to the guest if it doesn't match.

Is it valid and useful to issue sc from privileged mode anyway, except 
for calling the hypervisor?

> +
> +The parameters are as follows:
> +
> +	r3		KVM_SC_MAGIC_R3
> +	r4		KVM_SC_MAGIC_R4
> +	r5		Hypercall number
> +	r6		First parameter
> +	r7		Second parameter
> +	r8		Third parameter
> +	r9		Fourth parameter
> +
> +Hypercall definitions are shared in generic code, so the same hypercall numbers
> +apply for x86 and powerpc alike.
>    

Addresses passed in hypercall paramters are guest physical addresses.

Do you have >32 bit physical addresses on 32-bit guests?  if so, you'll 
need to pass physical addresses in two registers.

> +
> +The magic page
> +==============
> +
> +To enable communication between the hypervisor and guest there is a new shared
> +page that contains parts of supervisor visible register state. The guest can
> +map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
> +
> +With this hypercall issued the guest always gets the magic page mapped at the
> +desired location in effective and physical address space. For now, we always
> +map the page to -4096. This way we can access it using absolute load and store
> +functions. The following instruction reads the first field of the magic page:
> +
> +	ld	rX, -4096(0)
>    

Is the address guest controlled or host controlled?

> +
> +The interface is designed to be extensible should there be need later to add
> +additional registers to the magic page. If you add fields to the magic page,
> +also define a new hypercall feature to indicate that the host can give you more
> +registers. Only if the host supports the additional features, make use of them.
> +
> +The magic page has the following layout as described in
> +arch/powerpc/include/asm/kvm_para.h:
> +
> +struct kvm_vcpu_arch_shared {
> +	__u64 scratch1;
> +	__u64 scratch2;
> +	__u64 scratch3;
> +	__u64 critical;		/* Guest may not get interrupts if == r1 */
>    

Elaborate?

> +	__u64 sprg0;
> +	__u64 sprg1;
> +	__u64 sprg2;
> +	__u64 sprg3;
> +	__u64 srr0;
> +	__u64 srr1;
> +	__u64 dar;
> +	__u64 msr;
> +	__u32 dsisr;
> +	__u32 int_pending;	/* Tells the guest if we have an interrupt */
> +};
> +
> +Additions to the page must only occur at the end. Struct fields are always 32
> +bit aligned.
> +
> +Patched instructions
> +====================
> +
> +The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
> +respectively on 32 bit systems with an added offset of 4 to accomodate for big
> +endianness.
>    

Who does the patching? guest or host?

> +
> +From			To
> +====			==
> +
> +mfmsr	rX		ld	rX, magic_page->msr
> +mfsprg	rX, 0		ld	rX, magic_page->sprg0
> +mfsprg	rX, 1		ld	rX, magic_page->sprg1
> +mfsprg	rX, 2		ld	rX, magic_page->sprg2
> +mfsprg	rX, 3		ld	rX, magic_page->sprg3
> +mfsrr0	rX		ld	rX, magic_page->srr0
> +mfsrr1	rX		ld	rX, magic_page->srr1
> +mfdar	rX		ld	rX, magic_page->dar
> +mfdsisr	rX		ld	rX, magic_page->dsisr
> +
> +mtmsr	rX		std	rX, magic_page->msr
> +mtsprg	0, rX		std	rX, magic_page->sprg0
> +mtsprg	1, rX		std	rX, magic_page->sprg1
> +mtsprg	2, rX		std	rX, magic_page->sprg2
> +mtsprg	3, rX		std	rX, magic_page->sprg3
> +mtsrr0	rX		std	rX, magic_page->srr0
> +mtsrr1	rX		std	rX, magic_page->srr1
> +mtdar	rX		std	rX, magic_page->dar
> +mtdsisr	rX		std	rX, magic_page->dsisr
> +
> +tlbsync			nop
> +
> +mtmsrd	rX, 0		b	<special mtmsr section>
> +mtmsr			b	<special mtmsr section>
> +
> +mtmsrd	rX, 1		b	<special mtmsrd section>
> +
> +[BookE only]
> +wrteei	[0|1]		b	<special wrteei section>
>    

Probably the guest, as only it can arrange for special * sections.  Good.

> +
> +Some instructions require more logic to determine what's going on than a load
> +or store instruction can deliver. To enable patching of those, we keep some
> +RAM around where we can live translate instructions to. What happens is the
> +following:
> +
> +	1) copy emulation code to memory
> +	2) patch that code to fit the emulated instruction
> +	3) patch that code to return to the original pc + 4
> +	4) patch the original instruction to branch to the new code
> +
> +That way we can inject an arbitrary amount of code as replacement for a single
> +instruction. This allows us to check for pending interrupts when setting EE=1
> +for example.
> +
>    

Or not.

What about transitions from paravirt to non-paravirt?  For example, a 
system reset.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH 1/2] KVM: PPC: Add generic hpte management functions
From: Avi Kivity @ 2010-06-27  7:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: kvm-ppc, linuxppc-dev, Alexander Graf, kvm
In-Reply-To: <1277593118.4200.122.camel@pasglop>

On 06/27/2010 01:58 AM, Benjamin Herrenschmidt wrote:
>
>> Then mmu intensive loads can expect to be slow.
>>      
> Well, depends. ppc64 indeed requires the hash to be managed by the
> hypervisor, so inserting or invalidating translations will mean a
> roundtrip to the hypervisor, though there are ways at least the
> insertion could be alleviated (for example, the HV could service the
> hash misses directly walking the guest page tables).
>    

But the guest page tables are software defined, no?  That means the 
interface will break if the page table format changes.

> But that's due in part to a design choice (whether it's a good one or
> not I'm not going to argue here) which favors huge reasonably static
> workloads where the hash is expected to contain all translations for
> everything.
>    

What about when you have memory pressure?  The hash will have to reflect 
those pte_clear_flush_young(), no?

It seems horribly expensive.

> However, note that BookE (the embedded variant of the architecture) uses
> a different model for virtualization, including options in its latest
> variant for a HW logical->real translation (via a small dedicated TLB)
> and direct access to some TLB ops from the guest.
>    

I'm somewhat familiar with it, yes.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox