Userspace MSR handling

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Userspace MSR handling
@ 2009-05-22 20:11 Ed Swierk
  2009-05-23  8:57 ` Alexander Graf
  2009-05-25 11:16 ` Gerd Hoffmann
  0 siblings, 2 replies; 18+ messages in thread
From: Ed Swierk @ 2009-05-22 20:11 UTC (permalink / raw)
  To: kvm

I'm experimenting with Gerd's excellent work on integrating Xenner
into Qemu (http://git.et.redhat.com/?p=qemu-kraxel.git). I'm using it
to boot a FreeBSD guest that uses the Xen paravirtual network drivers.
Decoupling the Xen PV guest support from the hypervisor really
simplifies deployment.

The current implementation doesn't yet support KVM, as KVM has to
handle a Xen-specific MSR in order to map hypercall pages into the
guest physical address space. A recent thread on this list discussed
the issue but didn't come to a resolution.

Does it make sense to implement a generic mechanism for handling MSRs
in userspace? I imagine a mechanism analogous to PIO, adding a
KVM_EXIT_MSR code and a msr type in the kvm_run struct.

I'm happy to take a stab at implementing this if no one else is
already working on it.

--Ed

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-22 20:11 Userspace MSR handling Ed Swierk
@ 2009-05-23  8:57 ` Alexander Graf
  2009-05-24 12:07   ` Avi Kivity
  2009-05-25 11:16 ` Gerd Hoffmann
  1 sibling, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2009-05-23  8:57 UTC (permalink / raw)
  To: Ed Swierk; +Cc: kvm@vger.kernel.org





On 22.05.2009, at 22:11, Ed Swierk <eswierk@aristanetworks.com> wrote:

> I'm experimenting with Gerd's excellent work on integrating Xenner
> into Qemu (http://git.et.redhat.com/?p=qemu-kraxel.git). I'm using it
> to boot a FreeBSD guest that uses the Xen paravirtual network drivers.
> Decoupling the Xen PV guest support from the hypervisor really
> simplifies deployment.
>
> The current implementation doesn't yet support KVM, as KVM has to
> handle a Xen-specific MSR in order to map hypercall pages into the
> guest physical address space. A recent thread on this list discussed
> the issue but didn't come to a resolution.
>
> Does it make sense to implement a generic mechanism for handling MSRs
> in userspace? I imagine a mechanism analogous to PIO, adding a
> KVM_EXIT_MSR code and a msr type in the kvm_run struct.
>
> I'm happy to take a stab at implementing this if no one else is
> already working on it.

I think it's a great idea.
I was thinking of doing something similar for ppc's HIDs/SPRs too, so  
a userspace app can complement the kernel's vcpu support.

Also by falling back to userspace all those MSR read/write patches I  
send wouldn't have to go in-kernel anymore :)

Alex

>
>
> --Ed
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-23  8:57 ` Alexander Graf
@ 2009-05-24 12:07   ` Avi Kivity
  2009-05-24 16:15     ` Alexander Graf
  2009-05-25 11:03     ` Gerd Hoffmann
  0 siblings, 2 replies; 18+ messages in thread
From: Avi Kivity @ 2009-05-24 12:07 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Ed Swierk, kvm@vger.kernel.org

Alexander Graf wrote:
>
>
>
>
> On 22.05.2009, at 22:11, Ed Swierk <eswierk@aristanetworks.com> wrote:
>
>> I'm experimenting with Gerd's excellent work on integrating Xenner
>> into Qemu (http://git.et.redhat.com/?p=qemu-kraxel.git). I'm using it
>> to boot a FreeBSD guest that uses the Xen paravirtual network drivers.
>> Decoupling the Xen PV guest support from the hypervisor really
>> simplifies deployment.
>>
>> The current implementation doesn't yet support KVM, as KVM has to
>> handle a Xen-specific MSR in order to map hypercall pages into the
>> guest physical address space. A recent thread on this list discussed
>> the issue but didn't come to a resolution.
>>
>> Does it make sense to implement a generic mechanism for handling MSRs
>> in userspace? I imagine a mechanism analogous to PIO, adding a
>> KVM_EXIT_MSR code and a msr type in the kvm_run struct.
>>
>> I'm happy to take a stab at implementing this if no one else is
>> already working on it.
>
> I think it's a great idea.
> I was thinking of doing something similar for ppc's HIDs/SPRs too, so 
> a userspace app can complement the kernel's vcpu support.
>
> Also by falling back to userspace all those MSR read/write patches I 
> send wouldn't have to go in-kernel anymore :)

I'm wary of this.  It spreads the burden of implementing the cpu 
emulation across the kernel/user boundary.  We don't really notice with 
qemu as userspace, because we have a cpu emulator on both sides, but 
consider an alternative userspace that only emulates devices and has no 
cpu emulation support.  We want to support that scenario well.

Moreover, your patches only stub out those MSRs.  As soon as you 
implement the more interesting bits, you'll find yourself back in the 
kernel.

I agree however that the Xen hypercall page protocol has no business in 
kvm.ko.  But can't we implement it in emu?  Xenner conveniently places a 
ring 0 stub in the guest, we could trap the MSR there and emulate it 
entirely in the guest.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-24 12:07   ` Avi Kivity
@ 2009-05-24 16:15     ` Alexander Graf
  2009-05-26 11:31       ` Avi Kivity
  2009-05-25 11:03     ` Gerd Hoffmann
  1 sibling, 1 reply; 18+ messages in thread
From: Alexander Graf @ 2009-05-24 16:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ed Swierk, kvm@vger.kernel.org





On 24.05.2009, at 14:07, Avi Kivity <avi@redhat.com> wrote:

> Alexander Graf wrote:
>>
>>
>>
>>
>> On 22.05.2009, at 22:11, Ed Swierk <eswierk@aristanetworks.com>  
>> wrote:
>>
>>> I'm experimenting with Gerd's excellent work on integrating Xenner
>>> into Qemu (http://git.et.redhat.com/?p=qemu-kraxel.git). I'm using  
>>> it
>>> to boot a FreeBSD guest that uses the Xen paravirtual network  
>>> drivers.
>>> Decoupling the Xen PV guest support from the hypervisor really
>>> simplifies deployment.
>>>
>>> The current implementation doesn't yet support KVM, as KVM has to
>>> handle a Xen-specific MSR in order to map hypercall pages into the
>>> guest physical address space. A recent thread on this list discussed
>>> the issue but didn't come to a resolution.
>>>
>>> Does it make sense to implement a generic mechanism for handling  
>>> MSRs
>>> in userspace? I imagine a mechanism analogous to PIO, adding a
>>> KVM_EXIT_MSR code and a msr type in the kvm_run struct.
>>>
>>> I'm happy to take a stab at implementing this if no one else is
>>> already working on it.
>>
>> I think it's a great idea.
>> I was thinking of doing something similar for ppc's HIDs/SPRs too,  
>> so a userspace app can complement the kernel's vcpu support.
>>
>> Also by falling back to userspace all those MSR read/write patches  
>> I send wouldn't have to go in-kernel anymore :)
>
> I'm wary of this.  It spreads the burden of implementing the cpu  
> emulation across the kernel/user boundary.  We don't really notice  
> with qemu as userspace, because we have a cpu emulator on both  
> sides, but consider an alternative userspace that only emulates  
> devices and has no cpu emulation support.  We want to support that  
> scenario well.
>
> Moreover, your patches only stub out those MSRs.  As soon as you  
> implement the more interesting bits, you'll find yourself back in  
> the kernel.
>

Agreed. The one thing that always makes my life hard is the default  
policy on what to do for unknown MSRs. So if I could (by having a  
userspace fallback) either #GP or do nothing, I'd be able to mimic  
qemu's behavior more closely depending on what I need.

I definitely wouldn't see those approaches conflicting, but rather  
complementing each other. If your kvm using userspace app needs to act  
on a user-defined msr, you wouldn't want him to contact reshat to  
implement an ioctl for rhel5 just for this msr, do you?

So imho instead of #gp'ing falling back to userspace would be great.

Alex

> I agree however that the Xen hypercall page protocol has no business  
> in kvm.ko.  But can't we implement it in emu?  Xenner conveniently  
> places a ring 0 stub in the guest, we could trap the MSR there and  
> emulate it entirely in the guest.
>
> -- 
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-24 12:07   ` Avi Kivity
  2009-05-24 16:15     ` Alexander Graf
@ 2009-05-25 11:03     ` Gerd Hoffmann
  2009-05-25 11:20       ` Avi Kivity
  1 sibling, 1 reply; 18+ messages in thread
From: Gerd Hoffmann @ 2009-05-25 11:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, Ed Swierk, kvm@vger.kernel.org

On 05/24/09 14:07, Avi Kivity wrote:
> I agree however that the Xen hypercall page protocol has no business in
> kvm.ko. But can't we implement it in emu? Xenner conveniently places a
> ring 0 stub in the guest, we could trap the MSR there and emulate it
> entirely in the guest.

No.  The case where handling the msr writes is needed is the pv-on-hvm 
driver support.  For pv kernels it could be handled by emu if it would 
be needed, but pv kernels don't need the msr stuff in the first place. 
There should be a longish mail about that in the list archive ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-22 20:11 Userspace MSR handling Ed Swierk
  2009-05-23  8:57 ` Alexander Graf
@ 2009-05-25 11:16 ` Gerd Hoffmann
  1 sibling, 0 replies; 18+ messages in thread
From: Gerd Hoffmann @ 2009-05-25 11:16 UTC (permalink / raw)
  To: Ed Swierk; +Cc: kvm

On 05/22/09 22:11, Ed Swierk wrote:
> Does it make sense to implement a generic mechanism for handling MSRs
> in userspace?

I see no other way to handle the xen pv msr writes.

> I imagine a mechanism analogous to PIO, adding a
> KVM_EXIT_MSR code and a msr type in the kvm_run struct.

Sounds sensible to me.  Probably must be off by default for backward 
compatibility, then enabled by ioctl.  I think it would be best to 
enable it for specific msrs.  We probably also want to have a way for 
userspace to figure whenever some specific msr is implemented in-kernel 
to handle the case of an msr emulation moving from userspace to 
kernelspace gracefully.

> I'm happy to take a stab at implementing this if no one else is
> already working on it.

I is somewhere on my todo list, but I didn't looked at it (yet).
Feel free to go ahead ;)

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-25 11:03     ` Gerd Hoffmann
@ 2009-05-25 11:20       ` Avi Kivity
  2009-05-25 11:29         ` Gerd Hoffmann
  2009-05-27 16:12         ` Ed Swierk
  0 siblings, 2 replies; 18+ messages in thread
From: Avi Kivity @ 2009-05-25 11:20 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Alexander Graf, Ed Swierk, kvm@vger.kernel.org

Gerd Hoffmann wrote:
> On 05/24/09 14:07, Avi Kivity wrote:
>> I agree however that the Xen hypercall page protocol has no business in
>> kvm.ko. But can't we implement it in emu? Xenner conveniently places a
>> ring 0 stub in the guest, we could trap the MSR there and emulate it
>> entirely in the guest.
>
> No.  The case where handling the msr writes is needed is the pv-on-hvm 
> driver support.  For pv kernels it could be handled by emu if it would 
> be needed, but pv kernels don't need the msr stuff in the first place. 
> There should be a longish mail about that in the list archive ...

Yes, I forgot.

Device drivers have no business writing to cpu model specific 
registers.  I hate to bring that fugliness to kvm but I do want to 
support Xen guests.

It should have been implemented as mmio.  Maybe implement an ioctl that 
converts rdmsr/wrmsr to equivalent mmios?

struct kvm_msr_mmio {
    __u32 msr;
    __u32 nr;
    __u64 mmio;
    __u32 flags;
    __u32 pad[3];
}

In any case it should reject the standard msr ranges to prevent Alex 
from implementing cpu emulation in userspace.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-25 11:20       ` Avi Kivity
@ 2009-05-25 11:29         ` Gerd Hoffmann
  2009-05-25 11:31           ` Avi Kivity
  2009-05-27 16:12         ` Ed Swierk
  1 sibling, 1 reply; 18+ messages in thread
From: Gerd Hoffmann @ 2009-05-25 11:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, Ed Swierk, kvm@vger.kernel.org

On 05/25/09 13:20, Avi Kivity wrote:
> It should have been implemented as mmio. Maybe implement an ioctl that
> converts rdmsr/wrmsr to equivalent mmios?
>
> struct kvm_msr_mmio {
> __u32 msr;
> __u32 nr;
> __u64 mmio;
> __u32 flags;
> __u32 pad[3];
> }

Funny way to tunnel msr access through the existing interface.
Should work though I think.

> In any case it should reject the standard msr ranges to prevent Alex
> from implementing cpu emulation in userspace.

;)

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-25 11:29         ` Gerd Hoffmann
@ 2009-05-25 11:31           ` Avi Kivity
  0 siblings, 0 replies; 18+ messages in thread
From: Avi Kivity @ 2009-05-25 11:31 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Alexander Graf, Ed Swierk, kvm@vger.kernel.org

Gerd Hoffmann wrote:
> On 05/25/09 13:20, Avi Kivity wrote:
>> It should have been implemented as mmio. Maybe implement an ioctl that
>> converts rdmsr/wrmsr to equivalent mmios?
>>
>> struct kvm_msr_mmio {
>> __u32 msr;
>> __u32 nr;
>> __u64 mmio;
>> __u32 flags;
>> __u32 pad[3];
>> }
>
> Funny way to tunnel msr access through the existing interface.
> Should work though I think.

I'm not happy with it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-24 16:15     ` Alexander Graf
@ 2009-05-26 11:31       ` Avi Kivity
  0 siblings, 0 replies; 18+ messages in thread
From: Avi Kivity @ 2009-05-26 11:31 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Ed Swierk, kvm@vger.kernel.org

Alexander Graf wrote:
>>>> Does it make sense to implement a generic mechanism for handling MSRs
>>>> in userspace? I imagine a mechanism analogous to PIO, adding a
>>>> KVM_EXIT_MSR code and a msr type in the kvm_run struct.
>>>>
>>>> I'm happy to take a stab at implementing this if no one else is
>>>> already working on it.
>>>
>>> I think it's a great idea.
>>> I was thinking of doing something similar for ppc's HIDs/SPRs too, 
>>> so a userspace app can complement the kernel's vcpu support.
>>>
>>> Also by falling back to userspace all those MSR read/write patches I 
>>> send wouldn't have to go in-kernel anymore :)
>>
>> I'm wary of this.  It spreads the burden of implementing the cpu 
>> emulation across the kernel/user boundary.  We don't really notice 
>> with qemu as userspace, because we have a cpu emulator on both sides, 
>> but consider an alternative userspace that only emulates devices and 
>> has no cpu emulation support.  We want to support that scenario well.
>>
>> Moreover, your patches only stub out those MSRs.  As soon as you 
>> implement the more interesting bits, you'll find yourself back in the 
>> kernel.
>>
>
> Agreed. The one thing that always makes my life hard is the default 
> policy on what to do for unknown MSRs. So if I could (by having a 
> userspace fallback) either #GP or do nothing, I'd be able to mimic 
> qemu's behavior more closely depending on what I need.

I'm not interested in mimicing qemu, I'm interested in mimicing a real 
cpu.  kvm is not part of qemu.

Many (most?) msrs cannot be emulated in userspace.

> I definitely wouldn't see those approaches conflicting, but rather 
> complementing each other. If your kvm using userspace app needs to act 
> on a user-defined msr, you wouldn't want him to contact reshat to 
> implement an ioctl for rhel5 just for this msr, do you?

An msr is a cpu resource.  I don't see how you can define a new cpu 
resource without changing the cpu implementation, which is in the 
kernel.  If they want to communicate with userspace, let them use mmio 
or pio.

Look at the Xen interface.  You write to one cpu's MSR, and the cpu 
writes a page in guest memory.  It doesn't fit; a much better interface 
would have been mmio.

I don't want to break layering just for Xen, so I'm trying to find an 
alternative.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-25 11:20       ` Avi Kivity
  2009-05-25 11:29         ` Gerd Hoffmann
@ 2009-05-27 16:12         ` Ed Swierk
  2009-05-27 16:28           ` Avi Kivity
  1 sibling, 1 reply; 18+ messages in thread
From: Ed Swierk @ 2009-05-27 16:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gerd Hoffmann, Alexander Graf, kvm@vger.kernel.org

On Mon, May 25, 2009 at 4:20 AM, Avi Kivity <avi@redhat.com> wrote:
> Device drivers have no business writing to cpu model specific registers.  I
> hate to bring that fugliness to kvm but I do want to support Xen guests.
>
> It should have been implemented as mmio.  Maybe implement an ioctl that
> converts rdmsr/wrmsr to equivalent mmios?

Converting MSRs to IO sounds fine, but a generic mechanism, with a new
ioctl type and all the bookkeeping for a dynamically-sized list of
MSR-to-MMIO mappings, seems like overkill given the puny scope of the
problem. All the Xen HVM guest needs is a single, arbitrary MSR that
when written generates an MMIO or PIO write handled by userspace. If
this requirement is unique and we don't expect to find other guests
that similarly abuse MSRs, could we get away with a less flexible but
simpler mechanism?

What I have in mind is choosing an unused legacy IO port range, say,
0x28-0x2f, and implementing a KVM-specific MSR, say, MSR_KVM_IO_28,
that maps rdmsr/wrmsr to a pair of inl/outl operations on these ports.
Either MMIO or PIO would work, but I'm assuming it's safer to grab
currently-unused IO ports than particular memory addresses.

That odor you smell is the aroma of hardcoded goop, but I'm trying to
find a solution that doesn't burden KVM with a big chunk of code to
solve a one-off problem.

--Ed

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-27 16:12         ` Ed Swierk
@ 2009-05-27 16:28           ` Avi Kivity
  2009-05-27 17:09             ` Ed Swierk
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2009-05-27 16:28 UTC (permalink / raw)
  To: Ed Swierk; +Cc: Gerd Hoffmann, Alexander Graf, kvm@vger.kernel.org

Ed Swierk wrote:
> On Mon, May 25, 2009 at 4:20 AM, Avi Kivity <avi@redhat.com> wrote:
>   
>> Device drivers have no business writing to cpu model specific registers.  I
>> hate to bring that fugliness to kvm but I do want to support Xen guests.
>>
>> It should have been implemented as mmio.  Maybe implement an ioctl that
>> converts rdmsr/wrmsr to equivalent mmios?
>>     
>
> Converting MSRs to IO sounds fine, but a generic mechanism, with a new
> ioctl type and all the bookkeeping for a dynamically-sized list of
> MSR-to-MMIO mappings, seems like overkill given the puny scope of the
> problem. All the Xen HVM guest needs is a single, arbitrary MSR that
> when written generates an MMIO or PIO write handled by userspace. If
> this requirement is unique and we don't expect to find other guests
> that similarly abuse MSRs, could we get away with a less flexible but
> simpler mechanism?
>   

I agree, it's stupid.

> What I have in mind is choosing an unused legacy IO port range, say,
> 0x28-0x2f, and implementing a KVM-specific MSR, say, MSR_KVM_IO_28,
> that maps rdmsr/wrmsr to a pair of inl/outl operations on these ports.
> Either MMIO or PIO would work, but I'm assuming it's safer to grab
> currently-unused IO ports than particular memory addresses.
>   

It's just as bad.

> That odor you smell is the aroma of hardcoded goop, but I'm trying to
> find a solution that doesn't burden KVM with a big chunk of code to
> solve a one-off problem.
>   

Will it actually solve the problem?

- can all hypercalls that can be issued with 
pv-on-hvm-on-kvm-with-a-side-order-of-fries be satisfied from userspace?
- what about connecting the guest driver to xen netback one day?  we 
don't want to go through userspace for that.

We can consider catering to Xen and implementing that MSR in the kernel, 
if it's truly one off.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-27 16:28           ` Avi Kivity
@ 2009-05-27 17:09             ` Ed Swierk
  2009-05-27 19:16               ` Gerd Hoffmann
  0 siblings, 1 reply; 18+ messages in thread
From: Ed Swierk @ 2009-05-27 17:09 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gerd Hoffmann, Alexander Graf, kvm@vger.kernel.org

On Wed, May 27, 2009 at 9:28 AM, Avi Kivity <avi@redhat.com> wrote:
> Will it actually solve the problem?
>
> - can all hypercalls that can be issued with
> pv-on-hvm-on-kvm-with-a-side-order-of-fries be satisfied from userspace?
> - what about connecting the guest driver to xen netback one day?  we don't
> want to go through userspace for that.

In Gerd's current implementation, the code in the hypercall page
(which the guest maps in using that pesky MSR) handles all hypercalls
either internally or by invoking userspace (via another magic IO
port).

I'm too ignorant of Xen to claim that my proposal solves the problem
completely, but after hacking in support for delegating the magic MSR
to userspace, I got an unmodified FreeBSD disk image to boot in
PV-on-HVM-on-KVM+Qemu and use Xen PV network devices.

> We can consider catering to Xen and implementing that MSR in the kernel, if
> it's truly one off.

One way or another, the MSR somehow has to map in a chunk of data
supplied by userspace. Are you suggesting an alternative to the PIO
hack?

--Ed

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-27 17:09             ` Ed Swierk
@ 2009-05-27 19:16               ` Gerd Hoffmann
  2009-05-27 23:00                 ` Ed Swierk
  2009-05-28  8:53                 ` Avi Kivity
  0 siblings, 2 replies; 18+ messages in thread
From: Gerd Hoffmann @ 2009-05-27 19:16 UTC (permalink / raw)
  To: Ed Swierk; +Cc: Avi Kivity, Alexander Graf, kvm@vger.kernel.org

On 05/27/09 19:09, Ed Swierk wrote:
> On Wed, May 27, 2009 at 9:28 AM, Avi Kivity<avi@redhat.com>  wrote:
>> Will it actually solve the problem?
>>
>> - can all hypercalls that can be issued with
>> pv-on-hvm-on-kvm-with-a-side-order-of-fries be satisfied from userspace?

Yes.

>> - what about connecting the guest driver to xen netback one day?  we don't
>> want to go through userspace for that.

You can't without emulation tons of xen stuff in-kernel.

Current situation:
  * Guest does xen hypercalls.  We can handle that just fine.
  * Host userspace (backends) calls libxen*, where the xen hypercall
    calls are hidden.  We can redirect the library calls via LD_PRELOAD
    (standalone xenner) or function pointers (qemuified xenner) and do
    something else instead.

Trying to use in-kernel xen netback driver adds this problem:
  * Host kernel does xen hypercalls.  Ouch.  We have to emulate them
    in-kernel (otherwise using in-kernel netback would be a quite
    pointless exercise).

> One way or another, the MSR somehow has to map in a chunk of data
> supplied by userspace. Are you suggesting an alternative to the PIO
> hack?

Well, the "chunk of data" is on disk anyway:
$libdir/xenner/hvm{32,64}.bin

So a possible plan to attack could be "ln -s $libdir/xenner 
/lib/firmware", let kvm.ko grab it if needed using 
request_firmware("xenner/hvm${bits}.bin"), and a few lines of kernel 
code handling the wrmsr.  Logic is just this:

void xenner_wrmsr(uint64_t val, int longmode)
{
     uint32_t page = val & ~PAGE_MASK;
     uint64_t paddr = val & PAGE_MASK;
     uint8_t *blob = longmode ? hvm64 : hvm32;
     cpu_physical_memory_write(paddr, blob + page * PAGE_SIZE,
                               PAGE_SIZE);
}

Well, you'll have to sprinkle in blob loading and caching and some error 
checking.  But even with that it is probably hard to beat in actual code 
size.  Additional plus is we get away without a new ioctl then.

Comments?

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-27 19:16               ` Gerd Hoffmann
@ 2009-05-27 23:00                 ` Ed Swierk
  2009-05-28  8:53                 ` Avi Kivity
  1 sibling, 0 replies; 18+ messages in thread
From: Ed Swierk @ 2009-05-27 23:00 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Avi Kivity, Alexander Graf, kvm@vger.kernel.org

On Wed, 2009-05-27 at 21:16 +0200, Gerd Hoffmann wrote:
> Well, the "chunk of data" is on disk anyway:
> $libdir/xenner/hvm{32,64}.bin
> 
> So a possible plan to attack could be "ln -s $libdir/xenner 
> /lib/firmware", let kvm.ko grab it if needed using 
> request_firmware("xenner/hvm${bits}.bin"), and a few lines of kernel 
> code handling the wrmsr.  Logic is just this:
> 
> void xenner_wrmsr(uint64_t val, int longmode)
> {
>      uint32_t page = val & ~PAGE_MASK;
>      uint64_t paddr = val & PAGE_MASK;
>      uint8_t *blob = longmode ? hvm64 : hvm32;
>      cpu_physical_memory_write(paddr, blob + page * PAGE_SIZE,
>                                PAGE_SIZE);
> }
> 
> Well, you'll have to sprinkle in blob loading and caching and some error 
> checking.  But even with that it is probably hard to beat in actual code 
> size.  Additional plus is we get away without a new ioctl then.
> 
> Comments?

I like it.

Here's a first attempt.  One obvious improvement would be to cache the
reference to the firmware blob to avoid re-reading it on every wrmsr.

---
diff -BurN kvm-kmod-2.6.30-rc6/include/asm-x86/kvm_para.h kvm-kmod-2.6.30-rc6.new/include/asm-x86/kvm_para.h
--- kvm-kmod-2.6.30-rc6/include/asm-x86/kvm_para.h	2009-05-21 02:10:14.000000000 -0700
+++ kvm-kmod-2.6.30-rc6.new/include/asm-x86/kvm_para.h	2009-05-27 14:44:42.252004038 -0700
@@ -56,6 +56,7 @@
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
+#define MSR_KVM_LOAD_XENNER_FIRMWARE 0x40000000
 
 #define KVM_MAX_MMU_OP_BATCH           32
 
diff -BurN kvm-kmod-2.6.30-rc6/include/linux/kvm_host.h kvm-kmod-2.6.30-rc6.new/include/linux/kvm_host.h
--- kvm-kmod-2.6.30-rc6/include/linux/kvm_host.h	2009-05-21 02:10:14.000000000 -0700
+++ kvm-kmod-2.6.30-rc6.new/include/linux/kvm_host.h	2009-05-27 14:16:47.839529841 -0700
@@ -192,6 +192,7 @@
 	unsigned long mmu_notifier_seq;
 	long mmu_notifier_count;
 #endif
+	struct device *kvm_dev;
 };
 
 /* The guest did something we don't support. */
diff -BurN kvm-kmod-2.6.30-rc6/x86/kvm_main.c kvm-kmod-2.6.30-rc6.new/x86/kvm_main.c
--- kvm-kmod-2.6.30-rc6/x86/kvm_main.c	2009-05-21 02:10:18.000000000 -0700
+++ kvm-kmod-2.6.30-rc6.new/x86/kvm_main.c	2009-05-27 15:22:43.463251834 -0700
@@ -816,6 +816,8 @@
 };
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
+static struct miscdevice kvm_dev;
+
 static struct kvm *kvm_create_vm(void)
 {
 	struct kvm *kvm = kvm_arch_create_vm();
@@ -869,6 +871,7 @@
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 	kvm_coalesced_mmio_init(kvm);
 #endif
+	kvm->kvm_dev = kvm_dev.this_device;
 out:
 	return kvm;
 }
diff -BurN kvm-kmod-2.6.30-rc6/x86/x86.c kvm-kmod-2.6.30-rc6.new/x86/x86.c
--- kvm-kmod-2.6.30-rc6/x86/x86.c	2009-05-21 02:10:18.000000000 -0700
+++ kvm-kmod-2.6.30-rc6.new/x86/x86.c	2009-05-27 15:17:42.798002879 -0700
@@ -77,6 +77,7 @@
 #include <linux/iommu.h>
 #include <linux/intel-iommu.h>
 #include <linux/cpufreq.h>
+#include <linux/firmware.h>
 
 #include <asm/uaccess.h>
 #include <asm/msr.h>
@@ -846,6 +847,22 @@
 		kvm_request_guest_time_update(vcpu);
 		break;
 	}
+	case MSR_KVM_LOAD_XENNER_FIRMWARE: {
+		const char *fw_name = (vcpu->arch.shadow_efer & EFER_LME
+				       ? "xenner/hvm64.bin"
+				       : "xenner/hvm32.bin");
+		const struct firmware *firmware;
+		uint32_t page = data & ~PAGE_MASK;
+		uint64_t paddr = data & PAGE_MASK;
+		if (request_firmware(&firmware, fw_name, vcpu->kvm->kvm_dev))
+			return 1;
+		printk(KERN_INFO "kvm: loading %s page %d to %llx\n",
+		       fw_name, page, paddr);
+		kvm_write_guest(vcpu->kvm, paddr,
+				firmware->data + page * PAGE_SIZE, PAGE_SIZE);
+		release_firmware(firmware);
+		break;
+	}
 	default:
 		pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
 		return 1;



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-27 19:16               ` Gerd Hoffmann
  2009-05-27 23:00                 ` Ed Swierk
@ 2009-05-28  8:53                 ` Avi Kivity
  2009-05-29  9:47                   ` Gerd Hoffmann
  1 sibling, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2009-05-28  8:53 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Ed Swierk, Alexander Graf, kvm@vger.kernel.org

Gerd Hoffmann wrote:
>
>>> - what about connecting the guest driver to xen netback one day?  we 
>>> don't
>>> want to go through userspace for that.
>
> You can't without emulation tons of xen stuff in-kernel.
>
> Current situation:
>  * Guest does xen hypercalls.  We can handle that just fine.
>  * Host userspace (backends) calls libxen*, where the xen hypercall
>    calls are hidden.  We can redirect the library calls via LD_PRELOAD
>    (standalone xenner) or function pointers (qemuified xenner) and do
>    something else instead.
>
> Trying to use in-kernel xen netback driver adds this problem:
>  * Host kernel does xen hypercalls.  Ouch.  We have to emulate them
>    in-kernel (otherwise using in-kernel netback would be a quite
>    pointless exercise).

Or do the standard function pointer trick.  Event channel notifications 
change to eventfd_signal, grant table ops change to copy_to_user().

>
>> One way or another, the MSR somehow has to map in a chunk of data
>> supplied by userspace. Are you suggesting an alternative to the PIO
>> hack?
>
> Well, the "chunk of data" is on disk anyway:
> $libdir/xenner/hvm{32,64}.bin
>
> So a possible plan to attack could be "ln -s $libdir/xenner 
> /lib/firmware", let kvm.ko grab it if needed using 
> request_firmware("xenner/hvm${bits}.bin"), and a few lines of kernel 
> code handling the wrmsr.  Logic is just this:
>
> void xenner_wrmsr(uint64_t val, int longmode)
> {
>     uint32_t page = val & ~PAGE_MASK;
>     uint64_t paddr = val & PAGE_MASK;
>     uint8_t *blob = longmode ? hvm64 : hvm32;
>     cpu_physical_memory_write(paddr, blob + page * PAGE_SIZE,
>                               PAGE_SIZE);
> }
>
> Well, you'll have to sprinkle in blob loading and caching and some 
> error checking.  But even with that it is probably hard to beat in 
> actual code size.  

This ties all guests to one hypercall page implementation installed in 
one root-only place.

> Additional plus is we get away without a new ioctl then.


Minimizing the amount of ioctls is an important non-goal.  If you 
replace request_firmware with an ioctl that defines the location and 
size of the hypercall page in host userspace, this would work well.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-28  8:53                 ` Avi Kivity
@ 2009-05-29  9:47                   ` Gerd Hoffmann
  2009-05-31  8:21                     ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Gerd Hoffmann @ 2009-05-29  9:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ed Swierk, Alexander Graf, kvm@vger.kernel.org

On 05/28/09 10:53, Avi Kivity wrote:
> Gerd Hoffmann wrote:
>> Trying to use in-kernel xen netback driver adds this problem:
>> * Host kernel does xen hypercalls. Ouch. We have to emulate them
>> in-kernel (otherwise using in-kernel netback would be a quite
>> pointless exercise).
>
> Or do the standard function pointer trick. Event channel notifications
> change to eventfd_signal, grant table ops change to copy_to_user().

grant table ops include mapping pages of the guest (aka domU) into the 
host (aka dom0) address space, fill the pointer into some struct (bio 
for blkback, skb for netback), send it down the road for processing. 
netback and blkback do quite some lowlevel xen memory management to get 
that done, including m2p and p2m table updates due to direct paging 
mode.  It isn't as easy as s/hypercall/get_user_pages/ ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Userspace MSR handling
  2009-05-29  9:47                   ` Gerd Hoffmann
@ 2009-05-31  8:21                     ` Avi Kivity
  0 siblings, 0 replies; 18+ messages in thread
From: Avi Kivity @ 2009-05-31  8:21 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Ed Swierk, Alexander Graf, kvm@vger.kernel.org

Gerd Hoffmann wrote:
>>
>> Or do the standard function pointer trick. Event channel notifications
>> change to eventfd_signal, grant table ops change to copy_to_user().
>
> grant table ops include mapping pages of the guest (aka domU) into the 
> host (aka dom0) address space, fill the pointer into some struct (bio 
> for blkback, skb for netback), send it down the road for processing. 
> netback and blkback do quite some lowlevel xen memory management to 
> get that done, including m2p and p2m table updates due to direct 
> paging mode.  It isn't as easy as s/hypercall/get_user_pages/ ...

So we need to plug it at a slightly higher level.  Conceptually can 
provide the same interfaces as xen, so it's just a matter of abstracting 
it nicely.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-05-31  8:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-22 20:11 Userspace MSR handling Ed Swierk
2009-05-23  8:57 ` Alexander Graf
2009-05-24 12:07   ` Avi Kivity
2009-05-24 16:15     ` Alexander Graf
2009-05-26 11:31       ` Avi Kivity
2009-05-25 11:03     ` Gerd Hoffmann
2009-05-25 11:20       ` Avi Kivity
2009-05-25 11:29         ` Gerd Hoffmann
2009-05-25 11:31           ` Avi Kivity
2009-05-27 16:12         ` Ed Swierk
2009-05-27 16:28           ` Avi Kivity
2009-05-27 17:09             ` Ed Swierk
2009-05-27 19:16               ` Gerd Hoffmann
2009-05-27 23:00                 ` Ed Swierk
2009-05-28  8:53                 ` Avi Kivity
2009-05-29  9:47                   ` Gerd Hoffmann
2009-05-31  8:21                     ` Avi Kivity
2009-05-25 11:16 ` Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).