porting lguest to x86

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* porting lguest to x86_64
@ 2007-02-12 17:29 Steven Rostedt
  2007-02-12 18:46 ` Andi Kleen
  2007-02-13  1:02 ` Rusty Russell
  0 siblings, 2 replies; 6+ messages in thread
From: Steven Rostedt @ 2007-02-12 17:29 UTC (permalink / raw)
  To: virtualization; +Cc: Andi Kleen

Hi all,

Glauber and I have been looking into porting lguest over to the x86_64.
We've spent the last couple of weeks just trying lguest out and seeing
how far we can "force" it over to x86_64. This was more of just a
learning experience to get our feet wet in lguest since we are still
very green at it.  I also notice that lguest moves very fast (we were
still working on drivers/lguest when I now see it has moved to
arch/i386/lguest).

Anyway, we've decided that the work we have done so far was just a
learning prototype and have thrown it out for some better ideas. But
before getting too deep into coding, we want to ask the giants of lguest
for their ideas, and their thoughts on what we want.

Glauber has been focusing more an paravirt_ops for x86_64 and I've been
focusing lguest as a HV.  Since x86_64 is not as limited in address
space as i386 we've decided to redesign things differently.

Terminology:

  Host: the Linux HV kernel. (Xen terms would be dom0 plus HV).
  Guest: Linux that is run as paravirt on a Host (domU).


Host always mapped:

        Since the virtual address space is very large, it would be much
        simpler to just keep the Host always mapped in the Guests
        address space.  So the Guest will be more like a process here.
        So instead of just mapping the HV in both the Guest and Host as
        a hypervisor_blob, the entire Host will continually remain
        mapped.  This simplifies things tremendously.
        
        Now, we're thinking of moving the guest's PAGE_OFFSET instead of
        the Host. But this hasn't been determined yet.
        
Add PDA VCPU Field:

        Add another field in the per cpu PDA structure that can point to
        a VCPU descriptor (described below). A VCPU pointer will also be
        added to the task structure that will update this pointer on
        context switch (we can also just add the field to the task
        structure and not the PDA since the task structure is referenced
        off this structure, but the overhead in code execution might be
        too much).
        
The VCPU descriptor:

        This will hold function pointers for system calls and fault
        handlers. It will also hold a pointer for any guest CPU info
        (allowing for SMP guests). A pointer to a generic lguest
        structure for the global guest info. This structure will be
        examined in assembly so it must be compact.
        
System Calls:

        On all system calls (host users or guest users) the VCPU field
        of the PDA will be checked. If it is NULL, nothing different
        will happen than what the host already does today (see why it's
        better to have the field in the PDA). But if it is not NULL it
        will jump to the system_call function pointer of the VCPU
        structure to perform the guest operations.
        
        The VCPU field of the PDA will only be non-NULL when a guest is
        running.  The pointer can point to code in the lguest module.
        And placed in the right position, it can call C code making this
        even simpler yet.
        
        The system-call function can check to see if it is a hypercall
        or a system call made by a guest user process.  If the guest
        kernel makes a hypercall, it needs to set a flag in shared data
        between the guest and the host, saying it's making a hypercall.
        This shared data must be per VCPU.
        
        If the system call was just a normal guest process, the host
        will load the registers back onto the guest's stack and return
        to the guest where the guest will know that the regs of the user
        process has already been stored on the stack. Since %rcx will
        point to the guest's kernel address on return, the guest will
        need to read the 
        %rcx that is stored on the stack to get the %rip of the guest's
        process to return to.
        
Exceptions/Traps:

        Exceptions and traps will be handled the same way as system
        calls. Except that it doesn't need to check for hypercalls.  On
        an exception a check is made to see if the PDA contains a VCPU
        pointer. If this pointer is NULL, nothing different is done than
        what the host does today, else, it jumps to the exception
        function pointer in the VCPU structure.  Depending on where this
        jump is made, we can probably jump to C code in the lguest
        module.
        
        This can check to see if the guest can handle it's own
        exception, or if we should just kill the guest (tripple fault?).
        It can return back to the guest the same way that it returns
        from a system call.
        
Interrupts:

        Since the host kernel is always mapped in, even when the guest
        is running, we can let the host handle the interrupts with no
        changes what-so-ever (but see below).
        
IDT / GDT:

        This is where we're not %100 sure what to do. Should the Guest
        have a different CS/DS when compiled as paravirt?  Or should it
        keep the same and we switch the host kernel's CS / DS on
        switching to and from a guest?
        
        Changing CS / DS on guest switches may be a problem when the
        host does an interrupt. As mentioned above, we don't want to
        change any of the interrupt handling. I'm not sure how much the
        interrupts depend on the CS == __KERNEL_CS or not (have to look
        at the code).
        
        If we do change the host GDT we will also have to change the IDT
        to reflect those changes. So maybe at the beginning of
        development, we'll have the paravirt kernel use a different CS /
        DS than the host. And not modify the host's at all.
        

OK, this is just a brief overview of some of the things we came up with.
Please let us know of any problems you have with this approach. Tell us
how stupid we are and show us the correct way :)

We really want to get involved, and we want to do it right, right from
the start.  As mentioned earlier, we are new to the workings of lguest,
and want to help out on the x86_64 front, even while it's still being
developed on the i386 front.  We feel that because of the lack of
limitations that x86_64 gives, the work on the x86_64 will be a large
fork from what lguest does on i386.

Comments?

Thanks for your time

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: porting lguest to x86_64
  2007-02-12 17:29 porting lguest to x86_64 Steven Rostedt
@ 2007-02-12 18:46 ` Andi Kleen
  2007-02-12 19:14   ` Steven Rostedt
  2007-02-13  1:02 ` Rusty Russell
  1 sibling, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2007-02-12 18:46 UTC (permalink / raw)
  To: virtualization

On Monday 12 February 2007 18:29, Steven Rostedt wrote:

> Host always mapped:
> 
>         Since the virtual address space is very large, it would be much
>         simpler to just keep the Host always mapped in the Guests
>         address space.  So the Guest will be more like a process here.
>         So instead of just mapping the HV in both the Guest and Host as
>         a hypervisor_blob, the entire Host will continually remain
>         mapped.  This simplifies things tremendously.

How do you protect the host from the guest kernel then?
 
Segment limits as used by i386 lguest won't work.

[there is one way I know of but it has some drawbacks
and wouldn't work with a fully mapped linux kernel host]

The Xen method is to run guest kernel and guest guest both
at ring 3 with different address spaces. Or you can use VT/SVM.
         
> The VCPU descriptor:
> 
>         This will hold function pointers for system calls and fault
>         handlers. 

These would be better just mapped to a known address? 

> System Calls:
> 
>         On all system calls (host users or guest users) the VCPU field
>         of the PDA will be checked. If it is NULL, nothing different
>         will happen than what the host already does today (see why it's
>         better to have the field in the PDA). But if it is not NULL it
>         will jump to the system_call function pointer of the VCPU
>         structure to perform the guest operations.

What is the point of this? Just to optimize hypercalls or something else?
Do you expect hypercalls from user space to be common? 
     
> We really want to get involved, and we want to do it right, right from
> the start.  As mentioned earlier, we are new to the workings of lguest,
> and want to help out on the x86_64 front, even while it's still being
> developed on the i386 front.  We feel that because of the lack of
> limitations that x86_64 gives, the work on the x86_64 will be a large
> fork from what lguest does on i386.

It will be certainly quite different, except for the drivers.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: porting lguest to x86_64
  2007-02-12 18:46 ` Andi Kleen
@ 2007-02-12 19:14   ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2007-02-12 19:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: virtualization

Andi,

Thanks for the response!

On Mon, 2007-02-12 at 19:46 +0100, Andi Kleen wrote:
> On Monday 12 February 2007 18:29, Steven Rostedt wrote:
> 
> > Host always mapped:
> > 
> >         Since the virtual address space is very large, it would be much
> >         simpler to just keep the Host always mapped in the Guests
> >         address space.  So the Guest will be more like a process here.
> >         So instead of just mapping the HV in both the Guest and Host as
> >         a hypervisor_blob, the entire Host will continually remain
> >         mapped.  This simplifies things tremendously.
> 
> How do you protect the host from the guest kernel then?
>  
> Segment limits as used by i386 lguest won't work.
> 
> [there is one way I know of but it has some drawbacks
> and wouldn't work with a fully mapped linux kernel host]
> 
> The Xen method is to run guest kernel and guest guest both
> at ring 3 with different address spaces. Or you can use VT/SVM.

Well, lguest is for not VT/SVM, that's where KVM comes in :)

OK, I left out an important part.  We plan on running the guest kernel
in ring 3. Of course this means we will need a way to protect the guest
kernel from the guest processes, so that would mean that those would
probably need to run in different address spaces. Which will have their
own draw backs.

>          
> > The VCPU descriptor:
> > 
> >         This will hold function pointers for system calls and fault
> >         handlers. 
> 
> These would be better just mapped to a known address? 

We could. But we would like to have modules for different hypervisors.
So you can load two different hypervisor modules in at the same time and
dependent on which guest is running for which hypervisor, we would have
different functions being pointed to by those pointers.

> 
> > System Calls:
> > 
> >         On all system calls (host users or guest users) the VCPU field
> >         of the PDA will be checked. If it is NULL, nothing different
> >         will happen than what the host already does today (see why it's
> >         better to have the field in the PDA). But if it is not NULL it
> >         will jump to the system_call function pointer of the VCPU
> >         structure to perform the guest operations.
> 
> What is the point of this? Just to optimize hypercalls or something else?
> Do you expect hypercalls from user space to be common? 

No, but wouldn't the syscall from guest userspace still jump to the same
code in the host as would a guest doing a hypercall (assuming that the
guest uses syscall for hypercalls)?

>      
> > We really want to get involved, and we want to do it right, right from
> > the start.  As mentioned earlier, we are new to the workings of lguest,
> > and want to help out on the x86_64 front, even while it's still being
> > developed on the i386 front.  We feel that because of the lack of
> > limitations that x86_64 gives, the work on the x86_64 will be a large
> > fork from what lguest does on i386.
> 
> It will be certainly quite different, except for the drivers.

Right! :)

Thanks for your time.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: porting lguest to x86_64
  2007-02-12 17:29 porting lguest to x86_64 Steven Rostedt
  2007-02-12 18:46 ` Andi Kleen
@ 2007-02-13  1:02 ` Rusty Russell
  2007-02-13  1:34   ` Glauber de Oliveira Costa
  1 sibling, 1 reply; 6+ messages in thread
From: Rusty Russell @ 2007-02-13  1:02 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: virtualization, Andi Kleen

On Mon, 2007-02-12 at 12:29 -0500, Steven Rostedt wrote:
> Hi all,
> 
> Glauber and I have been looking into porting lguest over to the x86_64.
> We've spent the last couple of weeks just trying lguest out and seeing
> how far we can "force" it over to x86_64. This was more of just a
> learning experience to get our feet wet in lguest since we are still
> very green at it.  I also notice that lguest moves very fast (we were
> still working on drivers/lguest when I now see it has moved to
> arch/i386/lguest).

Yeah, sorry about that.  My very initial intention was to have x86-64
and PowerPC ports, but since the code is so arch-specific I decided that
it didn't make much sense at this point, so hence the move.

Plus, being in a single directory gives it that nice self-contained
feeling which makes upstream inclusion easier.

Now, at some point that decision might well be reversed...

> Anyway, we've decided that the work we have done so far was just a
> learning prototype and have thrown it out for some better ideas. But
> before getting too deep into coding, we want to ask the giants of lguest
> for their ideas, and their thoughts on what we want.

Well, there are many ways to write this.  Yours is very different,
that's for sure!

A few general points:
1) The entire point of the paravirt_ops infrastructure is to allow a
single kernel to adapt to different hypervisors at runtime.  This is a
real feature which should not be ignored, IMHO.  Also, the "modprobe and
go" model of host kernels is extremely attractive.  So changing
PAGE_OFFSET or what segments the kernel uses is not the trivial matter
it would otherwise be.

2) I would start really simple: no guest SMP, for example.  I would also
look hard at stealing KVM's mmu code: lguest's is much simpler, *but*
that's because it's only a simple 2-level.

3) The purpose of the high-loaded switcher code in lguest is to switch
the world back in a place where it can't be reached by the guest, due to
segment limits.  While this is not generally possible on x86-64, Andi
and Zach pointed out to me that a very similar approach is: use a
read-only page for this code, and a rw page to save & restore state.
When you go SMP for guests, you need a different page for each virtual
CPU of course, but that's later.  I haven't completely thought this
through, but it should work.

One benefit of this approach is that it *will* be v. v. similar to
32-bit lguest.  In fact, 32-bit lguest could probably be changed to use
the same technique without any real harm; indeed, it solves the problem
of 4G segments very nicely.... hm....

Cheers!
Rusty.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: porting lguest to x86_64
  2007-02-13  1:02 ` Rusty Russell
@ 2007-02-13  1:34   ` Glauber de Oliveira Costa
  2007-02-13  2:17     ` Rusty Russell
  0 siblings, 1 reply; 6+ messages in thread
From: Glauber de Oliveira Costa @ 2007-02-13  1:34 UTC (permalink / raw)
  To: Rusty Russell; +Cc: virtualization, Andi Kleen

On Tue, Feb 13, 2007 at 12:02:47PM +1100, Rusty Russell wrote:
> On Mon, 2007-02-12 at 12:29 -0500, Steven Rostedt wrote:
> > Hi all,
> > 
> > Glauber and I have been looking into porting lguest over to the x86_64.
> > We've spent the last couple of weeks just trying lguest out and seeing
> > how far we can "force" it over to x86_64. This was more of just a
> > learning experience to get our feet wet in lguest since we are still
> > very green at it.  I also notice that lguest moves very fast (we were
> > still working on drivers/lguest when I now see it has moved to
> > arch/i386/lguest).
> 
> Yeah, sorry about that.  My very initial intention was to have x86-64
> and PowerPC ports, but since the code is so arch-specific I decided that
> it didn't make much sense at this point, so hence the move.
> 
> Plus, being in a single directory gives it that nice self-contained
> feeling which makes upstream inclusion easier.
> 
> Now, at some point that decision might well be reversed...

Steven Roasted forgot to mention that simplicity was not the main reason
why we choosed lguest to pick up with. For me at least, the puppies were the
very and true reason.

Other than that, our first attempt already put it in a separate drivers/x86_64
directory. As Steven pointed out, there will probably be too few overlap 
between architectures. IMHO, the move to arch/<arch>/lguest is very
sane.

> A few general points:
> 1) The entire point of the paravirt_ops infrastructure is to allow a
> single kernel to adapt to different hypervisors at runtime.  This is a
> real feature which should not be ignored, IMHO.  Also, the "modprobe and
> go" model of host kernels is extremely attractive.  So changing
> PAGE_OFFSET or what segments the kernel uses is not the trivial matter
> it would otherwise be.
Although they are not included yet in mainline (for 64-bit), we think that 
relocatable kernel capabilities would help a lot in this. Besides, we
don't plan to move PAGE_OFFSET for the host, but rather for the guest,
which needs to have compiled-in provisions anyway.

> 
> 2) I would start really simple: no guest SMP, for example.  I would also
> look hard at stealing KVM's mmu code: lguest's is much simpler, *but*
> that's because it's only a simple 2-level.

I would agree with you, if having guest SMPs were a hard matter. I think
it is not. The current read-in-a-loop could be replicated in user space
threads, each of which running a different vcpu. 
For example, ee could start up the first, and get interrupted when it is
time to initialize other vcpus at kernel initialization. It also
simplifies user space management a lot. We gain, for example,
vcpu-pinning for free from the sched_setaffinity() syscalls.

Regarding the 4-level pagetable, it is definitely much more complicated.
Tip is appreciated, thanks!
 

-- 
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: porting lguest to x86_64
  2007-02-13  1:34   ` Glauber de Oliveira Costa
@ 2007-02-13  2:17     ` Rusty Russell
  0 siblings, 0 replies; 6+ messages in thread
From: Rusty Russell @ 2007-02-13  2:17 UTC (permalink / raw)
  To: Glauber de Oliveira Costa; +Cc: virtualization, Andi Kleen

On Mon, 2007-02-12 at 23:34 -0200, Glauber de Oliveira Costa wrote:
> On Tue, Feb 13, 2007 at 12:02:47PM +1100, Rusty Russell wrote:
> > Yeah, sorry about that.  My very initial intention was to have x86-64
> > and PowerPC ports, but since the code is so arch-specific I decided that
> > it didn't make much sense at this point, so hence the move.
> 
> Steven Roasted forgot to mention that simplicity was not the main reason
> why we choosed lguest to pick up with. For me at least, the puppies were the
> very and true reason.

8)

> Other than that, our first attempt already put it in a separate drivers/x86_64
> directory. As Steven pointed out, there will probably be too few overlap 
> between architectures. IMHO, the move to arch/<arch>/lguest is very
> sane.

Indeed.

> > A few general points:
> > 1) The entire point of the paravirt_ops infrastructure is to allow a
> > single kernel to adapt to different hypervisors at runtime.  This is a
> > real feature which should not be ignored, IMHO.  Also, the "modprobe and
> > go" model of host kernels is extremely attractive.  So changing
> > PAGE_OFFSET or what segments the kernel uses is not the trivial matter
> > it would otherwise be.
> Although they are not included yet in mainline (for 64-bit), we think that 
> relocatable kernel capabilities would help a lot in this. Besides, we
> don't plan to move PAGE_OFFSET for the host, but rather for the guest,
> which needs to have compiled-in provisions anyway.

If you talk to distributions, they want the guest and host kernels to be
the same.  The overhead of including lguest support is under 5k, which
is why it's always compiled in, even in the host kernel.  I currently
don't allow the guest drivers as modules, because they're only 9k, but I
could.

A universal kernel is a really, really good idea.  You can make most of
the lguest-specific code __init, too...

> > 2) I would start really simple: no guest SMP, for example.  I would also
> > look hard at stealing KVM's mmu code: lguest's is much simpler, *but*
> > that's because it's only a simple 2-level.
> 
> I would agree with you, if having guest SMPs were a hard matter. I think
> it is not. The current read-in-a-loop could be replicated in user space
> threads, each of which running a different vcpu. 

Sure, it's easy, *but* it's a good idea to get the basics done first,
and this is a very easy thing to cut.

Good luck!
Rusty.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-02-13  2:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-12 17:29 porting lguest to x86_64 Steven Rostedt
2007-02-12 18:46 ` Andi Kleen
2007-02-12 19:14   ` Steven Rostedt
2007-02-13  1:02 ` Rusty Russell
2007-02-13  1:34   ` Glauber de Oliveira Costa
2007-02-13  2:17     ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).