[RFC 0/8]KVM: swap out guest pages

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/8]KVM: swap out guest pages
@ 2007-07-23  6:51 Shaohua Li
  2007-07-23 10:27 ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2007-07-23  6:51 UTC (permalink / raw)
  To: kvm-devel, lkml; +Cc: Avi Kivity, Ingo Molnar

This patch series make kvm guest pages be able to be swapped out and
dynamically allocated. Without it, all guest memory is allocated at
guest start time.

patches are against latest git, and you need first patch Avi's kvm-sch
integration patch
(http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ).

Patch is quite stable in my test. With the patch, I can run a 256M
memory guest in a 300M memory host. If guest is idle, the memory it used
can be less than 10M. I did a simple performance test (measure kernel
build time in guest), if there is few swap, the performance w/wo the
patch difference isn't significent. If you have better measurement
approach, please let me try.

Unresolved issue:
1. swapoff doesn't work, we need a hook.
2. SMP guest might not work, as kvm doesn't support smp till now.
3. better algorithm to select swaped out guest pages according to
guest's memory usage.
Maybe more.

Any suggests and comments are appreciated.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8]KVM: swap out guest pages
  2007-07-23  6:51 [RFC 0/8]KVM: swap out guest pages Shaohua Li
@ 2007-07-23 10:27 ` Avi Kivity
  2007-07-23 12:25   ` [kvm-devel] " Christoph Hellwig
                     ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Avi Kivity @ 2007-07-23 10:27 UTC (permalink / raw)
  To: Shaohua Li; +Cc: kvm-devel, lkml, Ingo Molnar

Shaohua Li wrote:
> This patch series make kvm guest pages be able to be swapped out and
> dynamically allocated. Without it, all guest memory is allocated at
> guest start time.
>
> patches are against latest git, and you need first patch Avi's kvm-sch
> integration patch
> (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ).
>
> Patch is quite stable in my test. With the patch, I can run a 256M
> memory guest in a 300M memory host. 

What about the opposite?

> If guest is idle, the memory it used
> can be less than 10M. I did a simple performance test (measure kernel
> build time in guest), if there is few swap, the performance w/wo the
> patch difference isn't significent. If you have better measurement
> approach, please let me try.
>
> Unresolved issue:
> 1. swapoff doesn't work, we need a hook.
> 2. SMP guest might not work, as kvm doesn't support smp till now.
> 3. better algorithm to select swaped out guest pages according to
> guest's memory usage.
> Maybe more.
>
> Any suggests and comments are appreciated.
>   

The big question is whether to have kvm's own address_space or not.

Having an address_space (like your patch does) is remarkably simple, and 
requires few hooks from the current vm.  However using existing vmas 
mapped by the user has many advantages:

- compatible with s390 requirements
- allows the user to use hugetlbfs pages, which have a performance 
advantage using ept/npt (but which are unswappable)
- allows the user to map a file (which can be regarded as way to specify 
the swap device)
- better ingration with the rest of the vm

I am quite torn between the simplicity of your approach and the 
advantages of using generic vmas.  However, s390 pretty much forces our 
hand.

What is your opinion of extending generic vmas to back kvm guest memory?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-23 10:27 ` Avi Kivity
@ 2007-07-23 12:25   ` Christoph Hellwig
  2007-07-23 12:29     ` Avi Kivity
  2007-07-23 20:06   ` Jeff Dike
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2007-07-23 12:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml

On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
> Having an address_space (like your patch does) is remarkably simple, and 
> requires few hooks from the current vm.  However using existing vmas 
> mapped by the user has many advantages:

Actually it requires lots of deep down VM internals symbols that'll never
get exported.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-23 12:25   ` [kvm-devel] " Christoph Hellwig
@ 2007-07-23 12:29     ` Avi Kivity
  2007-07-23 12:34       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-23 12:29 UTC (permalink / raw)
  To: Christoph Hellwig, Avi Kivity, Shaohua Li, kvm-devel, lkml

Christoph Hellwig wrote:
> On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
>   
>> Having an address_space (like your patch does) is remarkably simple, and 
>> requires few hooks from the current vm.  However using existing vmas 
>> mapped by the user has many advantages:
>>     
>
> Actually it requires lots of deep down VM internals symbols that'll never
> get exported.
>
>   

What's "it" here?  kvm-specific address space or generic vmas.

Generic vmas will be more intrusive AFAICT.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-23 12:29     ` Avi Kivity
@ 2007-07-23 12:34       ` Christoph Hellwig
  2007-07-23 12:39         ` Avi Kivity
  2007-07-24  2:00         ` Shaohua Li
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2007-07-23 12:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christoph Hellwig, Shaohua Li, kvm-devel, lkml

On Mon, Jul 23, 2007 at 03:29:36PM +0300, Avi Kivity wrote:
> >Actually it requires lots of deep down VM internals symbols that'll never
> >get exported.
> >
> >  
> 
> What's "it" here?  kvm-specific address space or generic vmas.

The patches in this thread.

> Generic vmas will be more intrusive AFAICT.

People use intrusive differently.  Doing big changes to core code is not
a problem if we actually get a proper interface.  Just exporting core
function without other changes and then writing code in modules that
pokes into internals is much much worse.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-23 12:34       ` Christoph Hellwig
@ 2007-07-23 12:39         ` Avi Kivity
  2007-07-24  2:00         ` Shaohua Li
  1 sibling, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2007-07-23 12:39 UTC (permalink / raw)
  To: Christoph Hellwig, Avi Kivity, Shaohua Li, kvm-devel, lkml

Christoph Hellwig wrote:
>
>> Generic vmas will be more intrusive AFAICT.
>>     
>
> People use intrusive differently.  Doing big changes to core code is not
> a problem if we actually get a proper interface.  Just exporting core
> function without other changes and then writing code in modules that
> pokes into internals is much much worse.
>   

Agree 100%.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-23 12:34       ` Christoph Hellwig
  2007-07-23 12:39         ` Avi Kivity
@ 2007-07-24  2:00         ` Shaohua Li
  1 sibling, 0 replies; 20+ messages in thread
From: Shaohua Li @ 2007-07-24  2:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Avi Kivity, kvm-devel, lkml

On Mon, 2007-07-23 at 20:34 +0800, Christoph Hellwig wrote:
> On Mon, Jul 23, 2007 at 03:29:36PM +0300, Avi Kivity wrote:
> > >Actually it requires lots of deep down VM internals symbols that'll
> never
> > >get exported.
> > >
> > > 
> >
> > What's "it" here?  kvm-specific address space or generic vmas.
> 
> The patches in this thread.
> 
> > Generic vmas will be more intrusive AFAICT.
> 
> People use intrusive differently.  Doing big changes to core code is
> not
> a problem if we actually get a proper interface.  Just exporting core
> function without other changes and then writing code in modules that
> pokes into internals is much much worse.
The patch follows the same way shm swap out pages. The only difference
is kvm is a module but shm not. why kvm can't use the symbols shm used?

Sure, it's possible to write guest memory to a file so not use the
symbols, if you really hate this, I'll consider the alternative method.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8]KVM: swap out guest pages
  2007-07-23 10:27 ` Avi Kivity
  2007-07-23 12:25   ` [kvm-devel] " Christoph Hellwig
@ 2007-07-23 20:06   ` Jeff Dike
  2007-07-24  5:22     ` Avi Kivity
  2007-07-23 23:10   ` Rusty Russell
  2007-07-24  1:42   ` Shaohua Li
  3 siblings, 1 reply; 20+ messages in thread
From: Jeff Dike @ 2007-07-23 20:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml, Ingo Molnar

On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
> Having an address_space (like your patch does) is remarkably simple, and 
> requires few hooks from the current vm.  However using existing vmas 
> mapped by the user has many advantages:

It's also needed for a SKAS-like UML client, where the host side will
need to make system calls on behalf of the guest.

				Jeff

-- 
Work email - jdike at linux dot intel dot com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8]KVM: swap out guest pages
  2007-07-23 20:06   ` Jeff Dike
@ 2007-07-24  5:22     ` Avi Kivity
  2007-07-25 16:15       ` Jeff Dike
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24  5:22 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Shaohua Li, kvm-devel, lkml, Ingo Molnar

Jeff Dike wrote:
> On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
>   
>> Having an address_space (like your patch does) is remarkably simple, and 
>> requires few hooks from the current vm.  However using existing vmas 
>> mapped by the user has many advantages:
>>     
>
> It's also needed for a SKAS-like UML client, where the host side will
> need to make system calls on behalf of the guest.
>
>   

Even in the current model, guest physical memory is mmap()ed into host
userspace.  The kernel cannot enforce this, however.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8]KVM: swap out guest pages
  2007-07-24  5:22     ` Avi Kivity
@ 2007-07-25 16:15       ` Jeff Dike
  2007-07-25 17:12         ` [kvm-devel] " Carsten Otte
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Dike @ 2007-07-25 16:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml, Ingo Molnar

On Tue, Jul 24, 2007 at 08:22:53AM +0300, Avi Kivity wrote:
> Even in the current model, guest physical memory is mmap()ed into host
> userspace.

I want it to be identity-mapped, which a single address space would
guarantee.  For things which change mappings, like vmalloc, I need to
be in the same address space as the guest.

				Jeff

-- 
Work email - jdike at linux dot intel dot com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-25 16:15       ` Jeff Dike
@ 2007-07-25 17:12         ` Carsten Otte
  0 siblings, 0 replies; 20+ messages in thread
From: Carsten Otte @ 2007-07-25 17:12 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Avi Kivity, kvm-devel, lkml

Jeff Dike wrote:
> I want it to be identity-mapped, which a single address space would
> guarantee.  For things which change mappings, like vmalloc, I need to
> be in the same address space as the guest.
That'll also be mandatory required by hw when porting this to s390.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-23 10:27 ` Avi Kivity
  2007-07-23 12:25   ` [kvm-devel] " Christoph Hellwig
  2007-07-23 20:06   ` Jeff Dike
@ 2007-07-23 23:10   ` Rusty Russell
  2007-07-24  5:30     ` Avi Kivity
  2007-07-24  1:42   ` Shaohua Li
  3 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2007-07-23 23:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml

On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
> Having an address_space (like your patch does) is remarkably simple, and 
> requires few hooks from the current vm.  However using existing vmas 
> mapped by the user has many advantages:
> 
> - compatible with s390 requirements
> - allows the user to use hugetlbfs pages, which have a performance 
> advantage using ept/npt (but which are unswappable)
> - allows the user to map a file (which can be regarded as way to specify 
> the swap device)
> - better ingration with the rest of the vm

You don't need to expose the vmas.  You just have userspace point out
the start+len of each region of memory it wants the guest to be able to
access, and the address it wants it to appear in the guest.

This is a slight superset of what lguest does in two ways:

1) my guest address == user address, but I'm looking at adding an offset
so I don't have to link the launcher binary specially.
2) I have only one contiguous region of guest-physical memory, since I
can place device memory immediately above "normal" mem.

But the result is pretty sweet, and doesn't require any new symbols to
be exported.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-23 23:10   ` Rusty Russell
@ 2007-07-24  5:30     ` Avi Kivity
  2007-07-24  6:11       ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24  5:30 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Shaohua Li, kvm-devel, lkml

Rusty Russell wrote:
> On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
>   
>> Having an address_space (like your patch does) is remarkably simple, and 
>> requires few hooks from the current vm.  However using existing vmas 
>> mapped by the user has many advantages:
>>
>> - compatible with s390 requirements
>> - allows the user to use hugetlbfs pages, which have a performance 
>> advantage using ept/npt (but which are unswappable)
>> - allows the user to map a file (which can be regarded as way to specify 
>> the swap device)
>> - better ingration with the rest of the vm
>>     
>
> You don't need to expose the vmas.  You just have userspace point out
> the start+len of each region of memory it wants the guest to be able to
> access, and the address it wants it to appear in the guest.
>
> This is a slight superset of what lguest does in two ways:
>
> 1) my guest address == user address, but I'm looking at adding an offset
> so I don't have to link the launcher binary specially.
> 2) I have only one contiguous region of guest-physical memory, since I
> can place device memory immediately above "normal" mem.
>
>   

My intent was to allow userspace to establish assign a virtual address
range into a memory slot.

So long as you don't do swapping, all is simple, since you can do a
get_user_pages() on initialization or when installing a shadow pte.  But
if you want to swap, you need:

- a way to transfer the dirty bit from the shadow ptes to the struct page
- a way to let the vm rmap know that there are shadow ptes that point to
the page in addition to Linux ptes.  These shadow ptes may be in a
different format than Linux ptes.
- a different tlb invalidation method with ASIDs

It's not going to be simple.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-24  5:30     ` Avi Kivity
@ 2007-07-24  6:11       ` Rusty Russell
  2007-07-24  6:21         ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2007-07-24  6:11 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml

On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote:
> Rusty Russell wrote:
> > On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
> >   
> >> Having an address_space (like your patch does) is remarkably simple, and 
> >> requires few hooks from the current vm.  However using existing vmas 
> >> mapped by the user has many advantages:
> >>
> >> - compatible with s390 requirements
> >> - allows the user to use hugetlbfs pages, which have a performance 
> >> advantage using ept/npt (but which are unswappable)
> >> - allows the user to map a file (which can be regarded as way to specify 
> >> the swap device)
> >> - better ingration with the rest of the vm
> >>     
> >
> > You don't need to expose the vmas.  You just have userspace point out
> > the start+len of each region of memory it wants the guest to be able to
> > access, and the address it wants it to appear in the guest.
> >
> > This is a slight superset of what lguest does in two ways:
> >
> > 1) my guest address == user address, but I'm looking at adding an offset
> > so I don't have to link the launcher binary specially.
> > 2) I have only one contiguous region of guest-physical memory, since I
> > can place device memory immediately above "normal" mem.
> >
> >   
> 
> My intent was to allow userspace to establish assign a virtual address
> range into a memory slot.
> 
> So long as you don't do swapping, all is simple, since you can do a
> get_user_pages() on initialization or when installing a shadow pte.  But
> if you want to swap, you need:
> 
> - a way to transfer the dirty bit from the shadow ptes to the struct page

Actually, get_user_pages() does that for you.  You have to make R/O any
writable pte where the guest doesn't set the dirty bit (so you can trap
it later) but last I put a printk in there, Linux doesn't do that.

> - a way to let the vm rmap know that there are shadow ptes that point to
> the page in addition to Linux ptes.  These shadow ptes may be in a
> different format than Linux ptes.
> - a different tlb invalidation method with ASIDs

Well first I was just going to see how well hooking into the shrinker
works.  That might be sufficient: just throw out shadow refs to pages
when there's pressure.

If not, it does get harder.  A callback in the mm struct to say "I want
to swap your page out" is required if we don't take a reference to the
page.  Dirty bit handling would be an interesting issue (maybe the
callback can say "No!" and dirty the page again?).

I fear mm code.

> It's not going to be simple.

Yeah, but it's one thing stopping lguest from being non-root usable, so
I want it there, too.

Cheers,
Rusty.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-24  6:11       ` Rusty Russell
@ 2007-07-24  6:21         ` Avi Kivity
  2007-07-24  6:45           ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24  6:21 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Shaohua Li, kvm-devel, lkml

Rusty Russell wrote:
> On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
>>>   
>>>       
>>>> Having an address_space (like your patch does) is remarkably simple, and 
>>>> requires few hooks from the current vm.  However using existing vmas 
>>>> mapped by the user has many advantages:
>>>>
>>>> - compatible with s390 requirements
>>>> - allows the user to use hugetlbfs pages, which have a performance 
>>>> advantage using ept/npt (but which are unswappable)
>>>> - allows the user to map a file (which can be regarded as way to specify 
>>>> the swap device)
>>>> - better ingration with the rest of the vm
>>>>     
>>>>         
>>> You don't need to expose the vmas.  You just have userspace point out
>>> the start+len of each region of memory it wants the guest to be able to
>>> access, and the address it wants it to appear in the guest.
>>>
>>> This is a slight superset of what lguest does in two ways:
>>>
>>> 1) my guest address == user address, but I'm looking at adding an offset
>>> so I don't have to link the launcher binary specially.
>>> 2) I have only one contiguous region of guest-physical memory, since I
>>> can place device memory immediately above "normal" mem.
>>>
>>>   
>>>       
>> My intent was to allow userspace to establish assign a virtual address
>> range into a memory slot.
>>
>> So long as you don't do swapping, all is simple, since you can do a
>> get_user_pages() on initialization or when installing a shadow pte.  But
>> if you want to swap, you need:
>>
>> - a way to transfer the dirty bit from the shadow ptes to the struct page
>>     
>
> Actually, get_user_pages() does that for you.  You have to make R/O any
> writable pte where the guest doesn't set the dirty bit (so you can trap
> it later) but last I put a printk in there, Linux doesn't do that.
>
>   

Don't understand.  You mean Linux always sets the dirty bit when it
makes a page writable?  Surely some mistake.

It probably does do so on demand write faults, but I'm sure the dirty
bit can get cleaned out by the swapper.

>> - a way to let the vm rmap know that there are shadow ptes that point to
>> the page in addition to Linux ptes.  These shadow ptes may be in a
>> different format than Linux ptes.
>> - a different tlb invalidation method with ASIDs
>>     
>
> Well first I was just going to see how well hooking into the shrinker
> works.  That might be sufficient: just throw out shadow refs to pages
> when there's pressure.
>   

Ah, interesting.  Yes, you trim the shadow page table cache which unrefs
pages for you.

Maybe that's a good way to get things started.

> If not, it does get harder.  A callback in the mm struct to say "I want
> to swap your page out" is required if we don't take a reference to the
> page.  Dirty bit handling would be an interesting issue (maybe the
> callback can say "No!" and dirty the page again?).
>   

Since we have rmap, I don't see that as an issue.  Given a page, we can
easily drop all refs.  Though lguest doesn't do that, right?

I'm also concerned with picking the correct page, but there's no good
solution here.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-24  6:21         ` Avi Kivity
@ 2007-07-24  6:45           ` Rusty Russell
  2007-07-24  6:59             ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2007-07-24  6:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml

On Tue, 2007-07-24 at 09:21 +0300, Avi Kivity wrote:
> Rusty Russell wrote:
> > Actually, get_user_pages() does that for you.  You have to make R/O any
> > writable pte where the guest doesn't set the dirty bit (so you can trap
> > it later) but last I put a printk in there, Linux doesn't do that.
> 
> Don't understand.  You mean Linux always sets the dirty bit when it
> makes a page writable?  Surely some mistake.
> 
> It probably does do so on demand write faults, but I'm sure the dirty
> bit can get cleaned out by the swapper.

Yeah, me dumb.  I should put that printk back and try doing a kernel
compile.

> > If not, it does get harder.  A callback in the mm struct to say "I want
> > to swap your page out" is required if we don't take a reference to the
> > page.  Dirty bit handling would be an interesting issue (maybe the
> > callback can say "No!" and dirty the page again?).
> 
> Since we have rmap, I don't see that as an issue.  Given a page, we can
> easily drop all refs.  Though lguest doesn't do that, right?

Yeah, rmap might maul some puppies.  I could do poor man's rmap tho with
one backref and a bit to say "there are more".  Then if that bit is set,
I just drop all 4 shadows 8)

> I'm also concerned with picking the correct page, but there's no good
> solution here.

But since you have rmap, if there was a cb when the the page was
undirtied, you could undirty the ptes.  When there "I want to kick this
page out" cb comes along, see if one of the ptes is now dirty, dirty the
page and return "no".

Maybe it's too simplistic, but it might work.
Rusty.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-24  6:45           ` Rusty Russell
@ 2007-07-24  6:59             ` Avi Kivity
  2007-07-24  7:17               ` Rusty Russell
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24  6:59 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Shaohua Li, kvm-devel, lkml

Rusty Russell wrote:
>
>>> If not, it does get harder.  A callback in the mm struct to say "I want
>>> to swap your page out" is required if we don't take a reference to the
>>> page.  Dirty bit handling would be an interesting issue (maybe the
>>> callback can say "No!" and dirty the page again?).
>>>       
>> Since we have rmap, I don't see that as an issue.  Given a page, we can
>> easily drop all refs.  Though lguest doesn't do that, right?
>>     
>
> Yeah, rmap might maul some puppies.  I could do poor man's rmap tho with
> one backref and a bit to say "there are more".  Then if that bit is set,
> I just drop all 4 shadows 8)
>
>   

It's too poor.  A long running guest will eventually map all of memory
using the kernel page tables and a large proportion with user page
tables, so many pages will have that bit set.

However, you can probably work around that by not setting an rmap for
the kernel mappings, and instead have the guest teach the host where the
kernel page tables live.  You'd only be left with shared libraries,
until the kernel can share page tables for them too.

>> I'm also concerned with picking the correct page, but there's no good
>> solution here.
>>     
>
> But since you have rmap, if there was a cb when the the page was
> undirtied, you could undirty the ptes.  When there "I want to kick this
> page out" cb comes along, see if one of the ptes is now dirty, dirty the
> page and return "no".
>
> Maybe it's too simplistic, but it might work.
>   

Ah, I see what you mean now.  It could work, as far as I can tell (which
isn't very far, though).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
  2007-07-24  6:59             ` Avi Kivity
@ 2007-07-24  7:17               ` Rusty Russell
  0 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2007-07-24  7:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml

On Tue, 2007-07-24 at 09:59 +0300, Avi Kivity wrote:
> However, you can probably work around that by not setting an rmap for
> the kernel mappings, and instead have the guest teach the host where the
> kernel page tables live.  You'd only be left with shared libraries,
> until the kernel can share page tables for them too.

Well, I already treat kernel mappings specially (effectively I know the
guest's PAGE_OFFSET): they're kept identical in all the 4 shadows, and
need explicit guest flushing.

Whether the guest shares (non-kernel) page tables or not, I will shadow
them dumb as separate page table pages the way things stand.  So, yes,
shared libs will be my main issue.  Address space randomization means I
can't even use a heuristic such as looking for the page at the same
address in other shadows.  I'll come up with something.

Anyway, virtio what I'm *supposed* to be doing today...

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8]KVM: swap out guest pages
  2007-07-23 10:27 ` Avi Kivity
                     ` (2 preceding siblings ...)
  2007-07-23 23:10   ` Rusty Russell
@ 2007-07-24  1:42   ` Shaohua Li
  2007-07-24  5:42     ` Avi Kivity
  3 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2007-07-24  1:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, lkml, Ingo Molnar

On Mon, 2007-07-23 at 18:27 +0800, Avi Kivity wrote:
> Shaohua Li wrote:
> > This patch series make kvm guest pages be able to be swapped out and
> > dynamically allocated. Without it, all guest memory is allocated at
> > guest start time.
> >
> > patches are against latest git, and you need first patch Avi's
> kvm-sch
> > integration patch
> >
> (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ).
> >
> > Patch is quite stable in my test. With the patch, I can run a 256M
> > memory guest in a 300M memory host.
> 
> What about the opposite?
> 
> > If guest is idle, the memory it used
> > can be less than 10M. I did a simple performance test (measure
> kernel
> > build time in guest), if there is few swap, the performance w/wo the
> > patch difference isn't significent. If you have better measurement
> > approach, please let me try.
> >
> > Unresolved issue:
> > 1. swapoff doesn't work, we need a hook.
> > 2. SMP guest might not work, as kvm doesn't support smp till now.
> > 3. better algorithm to select swaped out guest pages according to
> > guest's memory usage.
> > Maybe more.
> >
> > Any suggests and comments are appreciated.
> >  
> 
> The big question is whether to have kvm's own address_space or not.
> 
> Having an address_space (like your patch does) is remarkably simple,
> and
> requires few hooks from the current vm.  However using existing vmas
> mapped by the user has many advantages:
> 
> - compatible with s390 requirements
> - allows the user to use hugetlbfs pages, which have a performance
> advantage using ept/npt (but which are unswappable)
> - allows the user to map a file (which can be regarded as way to
> specify
> the swap device)
> - better ingration with the rest of the vm
> 
> I am quite torn between the simplicity of your approach and the
> advantages of using generic vmas.  However, s390 pretty much forces
> our
> hand.
> 
> What is your opinion of extending generic vmas to back kvm guest
> memory?
several issues:
1. vma is to manage usersapce address, kvm guest uses full address
space.
2. qemu itself must use some address space.
3. kvm need special page fault for shadow page table. generic page table
operations can't be directly used for guest.
I have no idea if your idea is feasible. The s390 guys said their shadow
page table is the same as host, this is why they can easily implement
swap, x86 is hard.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8]KVM: swap out guest pages
  2007-07-24  1:42   ` Shaohua Li
@ 2007-07-24  5:42     ` Avi Kivity
  0 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2007-07-24  5:42 UTC (permalink / raw)
  To: Shaohua Li; +Cc: kvm-devel, lkml, Ingo Molnar

Shaohua Li wrote:
> On Mon, 2007-07-23 at 18:27 +0800, Avi Kivity wrote:
>   
>> Shaohua Li wrote:
>>     
>>> This patch series make kvm guest pages be able to be swapped out and
>>> dynamically allocated. Without it, all guest memory is allocated at
>>> guest start time.
>>>
>>> patches are against latest git, and you need first patch Avi's
>>>       
>> kvm-sch
>>     
>>> integration patch
>>>
>>>       
>> (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ).
>>     
>>> Patch is quite stable in my test. With the patch, I can run a 256M
>>> memory guest in a 300M memory host.
>>>       
>> What about the opposite?
>>
>>     
>>> If guest is idle, the memory it used
>>> can be less than 10M. I did a simple performance test (measure
>>>       
>> kernel
>>     
>>> build time in guest), if there is few swap, the performance w/wo the
>>> patch difference isn't significent. If you have better measurement
>>> approach, please let me try.
>>>
>>> Unresolved issue:
>>> 1. swapoff doesn't work, we need a hook.
>>> 2. SMP guest might not work, as kvm doesn't support smp till now.
>>> 3. better algorithm to select swaped out guest pages according to
>>> guest's memory usage.
>>> Maybe more.
>>>
>>> Any suggests and comments are appreciated.
>>>  
>>>       
>> The big question is whether to have kvm's own address_space or not.
>>
>> Having an address_space (like your patch does) is remarkably simple,
>> and
>> requires few hooks from the current vm.  However using existing vmas
>> mapped by the user has many advantages:
>>
>> - compatible with s390 requirements
>> - allows the user to use hugetlbfs pages, which have a performance
>> advantage using ept/npt (but which are unswappable)
>> - allows the user to map a file (which can be regarded as way to
>> specify
>> the swap device)
>> - better ingration with the rest of the vm
>>
>> I am quite torn between the simplicity of your approach and the
>> advantages of using generic vmas.  However, s390 pretty much forces
>> our
>> hand.
>>
>> What is your opinion of extending generic vmas to back kvm guest
>> memory?
>>     
> several issues:
> 1. vma is to manage usersapce address, kvm guest uses full address
> space.
> 2. qemu itself must use some address space.
>   

My idea is to keep the current slot concept, but instead of having kvm
allocate pages for a slot, it would call get_user_pages() for a virtual
address range.  Userspace doesn't directly talk about vmas, just virtual
address ranges.


> 3. kvm need special page fault for shadow page table. generic page table
> operations can't be directly used for guest.
> I have no idea if your idea is feasible. The s390 guys said their shadow
> page table is the same as host, this is why they can easily implement
> swap, x86 is hard.
>   

No question that it is hard.  I'd like to explore just how hard it is.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-07-25 17:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-23  6:51 [RFC 0/8]KVM: swap out guest pages Shaohua Li
2007-07-23 10:27 ` Avi Kivity
2007-07-23 12:25   ` [kvm-devel] " Christoph Hellwig
2007-07-23 12:29     ` Avi Kivity
2007-07-23 12:34       ` Christoph Hellwig
2007-07-23 12:39         ` Avi Kivity
2007-07-24  2:00         ` Shaohua Li
2007-07-23 20:06   ` Jeff Dike
2007-07-24  5:22     ` Avi Kivity
2007-07-25 16:15       ` Jeff Dike
2007-07-25 17:12         ` [kvm-devel] " Carsten Otte
2007-07-23 23:10   ` Rusty Russell
2007-07-24  5:30     ` Avi Kivity
2007-07-24  6:11       ` Rusty Russell
2007-07-24  6:21         ` Avi Kivity
2007-07-24  6:45           ` Rusty Russell
2007-07-24  6:59             ` Avi Kivity
2007-07-24  7:17               ` Rusty Russell
2007-07-24  1:42   ` Shaohua Li
2007-07-24  5:42     ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox