* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-23 10:27 ` Avi Kivity
@ 2007-07-23 12:25 ` Christoph Hellwig
2007-07-23 12:29 ` Avi Kivity
2007-07-23 20:06 ` Jeff Dike
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2007-07-23 12:25 UTC (permalink / raw)
To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml
On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
> Having an address_space (like your patch does) is remarkably simple, and
> requires few hooks from the current vm. However using existing vmas
> mapped by the user has many advantages:
Actually it requires lots of deep down VM internals symbols that'll never
get exported.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-23 12:25 ` [kvm-devel] " Christoph Hellwig
@ 2007-07-23 12:29 ` Avi Kivity
2007-07-23 12:34 ` Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-23 12:29 UTC (permalink / raw)
To: Christoph Hellwig, Avi Kivity, Shaohua Li, kvm-devel, lkml
Christoph Hellwig wrote:
> On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
>
>> Having an address_space (like your patch does) is remarkably simple, and
>> requires few hooks from the current vm. However using existing vmas
>> mapped by the user has many advantages:
>>
>
> Actually it requires lots of deep down VM internals symbols that'll never
> get exported.
>
>
What's "it" here? kvm-specific address space or generic vmas.
Generic vmas will be more intrusive AFAICT.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-23 12:29 ` Avi Kivity
@ 2007-07-23 12:34 ` Christoph Hellwig
2007-07-23 12:39 ` Avi Kivity
2007-07-24 2:00 ` Shaohua Li
0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2007-07-23 12:34 UTC (permalink / raw)
To: Avi Kivity; +Cc: Christoph Hellwig, Shaohua Li, kvm-devel, lkml
On Mon, Jul 23, 2007 at 03:29:36PM +0300, Avi Kivity wrote:
> >Actually it requires lots of deep down VM internals symbols that'll never
> >get exported.
> >
> >
>
> What's "it" here? kvm-specific address space or generic vmas.
The patches in this thread.
> Generic vmas will be more intrusive AFAICT.
People use intrusive differently. Doing big changes to core code is not
a problem if we actually get a proper interface. Just exporting core
function without other changes and then writing code in modules that
pokes into internals is much much worse.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-23 12:34 ` Christoph Hellwig
@ 2007-07-23 12:39 ` Avi Kivity
2007-07-24 2:00 ` Shaohua Li
1 sibling, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2007-07-23 12:39 UTC (permalink / raw)
To: Christoph Hellwig, Avi Kivity, Shaohua Li, kvm-devel, lkml
Christoph Hellwig wrote:
>
>> Generic vmas will be more intrusive AFAICT.
>>
>
> People use intrusive differently. Doing big changes to core code is not
> a problem if we actually get a proper interface. Just exporting core
> function without other changes and then writing code in modules that
> pokes into internals is much much worse.
>
Agree 100%.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-23 12:34 ` Christoph Hellwig
2007-07-23 12:39 ` Avi Kivity
@ 2007-07-24 2:00 ` Shaohua Li
1 sibling, 0 replies; 20+ messages in thread
From: Shaohua Li @ 2007-07-24 2:00 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Avi Kivity, kvm-devel, lkml
On Mon, 2007-07-23 at 20:34 +0800, Christoph Hellwig wrote:
> On Mon, Jul 23, 2007 at 03:29:36PM +0300, Avi Kivity wrote:
> > >Actually it requires lots of deep down VM internals symbols that'll
> never
> > >get exported.
> > >
> > >
> >
> > What's "it" here? kvm-specific address space or generic vmas.
>
> The patches in this thread.
>
> > Generic vmas will be more intrusive AFAICT.
>
> People use intrusive differently. Doing big changes to core code is
> not
> a problem if we actually get a proper interface. Just exporting core
> function without other changes and then writing code in modules that
> pokes into internals is much much worse.
The patch follows the same way shm swap out pages. The only difference
is kvm is a module but shm not. why kvm can't use the symbols shm used?
Sure, it's possible to write guest memory to a file so not use the
symbols, if you really hate this, I'll consider the alternative method.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages
2007-07-23 10:27 ` Avi Kivity
2007-07-23 12:25 ` [kvm-devel] " Christoph Hellwig
@ 2007-07-23 20:06 ` Jeff Dike
2007-07-24 5:22 ` Avi Kivity
2007-07-23 23:10 ` Rusty Russell
2007-07-24 1:42 ` Shaohua Li
3 siblings, 1 reply; 20+ messages in thread
From: Jeff Dike @ 2007-07-23 20:06 UTC (permalink / raw)
To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml, Ingo Molnar
On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
> Having an address_space (like your patch does) is remarkably simple, and
> requires few hooks from the current vm. However using existing vmas
> mapped by the user has many advantages:
It's also needed for a SKAS-like UML client, where the host side will
need to make system calls on behalf of the guest.
Jeff
--
Work email - jdike at linux dot intel dot com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages
2007-07-23 20:06 ` Jeff Dike
@ 2007-07-24 5:22 ` Avi Kivity
2007-07-25 16:15 ` Jeff Dike
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24 5:22 UTC (permalink / raw)
To: Jeff Dike; +Cc: Shaohua Li, kvm-devel, lkml, Ingo Molnar
Jeff Dike wrote:
> On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote:
>
>> Having an address_space (like your patch does) is remarkably simple, and
>> requires few hooks from the current vm. However using existing vmas
>> mapped by the user has many advantages:
>>
>
> It's also needed for a SKAS-like UML client, where the host side will
> need to make system calls on behalf of the guest.
>
>
Even in the current model, guest physical memory is mmap()ed into host
userspace. The kernel cannot enforce this, however.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages
2007-07-24 5:22 ` Avi Kivity
@ 2007-07-25 16:15 ` Jeff Dike
2007-07-25 17:12 ` [kvm-devel] " Carsten Otte
0 siblings, 1 reply; 20+ messages in thread
From: Jeff Dike @ 2007-07-25 16:15 UTC (permalink / raw)
To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml, Ingo Molnar
On Tue, Jul 24, 2007 at 08:22:53AM +0300, Avi Kivity wrote:
> Even in the current model, guest physical memory is mmap()ed into host
> userspace.
I want it to be identity-mapped, which a single address space would
guarantee. For things which change mappings, like vmalloc, I need to
be in the same address space as the guest.
Jeff
--
Work email - jdike at linux dot intel dot com
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-25 16:15 ` Jeff Dike
@ 2007-07-25 17:12 ` Carsten Otte
0 siblings, 0 replies; 20+ messages in thread
From: Carsten Otte @ 2007-07-25 17:12 UTC (permalink / raw)
To: Jeff Dike; +Cc: Avi Kivity, kvm-devel, lkml
Jeff Dike wrote:
> I want it to be identity-mapped, which a single address space would
> guarantee. For things which change mappings, like vmalloc, I need to
> be in the same address space as the guest.
That'll also be mandatory required by hw when porting this to s390.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-23 10:27 ` Avi Kivity
2007-07-23 12:25 ` [kvm-devel] " Christoph Hellwig
2007-07-23 20:06 ` Jeff Dike
@ 2007-07-23 23:10 ` Rusty Russell
2007-07-24 5:30 ` Avi Kivity
2007-07-24 1:42 ` Shaohua Li
3 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2007-07-23 23:10 UTC (permalink / raw)
To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml
On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
> Having an address_space (like your patch does) is remarkably simple, and
> requires few hooks from the current vm. However using existing vmas
> mapped by the user has many advantages:
>
> - compatible with s390 requirements
> - allows the user to use hugetlbfs pages, which have a performance
> advantage using ept/npt (but which are unswappable)
> - allows the user to map a file (which can be regarded as way to specify
> the swap device)
> - better ingration with the rest of the vm
You don't need to expose the vmas. You just have userspace point out
the start+len of each region of memory it wants the guest to be able to
access, and the address it wants it to appear in the guest.
This is a slight superset of what lguest does in two ways:
1) my guest address == user address, but I'm looking at adding an offset
so I don't have to link the launcher binary specially.
2) I have only one contiguous region of guest-physical memory, since I
can place device memory immediately above "normal" mem.
But the result is pretty sweet, and doesn't require any new symbols to
be exported.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-23 23:10 ` Rusty Russell
@ 2007-07-24 5:30 ` Avi Kivity
2007-07-24 6:11 ` Rusty Russell
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24 5:30 UTC (permalink / raw)
To: Rusty Russell; +Cc: Shaohua Li, kvm-devel, lkml
Rusty Russell wrote:
> On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
>
>> Having an address_space (like your patch does) is remarkably simple, and
>> requires few hooks from the current vm. However using existing vmas
>> mapped by the user has many advantages:
>>
>> - compatible with s390 requirements
>> - allows the user to use hugetlbfs pages, which have a performance
>> advantage using ept/npt (but which are unswappable)
>> - allows the user to map a file (which can be regarded as way to specify
>> the swap device)
>> - better ingration with the rest of the vm
>>
>
> You don't need to expose the vmas. You just have userspace point out
> the start+len of each region of memory it wants the guest to be able to
> access, and the address it wants it to appear in the guest.
>
> This is a slight superset of what lguest does in two ways:
>
> 1) my guest address == user address, but I'm looking at adding an offset
> so I don't have to link the launcher binary specially.
> 2) I have only one contiguous region of guest-physical memory, since I
> can place device memory immediately above "normal" mem.
>
>
My intent was to allow userspace to establish assign a virtual address
range into a memory slot.
So long as you don't do swapping, all is simple, since you can do a
get_user_pages() on initialization or when installing a shadow pte. But
if you want to swap, you need:
- a way to transfer the dirty bit from the shadow ptes to the struct page
- a way to let the vm rmap know that there are shadow ptes that point to
the page in addition to Linux ptes. These shadow ptes may be in a
different format than Linux ptes.
- a different tlb invalidation method with ASIDs
It's not going to be simple.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-24 5:30 ` Avi Kivity
@ 2007-07-24 6:11 ` Rusty Russell
2007-07-24 6:21 ` Avi Kivity
0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2007-07-24 6:11 UTC (permalink / raw)
To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml
On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote:
> Rusty Russell wrote:
> > On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
> >
> >> Having an address_space (like your patch does) is remarkably simple, and
> >> requires few hooks from the current vm. However using existing vmas
> >> mapped by the user has many advantages:
> >>
> >> - compatible with s390 requirements
> >> - allows the user to use hugetlbfs pages, which have a performance
> >> advantage using ept/npt (but which are unswappable)
> >> - allows the user to map a file (which can be regarded as way to specify
> >> the swap device)
> >> - better ingration with the rest of the vm
> >>
> >
> > You don't need to expose the vmas. You just have userspace point out
> > the start+len of each region of memory it wants the guest to be able to
> > access, and the address it wants it to appear in the guest.
> >
> > This is a slight superset of what lguest does in two ways:
> >
> > 1) my guest address == user address, but I'm looking at adding an offset
> > so I don't have to link the launcher binary specially.
> > 2) I have only one contiguous region of guest-physical memory, since I
> > can place device memory immediately above "normal" mem.
> >
> >
>
> My intent was to allow userspace to establish assign a virtual address
> range into a memory slot.
>
> So long as you don't do swapping, all is simple, since you can do a
> get_user_pages() on initialization or when installing a shadow pte. But
> if you want to swap, you need:
>
> - a way to transfer the dirty bit from the shadow ptes to the struct page
Actually, get_user_pages() does that for you. You have to make R/O any
writable pte where the guest doesn't set the dirty bit (so you can trap
it later) but last I put a printk in there, Linux doesn't do that.
> - a way to let the vm rmap know that there are shadow ptes that point to
> the page in addition to Linux ptes. These shadow ptes may be in a
> different format than Linux ptes.
> - a different tlb invalidation method with ASIDs
Well first I was just going to see how well hooking into the shrinker
works. That might be sufficient: just throw out shadow refs to pages
when there's pressure.
If not, it does get harder. A callback in the mm struct to say "I want
to swap your page out" is required if we don't take a reference to the
page. Dirty bit handling would be an interesting issue (maybe the
callback can say "No!" and dirty the page again?).
I fear mm code.
> It's not going to be simple.
Yeah, but it's one thing stopping lguest from being non-root usable, so
I want it there, too.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-24 6:11 ` Rusty Russell
@ 2007-07-24 6:21 ` Avi Kivity
2007-07-24 6:45 ` Rusty Russell
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24 6:21 UTC (permalink / raw)
To: Rusty Russell; +Cc: Shaohua Li, kvm-devel, lkml
Rusty Russell wrote:
> On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote:
>
>> Rusty Russell wrote:
>>
>>> On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote:
>>>
>>>
>>>> Having an address_space (like your patch does) is remarkably simple, and
>>>> requires few hooks from the current vm. However using existing vmas
>>>> mapped by the user has many advantages:
>>>>
>>>> - compatible with s390 requirements
>>>> - allows the user to use hugetlbfs pages, which have a performance
>>>> advantage using ept/npt (but which are unswappable)
>>>> - allows the user to map a file (which can be regarded as way to specify
>>>> the swap device)
>>>> - better ingration with the rest of the vm
>>>>
>>>>
>>> You don't need to expose the vmas. You just have userspace point out
>>> the start+len of each region of memory it wants the guest to be able to
>>> access, and the address it wants it to appear in the guest.
>>>
>>> This is a slight superset of what lguest does in two ways:
>>>
>>> 1) my guest address == user address, but I'm looking at adding an offset
>>> so I don't have to link the launcher binary specially.
>>> 2) I have only one contiguous region of guest-physical memory, since I
>>> can place device memory immediately above "normal" mem.
>>>
>>>
>>>
>> My intent was to allow userspace to establish assign a virtual address
>> range into a memory slot.
>>
>> So long as you don't do swapping, all is simple, since you can do a
>> get_user_pages() on initialization or when installing a shadow pte. But
>> if you want to swap, you need:
>>
>> - a way to transfer the dirty bit from the shadow ptes to the struct page
>>
>
> Actually, get_user_pages() does that for you. You have to make R/O any
> writable pte where the guest doesn't set the dirty bit (so you can trap
> it later) but last I put a printk in there, Linux doesn't do that.
>
>
Don't understand. You mean Linux always sets the dirty bit when it
makes a page writable? Surely some mistake.
It probably does do so on demand write faults, but I'm sure the dirty
bit can get cleaned out by the swapper.
>> - a way to let the vm rmap know that there are shadow ptes that point to
>> the page in addition to Linux ptes. These shadow ptes may be in a
>> different format than Linux ptes.
>> - a different tlb invalidation method with ASIDs
>>
>
> Well first I was just going to see how well hooking into the shrinker
> works. That might be sufficient: just throw out shadow refs to pages
> when there's pressure.
>
Ah, interesting. Yes, you trim the shadow page table cache which unrefs
pages for you.
Maybe that's a good way to get things started.
> If not, it does get harder. A callback in the mm struct to say "I want
> to swap your page out" is required if we don't take a reference to the
> page. Dirty bit handling would be an interesting issue (maybe the
> callback can say "No!" and dirty the page again?).
>
Since we have rmap, I don't see that as an issue. Given a page, we can
easily drop all refs. Though lguest doesn't do that, right?
I'm also concerned with picking the correct page, but there's no good
solution here.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-24 6:21 ` Avi Kivity
@ 2007-07-24 6:45 ` Rusty Russell
2007-07-24 6:59 ` Avi Kivity
0 siblings, 1 reply; 20+ messages in thread
From: Rusty Russell @ 2007-07-24 6:45 UTC (permalink / raw)
To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml
On Tue, 2007-07-24 at 09:21 +0300, Avi Kivity wrote:
> Rusty Russell wrote:
> > Actually, get_user_pages() does that for you. You have to make R/O any
> > writable pte where the guest doesn't set the dirty bit (so you can trap
> > it later) but last I put a printk in there, Linux doesn't do that.
>
> Don't understand. You mean Linux always sets the dirty bit when it
> makes a page writable? Surely some mistake.
>
> It probably does do so on demand write faults, but I'm sure the dirty
> bit can get cleaned out by the swapper.
Yeah, me dumb. I should put that printk back and try doing a kernel
compile.
> > If not, it does get harder. A callback in the mm struct to say "I want
> > to swap your page out" is required if we don't take a reference to the
> > page. Dirty bit handling would be an interesting issue (maybe the
> > callback can say "No!" and dirty the page again?).
>
> Since we have rmap, I don't see that as an issue. Given a page, we can
> easily drop all refs. Though lguest doesn't do that, right?
Yeah, rmap might maul some puppies. I could do poor man's rmap tho with
one backref and a bit to say "there are more". Then if that bit is set,
I just drop all 4 shadows 8)
> I'm also concerned with picking the correct page, but there's no good
> solution here.
But since you have rmap, if there was a cb when the the page was
undirtied, you could undirty the ptes. When there "I want to kick this
page out" cb comes along, see if one of the ptes is now dirty, dirty the
page and return "no".
Maybe it's too simplistic, but it might work.
Rusty.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-24 6:45 ` Rusty Russell
@ 2007-07-24 6:59 ` Avi Kivity
2007-07-24 7:17 ` Rusty Russell
0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2007-07-24 6:59 UTC (permalink / raw)
To: Rusty Russell; +Cc: Shaohua Li, kvm-devel, lkml
Rusty Russell wrote:
>
>>> If not, it does get harder. A callback in the mm struct to say "I want
>>> to swap your page out" is required if we don't take a reference to the
>>> page. Dirty bit handling would be an interesting issue (maybe the
>>> callback can say "No!" and dirty the page again?).
>>>
>> Since we have rmap, I don't see that as an issue. Given a page, we can
>> easily drop all refs. Though lguest doesn't do that, right?
>>
>
> Yeah, rmap might maul some puppies. I could do poor man's rmap tho with
> one backref and a bit to say "there are more". Then if that bit is set,
> I just drop all 4 shadows 8)
>
>
It's too poor. A long running guest will eventually map all of memory
using the kernel page tables and a large proportion with user page
tables, so many pages will have that bit set.
However, you can probably work around that by not setting an rmap for
the kernel mappings, and instead have the guest teach the host where the
kernel page tables live. You'd only be left with shared libraries,
until the kernel can share page tables for them too.
>> I'm also concerned with picking the correct page, but there's no good
>> solution here.
>>
>
> But since you have rmap, if there was a cb when the the page was
> undirtied, you could undirty the ptes. When there "I want to kick this
> page out" cb comes along, see if one of the ptes is now dirty, dirty the
> page and return "no".
>
> Maybe it's too simplistic, but it might work.
>
Ah, I see what you mean now. It could work, as far as I can tell (which
isn't very far, though).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages
2007-07-24 6:59 ` Avi Kivity
@ 2007-07-24 7:17 ` Rusty Russell
0 siblings, 0 replies; 20+ messages in thread
From: Rusty Russell @ 2007-07-24 7:17 UTC (permalink / raw)
To: Avi Kivity; +Cc: Shaohua Li, kvm-devel, lkml
On Tue, 2007-07-24 at 09:59 +0300, Avi Kivity wrote:
> However, you can probably work around that by not setting an rmap for
> the kernel mappings, and instead have the guest teach the host where the
> kernel page tables live. You'd only be left with shared libraries,
> until the kernel can share page tables for them too.
Well, I already treat kernel mappings specially (effectively I know the
guest's PAGE_OFFSET): they're kept identical in all the 4 shadows, and
need explicit guest flushing.
Whether the guest shares (non-kernel) page tables or not, I will shadow
them dumb as separate page table pages the way things stand. So, yes,
shared libs will be my main issue. Address space randomization means I
can't even use a heuristic such as looking for the page at the same
address in other shadows. I'll come up with something.
Anyway, virtio what I'm *supposed* to be doing today...
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages
2007-07-23 10:27 ` Avi Kivity
` (2 preceding siblings ...)
2007-07-23 23:10 ` Rusty Russell
@ 2007-07-24 1:42 ` Shaohua Li
2007-07-24 5:42 ` Avi Kivity
3 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2007-07-24 1:42 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel, lkml, Ingo Molnar
On Mon, 2007-07-23 at 18:27 +0800, Avi Kivity wrote:
> Shaohua Li wrote:
> > This patch series make kvm guest pages be able to be swapped out and
> > dynamically allocated. Without it, all guest memory is allocated at
> > guest start time.
> >
> > patches are against latest git, and you need first patch Avi's
> kvm-sch
> > integration patch
> >
> (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ).
> >
> > Patch is quite stable in my test. With the patch, I can run a 256M
> > memory guest in a 300M memory host.
>
> What about the opposite?
>
> > If guest is idle, the memory it used
> > can be less than 10M. I did a simple performance test (measure
> kernel
> > build time in guest), if there is few swap, the performance w/wo the
> > patch difference isn't significent. If you have better measurement
> > approach, please let me try.
> >
> > Unresolved issue:
> > 1. swapoff doesn't work, we need a hook.
> > 2. SMP guest might not work, as kvm doesn't support smp till now.
> > 3. better algorithm to select swaped out guest pages according to
> > guest's memory usage.
> > Maybe more.
> >
> > Any suggests and comments are appreciated.
> >
>
> The big question is whether to have kvm's own address_space or not.
>
> Having an address_space (like your patch does) is remarkably simple,
> and
> requires few hooks from the current vm. However using existing vmas
> mapped by the user has many advantages:
>
> - compatible with s390 requirements
> - allows the user to use hugetlbfs pages, which have a performance
> advantage using ept/npt (but which are unswappable)
> - allows the user to map a file (which can be regarded as way to
> specify
> the swap device)
> - better ingration with the rest of the vm
>
> I am quite torn between the simplicity of your approach and the
> advantages of using generic vmas. However, s390 pretty much forces
> our
> hand.
>
> What is your opinion of extending generic vmas to back kvm guest
> memory?
several issues:
1. vma is to manage usersapce address, kvm guest uses full address
space.
2. qemu itself must use some address space.
3. kvm need special page fault for shadow page table. generic page table
operations can't be directly used for guest.
I have no idea if your idea is feasible. The s390 guys said their shadow
page table is the same as host, this is why they can easily implement
swap, x86 is hard.
Thanks,
Shaohua
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC 0/8]KVM: swap out guest pages
2007-07-24 1:42 ` Shaohua Li
@ 2007-07-24 5:42 ` Avi Kivity
0 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2007-07-24 5:42 UTC (permalink / raw)
To: Shaohua Li; +Cc: kvm-devel, lkml, Ingo Molnar
Shaohua Li wrote:
> On Mon, 2007-07-23 at 18:27 +0800, Avi Kivity wrote:
>
>> Shaohua Li wrote:
>>
>>> This patch series make kvm guest pages be able to be swapped out and
>>> dynamically allocated. Without it, all guest memory is allocated at
>>> guest start time.
>>>
>>> patches are against latest git, and you need first patch Avi's
>>>
>> kvm-sch
>>
>>> integration patch
>>>
>>>
>> (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ).
>>
>>> Patch is quite stable in my test. With the patch, I can run a 256M
>>> memory guest in a 300M memory host.
>>>
>> What about the opposite?
>>
>>
>>> If guest is idle, the memory it used
>>> can be less than 10M. I did a simple performance test (measure
>>>
>> kernel
>>
>>> build time in guest), if there is few swap, the performance w/wo the
>>> patch difference isn't significent. If you have better measurement
>>> approach, please let me try.
>>>
>>> Unresolved issue:
>>> 1. swapoff doesn't work, we need a hook.
>>> 2. SMP guest might not work, as kvm doesn't support smp till now.
>>> 3. better algorithm to select swaped out guest pages according to
>>> guest's memory usage.
>>> Maybe more.
>>>
>>> Any suggests and comments are appreciated.
>>>
>>>
>> The big question is whether to have kvm's own address_space or not.
>>
>> Having an address_space (like your patch does) is remarkably simple,
>> and
>> requires few hooks from the current vm. However using existing vmas
>> mapped by the user has many advantages:
>>
>> - compatible with s390 requirements
>> - allows the user to use hugetlbfs pages, which have a performance
>> advantage using ept/npt (but which are unswappable)
>> - allows the user to map a file (which can be regarded as way to
>> specify
>> the swap device)
>> - better ingration with the rest of the vm
>>
>> I am quite torn between the simplicity of your approach and the
>> advantages of using generic vmas. However, s390 pretty much forces
>> our
>> hand.
>>
>> What is your opinion of extending generic vmas to back kvm guest
>> memory?
>>
> several issues:
> 1. vma is to manage usersapce address, kvm guest uses full address
> space.
> 2. qemu itself must use some address space.
>
My idea is to keep the current slot concept, but instead of having kvm
allocate pages for a slot, it would call get_user_pages() for a virtual
address range. Userspace doesn't directly talk about vmas, just virtual
address ranges.
> 3. kvm need special page fault for shadow page table. generic page table
> operations can't be directly used for guest.
> I have no idea if your idea is feasible. The s390 guys said their shadow
> page table is the same as host, this is why they can easily implement
> swap, x86 is hard.
>
No question that it is hard. I'd like to explore just how hard it is.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 20+ messages in thread