* [RFC 0/8]KVM: swap out guest pages
@ 2007-07-23 6:51 Shaohua Li
[not found] ` <1185173489.2645.64.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2007-07-23 6:51 UTC (permalink / raw)
To: kvm-devel, lkml
This patch series make kvm guest pages be able to be swapped out and
dynamically allocated. Without it, all guest memory is allocated at
guest start time.
patches are against latest git, and you need first patch Avi's kvm-sch
integration patch
(http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ).
Patch is quite stable in my test. With the patch, I can run a 256M
memory guest in a 300M memory host. If guest is idle, the memory it used
can be less than 10M. I did a simple performance test (measure kernel
build time in guest), if there is few swap, the performance w/wo the
patch difference isn't significent. If you have better measurement
approach, please let me try.
Unresolved issue:
1. swapoff doesn't work, we need a hook.
2. SMP guest might not work, as kvm doesn't support smp till now.
3. better algorithm to select swaped out guest pages according to
guest's memory usage.
Maybe more.
Any suggests and comments are appreciated.
Thanks,
Shaohua
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 20+ messages in thread[parent not found: <1185173489.2645.64.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <1185173489.2645.64.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org> @ 2007-07-23 10:27 ` Avi Kivity [not found] ` <46A4829C.9080104-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2007-07-23 10:27 UTC (permalink / raw) To: Shaohua Li; +Cc: kvm-devel, lkml Shaohua Li wrote: > This patch series make kvm guest pages be able to be swapped out and > dynamically allocated. Without it, all guest memory is allocated at > guest start time. > > patches are against latest git, and you need first patch Avi's kvm-sch > integration patch > (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ). > > Patch is quite stable in my test. With the patch, I can run a 256M > memory guest in a 300M memory host. What about the opposite? > If guest is idle, the memory it used > can be less than 10M. I did a simple performance test (measure kernel > build time in guest), if there is few swap, the performance w/wo the > patch difference isn't significent. If you have better measurement > approach, please let me try. > > Unresolved issue: > 1. swapoff doesn't work, we need a hook. > 2. SMP guest might not work, as kvm doesn't support smp till now. > 3. better algorithm to select swaped out guest pages according to > guest's memory usage. > Maybe more. > > Any suggests and comments are appreciated. > The big question is whether to have kvm's own address_space or not. Having an address_space (like your patch does) is remarkably simple, and requires few hooks from the current vm. However using existing vmas mapped by the user has many advantages: - compatible with s390 requirements - allows the user to use hugetlbfs pages, which have a performance advantage using ept/npt (but which are unswappable) - allows the user to map a file (which can be regarded as way to specify the swap device) - better ingration with the rest of the vm I am quite torn between the simplicity of your approach and the advantages of using generic vmas. However, s390 pretty much forces our hand. What is your opinion of extending generic vmas to back kvm guest memory? -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <46A4829C.9080104-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A4829C.9080104-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-23 12:25 ` Christoph Hellwig [not found] ` <20070723122510.GA3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2007-07-23 20:06 ` Jeff Dike ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2007-07-23 12:25 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote: > Having an address_space (like your patch does) is remarkably simple, and > requires few hooks from the current vm. However using existing vmas > mapped by the user has many advantages: Actually it requires lots of deep down VM internals symbols that'll never get exported. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <20070723122510.GA3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <20070723122510.GA3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2007-07-23 12:29 ` Avi Kivity [not found] ` <46A49F30.5010206-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2007-07-23 12:29 UTC (permalink / raw) To: Christoph Hellwig, Avi Kivity, Shaohua Li, kvm-devel, lkml Christoph Hellwig wrote: > On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote: > >> Having an address_space (like your patch does) is remarkably simple, and >> requires few hooks from the current vm. However using existing vmas >> mapped by the user has many advantages: >> > > Actually it requires lots of deep down VM internals symbols that'll never > get exported. > > What's "it" here? kvm-specific address space or generic vmas. Generic vmas will be more intrusive AFAICT. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <46A49F30.5010206-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A49F30.5010206-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-23 12:34 ` Christoph Hellwig [not found] ` <20070723123443.GB3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2007-07-23 12:34 UTC (permalink / raw) To: Avi Kivity; +Cc: Christoph Hellwig, lkml, kvm-devel On Mon, Jul 23, 2007 at 03:29:36PM +0300, Avi Kivity wrote: > >Actually it requires lots of deep down VM internals symbols that'll never > >get exported. > > > > > > What's "it" here? kvm-specific address space or generic vmas. The patches in this thread. > Generic vmas will be more intrusive AFAICT. People use intrusive differently. Doing big changes to core code is not a problem if we actually get a proper interface. Just exporting core function without other changes and then writing code in modules that pokes into internals is much much worse. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <20070723123443.GB3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <20070723123443.GB3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2007-07-23 12:39 ` Avi Kivity 2007-07-24 2:00 ` Shaohua Li 1 sibling, 0 replies; 20+ messages in thread From: Avi Kivity @ 2007-07-23 12:39 UTC (permalink / raw) To: Christoph Hellwig, Avi Kivity, Shaohua Li, kvm-devel, lkml Christoph Hellwig wrote: > >> Generic vmas will be more intrusive AFAICT. >> > > People use intrusive differently. Doing big changes to core code is not > a problem if we actually get a proper interface. Just exporting core > function without other changes and then writing code in modules that > pokes into internals is much much worse. > Agree 100%. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <20070723123443.GB3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2007-07-23 12:39 ` Avi Kivity @ 2007-07-24 2:00 ` Shaohua Li 1 sibling, 0 replies; 20+ messages in thread From: Shaohua Li @ 2007-07-24 2:00 UTC (permalink / raw) To: Christoph Hellwig; +Cc: kvm-devel, lkml On Mon, 2007-07-23 at 20:34 +0800, Christoph Hellwig wrote: > On Mon, Jul 23, 2007 at 03:29:36PM +0300, Avi Kivity wrote: > > >Actually it requires lots of deep down VM internals symbols that'll > never > > >get exported. > > > > > > > > > > What's "it" here? kvm-specific address space or generic vmas. > > The patches in this thread. > > > Generic vmas will be more intrusive AFAICT. > > People use intrusive differently. Doing big changes to core code is > not > a problem if we actually get a proper interface. Just exporting core > function without other changes and then writing code in modules that > pokes into internals is much much worse. The patch follows the same way shm swap out pages. The only difference is kvm is a module but shm not. why kvm can't use the symbols shm used? Sure, it's possible to write guest memory to a file so not use the symbols, if you really hate this, I'll consider the alternative method. Thanks, Shaohua ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A4829C.9080104-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-07-23 12:25 ` Christoph Hellwig @ 2007-07-23 20:06 ` Jeff Dike [not found] ` <20070723200659.GA13508-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org> 2007-07-23 23:10 ` Rusty Russell 2007-07-24 1:42 ` Shaohua Li 3 siblings, 1 reply; 20+ messages in thread From: Jeff Dike @ 2007-07-23 20:06 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote: > Having an address_space (like your patch does) is remarkably simple, and > requires few hooks from the current vm. However using existing vmas > mapped by the user has many advantages: It's also needed for a SKAS-like UML client, where the host side will need to make system calls on behalf of the guest. Jeff -- Work email - jdike at linux dot intel dot com ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <20070723200659.GA13508-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <20070723200659.GA13508-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org> @ 2007-07-24 5:22 ` Avi Kivity [not found] ` <46A58CAD.7070807-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2007-07-24 5:22 UTC (permalink / raw) To: Jeff Dike; +Cc: kvm-devel, lkml Jeff Dike wrote: > On Mon, Jul 23, 2007 at 01:27:40PM +0300, Avi Kivity wrote: > >> Having an address_space (like your patch does) is remarkably simple, and >> requires few hooks from the current vm. However using existing vmas >> mapped by the user has many advantages: >> > > It's also needed for a SKAS-like UML client, where the host side will > need to make system calls on behalf of the guest. > > Even in the current model, guest physical memory is mmap()ed into host userspace. The kernel cannot enforce this, however. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <46A58CAD.7070807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A58CAD.7070807-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-25 16:15 ` Jeff Dike [not found] ` <20070725161544.GA7747-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Jeff Dike @ 2007-07-25 16:15 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Tue, Jul 24, 2007 at 08:22:53AM +0300, Avi Kivity wrote: > Even in the current model, guest physical memory is mmap()ed into host > userspace. I want it to be identity-mapped, which a single address space would guarantee. For things which change mappings, like vmalloc, I need to be in the same address space as the guest. Jeff -- Work email - jdike at linux dot intel dot com ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <20070725161544.GA7747-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <20070725161544.GA7747-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org> @ 2007-07-25 17:12 ` Carsten Otte 0 siblings, 0 replies; 20+ messages in thread From: Carsten Otte @ 2007-07-25 17:12 UTC (permalink / raw) To: Jeff Dike; +Cc: kvm-devel, lkml Jeff Dike wrote: > I want it to be identity-mapped, which a single address space would > guarantee. For things which change mappings, like vmalloc, I need to > be in the same address space as the guest. That'll also be mandatory required by hw when porting this to s390. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A4829C.9080104-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-07-23 12:25 ` Christoph Hellwig 2007-07-23 20:06 ` Jeff Dike @ 2007-07-23 23:10 ` Rusty Russell 2007-07-24 5:30 ` [kvm-devel] " Avi Kivity 2007-07-24 1:42 ` Shaohua Li 3 siblings, 1 reply; 20+ messages in thread From: Rusty Russell @ 2007-07-23 23:10 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote: > Having an address_space (like your patch does) is remarkably simple, and > requires few hooks from the current vm. However using existing vmas > mapped by the user has many advantages: > > - compatible with s390 requirements > - allows the user to use hugetlbfs pages, which have a performance > advantage using ept/npt (but which are unswappable) > - allows the user to map a file (which can be regarded as way to specify > the swap device) > - better ingration with the rest of the vm You don't need to expose the vmas. You just have userspace point out the start+len of each region of memory it wants the guest to be able to access, and the address it wants it to appear in the guest. This is a slight superset of what lguest does in two ways: 1) my guest address == user address, but I'm looking at adding an offset so I don't have to link the launcher binary specially. 2) I have only one contiguous region of guest-physical memory, since I can place device memory immediately above "normal" mem. But the result is pretty sweet, and doesn't require any new symbols to be exported. Cheers, Rusty. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [kvm-devel] [RFC 0/8]KVM: swap out guest pages 2007-07-23 23:10 ` Rusty Russell @ 2007-07-24 5:30 ` Avi Kivity [not found] ` <46A58E8B.8050507-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2007-07-24 5:30 UTC (permalink / raw) To: Rusty Russell; +Cc: Shaohua Li, kvm-devel, lkml Rusty Russell wrote: > On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote: > >> Having an address_space (like your patch does) is remarkably simple, and >> requires few hooks from the current vm. However using existing vmas >> mapped by the user has many advantages: >> >> - compatible with s390 requirements >> - allows the user to use hugetlbfs pages, which have a performance >> advantage using ept/npt (but which are unswappable) >> - allows the user to map a file (which can be regarded as way to specify >> the swap device) >> - better ingration with the rest of the vm >> > > You don't need to expose the vmas. You just have userspace point out > the start+len of each region of memory it wants the guest to be able to > access, and the address it wants it to appear in the guest. > > This is a slight superset of what lguest does in two ways: > > 1) my guest address == user address, but I'm looking at adding an offset > so I don't have to link the launcher binary specially. > 2) I have only one contiguous region of guest-physical memory, since I > can place device memory immediately above "normal" mem. > > My intent was to allow userspace to establish assign a virtual address range into a memory slot. So long as you don't do swapping, all is simple, since you can do a get_user_pages() on initialization or when installing a shadow pte. But if you want to swap, you need: - a way to transfer the dirty bit from the shadow ptes to the struct page - a way to let the vm rmap know that there are shadow ptes that point to the page in addition to Linux ptes. These shadow ptes may be in a different format than Linux ptes. - a different tlb invalidation method with ASIDs It's not going to be simple. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <46A58E8B.8050507-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A58E8B.8050507-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-24 6:11 ` Rusty Russell [not found] ` <1185257474.1803.216.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Rusty Russell @ 2007-07-24 6:11 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote: > Rusty Russell wrote: > > On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote: > > > >> Having an address_space (like your patch does) is remarkably simple, and > >> requires few hooks from the current vm. However using existing vmas > >> mapped by the user has many advantages: > >> > >> - compatible with s390 requirements > >> - allows the user to use hugetlbfs pages, which have a performance > >> advantage using ept/npt (but which are unswappable) > >> - allows the user to map a file (which can be regarded as way to specify > >> the swap device) > >> - better ingration with the rest of the vm > >> > > > > You don't need to expose the vmas. You just have userspace point out > > the start+len of each region of memory it wants the guest to be able to > > access, and the address it wants it to appear in the guest. > > > > This is a slight superset of what lguest does in two ways: > > > > 1) my guest address == user address, but I'm looking at adding an offset > > so I don't have to link the launcher binary specially. > > 2) I have only one contiguous region of guest-physical memory, since I > > can place device memory immediately above "normal" mem. > > > > > > My intent was to allow userspace to establish assign a virtual address > range into a memory slot. > > So long as you don't do swapping, all is simple, since you can do a > get_user_pages() on initialization or when installing a shadow pte. But > if you want to swap, you need: > > - a way to transfer the dirty bit from the shadow ptes to the struct page Actually, get_user_pages() does that for you. You have to make R/O any writable pte where the guest doesn't set the dirty bit (so you can trap it later) but last I put a printk in there, Linux doesn't do that. > - a way to let the vm rmap know that there are shadow ptes that point to > the page in addition to Linux ptes. These shadow ptes may be in a > different format than Linux ptes. > - a different tlb invalidation method with ASIDs Well first I was just going to see how well hooking into the shrinker works. That might be sufficient: just throw out shadow refs to pages when there's pressure. If not, it does get harder. A callback in the mm struct to say "I want to swap your page out" is required if we don't take a reference to the page. Dirty bit handling would be an interesting issue (maybe the callback can say "No!" and dirty the page again?). I fear mm code. > It's not going to be simple. Yeah, but it's one thing stopping lguest from being non-root usable, so I want it there, too. Cheers, Rusty. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <1185257474.1803.216.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <1185257474.1803.216.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2007-07-24 6:21 ` Avi Kivity [not found] ` <46A59A75.8050501-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2007-07-24 6:21 UTC (permalink / raw) To: Rusty Russell; +Cc: kvm-devel, lkml Rusty Russell wrote: > On Tue, 2007-07-24 at 08:30 +0300, Avi Kivity wrote: > >> Rusty Russell wrote: >> >>> On Mon, 2007-07-23 at 13:27 +0300, Avi Kivity wrote: >>> >>> >>>> Having an address_space (like your patch does) is remarkably simple, and >>>> requires few hooks from the current vm. However using existing vmas >>>> mapped by the user has many advantages: >>>> >>>> - compatible with s390 requirements >>>> - allows the user to use hugetlbfs pages, which have a performance >>>> advantage using ept/npt (but which are unswappable) >>>> - allows the user to map a file (which can be regarded as way to specify >>>> the swap device) >>>> - better ingration with the rest of the vm >>>> >>>> >>> You don't need to expose the vmas. You just have userspace point out >>> the start+len of each region of memory it wants the guest to be able to >>> access, and the address it wants it to appear in the guest. >>> >>> This is a slight superset of what lguest does in two ways: >>> >>> 1) my guest address == user address, but I'm looking at adding an offset >>> so I don't have to link the launcher binary specially. >>> 2) I have only one contiguous region of guest-physical memory, since I >>> can place device memory immediately above "normal" mem. >>> >>> >>> >> My intent was to allow userspace to establish assign a virtual address >> range into a memory slot. >> >> So long as you don't do swapping, all is simple, since you can do a >> get_user_pages() on initialization or when installing a shadow pte. But >> if you want to swap, you need: >> >> - a way to transfer the dirty bit from the shadow ptes to the struct page >> > > Actually, get_user_pages() does that for you. You have to make R/O any > writable pte where the guest doesn't set the dirty bit (so you can trap > it later) but last I put a printk in there, Linux doesn't do that. > > Don't understand. You mean Linux always sets the dirty bit when it makes a page writable? Surely some mistake. It probably does do so on demand write faults, but I'm sure the dirty bit can get cleaned out by the swapper. >> - a way to let the vm rmap know that there are shadow ptes that point to >> the page in addition to Linux ptes. These shadow ptes may be in a >> different format than Linux ptes. >> - a different tlb invalidation method with ASIDs >> > > Well first I was just going to see how well hooking into the shrinker > works. That might be sufficient: just throw out shadow refs to pages > when there's pressure. > Ah, interesting. Yes, you trim the shadow page table cache which unrefs pages for you. Maybe that's a good way to get things started. > If not, it does get harder. A callback in the mm struct to say "I want > to swap your page out" is required if we don't take a reference to the > page. Dirty bit handling would be an interesting issue (maybe the > callback can say "No!" and dirty the page again?). > Since we have rmap, I don't see that as an issue. Given a page, we can easily drop all refs. Though lguest doesn't do that, right? I'm also concerned with picking the correct page, but there's no good solution here. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <46A59A75.8050501-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A59A75.8050501-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-24 6:45 ` Rusty Russell [not found] ` <1185259509.1803.237.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Rusty Russell @ 2007-07-24 6:45 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Tue, 2007-07-24 at 09:21 +0300, Avi Kivity wrote: > Rusty Russell wrote: > > Actually, get_user_pages() does that for you. You have to make R/O any > > writable pte where the guest doesn't set the dirty bit (so you can trap > > it later) but last I put a printk in there, Linux doesn't do that. > > Don't understand. You mean Linux always sets the dirty bit when it > makes a page writable? Surely some mistake. > > It probably does do so on demand write faults, but I'm sure the dirty > bit can get cleaned out by the swapper. Yeah, me dumb. I should put that printk back and try doing a kernel compile. > > If not, it does get harder. A callback in the mm struct to say "I want > > to swap your page out" is required if we don't take a reference to the > > page. Dirty bit handling would be an interesting issue (maybe the > > callback can say "No!" and dirty the page again?). > > Since we have rmap, I don't see that as an issue. Given a page, we can > easily drop all refs. Though lguest doesn't do that, right? Yeah, rmap might maul some puppies. I could do poor man's rmap tho with one backref and a bit to say "there are more". Then if that bit is set, I just drop all 4 shadows 8) > I'm also concerned with picking the correct page, but there's no good > solution here. But since you have rmap, if there was a cb when the the page was undirtied, you could undirty the ptes. When there "I want to kick this page out" cb comes along, see if one of the ptes is now dirty, dirty the page and return "no". Maybe it's too simplistic, but it might work. Rusty. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <1185259509.1803.237.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <1185259509.1803.237.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2007-07-24 6:59 ` Avi Kivity [not found] ` <46A5A36E.8000409-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 20+ messages in thread From: Avi Kivity @ 2007-07-24 6:59 UTC (permalink / raw) To: Rusty Russell; +Cc: kvm-devel, lkml Rusty Russell wrote: > >>> If not, it does get harder. A callback in the mm struct to say "I want >>> to swap your page out" is required if we don't take a reference to the >>> page. Dirty bit handling would be an interesting issue (maybe the >>> callback can say "No!" and dirty the page again?). >>> >> Since we have rmap, I don't see that as an issue. Given a page, we can >> easily drop all refs. Though lguest doesn't do that, right? >> > > Yeah, rmap might maul some puppies. I could do poor man's rmap tho with > one backref and a bit to say "there are more". Then if that bit is set, > I just drop all 4 shadows 8) > > It's too poor. A long running guest will eventually map all of memory using the kernel page tables and a large proportion with user page tables, so many pages will have that bit set. However, you can probably work around that by not setting an rmap for the kernel mappings, and instead have the guest teach the host where the kernel page tables live. You'd only be left with shared libraries, until the kernel can share page tables for them too. >> I'm also concerned with picking the correct page, but there's no good >> solution here. >> > > But since you have rmap, if there was a cb when the the page was > undirtied, you could undirty the ptes. When there "I want to kick this > page out" cb comes along, see if one of the ptes is now dirty, dirty the > page and return "no". > > Maybe it's too simplistic, but it might work. > Ah, I see what you mean now. It could work, as far as I can tell (which isn't very far, though). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <46A5A36E.8000409-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A5A36E.8000409-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-24 7:17 ` Rusty Russell 0 siblings, 0 replies; 20+ messages in thread From: Rusty Russell @ 2007-07-24 7:17 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Tue, 2007-07-24 at 09:59 +0300, Avi Kivity wrote: > However, you can probably work around that by not setting an rmap for > the kernel mappings, and instead have the guest teach the host where the > kernel page tables live. You'd only be left with shared libraries, > until the kernel can share page tables for them too. Well, I already treat kernel mappings specially (effectively I know the guest's PAGE_OFFSET): they're kept identical in all the 4 shadows, and need explicit guest flushing. Whether the guest shares (non-kernel) page tables or not, I will shadow them dumb as separate page table pages the way things stand. So, yes, shared libs will be my main issue. Address space randomization means I can't even use a heuristic such as looking for the page at the same address in other shadows. I'll come up with something. Anyway, virtio what I'm *supposed* to be doing today... Thanks, Rusty. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <46A4829C.9080104-atKUWr5tajBWk0Htik3J/w@public.gmane.org> ` (2 preceding siblings ...) 2007-07-23 23:10 ` Rusty Russell @ 2007-07-24 1:42 ` Shaohua Li [not found] ` <1185241357.24201.12.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org> 3 siblings, 1 reply; 20+ messages in thread From: Shaohua Li @ 2007-07-24 1:42 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel, lkml On Mon, 2007-07-23 at 18:27 +0800, Avi Kivity wrote: > Shaohua Li wrote: > > This patch series make kvm guest pages be able to be swapped out and > > dynamically allocated. Without it, all guest memory is allocated at > > guest start time. > > > > patches are against latest git, and you need first patch Avi's > kvm-sch > > integration patch > > > (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ). > > > > Patch is quite stable in my test. With the patch, I can run a 256M > > memory guest in a 300M memory host. > > What about the opposite? > > > If guest is idle, the memory it used > > can be less than 10M. I did a simple performance test (measure > kernel > > build time in guest), if there is few swap, the performance w/wo the > > patch difference isn't significent. If you have better measurement > > approach, please let me try. > > > > Unresolved issue: > > 1. swapoff doesn't work, we need a hook. > > 2. SMP guest might not work, as kvm doesn't support smp till now. > > 3. better algorithm to select swaped out guest pages according to > > guest's memory usage. > > Maybe more. > > > > Any suggests and comments are appreciated. > > > > The big question is whether to have kvm's own address_space or not. > > Having an address_space (like your patch does) is remarkably simple, > and > requires few hooks from the current vm. However using existing vmas > mapped by the user has many advantages: > > - compatible with s390 requirements > - allows the user to use hugetlbfs pages, which have a performance > advantage using ept/npt (but which are unswappable) > - allows the user to map a file (which can be regarded as way to > specify > the swap device) > - better ingration with the rest of the vm > > I am quite torn between the simplicity of your approach and the > advantages of using generic vmas. However, s390 pretty much forces > our > hand. > > What is your opinion of extending generic vmas to back kvm guest > memory? several issues: 1. vma is to manage usersapce address, kvm guest uses full address space. 2. qemu itself must use some address space. 3. kvm need special page fault for shadow page table. generic page table operations can't be directly used for guest. I have no idea if your idea is feasible. The s390 guys said their shadow page table is the same as host, this is why they can easily implement swap, x86 is hard. Thanks, Shaohua ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <1185241357.24201.12.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>]
* Re: [RFC 0/8]KVM: swap out guest pages [not found] ` <1185241357.24201.12.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org> @ 2007-07-24 5:42 ` Avi Kivity 0 siblings, 0 replies; 20+ messages in thread From: Avi Kivity @ 2007-07-24 5:42 UTC (permalink / raw) To: Shaohua Li; +Cc: kvm-devel, lkml Shaohua Li wrote: > On Mon, 2007-07-23 at 18:27 +0800, Avi Kivity wrote: > >> Shaohua Li wrote: >> >>> This patch series make kvm guest pages be able to be swapped out and >>> dynamically allocated. Without it, all guest memory is allocated at >>> guest start time. >>> >>> patches are against latest git, and you need first patch Avi's >>> >> kvm-sch >> >>> integration patch >>> >>> >> (http://sourceforge.net/mailarchive/forum.php?thread_name=11841693332609-git-send-email-avi%40qumranet.com&forum_name=kvm-devel ). >> >>> Patch is quite stable in my test. With the patch, I can run a 256M >>> memory guest in a 300M memory host. >>> >> What about the opposite? >> >> >>> If guest is idle, the memory it used >>> can be less than 10M. I did a simple performance test (measure >>> >> kernel >> >>> build time in guest), if there is few swap, the performance w/wo the >>> patch difference isn't significent. If you have better measurement >>> approach, please let me try. >>> >>> Unresolved issue: >>> 1. swapoff doesn't work, we need a hook. >>> 2. SMP guest might not work, as kvm doesn't support smp till now. >>> 3. better algorithm to select swaped out guest pages according to >>> guest's memory usage. >>> Maybe more. >>> >>> Any suggests and comments are appreciated. >>> >>> >> The big question is whether to have kvm's own address_space or not. >> >> Having an address_space (like your patch does) is remarkably simple, >> and >> requires few hooks from the current vm. However using existing vmas >> mapped by the user has many advantages: >> >> - compatible with s390 requirements >> - allows the user to use hugetlbfs pages, which have a performance >> advantage using ept/npt (but which are unswappable) >> - allows the user to map a file (which can be regarded as way to >> specify >> the swap device) >> - better ingration with the rest of the vm >> >> I am quite torn between the simplicity of your approach and the >> advantages of using generic vmas. However, s390 pretty much forces >> our >> hand. >> >> What is your opinion of extending generic vmas to back kvm guest >> memory? >> > several issues: > 1. vma is to manage usersapce address, kvm guest uses full address > space. > 2. qemu itself must use some address space. > My idea is to keep the current slot concept, but instead of having kvm allocate pages for a slot, it would call get_user_pages() for a virtual address range. Userspace doesn't directly talk about vmas, just virtual address ranges. > 3. kvm need special page fault for shadow page table. generic page table > operations can't be directly used for guest. > I have no idea if your idea is feasible. The s390 guys said their shadow > page table is the same as host, this is why they can easily implement > swap, x86 is hard. > No question that it is hard. I'd like to explore just how hard it is. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-07-25 17:12 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-23 6:51 [RFC 0/8]KVM: swap out guest pages Shaohua Li
[not found] ` <1185173489.2645.64.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
2007-07-23 10:27 ` Avi Kivity
[not found] ` <46A4829C.9080104-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-23 12:25 ` Christoph Hellwig
[not found] ` <20070723122510.GA3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2007-07-23 12:29 ` Avi Kivity
[not found] ` <46A49F30.5010206-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-23 12:34 ` Christoph Hellwig
[not found] ` <20070723123443.GB3674-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2007-07-23 12:39 ` Avi Kivity
2007-07-24 2:00 ` Shaohua Li
2007-07-23 20:06 ` Jeff Dike
[not found] ` <20070723200659.GA13508-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org>
2007-07-24 5:22 ` Avi Kivity
[not found] ` <46A58CAD.7070807-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-25 16:15 ` Jeff Dike
[not found] ` <20070725161544.GA7747-1LLyehjZOUUZWFFyALql+T+iFHGzDt/a@public.gmane.org>
2007-07-25 17:12 ` Carsten Otte
2007-07-23 23:10 ` Rusty Russell
2007-07-24 5:30 ` [kvm-devel] " Avi Kivity
[not found] ` <46A58E8B.8050507-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-24 6:11 ` Rusty Russell
[not found] ` <1185257474.1803.216.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-07-24 6:21 ` Avi Kivity
[not found] ` <46A59A75.8050501-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-24 6:45 ` Rusty Russell
[not found] ` <1185259509.1803.237.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2007-07-24 6:59 ` Avi Kivity
[not found] ` <46A5A36E.8000409-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-24 7:17 ` Rusty Russell
2007-07-24 1:42 ` Shaohua Li
[not found] ` <1185241357.24201.12.camel-yAZKuqJtXNMXR+D7ky4Foa2pdiUAq4bhAL8bYrjMMd8@public.gmane.org>
2007-07-24 5:42 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox