* [RFC][PATCH 0/5] Memory merging driver for Linux
@ 2008-01-21 16:05 Izik Eidus
[not found] ` <4794C2E1.8040607-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Izik Eidus @ 2008-01-21 16:05 UTC (permalink / raw)
To: kvm-devel, andrea-atKUWr5tajBWk0Htik3J/w,
avi-atKUWr5tajBWk0Htik3J/w, dor.laor-atKUWr5tajBWk0Htik3J/w,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, yaniv-atKUWr5tajBWk0Htik3J/w
when kvm is used in production servers, many times it run the same
guests operation systems more than once
the idea of this module is to find the identical pages in diffrent
guests and to share them so we can save memory,
due to the fact that many guests run identical operation systems, alot
of data in the ram is equal between the guests
this module find this identical data (pages) and merge them into one
single page
this new page is write protected so in any case the guest will try to
write to it do_wp_page will duplicate the page
this module simply go over a list of pages that were registered, and
find the identical pages (using hash table)
the pages that it scan are anonymous, each time that it find an
identical pages it create a file mapped
(right now it is just kernel allocated) page that will be the shared page,
as for now i am missing swapping support (will add soon using non-linear
vmas)
this module can be used for every other purpuse and work without kvm
(i used it for qemu)
to make it work for kvm, the mmu notifers sent by andrea should be used
i added 2 new functions to the kernel
one:
page_wrprotect() make the page as read only by setting the ptes point to
it as read only.
second:
replace_page() - replace the pte mapping related to vm area between two
pages
few numbers:
for started windows i can share almost the whole memory (as it zero all
the pages),
so i can start much much more windows guests than i have memory (as long
as no one touch it)
for linux guests i was able to share 800mb+ for 4 centos guests that
each had 512mb memory allocated to
(again it was without work load, and they ran X)
--
woof.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <4794C2E1.8040607-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC][PATCH 0/5] Memory merging driver for Linux [not found] ` <4794C2E1.8040607-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-23 17:05 ` Rik van Riel [not found] ` <20080123120510.4014e382-Fuq27k0DHcCSkoNiqTzCLQ@public.gmane.org> 2008-01-23 23:10 ` Chris Wright 1 sibling, 1 reply; 8+ messages in thread From: Rik van Riel @ 2008-01-23 17:05 UTC (permalink / raw) To: Izik Eidus Cc: andrea-atKUWr5tajBWk0Htik3J/w, yaniv-atKUWr5tajBWk0Htik3J/w, kvm-devel, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, avi-atKUWr5tajBWk0Htik3J/w On Mon, 21 Jan 2008 18:05:53 +0200 Izik Eidus <izike-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > i added 2 new functions to the kernel > one: > page_wrprotect() make the page as read only by setting the ptes point to > it as read only. > second: > replace_page() - replace the pte mapping related to vm area between two > pages How will this work on CPUs with nested paging support, where the CPU does the guest -> physical address translation? (opposed to having shadow page tables) Is it sufficient to mark the page read-only in the guest->physical translation page table? -- All rights reversed. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20080123120510.4014e382-Fuq27k0DHcCSkoNiqTzCLQ@public.gmane.org>]
* Re: [RFC][PATCH 0/5] Memory merging driver for Linux [not found] ` <20080123120510.4014e382-Fuq27k0DHcCSkoNiqTzCLQ@public.gmane.org> @ 2008-01-23 17:54 ` Andrea Arcangeli [not found] ` <20080123175444.GH7141-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> 2008-01-24 5:38 ` Avi Kivity 1 sibling, 1 reply; 8+ messages in thread From: Andrea Arcangeli @ 2008-01-23 17:54 UTC (permalink / raw) To: Rik van Riel Cc: yaniv-atKUWr5tajBWk0Htik3J/w, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, kvm-devel, avi-atKUWr5tajBWk0Htik3J/w On Wed, Jan 23, 2008 at 12:05:10PM -0500, Rik van Riel wrote: > On Mon, 21 Jan 2008 18:05:53 +0200 > Izik Eidus <izike-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > > i added 2 new functions to the kernel > > one: > > page_wrprotect() make the page as read only by setting the ptes point to > > it as read only. > > second: > > replace_page() - replace the pte mapping related to vm area between two > > pages > > How will this work on CPUs with nested paging support, where the > CPU does the guest -> physical address translation? (opposed to > having shadow page tables) sptes resolve guest addresses to host physical addresses (what is different is only which kind of guest address is being translated). sptes are faster than nptes for non pte-mangling non-context-switching memory intensive number crunching workloads infact. (DBMS will appreciate ntpes instead ;) > Is it sufficient to mark the page read-only in the guest->physical > translation page table? Yes, just like with sptes too. I guess ntpes will also be managed as a tlb even if they won't require many changes, but the mmu notifier already firing in those two calls is what will keep both sptes and nptes in sync with the main linux VM. The serialization against get_user_pages that refills the spte/npte layer with nonpresent-nofault case of course happens through the PT lock, just like for the regular linux page fault against the pte that is pte_none for a little while but with the lock held (and set to write protect or new value before releasing it). This infact shows how the mmu notifiers that connects the linux pte to the spte/npte works for more than swapping. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20080123175444.GH7141-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>]
* Re: [RFC][PATCH 0/5] Memory merging driver for Linux [not found] ` <20080123175444.GH7141-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> @ 2008-01-23 18:11 ` Izik Eidus 0 siblings, 0 replies; 8+ messages in thread From: Izik Eidus @ 2008-01-23 18:11 UTC (permalink / raw) To: Andrea Arcangeli Cc: yaniv-atKUWr5tajBWk0Htik3J/w, kvm-devel, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, avi-atKUWr5tajBWk0Htik3J/w Andrea Arcangeli wrote: > On Wed, Jan 23, 2008 at 12:05:10PM -0500, Rik van Riel wrote: > >> On Mon, 21 Jan 2008 18:05:53 +0200 >> Izik Eidus <izike-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: >> >> >>> i added 2 new functions to the kernel >>> one: >>> page_wrprotect() make the page as read only by setting the ptes point to >>> it as read only. >>> second: >>> replace_page() - replace the pte mapping related to vm area between two >>> pages >>> >> How will this work on CPUs with nested paging support, where the >> CPU does the guest -> physical address translation? (opposed to >> having shadow page tables) >> thanks for reviewing. nested page tables are some what diffrent from shadow page tables instead of keeping another page table like we are doing with the shadow code we are keeping another layer that translate the physical memory of the guest into the physical memory of the host, to this new layer we are allowed to add access permission, so we can mark the pages that are shared as readonly and to vmexit on that, so it should work with that. > > sptes resolve guest addresses to host physical addresses (what is > different is only which kind of guest address is being translated). > > sptes are faster than nptes for non pte-mangling non-context-switching > memory intensive number crunching workloads infact. (DBMS will > appreciate ntpes instead ;) > > >> Is it sufficient to mark the page read-only in the guest->physical >> translation page table? >> > > Yes, just like with sptes too. I guess ntpes will also be managed as a > tlb even if they won't require many changes, but the mmu notifier > already firing in those two calls is what will keep both sptes and > nptes in sync with the main linux VM. The serialization against > get_user_pages that refills the spte/npte layer with > nonpresent-nofault case of course happens through the PT lock, just > like for the regular linux page fault against the pte that is pte_none > for a little while but with the lock held (and set to write protect or > new value before releasing it). This infact shows how the mmu > notifiers that connects the linux pte to the spte/npte works for more > than swapping. > yea, without mmu notifiers this driver cant work safely and effective for kvm it can only work for normal applications such as qemu without the mmu notifers. -- woof. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/5] Memory merging driver for Linux [not found] ` <20080123120510.4014e382-Fuq27k0DHcCSkoNiqTzCLQ@public.gmane.org> 2008-01-23 17:54 ` Andrea Arcangeli @ 2008-01-24 5:38 ` Avi Kivity 1 sibling, 0 replies; 8+ messages in thread From: Avi Kivity @ 2008-01-24 5:38 UTC (permalink / raw) To: Rik van Riel Cc: andrea-atKUWr5tajBWk0Htik3J/w, yaniv-atKUWr5tajBWk0Htik3J/w, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, kvm-devel Rik van Riel wrote: > On Mon, 21 Jan 2008 18:05:53 +0200 > Izik Eidus <izike-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: > > >> i added 2 new functions to the kernel >> one: >> page_wrprotect() make the page as read only by setting the ptes point to >> it as read only. >> second: >> replace_page() - replace the pte mapping related to vm area between two >> pages >> > > How will this work on CPUs with nested paging support, where the > CPU does the guest -> physical address translation? (opposed to > having shadow page tables) > > Nested page tables are very similar to real-mode shadow paging: both translate guest physical addresses to host physical addreses. In any case, the merge driver is oblivious to the paging method used, it works at the Linux pte level and relies on mmu notifiers to keep everything in sync. -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC][PATCH 0/5] Memory merging driver for Linux [not found] ` <4794C2E1.8040607-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2008-01-23 17:05 ` Rik van Riel @ 2008-01-23 23:10 ` Chris Wright [not found] ` <20080123231037.GA3629-JyIX8gxvWYPr2PDY2+4mTGD2FQJk+8+b@public.gmane.org> 1 sibling, 1 reply; 8+ messages in thread From: Chris Wright @ 2008-01-23 23:10 UTC (permalink / raw) To: Izik Eidus Cc: andrea-atKUWr5tajBWk0Htik3J/w, yaniv-atKUWr5tajBWk0Htik3J/w, kvm-devel, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, avi-atKUWr5tajBWk0Htik3J/w * Izik Eidus (izike-atKUWr5tajBWk0Htik3J/w@public.gmane.org) wrote: > this module find this identical data (pages) and merge them into one > single page > this new page is write protected so in any case the guest will try to > write to it do_wp_page will duplicate the page What happens if you've merged more pages than you can recover on write faults? ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20080123231037.GA3629-JyIX8gxvWYPr2PDY2+4mTGD2FQJk+8+b@public.gmane.org>]
* Re: [RFC][PATCH 0/5] Memory merging driver for Linux [not found] ` <20080123231037.GA3629-JyIX8gxvWYPr2PDY2+4mTGD2FQJk+8+b@public.gmane.org> @ 2008-01-24 5:40 ` Avi Kivity [not found] ` <479824EA.7070603-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Avi Kivity @ 2008-01-24 5:40 UTC (permalink / raw) To: Chris Wright Cc: andrea-atKUWr5tajBWk0Htik3J/w, yaniv-atKUWr5tajBWk0Htik3J/w, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, kvm-devel Chris Wright wrote: > * Izik Eidus (izike-atKUWr5tajBWk0Htik3J/w@public.gmane.org) wrote: > >> this module find this identical data (pages) and merge them into one >> single page >> this new page is write protected so in any case the guest will try to >> write to it do_wp_page will duplicate the page >> > > What happens if you've merged more pages than you can recover on write > faults? > You start to swap. Just like Linux when you start to write on fork()ed memory. A management application may start taking measures, like inflating balloons and migrating to other hosts, but swapping is needed as a last resort measure. -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <479824EA.7070603-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: [RFC][PATCH 0/5] Memory merging driver for Linux [not found] ` <479824EA.7070603-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2008-01-24 9:26 ` Izik Eidus 0 siblings, 0 replies; 8+ messages in thread From: Izik Eidus @ 2008-01-24 9:26 UTC (permalink / raw) To: Avi Kivity Cc: andrea-atKUWr5tajBWk0Htik3J/w, yaniv-atKUWr5tajBWk0Htik3J/w, kvm-devel, Chris Wright, linux-mm-Bw31MaZKKs3YtjvyW6yDsg Avi Kivity wrote: > Chris Wright wrote: >> * Izik Eidus (izike-atKUWr5tajBWk0Htik3J/w@public.gmane.org) wrote: >> >>> this module find this identical data (pages) and merge them into one >>> single page >>> this new page is write protected so in any case the guest will try >>> to write to it do_wp_page will duplicate the page >>> >> >> What happens if you've merged more pages than you can recover on write >> faults? >> > > You start to swap. Just like Linux when you start to write on > fork()ed memory. > > A management application may start taking measures, like inflating > balloons and migrating to other hosts, but swapping is needed as a > last resort measure. > yes, write faults are getting into do_wp_page() that in turn create a new anonymous/swappable page so it is safe. -- woof. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-01-24 9:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-21 16:05 [RFC][PATCH 0/5] Memory merging driver for Linux Izik Eidus
[not found] ` <4794C2E1.8040607-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-23 17:05 ` Rik van Riel
[not found] ` <20080123120510.4014e382-Fuq27k0DHcCSkoNiqTzCLQ@public.gmane.org>
2008-01-23 17:54 ` Andrea Arcangeli
[not found] ` <20080123175444.GH7141-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org>
2008-01-23 18:11 ` Izik Eidus
2008-01-24 5:38 ` Avi Kivity
2008-01-23 23:10 ` Chris Wright
[not found] ` <20080123231037.GA3629-JyIX8gxvWYPr2PDY2+4mTGD2FQJk+8+b@public.gmane.org>
2008-01-24 5:40 ` Avi Kivity
[not found] ` <479824EA.7070603-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-01-24 9:26 ` Izik Eidus
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox