* copy on write memory
@ 2004-11-15 22:01 Peri Hankey
2004-11-16 0:35 ` Rik van Riel
0 siblings, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-15 22:01 UTC (permalink / raw)
To: xen-devel
Hello
UML has developed a SKAS mode of operation, in which (as far as I
understand it) a process can be created much in the same way as any
other Linux process, except that it runs (under an UML kernel) with a
separate kernel address space.
It occurred to me that the equivalent in the Xen world would be to use
one Linux xenU domain purely as a page-table manager for a collection of
separate xenU domains that are expected or known have similar process
populations. The process creation domain would have a large allocation
of memory which it would use to populate page tables applying standard
copy on write semantics, but the processes to which these page tables
belong would effectively run in the separate execution domains to which
they belong.
A similar arrangement of one page-table management domain with multiple
separate execution domains could be used for other kernels such as
netbsd, which have similar copy on write semantics.
Xen itself would only need to provide a mechanism for managing the trade
between a page-manager domain and its execution domains, and would not
need to replicate the functionality of any particular system.
As the page-table manager domain also knows which disk pages are clean,
and which have been written to, it is also in a good position to manage
copy on write semantics for filesystem storage.
Is this a feasible way of looking at it?
Regards
Peri Hankey
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-15 22:01 copy on write memory Peri Hankey
@ 2004-11-16 0:35 ` Rik van Riel
2004-11-16 9:44 ` Peri Hankey
2004-11-16 18:09 ` Adam Heath
0 siblings, 2 replies; 26+ messages in thread
From: Rik van Riel @ 2004-11-16 0:35 UTC (permalink / raw)
To: Peri Hankey; +Cc: xen-devel
On Mon, 15 Nov 2004, Peri Hankey wrote:
> It occurred to me that the equivalent in the Xen world would be to use
> one Linux xenU domain purely as a page-table manager for a collection of
> separate xenU domains that are expected or known have similar process
> populations.
UML copy on write is only for filesystems, isn't it ?
The Xen equivalent would be cloning the xenU root filesystem
as an LVM snapshot, from a read-only LVM snapshot. Then each
xenU virtual system would only use the disk space it writes
to and no more.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 0:35 ` Rik van Riel
@ 2004-11-16 9:44 ` Peri Hankey
2004-11-16 9:51 ` Peri Hankey
` (2 more replies)
2004-11-16 18:09 ` Adam Heath
1 sibling, 3 replies; 26+ messages in thread
From: Peri Hankey @ 2004-11-16 9:44 UTC (permalink / raw)
To: Rik van Riel; +Cc: xen-devel
Certainly, UML is best known for copy on write filesystems. But UML's
SKAS mode is a different way of managing memory, and that was the
starting point for this proposal, which is about using the copy on write
semantics of Linux memory management to share memory pages between Xen
domains. I see now that one person's starting point may prove to be
another's red herring.
The notion is that there are applications of Xen where there would be
very many virtual computers running the same set of applications for
much of the time (eg standard web hosting, honeypots).
In the standard case of a single OS on a single machine, all processes
are loaded from a common filesystem, so the OS knows which page sets
start out with shared information. It can use this information to share
pages between processes so long as those pages are not written to,
allocating distinct pages to distinct processes when those pages are
written to - the copy on write semantics.
In the Xen case, you don't want to reinvent and reimplement existing
mechanisms, especially as these may differ in subtle ways from one guest
operating system to another. So I suggest it would make sense to create
mechanisms that allow some Xen domains to operate as memory management
servers to groups of related domains.
In effect we create a memory manager privilege. Suppose we have a memory
manager domain M1 with a collection of memory client domains M1-x, where
each memory client domain has its own kernel address space KASx, a set
of modified pages Wx and a set of shared pages Rx. Then the situation we
want to see is this:
M1 doesn't actually execute any application code, just manages memory for its clients
M1-a executes application code in (KASa, Ra, Wa) calling on M1 for memory management
M1-b executes application code in (KASb, Rb, Wb) calling on M1 for memory management
M1-c executes application code in (KASc, Rc, Wc) calling on M1 for memory management
...
There is a connection with copy-on-write storage. The execution state of
a client domain x can be frozen as:
'Rx' which identifies a set of pages that are shared read-only with similar clients
KASx which is the kernel address space page set for this domain
Wx which is the set of user address space pages that have been written to by this domain
The total long-term state of a client domain can be characterised by adding
'SRx' which identifies blocks in read-only storage that are shared with similar clients
SWx which identifies which files in read-write storage that belong to this client domain.
It seems to me that a memory manager domain, which pretty much has to
serve pages initially drawn from a filesystem that is shared read-only
between its clients, is also in a position to manage copy-on-write use
of that file system for its clients, as it already knows which blocks
and clean and which are dirty.
Peri
> On Mon, 15 Nov 2004, Peri Hankey wrote:
>
>> It occurred to me that the equivalent in the Xen world would be to
>> use one Linux xenU domain purely as a page-table manager for a
>> collection of separate xenU domains that are expected or known have
>> similar process populations.
>
>
> UML copy on write is only for filesystems, isn't it ?
>
> The Xen equivalent would be cloning the xenU root filesystem
> as an LVM snapshot, from a read-only LVM snapshot. Then each
> xenU virtual system would only use the disk space it writes
> to and no more.
>
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 9:44 ` Peri Hankey
@ 2004-11-16 9:51 ` Peri Hankey
2004-11-16 15:27 ` urmk
2004-11-16 18:10 ` copy on write memory Adam Heath
2 siblings, 0 replies; 26+ messages in thread
From: Peri Hankey @ 2004-11-16 9:51 UTC (permalink / raw)
To: Rik van Riel; +Cc: xen-devel
Sorry, I had a copy-and-pasto (variant of typo) in my previous message -
The total long-term state of a client domain can be characterised by adding
'SRx' which identifies blocks in read-only storage that are shared with
similar clients
SWx which is the set of blocks in read-write storage that belong to
this client domain.
Peri Hankey wrote:
> Certainly, UML is best known for copy on write filesystems. But UML's
> SKAS mode is a different way of managing memory, and that was the
> starting point for this proposal, which is about using the copy on
> write semantics of Linux memory management to share memory pages
> between Xen domains. I see now that one person's starting point may
> prove to be another's red herring.
>
> The notion is that there are applications of Xen where there would be
> very many virtual computers running the same set of applications for
> much of the time (eg standard web hosting, honeypots).
>
> In the standard case of a single OS on a single machine, all processes
> are loaded from a common filesystem, so the OS knows which page sets
> start out with shared information. It can use this information to
> share pages between processes so long as those pages are not written
> to, allocating distinct pages to distinct processes when those pages
> are written to - the copy on write semantics.
>
> In the Xen case, you don't want to reinvent and reimplement existing
> mechanisms, especially as these may differ in subtle ways from one
> guest operating system to another. So I suggest it would make sense to
> create mechanisms that allow some Xen domains to operate as memory
> management servers to groups of related domains.
>
> In effect we create a memory manager privilege. Suppose we have a
> memory manager domain M1 with a collection of memory client domains
> M1-x, where each memory client domain has its own kernel address space
> KASx, a set of modified pages Wx and a set of shared pages Rx. Then
> the situation we want to see is this:
>
> M1 doesn't actually execute any application code, just manages
> memory for its clients
> M1-a executes application code in (KASa, Ra, Wa) calling on M1 for
> memory management
> M1-b executes application code in (KASb, Rb, Wb) calling on M1 for
> memory management
> M1-c executes application code in (KASc, Rc, Wc) calling on M1 for
> memory management
> ...
>
> There is a connection with copy-on-write storage. The execution state
> of a client domain x can be frozen as:
>
> 'Rx' which identifies a set of pages that are shared read-only with
> similar clients
> KASx which is the kernel address space page set for this domain
> Wx which is the set of user address space pages that have been
> written to by this domain
>
> The total long-term state of a client domain can be characterised by
> adding
>
> 'SRx' which identifies blocks in read-only storage that are shared
> with similar clients
> SWx which identifies which files in read-write storage that belong
> to this client domain.
>
> It seems to me that a memory manager domain, which pretty much has to
> serve pages initially drawn from a filesystem that is shared read-only
> between its clients, is also in a position to manage copy-on-write use
> of that file system for its clients, as it already knows which blocks
> and clean and which are dirty.
>
> Peri
>
>> On Mon, 15 Nov 2004, Peri Hankey wrote:
>>
>>> It occurred to me that the equivalent in the Xen world would be to
>>> use one Linux xenU domain purely as a page-table manager for a
>>> collection of separate xenU domains that are expected or known have
>>> similar process populations.
>>
>>
>>
>> UML copy on write is only for filesystems, isn't it ?
>>
>> The Xen equivalent would be cloning the xenU root filesystem
>> as an LVM snapshot, from a read-only LVM snapshot. Then each
>> xenU virtual system would only use the disk space it writes
>> to and no more.
>>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 9:44 ` Peri Hankey
2004-11-16 9:51 ` Peri Hankey
@ 2004-11-16 15:27 ` urmk
2004-11-16 16:17 ` Mark A. Williamson
2004-11-18 16:56 ` Peri Hankey
2004-11-16 18:10 ` copy on write memory Adam Heath
2 siblings, 2 replies; 26+ messages in thread
From: urmk @ 2004-11-16 15:27 UTC (permalink / raw)
To: Peri Hankey; +Cc: Rik van Riel, xen-devel
> The notion is that there are applications of Xen where there would be
> very many virtual computers running the same set of applications for
> much of the time (eg standard web hosting, honeypots).
> * snip *
> In the Xen case, you don't want to reinvent and reimplement existing
> mechanisms, especially as these may differ in subtle ways from one guest
> operating system to another. So I suggest it would make sense to create
> mechanisms that allow some Xen domains to operate as memory management
> servers to groups of related domains.
> *snip*
> It seems to me that a memory manager domain, which pretty much has to
> serve pages initially drawn from a filesystem that is shared read-only
> between its clients, is also in a position to manage copy-on-write use
> of that file system for its clients, as it already knows which blocks
> and clean and which are dirty.
I'd thought I'd sent this before but it's not in the archives so I'll
send it again... If this made it to the list once already, my
apologies:
On the s/390 platform, we have a new filesystem called XIP2. This is a
shared-memory filesystem based on ext2, which can be shared among any
number of guests. Basically you populate the XIP2 fs and then "freeze"
it and share it.
Thats all pretty standard, but here comes the magic: Any data in the
XIP2 filesystem is not copied into the cached memory of the guest.
XIP = eXecute In Place. Binaries are run directly from the shared
memory and not cached locally, so if you throw common services and
libraries (like apache, JVM, etc from your example) into it you get the
binaries themselves shared with basically no cost to the guests.
Not to say an automatic memory manager to determine when it could do
COW of ram isn't a good avenue to pursue as well, but XIP is a fairly
good starting point for most situations that you'd want shared memory
like this, I think.
The XIP2 source code is in the IBM patches to the kernel:
http://oss.software.ibm.com/linux390/linux-2.6.5-s390-04-april2004.shtml
and by this point it's quite likely already in the bitkeeper tree as
well, they've been pushing updates upstream.
-m
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 15:27 ` urmk
@ 2004-11-16 16:17 ` Mark A. Williamson
2004-11-18 16:56 ` Peri Hankey
1 sibling, 0 replies; 26+ messages in thread
From: Mark A. Williamson @ 2004-11-16 16:17 UTC (permalink / raw)
To: xen-devel; +Cc: urmk, Peri Hankey, Rik van Riel
> Thats all pretty standard, but here comes the magic: Any data in the
> XIP2 filesystem is not copied into the cached memory of the guest.
> XIP = eXecute In Place. Binaries are run directly from the shared
> memory and not cached locally, so if you throw common services and
> libraries (like apache, JVM, etc from your example) into it you get the
> binaries themselves shared with basically no cost to the guests.
We've been talking about implementing something similar (though we'd talked
about implementing it at the block level). We'd have a shared buffer cache
for read-only data, which domains could opt in to. If they are sharing
filesystem blocks (e.g. from a CoW LVM volume) the domains would get a shared
read only mapping to the same memory.
Cheers for the info,
Mark
> Not to say an automatic memory manager to determine when it could do
> COW of ram isn't a good avenue to pursue as well, but XIP is a fairly
> good starting point for most situations that you'd want shared memory
> like this, I think.
>
> The XIP2 source code is in the IBM patches to the kernel:
> http://oss.software.ibm.com/linux390/linux-2.6.5-s390-04-april2004.shtml
>
> and by this point it's quite likely already in the bitkeeper tree as
> well, they've been pushing updates upstream.
>
> -m
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 0:35 ` Rik van Riel
2004-11-16 9:44 ` Peri Hankey
@ 2004-11-16 18:09 ` Adam Heath
2004-11-16 18:39 ` Matt Ayres
1 sibling, 1 reply; 26+ messages in thread
From: Adam Heath @ 2004-11-16 18:09 UTC (permalink / raw)
Cc: xen-devel@lists.sourceforge.net
On Mon, 15 Nov 2004, Rik van Riel wrote:
> On Mon, 15 Nov 2004, Peri Hankey wrote:
>
> > It occurred to me that the equivalent in the Xen world would be to use
> > one Linux xenU domain purely as a page-table manager for a collection of
> > separate xenU domains that are expected or known have similar process
> > populations.
>
> UML copy on write is only for filesystems, isn't it ?
And, since UML can use mmap access, if there are shared filesystems, it can
reduce memory pressure. Maybe that is something that can be worked on for
xen.
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 9:44 ` Peri Hankey
2004-11-16 9:51 ` Peri Hankey
2004-11-16 15:27 ` urmk
@ 2004-11-16 18:10 ` Adam Heath
2 siblings, 0 replies; 26+ messages in thread
From: Adam Heath @ 2004-11-16 18:10 UTC (permalink / raw)
To: Peri Hankey; +Cc: xen-devel@lists.sourceforge.net
On Tue, 16 Nov 2004, Peri Hankey wrote:
> Certainly, UML is best known for copy on write filesystems. But UML's
> SKAS mode is a different way of managing memory, and that was the
> starting point for this proposal, which is about using the copy on write
> semantics of Linux memory management to share memory pages between Xen
> domains. I see now that one person's starting point may prove to be
> another's red herring.
SKAS does *not* do COW for shared memory. That's completely separate.
You're thinking of mmap access for ubd, and for hostfs.
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 18:09 ` Adam Heath
@ 2004-11-16 18:39 ` Matt Ayres
0 siblings, 0 replies; 26+ messages in thread
From: Matt Ayres @ 2004-11-16 18:39 UTC (permalink / raw)
To: Adam Heath; +Cc: xen-devel@lists.sourceforge.net
On Tue, 2004-11-16 at 12:09 -0600, Adam Heath wrote:
> On Mon, 15 Nov 2004, Rik van Riel wrote:
>
> > On Mon, 15 Nov 2004, Peri Hankey wrote:
> >
> > > It occurred to me that the equivalent in the Xen world would be to use
> > > one Linux xenU domain purely as a page-table manager for a collection of
> > > separate xenU domains that are expected or known have similar process
> > > populations.
> >
> > UML copy on write is only for filesystems, isn't it ?
>
> And, since UML can use mmap access, if there are shared filesystems, it can
> reduce memory pressure. Maybe that is something that can be worked on for
> xen.
>
If you're going to design a system like this I believe it's important to
also consider a method of re-sharing filesystems. With UML after a
length of time all the software will not be shared anymore and you see
no memory gains. There really needs to be a way for things to be re-
linked. VServer does this on a filesystem level by doing a name,
permissions, and SHA1 hash comparision. I think for a CoW file-backed
disk image you'd have to do block level though. This way if httpd
binary get updated (ie. via yum/apt/up2date) on all domains that have a
shared backing store there should be a method to re-share that httpd
binary.
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-16 15:27 ` urmk
2004-11-16 16:17 ` Mark A. Williamson
@ 2004-11-18 16:56 ` Peri Hankey
2004-11-18 17:11 ` urmk
1 sibling, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-18 16:56 UTC (permalink / raw)
To: urmk; +Cc: Rik van Riel, xen-devel
It's true, you did mention it before, but I was looking for something
else at the time. What I have in mind doesn't require so much
configuration. On the other hand it doesn't exist, and this does.
But the patch is against quite an old source, and it doesn't compile
straight out of the box. Do you know if there are updated patches
against 2.6.9?
I get this error (which I haven't yet examined in detail):
CC [M] fs/xip2fs/file.o
fs/xip2fs/file.c: In function `xip2_do_file_read':
fs/xip2fs/file.c:69: error: structure has no member named `buf'
fs/xip2fs/file.c: In function `__xip2_file_aio_read':
fs/xip2fs/file.c:119: error: structure has no member named `buf'
fs/xip2fs/file.c: In function `xip2_file_sendfile':
fs/xip2fs/file.c:302: error: structure has no member named `buf'
This was against xen-2.0.1 as of today 18 Nov 2004
Regards
Peri
urmk@reason.marist.edu wrote:
>>The notion is that there are applications of Xen where there would be
>>very many virtual computers running the same set of applications for
>>much of the time (eg standard web hosting, honeypots).
>>* snip *
>>In the Xen case, you don't want to reinvent and reimplement existing
>>mechanisms, especially as these may differ in subtle ways from one guest
>>operating system to another. So I suggest it would make sense to create
>>mechanisms that allow some Xen domains to operate as memory management
>>servers to groups of related domains.
>>*snip*
>>It seems to me that a memory manager domain, which pretty much has to
>>serve pages initially drawn from a filesystem that is shared read-only
>>between its clients, is also in a position to manage copy-on-write use
>>of that file system for its clients, as it already knows which blocks
>>and clean and which are dirty.
>>
>>
>
>I'd thought I'd sent this before but it's not in the archives so I'll
>send it again... If this made it to the list once already, my
>apologies:
>
>On the s/390 platform, we have a new filesystem called XIP2. This is a
>shared-memory filesystem based on ext2, which can be shared among any
>number of guests. Basically you populate the XIP2 fs and then "freeze"
>it and share it.
>
>Thats all pretty standard, but here comes the magic: Any data in the
>XIP2 filesystem is not copied into the cached memory of the guest.
>XIP = eXecute In Place. Binaries are run directly from the shared
>memory and not cached locally, so if you throw common services and
>libraries (like apache, JVM, etc from your example) into it you get the
>binaries themselves shared with basically no cost to the guests.
>
>Not to say an automatic memory manager to determine when it could do
>COW of ram isn't a good avenue to pursue as well, but XIP is a fairly
>good starting point for most situations that you'd want shared memory
>like this, I think.
>
>The XIP2 source code is in the IBM patches to the kernel:
>http://oss.software.ibm.com/linux390/linux-2.6.5-s390-04-april2004.shtml
>
>and by this point it's quite likely already in the bitkeeper tree as
>well, they've been pushing updates upstream.
>
>-m
>
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by: InterSystems CACHE
>FREE OODBMS DOWNLOAD - A multidimensional database that combines
>robust object and relational technologies, making it a perfect match
>for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>
>
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-18 16:56 ` Peri Hankey
@ 2004-11-18 17:11 ` urmk
2004-11-18 17:25 ` Keir Fraser
2004-11-18 18:15 ` Peri Hankey
0 siblings, 2 replies; 26+ messages in thread
From: urmk @ 2004-11-18 17:11 UTC (permalink / raw)
To: Peri Hankey; +Cc: Rik van Riel, xen-devel
> It's true, you did mention it before, but I was looking for something
> else at the time. What I have in mind doesn't require so much
> configuration. On the other hand it doesn't exist, and this does.
Ah. I couldn't remember if I'd sent it or not (or if I'd even tried to send
it from an address that was on the list, a few go to the same mailbox at
the moment)
> But the patch is against quite an old source, and it doesn't compile
> straight out of the box. Do you know if there are updated patches
> against 2.6.9?
I can check.
> I get this error (which I haven't yet examined in detail):
>
> CC [M] fs/xip2fs/file.o
> fs/xip2fs/file.c: In function `xip2_do_file_read':
> fs/xip2fs/file.c:69: error: structure has no member named `buf'
> fs/xip2fs/file.c: In function `__xip2_file_aio_read':
> fs/xip2fs/file.c:119: error: structure has no member named `buf'
> fs/xip2fs/file.c: In function `xip2_file_sendfile':
> fs/xip2fs/file.c:302: error: structure has no member named `buf'
>
> This was against xen-2.0.1 as of today 18 Nov 2004
I highly doubt that it will be directly applicable to xen - the entire backend
mechanism is linked into the z/VM shared memory system between guests. I was
more pointing it out as a probable jumping off point (most of the work is done,
it just needs to use the xen memory sharing instead) and as a workable
concept for a less-cpu-intensive copy-on-write mechanism.
I'll take a look for a newer patch and see if I can scrape up some time to
apply the backend to xen, but I don't know when I'll get a chance -- don't let
me hold anyone else up who was considering working on it.
-m
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-18 17:11 ` urmk
@ 2004-11-18 17:25 ` Keir Fraser
2004-11-18 18:41 ` Kip Macy
2004-11-18 18:15 ` Peri Hankey
1 sibling, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-18 17:25 UTC (permalink / raw)
To: urmk; +Cc: Peri Hankey, Rik van Riel, xen-devel
> I highly doubt that it will be directly applicable to xen - the entire backend
> mechanism is linked into the z/VM shared memory system between guests. I was
> more pointing it out as a probable jumping off point (most of the work is done,
> it just needs to use the xen memory sharing instead) and as a workable
> concept for a less-cpu-intensive copy-on-write mechanism.
Yeah, I spotted uses of hypervisor services and things called
"DCSS"s. I guessed that xip2 must be rather z/VM specific. :-)
Is there any documentation on the semantics of DCSS and the hypervisor
services that are used? It would be interesting to see if Xen ought to
have some similar concept.
-- Keir
> I'll take a look for a newer patch and see if I can scrape up some time to
> apply the backend to xen, but I don't know when I'll get a chance -- don't let
> me hold anyone else up who was considering working on it.
>
> -m
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-18 17:11 ` urmk
2004-11-18 17:25 ` Keir Fraser
@ 2004-11-18 18:15 ` Peri Hankey
2004-11-19 10:35 ` Jacob Gorm Hansen
1 sibling, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-18 18:15 UTC (permalink / raw)
To: urmk; +Cc: Rik van Riel, xen-devel
I tried applying it in a spirit of pure optimism (naivety). You never know.
To return for one moment to the thing I was chasing. Let's assume that
in its standard configuration the Linux (or indeed some other kernel)
makes a very good job of sharing memory across processes by means of the
copy-on-write fork semantics and by exploiting the overall view that it
has of how pages/blocks are read from the filesystem and modified by
processes. Let's hope that it (or they) get even better. We would like
separate xenU domains to benefit from the excellence that exists and the
improvements that may be.
My idea is to use a memory-privileged xenU domain as a page-table
manager for a group of client domains. It would have to know about the
memory/page/block usage of every process in each of its clients as if
they were all running within it. But it would never operate as the
actual kernel for any of these processes. For those purposes each
process would operate under the supervision of a memory-client domain
that has its own kernel address space.
I haven't at present got the intimate knowledge of memory management
traffic between xen, xen0 and xenU domains that I would need to see
whether or not this idea has any legs. But I am very interested to know
what anyone thinks. No doubt the best answer is to go and study the code.
Anyway, thanks for your help, and for your quick repsonse. Much quicker
than mine.
Regards
Peri
urmk@reason.marist.edu wrote:
>>It's true, you did mention it before, but I was looking for something
>>else at the time. What I have in mind doesn't require so much
>>configuration. On the other hand it doesn't exist, and this does.
>>
>>
>
>Ah. I couldn't remember if I'd sent it or not (or if I'd even tried to send
>it from an address that was on the list, a few go to the same mailbox at
>the moment)
>
>
>
>>But the patch is against quite an old source, and it doesn't compile
>>straight out of the box. Do you know if there are updated patches
>>against 2.6.9?
>>
>>
>
>I can check.
>
>
>
>>I get this error (which I haven't yet examined in detail):
>>
>> CC [M] fs/xip2fs/file.o
>>fs/xip2fs/file.c: In function `xip2_do_file_read':
>>fs/xip2fs/file.c:69: error: structure has no member named `buf'
>>fs/xip2fs/file.c: In function `__xip2_file_aio_read':
>>fs/xip2fs/file.c:119: error: structure has no member named `buf'
>>fs/xip2fs/file.c: In function `xip2_file_sendfile':
>>fs/xip2fs/file.c:302: error: structure has no member named `buf'
>>
>>This was against xen-2.0.1 as of today 18 Nov 2004
>>
>>
>
>I highly doubt that it will be directly applicable to xen - the entire backend
>mechanism is linked into the z/VM shared memory system between guests. I was
>more pointing it out as a probable jumping off point (most of the work is done,
>it just needs to use the xen memory sharing instead) and as a workable
>concept for a less-cpu-intensive copy-on-write mechanism.
>
>I'll take a look for a newer patch and see if I can scrape up some time to
>apply the backend to xen, but I don't know when I'll get a chance -- don't let
>me hold anyone else up who was considering working on it.
>
>-m
>
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by: InterSystems CACHE
>FREE OODBMS DOWNLOAD - A multidimensional database that combines
>robust object and relational technologies, making it a perfect match
>for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>
>
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-18 17:25 ` Keir Fraser
@ 2004-11-18 18:41 ` Kip Macy
2004-11-18 18:55 ` Keir Fraser
0 siblings, 1 reply; 26+ messages in thread
From: Kip Macy @ 2004-11-18 18:41 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
>
> Is there any documentation on the semantics of DCSS and the hypervisor
> services that are used? It would be interesting to see if Xen ought to
> have some similar concept.
This doesn't really answer your question - but the closest your likely
to get is at their pubs:
http://www.vm.ibm.com/pubs/
>
> > I'll take a look for a newer patch and see if I can scrape up some time to
> > apply the backend to xen, but I don't know when I'll get a chance -- don't let
> > me hold anyone else up who was considering working on it.
> >
> > -m
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: InterSystems CACHE
> > FREE OODBMS DOWNLOAD - A multidimensional database that combines
> > robust object and relational technologies, making it a perfect match
> > for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-18 18:41 ` Kip Macy
@ 2004-11-18 18:55 ` Keir Fraser
2004-11-18 19:16 ` Kip Macy
0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-18 18:55 UTC (permalink / raw)
To: Kip Macy; +Cc: Keir Fraser, xen-devel
> >
> > Is there any documentation on the semantics of DCSS and the hypervisor
> > services that are used? It would be interesting to see if Xen ought to
> > have some similar concept.
>
> This doesn't really answer your question - but the closest your likely
> to get is at their pubs:
>
> http://www.vm.ibm.com/pubs/
There's a manual called "How to Improve the Performance of Linux on
z/VM with Execute-In-Place Technology", which is easily googled for.
Turns out that DCSSs are pretty simple things -- just a blob of
physmem that is mapped into phys address space of every Linux
instance, and contains a read-only filesystem image. xip2fs is a
simple read-only fs that serves mmap() requests directly out of the
shared DCSS rather than the private block cache. Looks tedious to
manage because the whole filesystem takes up space in every Linux
memory map all the time, no matter whether a particular file/block is
being used. So you have to be careful to put only frequently accessed
and highly shared files in the filesystem and use twisted symlinks to
link in from their usual location into the mounted xip2fs filesystem.
-- Keir
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-18 18:55 ` Keir Fraser
@ 2004-11-18 19:16 ` Kip Macy
0 siblings, 0 replies; 26+ messages in thread
From: Kip Macy @ 2004-11-18 19:16 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
It doesn't sound like the performance benefit would justify the added
management complexity for xen.
-Kip
On Thu, 18 Nov 2004, Keir Fraser wrote:
> > >
> > > Is there any documentation on the semantics of DCSS and the hypervisor
> > > services that are used? It would be interesting to see if Xen ought to
> > > have some similar concept.
> >
> > This doesn't really answer your question - but the closest your likely
> > to get is at their pubs:
> >
> > http://www.vm.ibm.com/pubs/
>
> There's a manual called "How to Improve the Performance of Linux on
> z/VM with Execute-In-Place Technology", which is easily googled for.
> Turns out that DCSSs are pretty simple things -- just a blob of
> physmem that is mapped into phys address space of every Linux
> instance, and contains a read-only filesystem image. xip2fs is a
> simple read-only fs that serves mmap() requests directly out of the
> shared DCSS rather than the private block cache. Looks tedious to
> manage because the whole filesystem takes up space in every Linux
> memory map all the time, no matter whether a particular file/block is
> being used. So you have to be careful to put only frequently accessed
> and highly shared files in the filesystem and use twisted symlinks to
> link in from their usual location into the mounted xip2fs filesystem.
>
> -- Keir
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-18 18:15 ` Peri Hankey
@ 2004-11-19 10:35 ` Jacob Gorm Hansen
2004-11-19 10:59 ` Keir Fraser
0 siblings, 1 reply; 26+ messages in thread
From: Jacob Gorm Hansen @ 2004-11-19 10:35 UTC (permalink / raw)
To: Peri Hankey; +Cc: urmk, Rik van Riel, xen-devel
Peri Hankey wrote:
> My idea is to use a memory-privileged xenU domain as a page-table
> manager for a group of client domains. It would have to know about the
> memory/page/block usage of every process in each of its clients as if
> they were all running within it. But it would never operate as the
> actual kernel for any of these processes. For those purposes each
> process would operate under the supervision of a memory-client domain
> that has its own kernel address space.
>
> I haven't at present got the intimate knowledge of memory management
> traffic between xen, xen0 and xenU domains that I would need to see
> whether or not this idea has any legs. But I am very interested to know
> what anyone thinks. No doubt the best answer is to go and study the code.
I think the main point distinguishing the design of Xen from that of
older microkernels (such as L4), is that there is no memory management
traffic between domains, and it should probably stay that way to retain
the good performance and performance isolation that Xen has to offer.
I would like to re-plug the idea I proposed earlier, on the topic of
sharing read-only pages across domains:
Create a new variant of the HYPERVISOR_update_va_mapping (or of the
batched variant, or both), lets call it HYPERVISOR_make_ro_and_share.
When Xen gets this hypercall, it does the following:
- Turn the mapping into a read-only mapping.
- Look up the contents of the machine page in the mapping in a
machine-global hash table or other suitable type of associative data
strucure.
- If there is a match, make the user's mapping point to the matching
page. Also increase a reference/sharing count of the matching page.
- If there is no match, insert the user's page into the hash table.
- Revoke the user's ownership of the old page.
If the mapping is write-accessed inside the user's domain, the user will
be responsible for CoWing the page into a newly allocated one. As an
optimisation, one of the AVL bits can be used for tracking this type of
fault. When you update the mapping, Xen will decrease the
reference/sharing count of the previously shared page, and if the
refcount reaches zero free the page, and remove it from the hash table.
Jacob
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-19 10:35 ` Jacob Gorm Hansen
@ 2004-11-19 10:59 ` Keir Fraser
2004-11-19 12:02 ` Jacob Gorm Hansen
0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-19 10:59 UTC (permalink / raw)
To: Jacob Gorm Hansen; +Cc: Peri Hankey, urmk, Rik van Riel, xen-devel
This would end up pushing policy into Xen -- what happens when memory
is fully committed, some domain has given up a bunch of his
exclusively-owned pages by buying into the shared table, and now he
has a slew of CoW faults and wants to get some of his exclusive pages
back from Xen, thankyou very much?
At this point Xen needs some reclamation policy (saying that Xen will
guarantee to have enough pages around to satisfy these requests is not
possible, since the point of the sharing is to be able to
"over-reserve" memory). It needs to decide which pages to reclaim,
then have a mechanism for reclaiming them which will probably involve
communicating up to the domains concerned in advance and setting
timeouts by when they must relinquish their mappings.
This is the kind of thing I would prefer to implement outside Xen.
-- Keir
> I would like to re-plug the idea I proposed earlier, on the topic of
> sharing read-only pages across domains:
>
> Create a new variant of the HYPERVISOR_update_va_mapping (or of the
> batched variant, or both), lets call it HYPERVISOR_make_ro_and_share.
>
> When Xen gets this hypercall, it does the following:
>
> - Turn the mapping into a read-only mapping.
>
> - Look up the contents of the machine page in the mapping in a
> machine-global hash table or other suitable type of associative data
> strucure.
>
> - If there is a match, make the user's mapping point to the matching
> page. Also increase a reference/sharing count of the matching page.
>
> - If there is no match, insert the user's page into the hash table.
>
> - Revoke the user's ownership of the old page.
>
> If the mapping is write-accessed inside the user's domain, the user will
> be responsible for CoWing the page into a newly allocated one. As an
> optimisation, one of the AVL bits can be used for tracking this type of
> fault. When you update the mapping, Xen will decrease the
> reference/sharing count of the previously shared page, and if the
> refcount reaches zero free the page, and remove it from the hash table.
>
> Jacob
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-19 10:59 ` Keir Fraser
@ 2004-11-19 12:02 ` Jacob Gorm Hansen
2004-11-19 14:50 ` Keir Fraser
0 siblings, 1 reply; 26+ messages in thread
From: Jacob Gorm Hansen @ 2004-11-19 12:02 UTC (permalink / raw)
To: Keir Fraser; +Cc: Peri Hankey, urmk, Rik van Riel, xen-devel
Keir Fraser wrote:
> This would end up pushing policy into Xen -- what happens when memory
> is fully committed, some domain has given up a bunch of his
> exclusively-owned pages by buying into the shared table, and now he
> has a slew of CoW faults and wants to get some of his exclusive pages
> back from Xen, thankyou very much?
>
> At this point Xen needs some reclamation policy (saying that Xen will
> guarantee to have enough pages around to satisfy these requests is not
> possible, since the point of the sharing is to be able to
> "over-reserve" memory). It needs to decide which pages to reclaim,
> then have a mechanism for reclaiming them which will probably involve
> communicating up to the domains concerned in advance and setting
> timeouts by when they must relinquish their mappings.
>
> This is the kind of thing I would prefer to implement outside Xen.
Could the same thing not work using an event-channel rather than a
hypercall then? I guess you basically do the same when giving your
pages away for a driver to fill them up with data?
My main point is that the domains have better knowledge about what pages
are likely to be shareable than dom0 or Xen has, and so should volunteer
to share them, and somehow be rewarded.
The problem of reclamation-policy will exist for any solution that
over-reserves memory, including the transparent VMWare system. For some
pages, like the guest OS kernel text area, it would be ok to remove
these pages from the domain's allowance for good -- it will not need to
CoW these, and the domain builder could simply build that part of the
domain from shared pages.
Perhaps this should just be a one-way street, you give up pages to be
nice to others (and get cheaper hosting or whatever kind of reward you
can think of in return), and then you lose the right to write to them
for good. Should you need more writable pages, you will have to re-grow
your reservation, and if that fails you will need to flush some slabs or
buffer caches or or page stuff to disk or whatever you do in Linux when
you have memory pressure. Ultimately you may want to migrate to a less
loaded machine.
It seems to me any other kind of solution will allow a malicious domain
to affect the performance of innocent domains by repeatedly sharing and
unsharing its pages (whether by explicit hypercall or by placing popular
vs random data in them).
Jacob
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-19 12:02 ` Jacob Gorm Hansen
@ 2004-11-19 14:50 ` Keir Fraser
2004-11-22 12:42 ` Jacob Gorm Hansen
0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-19 14:50 UTC (permalink / raw)
To: Jacob Gorm Hansen; +Cc: Keir Fraser, Peri Hankey, urmk, Rik van Riel, xen-devel
> Could the same thing not work using an event-channel rather than a
> hypercall then? I guess you basically do the same when giving your
> pages away for a driver to fill them up with data?
> My main point is that the domains have better knowledge about what pages
> are likely to be shareable than dom0 or Xen has, and so should volunteer
> to share them, and somehow be rewarded.
Equally, a centralised "buffer cache" domain can see request traffic
and observe empirically what pages are most beneficial to share. :-)
Both ways round could be interesting to experiment with though.
> The problem of reclamation-policy will exist for any solution that
> over-reserves memory, including the transparent VMWare system. For some
> pages, like the guest OS kernel text area, it would be ok to remove
> these pages from the domain's allowance for good -- it will not need to
> CoW these, and the domain builder could simply build that part of the
> domain from shared pages.
Well, you also can over-commit on stuff that is read-only and fault in
on demand, just as you can demand-CoW writable stuff e.g., no need to
have all of kernel or glibc in memory all the time -- only hot parts
of both will be in use by the system at any time.
i.e.,
1. There is fault in from no page -> shareable page on read accesses.
2. There is fault from shareable page -> shareable page + exclusive
page on write accesses.
Both of these require extra allocation of memory.
> Perhaps this should just be a one-way street, you give up pages to be
> nice to others (and get cheaper hosting or whatever kind of reward you
> can think of in return), and then you lose the right to write to them
> for good. Should you need more writable pages, you will have to re-grow
> your reservation, and if that fails you will need to flush some slabs or
> buffer caches or or page stuff to disk or whatever you do in Linux when
> you have memory pressure. Ultimately you may want to migrate to a less
> loaded machine.
It's another way of looking at the problem (end-to-end style I
suppose). Potetntially worth investigating. :-)
Cheers,
Keir
-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: copy on write memory
2004-11-19 14:50 ` Keir Fraser
@ 2004-11-22 12:42 ` Jacob Gorm Hansen
2004-11-25 15:01 ` of cows and clones: creating domains as clones of saved state Peri Hankey
0 siblings, 1 reply; 26+ messages in thread
From: Jacob Gorm Hansen @ 2004-11-22 12:42 UTC (permalink / raw)
To: Keir Fraser; +Cc: Xen-devel
Keir Fraser wrote:
>
> Well, you also can over-commit on stuff that is read-only and fault in
> on demand, just as you can demand-CoW writable stuff e.g., no need to
> have all of kernel or glibc in memory all the time -- only hot parts
> of both will be in use by the system at any time.
>
> i.e.,
> 1. There is fault in from no page -> shareable page on read accesses.
> 2. There is fault from shareable page -> shareable page + exclusive
> page on write accesses.
> Both of these require extra allocation of memory.
But I will need some external service to give them to me right when I
need them, or I may run into the 'paged-the-pager' problem and die?
>>Perhaps this should just be a one-way street, you give up pages to be
>>nice to others (and get cheaper hosting or whatever kind of reward you
>>can think of in return), and then you lose the right to write to them
>>for good. Should you need more writable pages, you will have to re-grow
>>your reservation, and if that fails you will need to flush some slabs or
>>buffer caches or or page stuff to disk or whatever you do in Linux when
>>you have memory pressure. Ultimately you may want to migrate to a less
>>loaded machine.
>
>
> It's another way of looking at the problem (end-to-end style I
> suppose). Potetntially worth investigating. :-)
Perhaps I will have a go at some point. If going in this direction
perhaps it will make sense to do this in Xen anyway.
Jacob
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
^ permalink raw reply [flat|nested] 26+ messages in thread
* of cows and clones: creating domains as clones of saved state
2004-11-22 12:42 ` Jacob Gorm Hansen
@ 2004-11-25 15:01 ` Peri Hankey
2004-11-25 21:19 ` Keir Fraser
0 siblings, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-25 15:01 UTC (permalink / raw)
To: xen-devel
Xen can already save the state of an established domain and restore it
(in theory at least - so far it crashes for me). It seems to me that
most of the objectives discussed for a copy-on-write memory system can
be achieved by providing a mechanism for creating domains by cloning
them from an initial state that is shared read-only between them with a
copy-on-write mapping for each.
In this scheme a collection of clone domains can be created by starting
from the saved state X of some original domain, which may be configured
specifically for this purpose. The saved state X encapsulates the memory
and filesystem state of the originating domain. The memory and
filesystem components of the state X are shared read-only between the
clone domains of X, with each clone domain superimposing its own
copy-on-write mapping of the memory and filesystem states
When a clone domain is started, it immediately reconfigures itself as a
distinct independent domain by writing its own configuration data to the
copy-on-write mapping of its initial memory and filesystem state.
Once it has reconfigured itself, the state of a clone domain is the
state represented by the the copy-on-write mapping of its private data
as an overlay on the shared state from which it was first created.
This scheme has the added merit that creating a new domain of a
particular kind is simply a matter of creating a new clone domain and
reconfiguring it as an independent domain. Creating a new domain should
take not much longer than restarting a migrating domain in a different
machine.
Once it has reconfigured itself as an independent domain, each clone
domain operates in the same way as any other domain. In particular, each
has a fixed allocation of memory pages, but these are available to it
over and above the pages that it shares read-write with other clones of
the same initial state.
As far as I can see, the only mechanism that is required from xen is a
mechanism for sharing memory pages read-only between domains with a
copy-on-write mapping for each domain. It may be that the copy-on-write
part of this mechanism is best handled by each guest operating system,
although it would obviously be best to provide that as a service of the
xen system itself.
There is an implied requirement that a clone domain created from a state
X can only migrate to a machine where the the shared state X is available.
Regards
Peri Hankey
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: of cows and clones: creating domains as clones of saved state
2004-11-25 15:01 ` of cows and clones: creating domains as clones of saved state Peri Hankey
@ 2004-11-25 21:19 ` Keir Fraser
2004-11-25 22:13 ` Peri Hankey
0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-25 21:19 UTC (permalink / raw)
To: Peri Hankey; +Cc: xen-devel
> Xen can already save the state of an established domain and restore it
> (in theory at least - so far it crashes for me).
How does it crash for you? I can save/restore a 2.6.9-xenU domain
built from latest 2.0-testing tree with no problems.
-- Keir
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: of cows and clones: creating domains as clones of saved state
2004-11-25 21:19 ` Keir Fraser
@ 2004-11-25 22:13 ` Peri Hankey
2004-11-25 22:36 ` Keir Fraser
2004-11-25 22:37 ` Ian Pratt
0 siblings, 2 replies; 26+ messages in thread
From: Peri Hankey @ 2004-11-25 22:13 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
Keir
You had in fact seen this in an earlier thread: 'crash when domain is
restored', which was confirmed as occurring by Charles Coffing. You
suggested I add instrumentation. I haven't yet had a chance to do that,
but I'll first try it with xen-unstable.
But I'll be interested to hear whether you think there is any mileage in
my clone proposal.
Regards
Peri
your comment:
>Looks easy to fix. You might want to instrument up time_resume in
>arch/xen/i386/kernel/time.c, or I may find some time to look later in
>the week.
>
> -- Keir
>
my message:
> I hadn't yet investigated save/restore, but it seems not to work for
> me. Running Mandrakelinux 10.1 (ish - updated not reinstalled) on an
> AMD ATHLON 2400 I have a crash restoring the domain:
>
> root@xen0# xm save aaa aaa.saved
> ...
> root@xen0# xm restore aaa.saved
>
> The console output is as follows:
>
> Unable to handle kernel paging request at virtual address 0000c000
> printing eip:
> *pde = ma 00000000 pa 55555000
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: loop
> CPU: 0
> EIP: 0061:[<c0311000>] Not tainted VLI
> EFLAGS: 00010206 (2.6.9-xenU)
> EIP is at early_serial_init+0x1c/0x1d5
> eax: c02ca510 ebx: 0000c000 ecx: fbffc000 edx: 00000001
> esi: 00000010 edi: c0102000 ebp: 00000000 esp: c10ebf20
> ds: 0069 es: 0069 ss: 0069
> Process events/0 (pid: 3, threadinfo=c10ea000 task=c10d9020)
> Stack: c010e7c8 00000000 c0109e4d fbffc000 000001d3 00000063 c36a6000
> c10ea000
> 00000000 c02c9580 00000000 c012cd23 00000000 c10ebf74 00000000
> c10ad278
> c0109f4f c10ea000 c10ad268 ffffffff ffffffff 00000001 00000000
> c01195c6
> Call Trace:
> [<c010e7c8>] time_resume+0x12/0x52
>
> [<c0109e4d>] __do_suspend+0x1a0/0x1e1
>
> [<c012cd23>] worker_thread+0x1ea/0x2fe
>
> [<c0109f4f>] __shutdown_handler+0x0/0x48
>
> [<c01195c6>] default_wake_function+0x0/0x12
>
> [<c01195c6>] default_wake_function+0x0/0x12
>
> [<c012cb39>] worker_thread+0x0/0x2fe
>
> [<c0130f94>] kthread+0xa8/0xde
>
> [<c0130eec>] kthread+0x0/0xde
>
> [<c010f091>] kernel_thread_helper+0x5/0xb
>
> Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <c0>
> 3b 39 98 00 00 00 02 00 00 1c 7e 00 00 00 00 00 00 00 00 00
Keir Fraser wrote:
>>Xen can already save the state of an established domain and restore it
>>(in theory at least - so far it crashes for me).
>>
>>
>
>How does it crash for you? I can save/restore a 2.6.9-xenU domain
>built from latest 2.0-testing tree with no problems.
>
> -- Keir
>
>
>-------------------------------------------------------
>SF email is sponsored by - The IT Product Guide
>Read honest & candid reviews on hundreds of IT Products from real users.
>Discover which products truly live up to the hype. Start reading now.
>http://productguide.itmanagersjournal.com/
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>
>
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: of cows and clones: creating domains as clones of saved state
2004-11-25 22:13 ` Peri Hankey
@ 2004-11-25 22:36 ` Keir Fraser
2004-11-25 22:37 ` Ian Pratt
1 sibling, 0 replies; 26+ messages in thread
From: Keir Fraser @ 2004-11-25 22:36 UTC (permalink / raw)
To: Peri Hankey; +Cc: Keir Fraser, xen-devel
> You had in fact seen this in an earlier thread: 'crash when domain is
> restored', which was confirmed as occurring by Charles Coffing. You
> suggested I add instrumentation. I haven't yet had a chance to do that,
> but I'll first try it with xen-unstable.
>
> But I'll be interested to hear whether you think there is any mileage in
> my clone proposal.
It sounds rather like VM forking, except you want to be able to can
the base image for later re-instantiation. I guess you would create
the in-memory read-only base VM on demand from the canned image when
the first CoW VM is created, and garbage-collect it when the last CoW
VM is destroyed.
The idea of taking a small basic set of VM images and customising
their configuration when instantiating them is sane. For the memory
sharing we would like to investigate a more general mechanism (e.g.,
shared buffer cache indexed by content hash) which would optimise
memory usage not just in your scenario but also in a whole bunch of
others.
-- Keir
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: of cows and clones: creating domains as clones of saved state
2004-11-25 22:13 ` Peri Hankey
2004-11-25 22:36 ` Keir Fraser
@ 2004-11-25 22:37 ` Ian Pratt
1 sibling, 0 replies; 26+ messages in thread
From: Ian Pratt @ 2004-11-25 22:37 UTC (permalink / raw)
To: Peri Hankey; +Cc: Keir Fraser, xen-devel, Ian.Pratt
> You had in fact seen this in an earlier thread: 'crash when domain is
> restored', which was confirmed as occurring by Charles Coffing. You
> suggested I add instrumentation. I haven't yet had a chance to do that,
> but I'll first try it with xen-unstable.
>
> But I'll be interested to hear whether you think there is any mileage in
> my clone proposal.
This has been on the research roadmap for sometime.
If you want something sooner, an approximation to the same effect
can be achieved without CoW memory, simply using CoW disk and the
existing save/restore/migrate code.
The only extra code that's required is to hook into Xen's current
resume code such that it fakes out e.g. an ACPI resume event so
that user space code gets to run to cope with the change of IP
address etc. You'll need to hack the xend migrate code such that
it doesn't kill the previous domain after migrating it, sets up
the new CoW volume and configures the new domain's devices
appropriately.
Ian
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2004-11-25 22:37 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-15 22:01 copy on write memory Peri Hankey
2004-11-16 0:35 ` Rik van Riel
2004-11-16 9:44 ` Peri Hankey
2004-11-16 9:51 ` Peri Hankey
2004-11-16 15:27 ` urmk
2004-11-16 16:17 ` Mark A. Williamson
2004-11-18 16:56 ` Peri Hankey
2004-11-18 17:11 ` urmk
2004-11-18 17:25 ` Keir Fraser
2004-11-18 18:41 ` Kip Macy
2004-11-18 18:55 ` Keir Fraser
2004-11-18 19:16 ` Kip Macy
2004-11-18 18:15 ` Peri Hankey
2004-11-19 10:35 ` Jacob Gorm Hansen
2004-11-19 10:59 ` Keir Fraser
2004-11-19 12:02 ` Jacob Gorm Hansen
2004-11-19 14:50 ` Keir Fraser
2004-11-22 12:42 ` Jacob Gorm Hansen
2004-11-25 15:01 ` of cows and clones: creating domains as clones of saved state Peri Hankey
2004-11-25 21:19 ` Keir Fraser
2004-11-25 22:13 ` Peri Hankey
2004-11-25 22:36 ` Keir Fraser
2004-11-25 22:37 ` Ian Pratt
2004-11-16 18:10 ` copy on write memory Adam Heath
2004-11-16 18:09 ` Adam Heath
2004-11-16 18:39 ` Matt Ayres
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.