All of lore.kernel.org
 help / color / mirror / Atom feed
* copy on write memory
@ 2004-11-15 22:01 Peri Hankey
  2004-11-16  0:35 ` Rik van Riel
  0 siblings, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-15 22:01 UTC (permalink / raw)
  To: xen-devel

Hello

UML has developed a SKAS mode of operation, in which (as far as I 
understand it) a process can be created much in the same way as any 
other Linux process, except that it runs (under an UML kernel) with a 
separate kernel address space.

It occurred to me that the equivalent in the Xen world would be to use 
one Linux xenU domain purely as a page-table manager for a collection of 
separate xenU domains that are expected or known have similar process 
populations. The process creation domain would have a large allocation 
of memory which it would use to populate page tables applying standard 
copy on write semantics, but the processes to which these page tables 
belong would effectively run in the separate execution domains to which 
they belong.

A similar arrangement of one page-table management domain with multiple 
separate execution domains could be used for other kernels such as 
netbsd, which have similar copy on write semantics.

Xen itself would only need to provide a mechanism for managing the trade 
between a page-manager domain and its execution domains, and would not 
need to replicate the functionality of any particular system.

As the page-table manager domain also knows which disk pages are clean, 
and which have been written to, it is also in a good position to manage 
copy on write semantics for filesystem storage.

Is this a feasible way of looking at it?

Regards
Peri Hankey



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-15 22:01 copy on write memory Peri Hankey
@ 2004-11-16  0:35 ` Rik van Riel
  2004-11-16  9:44   ` Peri Hankey
  2004-11-16 18:09   ` Adam Heath
  0 siblings, 2 replies; 26+ messages in thread
From: Rik van Riel @ 2004-11-16  0:35 UTC (permalink / raw)
  To: Peri Hankey; +Cc: xen-devel

On Mon, 15 Nov 2004, Peri Hankey wrote:

> It occurred to me that the equivalent in the Xen world would be to use 
> one Linux xenU domain purely as a page-table manager for a collection of 
> separate xenU domains that are expected or known have similar process 
> populations.

UML copy on write is only for filesystems, isn't it ?

The Xen equivalent would be cloning the xenU root filesystem
as an LVM snapshot, from a read-only LVM snapshot.  Then each
xenU virtual system would only use the disk space it writes
to and no more.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16  0:35 ` Rik van Riel
@ 2004-11-16  9:44   ` Peri Hankey
  2004-11-16  9:51     ` Peri Hankey
                       ` (2 more replies)
  2004-11-16 18:09   ` Adam Heath
  1 sibling, 3 replies; 26+ messages in thread
From: Peri Hankey @ 2004-11-16  9:44 UTC (permalink / raw)
  To: Rik van Riel; +Cc: xen-devel

Certainly, UML is best known for copy on write filesystems. But UML's 
SKAS mode is a different way of managing memory, and that was the 
starting point for this proposal, which is about using the copy on write 
semantics of Linux memory management to share memory pages between Xen 
domains. I see now that one person's starting point may prove to be 
another's red herring.

The notion is that there are applications of Xen where there would be 
very many virtual computers running the same set of applications for 
much of the time (eg standard web hosting, honeypots).

In the standard case of a single OS on a single machine, all processes 
are loaded from a common filesystem, so the OS knows which page sets 
start out with shared information. It can use this information to share 
pages between processes so long as those pages are not written to, 
allocating distinct pages to distinct processes when those pages are 
written to - the copy on write semantics.

In the Xen case, you don't want to reinvent and reimplement existing 
mechanisms, especially as these may differ in subtle ways from one guest 
operating system to another. So I suggest it would make sense to create 
mechanisms that allow some Xen domains to operate as memory management 
servers to groups of related domains.

In effect we create a memory manager privilege. Suppose we have a memory 
manager domain M1 with a collection of memory client domains M1-x, where 
each memory client domain has its own kernel address space KASx, a set 
of modified pages Wx and a set of shared pages Rx. Then the situation we 
want to see is this:

M1      doesn't actually execute any application code, just manages memory for its clients
M1-a    executes application code in (KASa, Ra, Wa) calling on M1 for memory management
M1-b    executes application code in (KASb, Rb, Wb) calling on M1 for memory management
M1-c    executes application code in (KASc, Rc, Wc) calling on M1 for memory management
...

There is a connection with copy-on-write storage. The execution state of 
a client domain x can be frozen as:

'Rx'  which identifies a set of pages that are shared read-only with similar clients
KASx  which is the kernel address space page set for this domain
Wx    which is the set of user address space pages that have been written to by this domain

The total long-term state of a client domain can be characterised by adding

'SRx' which identifies blocks in read-only storage that are shared with similar clients
SWx   which identifies which files in read-write storage that belong to this client domain.

It seems to me that a memory manager domain, which pretty much has to 
serve pages initially drawn from a filesystem that is shared read-only 
between its clients, is also in a position to manage copy-on-write use 
of that file system for its clients, as it already knows which blocks 
and clean and which are dirty.

Peri

> On Mon, 15 Nov 2004, Peri Hankey wrote:
>
>> It occurred to me that the equivalent in the Xen world would be to 
>> use one Linux xenU domain purely as a page-table manager for a 
>> collection of separate xenU domains that are expected or known have 
>> similar process populations.
>
>
> UML copy on write is only for filesystems, isn't it ?
>
> The Xen equivalent would be cloning the xenU root filesystem
> as an LVM snapshot, from a read-only LVM snapshot.  Then each
> xenU virtual system would only use the disk space it writes
> to and no more.
>



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16  9:44   ` Peri Hankey
@ 2004-11-16  9:51     ` Peri Hankey
  2004-11-16 15:27     ` urmk
  2004-11-16 18:10     ` copy on write memory Adam Heath
  2 siblings, 0 replies; 26+ messages in thread
From: Peri Hankey @ 2004-11-16  9:51 UTC (permalink / raw)
  To: Rik van Riel; +Cc: xen-devel

Sorry, I had a copy-and-pasto (variant of typo) in my previous message -

The total long-term state of a client domain can be characterised by adding

'SRx' which identifies blocks in read-only storage that are shared with 
similar clients
SWx  which is the set of blocks in read-write storage that belong to 
this client domain.


Peri Hankey wrote:

> Certainly, UML is best known for copy on write filesystems. But UML's 
> SKAS mode is a different way of managing memory, and that was the 
> starting point for this proposal, which is about using the copy on 
> write semantics of Linux memory management to share memory pages 
> between Xen domains. I see now that one person's starting point may 
> prove to be another's red herring.
>
> The notion is that there are applications of Xen where there would be 
> very many virtual computers running the same set of applications for 
> much of the time (eg standard web hosting, honeypots).
>
> In the standard case of a single OS on a single machine, all processes 
> are loaded from a common filesystem, so the OS knows which page sets 
> start out with shared information. It can use this information to 
> share pages between processes so long as those pages are not written 
> to, allocating distinct pages to distinct processes when those pages 
> are written to - the copy on write semantics.
>
> In the Xen case, you don't want to reinvent and reimplement existing 
> mechanisms, especially as these may differ in subtle ways from one 
> guest operating system to another. So I suggest it would make sense to 
> create mechanisms that allow some Xen domains to operate as memory 
> management servers to groups of related domains.
>
> In effect we create a memory manager privilege. Suppose we have a 
> memory manager domain M1 with a collection of memory client domains 
> M1-x, where each memory client domain has its own kernel address space 
> KASx, a set of modified pages Wx and a set of shared pages Rx. Then 
> the situation we want to see is this:
>
> M1      doesn't actually execute any application code, just manages 
> memory for its clients
> M1-a    executes application code in (KASa, Ra, Wa) calling on M1 for 
> memory management
> M1-b    executes application code in (KASb, Rb, Wb) calling on M1 for 
> memory management
> M1-c    executes application code in (KASc, Rc, Wc) calling on M1 for 
> memory management
> ...
>
> There is a connection with copy-on-write storage. The execution state 
> of a client domain x can be frozen as:
>
> 'Rx'  which identifies a set of pages that are shared read-only with 
> similar clients
> KASx  which is the kernel address space page set for this domain
> Wx    which is the set of user address space pages that have been 
> written to by this domain
>
> The total long-term state of a client domain can be characterised by 
> adding
>
> 'SRx' which identifies blocks in read-only storage that are shared 
> with similar clients
> SWx   which identifies which files in read-write storage that belong 
> to this client domain.
>
> It seems to me that a memory manager domain, which pretty much has to 
> serve pages initially drawn from a filesystem that is shared read-only 
> between its clients, is also in a position to manage copy-on-write use 
> of that file system for its clients, as it already knows which blocks 
> and clean and which are dirty.
>
> Peri
>
>> On Mon, 15 Nov 2004, Peri Hankey wrote:
>>
>>> It occurred to me that the equivalent in the Xen world would be to 
>>> use one Linux xenU domain purely as a page-table manager for a 
>>> collection of separate xenU domains that are expected or known have 
>>> similar process populations.
>>
>>
>>
>> UML copy on write is only for filesystems, isn't it ?
>>
>> The Xen equivalent would be cloning the xenU root filesystem
>> as an LVM snapshot, from a read-only LVM snapshot.  Then each
>> xenU virtual system would only use the disk space it writes
>> to and no more.
>>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16  9:44   ` Peri Hankey
  2004-11-16  9:51     ` Peri Hankey
@ 2004-11-16 15:27     ` urmk
  2004-11-16 16:17       ` Mark A. Williamson
  2004-11-18 16:56       ` Peri Hankey
  2004-11-16 18:10     ` copy on write memory Adam Heath
  2 siblings, 2 replies; 26+ messages in thread
From: urmk @ 2004-11-16 15:27 UTC (permalink / raw)
  To: Peri Hankey; +Cc: Rik van Riel, xen-devel

> The notion is that there are applications of Xen where there would be 
> very many virtual computers running the same set of applications for 
> much of the time (eg standard web hosting, honeypots).
> * snip *
> In the Xen case, you don't want to reinvent and reimplement existing 
> mechanisms, especially as these may differ in subtle ways from one guest 
> operating system to another. So I suggest it would make sense to create 
> mechanisms that allow some Xen domains to operate as memory management 
> servers to groups of related domains.
> *snip*
> It seems to me that a memory manager domain, which pretty much has to 
> serve pages initially drawn from a filesystem that is shared read-only 
> between its clients, is also in a position to manage copy-on-write use 
> of that file system for its clients, as it already knows which blocks 
> and clean and which are dirty.

I'd thought I'd sent this before but it's not in the archives so I'll
send it again...  If this made it to the list once already, my 
apologies:

On the s/390 platform, we have a new filesystem called XIP2.  This is a 
shared-memory filesystem based on ext2, which can be shared among any
number of guests.  Basically you populate the XIP2 fs and then "freeze"
it and share it.

Thats all pretty standard, but here comes the magic:  Any data in the 
XIP2 filesystem is not copied into the cached memory of the guest.
XIP = eXecute In Place.  Binaries are run directly from the shared
memory and not cached locally, so if you throw common services and 
libraries (like apache, JVM, etc from your example) into it you get the
binaries themselves shared with basically no cost to the guests.

Not to say an automatic memory manager to determine when it could do
COW of ram isn't a good avenue to pursue as well, but XIP is a fairly
good starting point for most situations that you'd want shared memory
like this, I think.

The XIP2 source code is in the IBM patches to the kernel:
http://oss.software.ibm.com/linux390/linux-2.6.5-s390-04-april2004.shtml

and by this point it's quite likely already in the bitkeeper tree as
well, they've been pushing updates upstream.

-m



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16 15:27     ` urmk
@ 2004-11-16 16:17       ` Mark A. Williamson
  2004-11-18 16:56       ` Peri Hankey
  1 sibling, 0 replies; 26+ messages in thread
From: Mark A. Williamson @ 2004-11-16 16:17 UTC (permalink / raw)
  To: xen-devel; +Cc: urmk, Peri Hankey, Rik van Riel

> Thats all pretty standard, but here comes the magic:  Any data in the
> XIP2 filesystem is not copied into the cached memory of the guest.
> XIP = eXecute In Place.  Binaries are run directly from the shared
> memory and not cached locally, so if you throw common services and
> libraries (like apache, JVM, etc from your example) into it you get the
> binaries themselves shared with basically no cost to the guests.

We've been talking about implementing something similar (though we'd talked 
about implementing it at the block level).  We'd have a shared buffer cache 
for read-only data, which domains could opt in to.  If they are sharing 
filesystem blocks (e.g. from a CoW LVM volume) the domains would get a shared 
read only mapping to the same memory.

Cheers for the info,
Mark

> Not to say an automatic memory manager to determine when it could do
> COW of ram isn't a good avenue to pursue as well, but XIP is a fairly
> good starting point for most situations that you'd want shared memory
> like this, I think.
>
> The XIP2 source code is in the IBM patches to the kernel:
> http://oss.software.ibm.com/linux390/linux-2.6.5-s390-04-april2004.shtml
>
> and by this point it's quite likely already in the bitkeeper tree as
> well, they've been pushing updates upstream.
>
> -m
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16  0:35 ` Rik van Riel
  2004-11-16  9:44   ` Peri Hankey
@ 2004-11-16 18:09   ` Adam Heath
  2004-11-16 18:39     ` Matt Ayres
  1 sibling, 1 reply; 26+ messages in thread
From: Adam Heath @ 2004-11-16 18:09 UTC (permalink / raw)
  Cc: xen-devel@lists.sourceforge.net

On Mon, 15 Nov 2004, Rik van Riel wrote:

> On Mon, 15 Nov 2004, Peri Hankey wrote:
>
> > It occurred to me that the equivalent in the Xen world would be to use
> > one Linux xenU domain purely as a page-table manager for a collection of
> > separate xenU domains that are expected or known have similar process
> > populations.
>
> UML copy on write is only for filesystems, isn't it ?

And, since UML can use mmap access, if there are shared filesystems, it can
reduce memory pressure.  Maybe that is something that can be worked on for
xen.


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16  9:44   ` Peri Hankey
  2004-11-16  9:51     ` Peri Hankey
  2004-11-16 15:27     ` urmk
@ 2004-11-16 18:10     ` Adam Heath
  2 siblings, 0 replies; 26+ messages in thread
From: Adam Heath @ 2004-11-16 18:10 UTC (permalink / raw)
  To: Peri Hankey; +Cc: xen-devel@lists.sourceforge.net

On Tue, 16 Nov 2004, Peri Hankey wrote:

> Certainly, UML is best known for copy on write filesystems. But UML's
> SKAS mode is a different way of managing memory, and that was the
> starting point for this proposal, which is about using the copy on write
> semantics of Linux memory management to share memory pages between Xen
> domains. I see now that one person's starting point may prove to be
> another's red herring.

SKAS does *not* do COW for shared memory.  That's completely separate.

You're thinking of mmap access for ubd, and for hostfs.


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16 18:09   ` Adam Heath
@ 2004-11-16 18:39     ` Matt Ayres
  0 siblings, 0 replies; 26+ messages in thread
From: Matt Ayres @ 2004-11-16 18:39 UTC (permalink / raw)
  To: Adam Heath; +Cc: xen-devel@lists.sourceforge.net

On Tue, 2004-11-16 at 12:09 -0600, Adam Heath wrote:
> On Mon, 15 Nov 2004, Rik van Riel wrote:
> 
> > On Mon, 15 Nov 2004, Peri Hankey wrote:
> >
> > > It occurred to me that the equivalent in the Xen world would be to use
> > > one Linux xenU domain purely as a page-table manager for a collection of
> > > separate xenU domains that are expected or known have similar process
> > > populations.
> >
> > UML copy on write is only for filesystems, isn't it ?
> 
> And, since UML can use mmap access, if there are shared filesystems, it can
> reduce memory pressure.  Maybe that is something that can be worked on for
> xen.
> 

If you're going to design a system like this I believe it's important to
also consider a method of re-sharing filesystems.  With UML after a
length of time all the software will not be shared anymore and you see
no memory gains.  There really needs to be a way for things to be re-
linked.  VServer does this on a filesystem level by doing a name,
permissions, and SHA1 hash comparision.  I think for a CoW file-backed
disk image you'd have to do block level though.  This way if httpd
binary get updated (ie. via yum/apt/up2date) on all domains that have a
shared backing store there should be a method to re-share that httpd
binary.






-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-16 15:27     ` urmk
  2004-11-16 16:17       ` Mark A. Williamson
@ 2004-11-18 16:56       ` Peri Hankey
  2004-11-18 17:11         ` urmk
  1 sibling, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-18 16:56 UTC (permalink / raw)
  To: urmk; +Cc: Rik van Riel, xen-devel

It's true, you did mention it before, but I was looking for something 
else at the time. What I have in mind doesn't require so much 
configuration. On the other hand it doesn't exist, and this does.

But the patch is against quite an old source, and it doesn't compile 
straight out of the box. Do you know if there are updated patches 
against 2.6.9?

I get this error (which I haven't yet examined in detail):

  CC [M]  fs/xip2fs/file.o
fs/xip2fs/file.c: In function `xip2_do_file_read':
fs/xip2fs/file.c:69: error: structure has no member named `buf'
fs/xip2fs/file.c: In function `__xip2_file_aio_read':
fs/xip2fs/file.c:119: error: structure has no member named `buf'
fs/xip2fs/file.c: In function `xip2_file_sendfile':
fs/xip2fs/file.c:302: error: structure has no member named `buf'

This was against xen-2.0.1 as of today 18 Nov 2004

Regards
Peri

urmk@reason.marist.edu wrote:

>>The notion is that there are applications of Xen where there would be 
>>very many virtual computers running the same set of applications for 
>>much of the time (eg standard web hosting, honeypots).
>>* snip *
>>In the Xen case, you don't want to reinvent and reimplement existing 
>>mechanisms, especially as these may differ in subtle ways from one guest 
>>operating system to another. So I suggest it would make sense to create 
>>mechanisms that allow some Xen domains to operate as memory management 
>>servers to groups of related domains.
>>*snip*
>>It seems to me that a memory manager domain, which pretty much has to 
>>serve pages initially drawn from a filesystem that is shared read-only 
>>between its clients, is also in a position to manage copy-on-write use 
>>of that file system for its clients, as it already knows which blocks 
>>and clean and which are dirty.
>>    
>>
>
>I'd thought I'd sent this before but it's not in the archives so I'll
>send it again...  If this made it to the list once already, my 
>apologies:
>
>On the s/390 platform, we have a new filesystem called XIP2.  This is a 
>shared-memory filesystem based on ext2, which can be shared among any
>number of guests.  Basically you populate the XIP2 fs and then "freeze"
>it and share it.
>
>Thats all pretty standard, but here comes the magic:  Any data in the 
>XIP2 filesystem is not copied into the cached memory of the guest.
>XIP = eXecute In Place.  Binaries are run directly from the shared
>memory and not cached locally, so if you throw common services and 
>libraries (like apache, JVM, etc from your example) into it you get the
>binaries themselves shared with basically no cost to the guests.
>
>Not to say an automatic memory manager to determine when it could do
>COW of ram isn't a good avenue to pursue as well, but XIP is a fairly
>good starting point for most situations that you'd want shared memory
>like this, I think.
>
>The XIP2 source code is in the IBM patches to the kernel:
>http://oss.software.ibm.com/linux390/linux-2.6.5-s390-04-april2004.shtml
>
>and by this point it's quite likely already in the bitkeeper tree as
>well, they've been pushing updates upstream.
>
>-m
>
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by: InterSystems CACHE
>FREE OODBMS DOWNLOAD - A multidimensional database that combines
>robust object and relational technologies, making it a perfect match
>for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>  
>


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-18 16:56       ` Peri Hankey
@ 2004-11-18 17:11         ` urmk
  2004-11-18 17:25           ` Keir Fraser
  2004-11-18 18:15           ` Peri Hankey
  0 siblings, 2 replies; 26+ messages in thread
From: urmk @ 2004-11-18 17:11 UTC (permalink / raw)
  To: Peri Hankey; +Cc: Rik van Riel, xen-devel

> It's true, you did mention it before, but I was looking for something 
> else at the time. What I have in mind doesn't require so much 
> configuration. On the other hand it doesn't exist, and this does.

Ah.  I couldn't remember if I'd sent it or not (or if I'd even tried to send
it from an address that was on the list, a few go to the same mailbox at 
the moment)

> But the patch is against quite an old source, and it doesn't compile 
> straight out of the box. Do you know if there are updated patches 
> against 2.6.9?

I can check.

> I get this error (which I haven't yet examined in detail):
> 
>  CC [M]  fs/xip2fs/file.o
> fs/xip2fs/file.c: In function `xip2_do_file_read':
> fs/xip2fs/file.c:69: error: structure has no member named `buf'
> fs/xip2fs/file.c: In function `__xip2_file_aio_read':
> fs/xip2fs/file.c:119: error: structure has no member named `buf'
> fs/xip2fs/file.c: In function `xip2_file_sendfile':
> fs/xip2fs/file.c:302: error: structure has no member named `buf'
> 
> This was against xen-2.0.1 as of today 18 Nov 2004

I highly doubt that it will be directly applicable to xen - the entire backend
mechanism is linked into the z/VM shared memory system between guests.  I was
more pointing it out as a probable jumping off point (most of the work is done,
it just needs to use the xen memory sharing instead) and as a workable
concept for a less-cpu-intensive copy-on-write mechanism.

I'll take a look for a newer patch and see if I can scrape up some time to
apply the backend to xen, but I don't know when I'll get a chance -- don't let
me hold anyone else up who was considering working on it.

-m



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-18 17:11         ` urmk
@ 2004-11-18 17:25           ` Keir Fraser
  2004-11-18 18:41             ` Kip Macy
  2004-11-18 18:15           ` Peri Hankey
  1 sibling, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-18 17:25 UTC (permalink / raw)
  To: urmk; +Cc: Peri Hankey, Rik van Riel, xen-devel

> I highly doubt that it will be directly applicable to xen - the entire backend
> mechanism is linked into the z/VM shared memory system between guests.  I was
> more pointing it out as a probable jumping off point (most of the work is done,
> it just needs to use the xen memory sharing instead) and as a workable
> concept for a less-cpu-intensive copy-on-write mechanism.

Yeah, I spotted uses of hypervisor services and things called
"DCSS"s. I guessed that xip2 must be rather z/VM specific. :-)

Is there any documentation on the semantics of DCSS and the hypervisor
services that are used? It would be interesting to see if Xen ought to
have some similar concept.

 -- Keir


> I'll take a look for a newer patch and see if I can scrape up some time to
> apply the backend to xen, but I don't know when I'll get a chance -- don't let
> me hold anyone else up who was considering working on it.
> 
> -m
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-18 17:11         ` urmk
  2004-11-18 17:25           ` Keir Fraser
@ 2004-11-18 18:15           ` Peri Hankey
  2004-11-19 10:35             ` Jacob Gorm Hansen
  1 sibling, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-18 18:15 UTC (permalink / raw)
  To: urmk; +Cc: Rik van Riel, xen-devel

I tried applying  it in a spirit of pure optimism (naivety). You never know.

To return for one moment to the thing I was chasing. Let's assume that 
in its standard configuration the Linux (or indeed some other kernel) 
makes a very good job of sharing memory across processes by means of the 
copy-on-write fork semantics and by exploiting the overall view that it 
has of how pages/blocks are read from the filesystem and modified by 
processes. Let's hope that it (or they) get even better. We would like 
separate xenU domains to benefit from the excellence that exists and the 
improvements that may be.

My idea is to use a memory-privileged xenU domain as a page-table 
manager for a group of client domains. It would have to know about the 
memory/page/block usage of every process in each of its clients as if 
they were all running within it. But it would never operate as the 
actual kernel for any of these processes. For those purposes each 
process would operate under the supervision of a memory-client domain 
that has its own kernel address space.

I haven't at present got the intimate knowledge of memory management 
traffic between xen, xen0 and xenU domains that I would need to see 
whether or not this idea has any legs. But I am very interested to know 
what anyone thinks. No doubt the best answer is to go and study the code.

Anyway, thanks for your help, and for your quick repsonse. Much quicker 
than mine.

Regards
Peri

urmk@reason.marist.edu wrote:

>>It's true, you did mention it before, but I was looking for something 
>>else at the time. What I have in mind doesn't require so much 
>>configuration. On the other hand it doesn't exist, and this does.
>>    
>>
>
>Ah.  I couldn't remember if I'd sent it or not (or if I'd even tried to send
>it from an address that was on the list, a few go to the same mailbox at 
>the moment)
>
>  
>
>>But the patch is against quite an old source, and it doesn't compile 
>>straight out of the box. Do you know if there are updated patches 
>>against 2.6.9?
>>    
>>
>
>I can check.
>
>  
>
>>I get this error (which I haven't yet examined in detail):
>>
>> CC [M]  fs/xip2fs/file.o
>>fs/xip2fs/file.c: In function `xip2_do_file_read':
>>fs/xip2fs/file.c:69: error: structure has no member named `buf'
>>fs/xip2fs/file.c: In function `__xip2_file_aio_read':
>>fs/xip2fs/file.c:119: error: structure has no member named `buf'
>>fs/xip2fs/file.c: In function `xip2_file_sendfile':
>>fs/xip2fs/file.c:302: error: structure has no member named `buf'
>>
>>This was against xen-2.0.1 as of today 18 Nov 2004
>>    
>>
>
>I highly doubt that it will be directly applicable to xen - the entire backend
>mechanism is linked into the z/VM shared memory system between guests.  I was
>more pointing it out as a probable jumping off point (most of the work is done,
>it just needs to use the xen memory sharing instead) and as a workable
>concept for a less-cpu-intensive copy-on-write mechanism.
>
>I'll take a look for a newer patch and see if I can scrape up some time to
>apply the backend to xen, but I don't know when I'll get a chance -- don't let
>me hold anyone else up who was considering working on it.
>
>-m
>
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by: InterSystems CACHE
>FREE OODBMS DOWNLOAD - A multidimensional database that combines
>robust object and relational technologies, making it a perfect match
>for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>  
>


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-18 17:25           ` Keir Fraser
@ 2004-11-18 18:41             ` Kip Macy
  2004-11-18 18:55               ` Keir Fraser
  0 siblings, 1 reply; 26+ messages in thread
From: Kip Macy @ 2004-11-18 18:41 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

>
> Is there any documentation on the semantics of DCSS and the hypervisor
> services that are used? It would be interesting to see if Xen ought to
> have some similar concept.

This doesn't really answer your question - but the closest your likely
to get is at their pubs:

http://www.vm.ibm.com/pubs/

>
> > I'll take a look for a newer patch and see if I can scrape up some time to
> > apply the backend to xen, but I don't know when I'll get a chance -- don't let
> > me hold anyone else up who was considering working on it.
> >
> > -m
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: InterSystems CACHE
> > FREE OODBMS DOWNLOAD - A multidimensional database that combines
> > robust object and relational technologies, making it a perfect match
> > for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-18 18:41             ` Kip Macy
@ 2004-11-18 18:55               ` Keir Fraser
  2004-11-18 19:16                 ` Kip Macy
  0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-18 18:55 UTC (permalink / raw)
  To: Kip Macy; +Cc: Keir Fraser, xen-devel

> >
> > Is there any documentation on the semantics of DCSS and the hypervisor
> > services that are used? It would be interesting to see if Xen ought to
> > have some similar concept.
> 
> This doesn't really answer your question - but the closest your likely
> to get is at their pubs:
> 
> http://www.vm.ibm.com/pubs/

There's a manual called "How to Improve the Performance of Linux on
z/VM with Execute-In-Place Technology", which is easily googled for.
Turns out that DCSSs are pretty simple things -- just a blob of
physmem that is mapped into phys address space of every Linux
instance, and contains a read-only filesystem image. xip2fs is a
simple read-only fs that serves mmap() requests directly out of the
shared DCSS rather than the private block cache. Looks tedious to
manage because the whole filesystem takes up space in every Linux
memory map all the time, no matter whether a particular file/block is
being used. So you have to be careful to put only frequently accessed
and highly shared files in the filesystem and use twisted symlinks to
link in from their usual location into the mounted xip2fs filesystem. 

 -- Keir


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-18 18:55               ` Keir Fraser
@ 2004-11-18 19:16                 ` Kip Macy
  0 siblings, 0 replies; 26+ messages in thread
From: Kip Macy @ 2004-11-18 19:16 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

It doesn't sound like the performance benefit would justify the added
management complexity for xen.

				-Kip

On Thu, 18 Nov 2004, Keir Fraser wrote:

> > >
> > > Is there any documentation on the semantics of DCSS and the hypervisor
> > > services that are used? It would be interesting to see if Xen ought to
> > > have some similar concept.
> >
> > This doesn't really answer your question - but the closest your likely
> > to get is at their pubs:
> >
> > http://www.vm.ibm.com/pubs/
>
> There's a manual called "How to Improve the Performance of Linux on
> z/VM with Execute-In-Place Technology", which is easily googled for.
> Turns out that DCSSs are pretty simple things -- just a blob of
> physmem that is mapped into phys address space of every Linux
> instance, and contains a read-only filesystem image. xip2fs is a
> simple read-only fs that serves mmap() requests directly out of the
> shared DCSS rather than the private block cache. Looks tedious to
> manage because the whole filesystem takes up space in every Linux
> memory map all the time, no matter whether a particular file/block is
> being used. So you have to be careful to put only frequently accessed
> and highly shared files in the filesystem and use twisted symlinks to
> link in from their usual location into the mounted xip2fs filesystem.
>
>  -- Keir
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-18 18:15           ` Peri Hankey
@ 2004-11-19 10:35             ` Jacob Gorm Hansen
  2004-11-19 10:59               ` Keir Fraser
  0 siblings, 1 reply; 26+ messages in thread
From: Jacob Gorm Hansen @ 2004-11-19 10:35 UTC (permalink / raw)
  To: Peri Hankey; +Cc: urmk, Rik van Riel, xen-devel

Peri Hankey wrote:

> My idea is to use a memory-privileged xenU domain as a page-table 
> manager for a group of client domains. It would have to know about the 
> memory/page/block usage of every process in each of its clients as if 
> they were all running within it. But it would never operate as the 
> actual kernel for any of these processes. For those purposes each 
> process would operate under the supervision of a memory-client domain 
> that has its own kernel address space.
> 
> I haven't at present got the intimate knowledge of memory management 
> traffic between xen, xen0 and xenU domains that I would need to see 
> whether or not this idea has any legs. But I am very interested to know 
> what anyone thinks. No doubt the best answer is to go and study the code.

I think the main point distinguishing the design of Xen from that of 
older microkernels (such as L4), is that there is no memory management 
traffic between domains, and it should probably stay that way to retain 
the good performance and performance isolation that Xen has to offer.

I would like to re-plug the idea I proposed earlier, on the topic of 
sharing read-only pages across domains:

Create a new variant of the HYPERVISOR_update_va_mapping (or of the 
batched variant, or both), lets call it HYPERVISOR_make_ro_and_share.

When Xen gets this hypercall, it does the following:

- Turn the mapping into a read-only mapping.

- Look up the contents of the machine page in the mapping in a 
machine-global hash table or other suitable type of associative data 
strucure.

- If there is a match, make the user's mapping point to the matching 
page. Also increase a reference/sharing count of the matching page.

- If there is no match, insert the user's page into the hash table.

- Revoke the user's ownership of the old page.

If the mapping is write-accessed inside the user's domain, the user will 
be responsible for CoWing the page into a newly allocated one. As an 
optimisation, one of the AVL bits can be used for tracking this type of 
fault.  When you update the mapping, Xen will decrease the 
reference/sharing count of the previously shared page, and if the 
refcount reaches zero free the page, and remove it from the hash table.

Jacob


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-19 10:35             ` Jacob Gorm Hansen
@ 2004-11-19 10:59               ` Keir Fraser
  2004-11-19 12:02                 ` Jacob Gorm Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-19 10:59 UTC (permalink / raw)
  To: Jacob Gorm Hansen; +Cc: Peri Hankey, urmk, Rik van Riel, xen-devel


This would end up pushing policy into Xen -- what happens when memory
is fully committed, some domain has given up a bunch of his
exclusively-owned pages by buying into the shared table, and now he
has a slew of CoW faults and wants to get some of his exclusive pages
back from Xen, thankyou very much?

At this point Xen needs some reclamation policy (saying that Xen will
guarantee to have enough pages around to satisfy these requests is not
possible, since the point of the sharing is to be able to
"over-reserve" memory). It needs to decide which pages to reclaim,
then have a mechanism for reclaiming them which will probably involve
communicating up to the domains concerned in advance and setting
timeouts by when they must relinquish their mappings.

This is the kind of thing I would prefer to implement outside Xen.

 -- Keir

> I would like to re-plug the idea I proposed earlier, on the topic of 
> sharing read-only pages across domains:
> 
> Create a new variant of the HYPERVISOR_update_va_mapping (or of the 
> batched variant, or both), lets call it HYPERVISOR_make_ro_and_share.
> 
> When Xen gets this hypercall, it does the following:
> 
> - Turn the mapping into a read-only mapping.
> 
> - Look up the contents of the machine page in the mapping in a 
> machine-global hash table or other suitable type of associative data 
> strucure.
> 
> - If there is a match, make the user's mapping point to the matching 
> page. Also increase a reference/sharing count of the matching page.
> 
> - If there is no match, insert the user's page into the hash table.
> 
> - Revoke the user's ownership of the old page.
> 
> If the mapping is write-accessed inside the user's domain, the user will 
> be responsible for CoWing the page into a newly allocated one. As an 
> optimisation, one of the AVL bits can be used for tracking this type of 
> fault.  When you update the mapping, Xen will decrease the 
> reference/sharing count of the previously shared page, and if the 
> refcount reaches zero free the page, and remove it from the hash table.
> 
> Jacob
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: InterSystems CACHE
> FREE OODBMS DOWNLOAD - A multidimensional database that combines
> robust object and relational technologies, making it a perfect match
> for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-19 10:59               ` Keir Fraser
@ 2004-11-19 12:02                 ` Jacob Gorm Hansen
  2004-11-19 14:50                   ` Keir Fraser
  0 siblings, 1 reply; 26+ messages in thread
From: Jacob Gorm Hansen @ 2004-11-19 12:02 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Peri Hankey, urmk, Rik van Riel, xen-devel

Keir Fraser wrote:
> This would end up pushing policy into Xen -- what happens when memory
> is fully committed, some domain has given up a bunch of his
> exclusively-owned pages by buying into the shared table, and now he
> has a slew of CoW faults and wants to get some of his exclusive pages
> back from Xen, thankyou very much?
> 
> At this point Xen needs some reclamation policy (saying that Xen will
> guarantee to have enough pages around to satisfy these requests is not
> possible, since the point of the sharing is to be able to
> "over-reserve" memory). It needs to decide which pages to reclaim,
> then have a mechanism for reclaiming them which will probably involve
> communicating up to the domains concerned in advance and setting
> timeouts by when they must relinquish their mappings.
> 
> This is the kind of thing I would prefer to implement outside Xen.

Could the same thing not work using an event-channel rather than a 
hypercall then?  I guess you basically do the same when giving your 
pages away for a driver to fill them up with data?
My main point is that the domains have better knowledge about what pages 
are likely to be shareable than dom0 or Xen has, and so should volunteer 
to share them, and somehow be rewarded.

The problem of reclamation-policy will exist for any solution that 
over-reserves memory, including the transparent VMWare system. For some 
pages, like the guest OS kernel text area, it would be ok to remove 
these pages from the domain's allowance for good -- it will not need to 
CoW these, and the domain builder could simply build that part of the 
domain from shared pages.

Perhaps this should just be a one-way street, you give up pages to be 
nice to others (and get cheaper hosting or whatever kind of reward you 
can think of in return), and then you lose the right to write to them 
for good.  Should you need more writable pages, you will have to re-grow 
your reservation, and if that fails you will need to flush some slabs or 
buffer caches or or page stuff to disk or whatever you do in Linux when 
you have memory pressure.  Ultimately you may want to migrate to a less 
loaded machine.

It seems to me any other kind of solution will allow a malicious domain 
to affect the performance of innocent domains by repeatedly sharing and 
unsharing its pages (whether by explicit hypercall or by placing popular 
vs random data in them).

Jacob


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-19 12:02                 ` Jacob Gorm Hansen
@ 2004-11-19 14:50                   ` Keir Fraser
  2004-11-22 12:42                     ` Jacob Gorm Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-19 14:50 UTC (permalink / raw)
  To: Jacob Gorm Hansen; +Cc: Keir Fraser, Peri Hankey, urmk, Rik van Riel, xen-devel

> Could the same thing not work using an event-channel rather than a 
> hypercall then?  I guess you basically do the same when giving your 
> pages away for a driver to fill them up with data?
> My main point is that the domains have better knowledge about what pages 
> are likely to be shareable than dom0 or Xen has, and so should volunteer 
> to share them, and somehow be rewarded.

Equally, a centralised "buffer cache" domain can see request traffic
and observe empirically what pages are most beneficial to share. :-)
Both ways round could be interesting to experiment with though.

> The problem of reclamation-policy will exist for any solution that 
> over-reserves memory, including the transparent VMWare system. For some 
> pages, like the guest OS kernel text area, it would be ok to remove 
> these pages from the domain's allowance for good -- it will not need to 
> CoW these, and the domain builder could simply build that part of the 
> domain from shared pages.

Well, you also can over-commit on stuff that is read-only and fault in
on demand, just as you can demand-CoW writable stuff e.g., no need to
have all of kernel or glibc in memory all the time -- only hot parts
of both will be in use by the system at any time.

i.e., 
 1. There is fault in from no page -> shareable page on read accesses.
 2. There is fault from shareable page -> shareable page + exclusive
    page on write accesses. 
 Both of these require extra allocation of memory.

> Perhaps this should just be a one-way street, you give up pages to be 
> nice to others (and get cheaper hosting or whatever kind of reward you 
> can think of in return), and then you lose the right to write to them 
> for good.  Should you need more writable pages, you will have to re-grow 
> your reservation, and if that fails you will need to flush some slabs or 
> buffer caches or or page stuff to disk or whatever you do in Linux when 
> you have memory pressure.  Ultimately you may want to migrate to a less 
> loaded machine.

It's another way of looking at the problem (end-to-end style I
suppose). Potetntially worth investigating. :-)

 Cheers,
 Keir


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: copy on write memory
  2004-11-19 14:50                   ` Keir Fraser
@ 2004-11-22 12:42                     ` Jacob Gorm Hansen
  2004-11-25 15:01                       ` of cows and clones: creating domains as clones of saved state Peri Hankey
  0 siblings, 1 reply; 26+ messages in thread
From: Jacob Gorm Hansen @ 2004-11-22 12:42 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel

Keir Fraser wrote:
> 
> Well, you also can over-commit on stuff that is read-only and fault in
> on demand, just as you can demand-CoW writable stuff e.g., no need to
> have all of kernel or glibc in memory all the time -- only hot parts
> of both will be in use by the system at any time.
> 
> i.e., 
>  1. There is fault in from no page -> shareable page on read accesses.
>  2. There is fault from shareable page -> shareable page + exclusive
>     page on write accesses. 
>  Both of these require extra allocation of memory.

But I will need some external service to give them to me right when I 
need them, or I may run into the 'paged-the-pager' problem and die?

>>Perhaps this should just be a one-way street, you give up pages to be 
>>nice to others (and get cheaper hosting or whatever kind of reward you 
>>can think of in return), and then you lose the right to write to them 
>>for good.  Should you need more writable pages, you will have to re-grow 
>>your reservation, and if that fails you will need to flush some slabs or 
>>buffer caches or or page stuff to disk or whatever you do in Linux when 
>>you have memory pressure.  Ultimately you may want to migrate to a less 
>>loaded machine.
> 
> 
> It's another way of looking at the problem (end-to-end style I
> suppose). Potetntially worth investigating. :-)

Perhaps I will have a go at some point. If going in this direction 
perhaps it will make sense to do this in Xen anyway.

Jacob



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* of cows and clones: creating domains as clones of saved state
  2004-11-22 12:42                     ` Jacob Gorm Hansen
@ 2004-11-25 15:01                       ` Peri Hankey
  2004-11-25 21:19                         ` Keir Fraser
  0 siblings, 1 reply; 26+ messages in thread
From: Peri Hankey @ 2004-11-25 15:01 UTC (permalink / raw)
  To: xen-devel

Xen can already save the state of an established domain and restore it 
(in theory at least - so far it crashes for me).  It seems to me that 
most of the objectives discussed for a copy-on-write memory system can 
be achieved by providing a mechanism for  creating domains by cloning 
them from an initial state that is shared read-only between them with a 
copy-on-write mapping for each.

In this scheme a collection of clone domains can be created by starting 
from the saved state X of some original domain, which may be configured 
specifically for this purpose. The saved state X encapsulates the memory 
and filesystem state of the originating domain. The memory and 
filesystem components of the state X are shared read-only between the 
clone domains of X, with each clone domain superimposing its own 
copy-on-write mapping of the memory and filesystem states

When a clone domain is started, it immediately reconfigures itself as a 
distinct independent domain by writing its own configuration data to the 
copy-on-write mapping of its initial memory and filesystem state.

Once it has reconfigured itself, the state of a clone domain is the 
state represented by the the copy-on-write mapping of its private data 
as an overlay on the shared state from which it was first created.

This scheme has the added merit that creating a new domain of a 
particular kind is simply a matter of creating a new clone domain and 
reconfiguring it as an independent domain. Creating a new domain should 
take not much longer than restarting a migrating domain in a different 
machine.

Once it has reconfigured itself as an independent domain, each clone 
domain operates in the same way as any other domain. In particular, each 
has a fixed allocation of memory pages, but these are available to it 
over and above the pages that it shares read-write with other clones of 
the same initial state.

As far as I can see, the only mechanism that is required from xen is a 
mechanism for sharing memory pages read-only between domains with a 
copy-on-write mapping for each domain. It may be that the copy-on-write 
part of this mechanism is best handled by each guest operating  system, 
although it would obviously be best to provide that as a service of the 
xen system itself.

There is an implied requirement that a clone domain created from a state 
X can only migrate to a machine where the the shared state X is available.

Regards
Peri Hankey






-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: of cows and clones: creating domains as clones of saved state
  2004-11-25 15:01                       ` of cows and clones: creating domains as clones of saved state Peri Hankey
@ 2004-11-25 21:19                         ` Keir Fraser
  2004-11-25 22:13                           ` Peri Hankey
  0 siblings, 1 reply; 26+ messages in thread
From: Keir Fraser @ 2004-11-25 21:19 UTC (permalink / raw)
  To: Peri Hankey; +Cc: xen-devel

> Xen can already save the state of an established domain and restore it 
> (in theory at least - so far it crashes for me).

How does it crash for you? I can save/restore a 2.6.9-xenU domain
built from latest 2.0-testing tree with no problems.

 -- Keir


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: of cows and clones: creating domains as clones of saved state
  2004-11-25 21:19                         ` Keir Fraser
@ 2004-11-25 22:13                           ` Peri Hankey
  2004-11-25 22:36                             ` Keir Fraser
  2004-11-25 22:37                             ` Ian Pratt
  0 siblings, 2 replies; 26+ messages in thread
From: Peri Hankey @ 2004-11-25 22:13 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

Keir

You had in fact seen this in an earlier thread: 'crash when domain is 
restored', which was confirmed as occurring by Charles Coffing.  You 
suggested I add instrumentation. I haven't yet had a chance to do that, 
but I'll first try it with xen-unstable.

But I'll be interested to hear whether you think there is any mileage in 
my clone proposal.

Regards
Peri

your comment:

>Looks easy to fix. You might want to instrument up time_resume in
>arch/xen/i386/kernel/time.c, or I may find some time to look later in
>the week.
>
> -- Keir
>

my message:

> I hadn't yet investigated save/restore, but it seems not to work for 
> me. Running Mandrakelinux 10.1 (ish - updated not reinstalled) on an 
> AMD ATHLON 2400 I have a crash restoring the domain:
>
>    root@xen0# xm save aaa aaa.saved
>    ...
>    root@xen0# xm restore aaa.saved
>
> The console output is as follows:
>
> Unable to handle kernel paging request at virtual address 0000c000
> printing eip:
> *pde = ma 00000000 pa 55555000
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: loop
> CPU:    0
> EIP:    0061:[<c0311000>]    Not tainted VLI
> EFLAGS: 00010206   (2.6.9-xenU)
> EIP is at early_serial_init+0x1c/0x1d5
> eax: c02ca510   ebx: 0000c000   ecx: fbffc000   edx: 00000001
> esi: 00000010   edi: c0102000   ebp: 00000000   esp: c10ebf20
> ds: 0069   es: 0069   ss: 0069
> Process events/0 (pid: 3, threadinfo=c10ea000 task=c10d9020)
> Stack: c010e7c8 00000000 c0109e4d fbffc000 000001d3 00000063 c36a6000 
> c10ea000
>       00000000 c02c9580 00000000 c012cd23 00000000 c10ebf74 00000000 
> c10ad278
>       c0109f4f c10ea000 c10ad268 ffffffff ffffffff 00000001 00000000 
> c01195c6
> Call Trace:
> [<c010e7c8>] time_resume+0x12/0x52
>
> [<c0109e4d>] __do_suspend+0x1a0/0x1e1
>
> [<c012cd23>] worker_thread+0x1ea/0x2fe
>
> [<c0109f4f>] __shutdown_handler+0x0/0x48
>
> [<c01195c6>] default_wake_function+0x0/0x12
>
> [<c01195c6>] default_wake_function+0x0/0x12
>
> [<c012cb39>] worker_thread+0x0/0x2fe
>
> [<c0130f94>] kthread+0xa8/0xde
>
> [<c0130eec>] kthread+0x0/0xde
>
> [<c010f091>] kernel_thread_helper+0x5/0xb
>
> Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <c0> 
> 3b 39 98 00 00 00 02 00 00 1c 7e 00 00 00 00 00 00 00 00 00



Keir Fraser wrote:

>>Xen can already save the state of an established domain and restore it 
>>(in theory at least - so far it crashes for me).
>>    
>>
>
>How does it crash for you? I can save/restore a 2.6.9-xenU domain
>built from latest 2.0-testing tree with no problems.
>
> -- Keir
>
>
>-------------------------------------------------------
>SF email is sponsored by - The IT Product Guide
>Read honest & candid reviews on hundreds of IT Products from real users.
>Discover which products truly live up to the hype. Start reading now. 
>http://productguide.itmanagersjournal.com/
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>  
>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: of cows and clones: creating domains as clones of saved state
  2004-11-25 22:13                           ` Peri Hankey
@ 2004-11-25 22:36                             ` Keir Fraser
  2004-11-25 22:37                             ` Ian Pratt
  1 sibling, 0 replies; 26+ messages in thread
From: Keir Fraser @ 2004-11-25 22:36 UTC (permalink / raw)
  To: Peri Hankey; +Cc: Keir Fraser, xen-devel

> You had in fact seen this in an earlier thread: 'crash when domain is 
> restored', which was confirmed as occurring by Charles Coffing.  You 
> suggested I add instrumentation. I haven't yet had a chance to do that, 
> but I'll first try it with xen-unstable.
> 
> But I'll be interested to hear whether you think there is any mileage in 
> my clone proposal.

It sounds rather like VM forking, except you want to be able to can
the base image for later re-instantiation. I guess you would create
the in-memory read-only base VM on demand from the canned image when
the first CoW VM is created, and garbage-collect it when the last CoW
VM is destroyed.

The idea of taking a small basic set of VM images and customising
their configuration when instantiating them is sane. For the memory
sharing we would like to investigate a more general mechanism (e.g.,
shared buffer cache indexed by content hash) which would optimise
memory usage not just in your scenario but also in a whole bunch of
others. 

 -- Keir


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: of cows and clones: creating domains as clones of saved state
  2004-11-25 22:13                           ` Peri Hankey
  2004-11-25 22:36                             ` Keir Fraser
@ 2004-11-25 22:37                             ` Ian Pratt
  1 sibling, 0 replies; 26+ messages in thread
From: Ian Pratt @ 2004-11-25 22:37 UTC (permalink / raw)
  To: Peri Hankey; +Cc: Keir Fraser, xen-devel, Ian.Pratt


> You had in fact seen this in an earlier thread: 'crash when domain is 
> restored', which was confirmed as occurring by Charles Coffing.  You 
> suggested I add instrumentation. I haven't yet had a chance to do that, 
> but I'll first try it with xen-unstable.
> 
> But I'll be interested to hear whether you think there is any mileage in 
> my clone proposal.

This has been on the research roadmap for sometime. 

If you want something sooner, an approximation to the same effect
can be achieved without CoW memory, simply using CoW disk and the
existing save/restore/migrate code.

The only extra code that's required is to hook into Xen's current
resume code such that it fakes out e.g. an ACPI resume event so
that user space code gets to run to cope with the change of IP
address etc. You'll need to hack the xend migrate code such that
it doesn't kill the previous domain after migrating it, sets up
the new CoW volume and configures the new domain's devices
appropriately.

Ian


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2004-11-25 22:37 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-15 22:01 copy on write memory Peri Hankey
2004-11-16  0:35 ` Rik van Riel
2004-11-16  9:44   ` Peri Hankey
2004-11-16  9:51     ` Peri Hankey
2004-11-16 15:27     ` urmk
2004-11-16 16:17       ` Mark A. Williamson
2004-11-18 16:56       ` Peri Hankey
2004-11-18 17:11         ` urmk
2004-11-18 17:25           ` Keir Fraser
2004-11-18 18:41             ` Kip Macy
2004-11-18 18:55               ` Keir Fraser
2004-11-18 19:16                 ` Kip Macy
2004-11-18 18:15           ` Peri Hankey
2004-11-19 10:35             ` Jacob Gorm Hansen
2004-11-19 10:59               ` Keir Fraser
2004-11-19 12:02                 ` Jacob Gorm Hansen
2004-11-19 14:50                   ` Keir Fraser
2004-11-22 12:42                     ` Jacob Gorm Hansen
2004-11-25 15:01                       ` of cows and clones: creating domains as clones of saved state Peri Hankey
2004-11-25 21:19                         ` Keir Fraser
2004-11-25 22:13                           ` Peri Hankey
2004-11-25 22:36                             ` Keir Fraser
2004-11-25 22:37                             ` Ian Pratt
2004-11-16 18:10     ` copy on write memory Adam Heath
2004-11-16 18:09   ` Adam Heath
2004-11-16 18:39     ` Matt Ayres

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.