Improving domU restore time

All of lore.kernel.org
 help / color / mirror / Atom feed

* Improving domU restore time
@ 2010-05-25 10:35 Rafal Wojtczuk
  2010-05-25 10:58 ` Joanna Rutkowska
  2010-05-25 11:50 ` Keir Fraser
  0 siblings, 2 replies; 18+ messages in thread
From: Rafal Wojtczuk @ 2010-05-25 10:35 UTC (permalink / raw)
  To: xen-devel

Hello,
I would be grateful for the comments on possible methods to improve domain
restore performance. Focusing on the PV case, if it matters.
1) xen-4.0.0
I see a similar problem to the one reported at the thread at
http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.html

Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64. 
[user@qubes ~]$ xm create /dev/null
	kernel=/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64 
	root=/dev/mapper/dmroot extra="rootdelay=1000" memory=400
...wait a second...
[user@qubes ~]$ xm save null nullsave
[user@qubes ~]$ time cat nullsave >/dev/null
...
[user@qubes ~]$ time cat nullsave >/dev/null
...
[user@qubes ~]$ time cat nullsave >/dev/null
real    0m0.173s
user    0m0.010s
sys     0m0.164s
/* sits nicely in the cache, let's restore... */
[user@qubes ~]$ time xm restore nullsave
real    0m9.189s
user    0m0.151s
sys     0m0.039s

According to systemtap, xc_restore uses 3812s of CPU time; besides it being
a lot, what uses the remaining 6s ? Just as reported previously, there are 
some errors in xend.log

[2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0,
_static_max=0x19000000, _static_min=0x0, 
[2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]:
/usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0
[2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore
start: p2m_size = 19000
[2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory pages:
0%
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
Error when reading batch size
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
error when buffering batch, finishing
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) 
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100%
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0
pages)
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoint
load
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be
built.
[2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with rc=0

Note, xc_restore on xen-3.4.3 works much faster (and with no warnings in the
log), with the same dom0 pvops kernel.

Ok, so there is some issue here. Some more generic thoughts below.

2) xen-3.4.3
Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like
for i in /dev/loop* ; do
	losetup $i
so, spawn one losetup process per each existing /dev/loopX; it hogs CPU, 
especially if your system comes with maxloops=255 :). So,
let's replace it with the xen-4.0.0 version, where this problem is fixed (it 
uses losetup -a, hurray).
Then, restore time for a 400MB domain, with the restore file in the cache,
with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time.
According to systemtap, the CPU time requirements are
xend threads- 0.363s
udevd(in dom0) - 0.007s
/etc/xen/scripts/block and its children - 1.075s
xc_restore - 1.368s
/etc/xen/scripts/vif-bridge (in netvm) - 0.130s

The obvious idea to improve /etc/xen/scripts/block shell script execution time 
is to recode it, in some other language that will not spawn hundreds of 
processes to do its job.

Now, xc_restore.
a) Is it correct that when xc_restore runs, the target domain memory is already
zeroed (because hypervisor scrubs free memory, before it is assigned to a
new domain) ? So, xc_save could check whether a given page contains only
zeroes and if so, omit it in the savefile. This could result in quite
significant savings when
- we save a freshly booted domain, or if we can zero out free memory in the 
  domain before saving
- we plan to restore multiple times from the same savefile (yes, vbd must be
restored in this case too).

b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
read syscall per page. Make it read in larger chunks. It looks it is fixed in
xen-4.0.0, is this correct ?

Also, it looks really excessive that basically copying 400MB of memory takes 
over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything 
else ?
I am aware that in the usual cases, xc_restore is not the bottleneck 
(savefile reads from the disk or the network is), but in case we can fetch 
savefile quickly, it matters.

Is 3.4.3 branch still being developed, or pure maintenance mode only, so new 
code should be prepared for 4.0.0 ? 

Regards,
Rafal Wojtczuk
Principal Researcher
Invisible Things Lab, Qubes-os project

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Improving domU restore time
  2010-05-25 10:35 Improving domU restore time Rafal Wojtczuk
@ 2010-05-25 10:58 ` Joanna Rutkowska
  2010-05-25 11:50 ` Keir Fraser
  1 sibling, 0 replies; 18+ messages in thread
From: Joanna Rutkowska @ 2010-05-25 10:58 UTC (permalink / raw)
  To: Rafal Wojtczuk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5960 bytes --]

A bit of a background to the Rafal's post -- we plan to implement a
feature that we call "Disposable VMs" in Qubes, that would essentially
allow for super-fast creation of small, one-purpose VM (DomU), e.g. just
for opening of a PDF, or Word document, etc. The point is: the creation
& resume of such a VM must be really fast, i.e. much below 1s.

And this seems possible, especially if we use sparse files for storing
the VM's save-image and the restore operation (the VMs we're talking
about here would have around 100-150MB of the actual data recorded in a
sparse savefile). But, as Rafal pointed out, some operations that Xen
does seem to be implemented ineffectively, and wanted to get your
opinion before we start optimizing them (i.e. xc_restore and
/etc/xen/scripts/block optimization that Rafal mentioned).

Thanks,
j.

On 05/25/2010 12:35 PM, Rafal Wojtczuk wrote:
> Hello,
> I would be grateful for the comments on possible methods to improve domain
> restore performance. Focusing on the PV case, if it matters.
> 1) xen-4.0.0
> I see a similar problem to the one reported at the thread at
> http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.html
> 
> Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64. 
> [user@qubes ~]$ xm create /dev/null
> 	kernel=/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64 
> 	root=/dev/mapper/dmroot extra="rootdelay=1000" memory=400
> ...wait a second...
> [user@qubes ~]$ xm save null nullsave
> [user@qubes ~]$ time cat nullsave >/dev/null
> ...
> [user@qubes ~]$ time cat nullsave >/dev/null
> ...
> [user@qubes ~]$ time cat nullsave >/dev/null
> real    0m0.173s
> user    0m0.010s
> sys     0m0.164s
> /* sits nicely in the cache, let's restore... */
> [user@qubes ~]$ time xm restore nullsave
> real    0m9.189s
> user    0m0.151s
> sys     0m0.039s
> 
> According to systemtap, xc_restore uses 3812s of CPU time; besides it being
> a lot, what uses the remaining 6s ? Just as reported previously, there are 
> some errors in xend.log
> 
> [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0,
> _static_max=0x19000000, _static_min=0x0, 
> [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]:
> /usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0
> [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore
> start: p2m_size = 19000
> [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory pages:
> 0%
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> Error when reading batch size
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal error:
> error when buffering batch, finishing
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) 
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100%
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0
> pages)
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoint
> load
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be
> built.
> [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with rc=0
> 
> Note, xc_restore on xen-3.4.3 works much faster (and with no warnings in the
> log), with the same dom0 pvops kernel.
> 
> Ok, so there is some issue here. Some more generic thoughts below.
> 
> 2) xen-3.4.3
> Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like
> for i in /dev/loop* ; do
> 	losetup $i
> so, spawn one losetup process per each existing /dev/loopX; it hogs CPU, 
> especially if your system comes with maxloops=255 :). So,
> let's replace it with the xen-4.0.0 version, where this problem is fixed (it 
> uses losetup -a, hurray).
> Then, restore time for a 400MB domain, with the restore file in the cache,
> with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time.
> According to systemtap, the CPU time requirements are
> xend threads- 0.363s
> udevd(in dom0) - 0.007s
> /etc/xen/scripts/block and its children - 1.075s
> xc_restore - 1.368s
> /etc/xen/scripts/vif-bridge (in netvm) - 0.130s
> 
> The obvious idea to improve /etc/xen/scripts/block shell script execution time 
> is to recode it, in some other language that will not spawn hundreds of 
> processes to do its job.
> 
> Now, xc_restore.
> a) Is it correct that when xc_restore runs, the target domain memory is already
> zeroed (because hypervisor scrubs free memory, before it is assigned to a
> new domain) ? So, xc_save could check whether a given page contains only
> zeroes and if so, omit it in the savefile. This could result in quite
> significant savings when
> - we save a freshly booted domain, or if we can zero out free memory in the 
>   domain before saving
> - we plan to restore multiple times from the same savefile (yes, vbd must be
> restored in this case too).
> 
> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
> read syscall per page. Make it read in larger chunks. It looks it is fixed in
> xen-4.0.0, is this correct ?
> 
> Also, it looks really excessive that basically copying 400MB of memory takes 
> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
> dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything 
> else ?
> I am aware that in the usual cases, xc_restore is not the bottleneck 
> (savefile reads from the disk or the network is), but in case we can fetch 
> savefile quickly, it matters.
> 
> Is 3.4.3 branch still being developed, or pure maintenance mode only, so new 
> code should be prepared for 4.0.0 ? 
> 
> Regards,
> Rafal Wojtczuk
> Principal Researcher
> Invisible Things Lab, Qubes-os project
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Improving domU restore time
  2010-05-25 10:35 Improving domU restore time Rafal Wojtczuk
  2010-05-25 10:58 ` Joanna Rutkowska
@ 2010-05-25 11:50 ` Keir Fraser
  2010-05-25 12:50   ` Rafal Wojtczuk
  2010-05-31  9:42   ` Rafal Wojtczuk
  1 sibling, 2 replies; 18+ messages in thread
From: Keir Fraser @ 2010-05-25 11:50 UTC (permalink / raw)
  To: Rafal Wojtczuk, xen-devel@lists.xensource.com

On 25/05/2010 11:35, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:

> a) Is it correct that when xc_restore runs, the target domain memory is
> already
> zeroed (because hypervisor scrubs free memory, before it is assigned to a
> new domain)

There is no guarantee that the memory will be zeroed.

> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
> read syscall per page. Make it read in larger chunks. It looks it is fixed in
> xen-4.0.0, is this correct ?

It got changed a lot for Remus. I expect performance was on their mind.
Normally kernel's file readahead heuristic would get back most of the
performance of not reading in larger chunks.

> Also, it looks really excessive that basically copying 400MB of memory takes
> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
> dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything
> else ?

I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of
that loop.

 -- Keir

> I am aware that in the usual cases, xc_restore is not the bottleneck
> (savefile reads from the disk or the network is), but in case we can fetch
> savefile quickly, it matters.
> 
> Is 3.4.3 branch still being developed, or pure maintenance mode only, so new
> code should be prepared for 4.0.0 ? 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Improving domU restore time
  2010-05-25 11:50 ` Keir Fraser
@ 2010-05-25 12:50   ` Rafal Wojtczuk
  2010-05-25 12:59     ` Keir Fraser
  2010-05-25 13:02     ` Improving domU restore time Keir Fraser
  2010-05-31  9:42   ` Rafal Wojtczuk
  1 sibling, 2 replies; 18+ messages in thread
From: Rafal Wojtczuk @ 2010-05-25 12:50 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com

On Tue, May 25, 2010 at 12:50:40PM +0100, Keir Fraser wrote:
> On 25/05/2010 11:35, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:
> 
> > a) Is it correct that when xc_restore runs, the target domain memory is
> > already
> > zeroed (because hypervisor scrubs free memory, before it is assigned to a
> > new domain)
> 
> There is no guarantee that the memory will be zeroed.
Interesting.
For my education, could you explain who is responsible for clearing memory
of a newborn domain ? Xend ? Could you point me to the relevant code
fragments ?
It looks sensible to clear free memory in hypervisor context in its idle 
cycles; if non-temporal instructions (movnti) were used for this, it would 
not pollute caches, and it must be done anyway ?

> > b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
> > read syscall per page. Make it read in larger chunks. It looks it is fixed in
> > xen-4.0.0, is this correct ?
> 
> It got changed a lot for Remus. I expect performance was on their mind.
> Normally kernel's file readahead heuristic would get back most of the
> performance of not reading in larger chunks.
Yes, readahead would keep the disk request queue full, but I was just
thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)
[user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400
102400+0 records in
102400+0 records out
419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s
[user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s

RW

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Improving domU restore time
  2010-05-25 12:50   ` Rafal Wojtczuk
@ 2010-05-25 12:59     ` Keir Fraser
  2010-05-25 13:33       ` scrubbing free'd pages James Harper
  2010-05-25 14:12       ` scrubbing pages on vm pause Joanna Rutkowska
  2010-05-25 13:02     ` Improving domU restore time Keir Fraser
  1 sibling, 2 replies; 18+ messages in thread
From: Keir Fraser @ 2010-05-25 12:59 UTC (permalink / raw)
  To: Rafal Wojtczuk; +Cc: xen-devel@lists.xensource.com

On 25/05/2010 13:50, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:

>> There is no guarantee that the memory will be zeroed.
> Interesting.
> For my education, could you explain who is responsible for clearing memory
> of a newborn domain ? Xend ? Could you point me to the relevant code
> fragments ?

New domains are not guaranteed to receive zeroed memory. The only guarantee
Xen provides is that when it frees memory for a *dead* domain, it will scrub
the contents before reallocation (it may not write zeroes however, in a
debug build of Xen for example!). Other memory pages the domain freeing the
pages must scrub them itself before freeing them back to Xen.

> It looks sensible to clear free memory in hypervisor context in its idle
> cycles; if non-temporal instructions (movnti) were used for this, it would
> not pollute caches, and it must be done anyway ?

Only for that one case (freeing pages of a dead domain). In that one case we
currently do it synchronously. But that is because it was better than my
previous crappy asynchronous scrubbing code. :-)

>>> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
>>> read syscall per page. Make it read in larger chunks. It looks it is fixed
>>> in
>>> xen-4.0.0, is this correct ?
>> 
>> It got changed a lot for Remus. I expect performance was on their mind.
>> Normally kernel's file readahead heuristic would get back most of the
>> performance of not reading in larger chunks.
> Yes, readahead would keep the disk request queue full, but I was just
> thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)

Well the code looks like it batches now anyway. If it isn't, it would be
interesting to see if making batches would measurably improve performance.

 -- Keir

> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s
> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100
> 100+0 records in
> 100+0 records out
> 419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Improving domU restore time
  2010-05-25 12:50   ` Rafal Wojtczuk
  2010-05-25 12:59     ` Keir Fraser
@ 2010-05-25 13:02     ` Keir Fraser
  1 sibling, 0 replies; 18+ messages in thread
From: Keir Fraser @ 2010-05-25 13:02 UTC (permalink / raw)
  To: Rafal Wojtczuk; +Cc: xen-devel@lists.xensource.com

On 25/05/2010 13:50, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:

>> There is no guarantee that the memory will be zeroed.
> Interesting.
> For my education, could you explain who is responsible for clearing memory
> of a newborn domain ? Xend ? Could you point me to the relevant code
> fragments ?

New domains are not guaranteed to receive zeroed memory. The only guarantee
Xen provides is that when it frees memory for a *dead* domain, it will scrub
the contents before reallocation (it may not write zeroes however, in a
debug build of Xen for example!). Other memory pages the domain freeing the
pages must scrub them itself before freeing them back to Xen.

> It looks sensible to clear free memory in hypervisor context in its idle
> cycles; if non-temporal instructions (movnti) were used for this, it would
> not pollute caches, and it must be done anyway ?

Only for that one case (freeing pages of a dead domain). In that one case we
currently do it synchronously. But that is because it was better than my
previous crappy asynchronous scrubbing code. :-)

>>> b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, one
>>> read syscall per page. Make it read in larger chunks. It looks it is fixed
>>> in
>>> xen-4.0.0, is this correct ?
>> 
>> It got changed a lot for Remus. I expect performance was on their mind.
>> Normally kernel's file readahead heuristic would get back most of the
>> performance of not reading in larger chunks.
> Yes, readahead would keep the disk request queue full, but I was just
> thinking of lowering the syscall overhead. 1e5 syscalls is a lot :)

Well the code looks like it batches now anyway. If it isn't, it would be
interesting to see if making batches would measurably improve performance.

 -- Keir

> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4k count=102400
> 102400+0 records in
> 102400+0 records out
> 419430400 bytes (419 MB) copied, 0.307211 s, 1.4 GB/s
> [user@qubes ~]$ dd if=/dev/zero of=/dev/null bs=4M count=100
> 100+0 records in
> 100+0 records out
> 419430400 bytes (419 MB) copied, 0.25347 s, 1.7 GB/s

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: scrubbing free'd pages
  2010-05-25 12:59     ` Keir Fraser
@ 2010-05-25 13:33       ` James Harper
  2010-05-25 13:39         ` Keir Fraser
  2010-05-25 14:12       ` scrubbing pages on vm pause Joanna Rutkowska
  1 sibling, 1 reply; 18+ messages in thread
From: James Harper @ 2010-05-25 13:33 UTC (permalink / raw)
  To: Keir Fraser, Rafal Wojtczuk; +Cc: xen-devel

> Other memory pages the domain freeing the
> pages must scrub them itself before freeing them back to Xen.

Is that true for a HVM domain making a decrease_reservation hypercall?
If so I should modify my code accordingly... it also means I need to
know if the page I'm decreasing is an unpopulated PoD page or not too.

James

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: scrubbing free'd pages
  2010-05-25 13:33       ` scrubbing free'd pages James Harper
@ 2010-05-25 13:39         ` Keir Fraser
  2010-05-25 13:48           ` Paul Durrant
  0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2010-05-25 13:39 UTC (permalink / raw)
  To: James Harper, Rafal Wojtczuk; +Cc: xen-devel@lists.xensource.com

On 25/05/2010 14:33, "James Harper" <james.harper@bendigoit.com.au> wrote:

>> Other memory pages the domain freeing the
>> pages must scrub them itself before freeing them back to Xen.
> 
> Is that true for a HVM domain making a decrease_reservation hypercall?
> If so I should modify my code accordingly...

Yes you should.

> it also means I need to
> know if the page I'm decreasing is an unpopulated PoD page or not too.

Certainly you could avoid it in that case. Actually I think the PoD code can
detect and reclaim allocated-but-zeroed pages however. But not sure if you
really have to rely on that or not.

 -- Keir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: scrubbing free'd pages
  2010-05-25 13:39         ` Keir Fraser
@ 2010-05-25 13:48           ` Paul Durrant
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Durrant @ 2010-05-25 13:48 UTC (permalink / raw)
  To: Keir Fraser, James Harper, Rafal Wojtczuk; +Cc: xen-devel@lists.xensource.com

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of Keir Fraser
> Sent: 25 May 2010 14:40
> To: James Harper; Rafal Wojtczuk
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] scrubbing free'd pages
> 
> On 25/05/2010 14:33, "James Harper" <james.harper@bendigoit.com.au>
> wrote:
> 
> >> Other memory pages the domain freeing the
> >> pages must scrub them itself before freeing them back to Xen.
> >
> > Is that true for a HVM domain making a decrease_reservation
> hypercall?
> > If so I should modify my code accordingly...
> 
> Yes you should.
> 
> > it also means I need to
> > know if the page I'm decreasing is an unpopulated PoD page or not
> too.
> 
> Certainly you could avoid it in that case. Actually I think the PoD
> code can
> detect and reclaim allocated-but-zeroed pages however. But not sure
> if you
> really have to rely on that or not.
> 

Yes, that's true, but it would be better if we didn't have to scrub pages and cause a populate immediately before an invalidate.

  Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* scrubbing pages on vm pause
  2010-05-25 12:59     ` Keir Fraser
  2010-05-25 13:33       ` scrubbing free'd pages James Harper
@ 2010-05-25 14:12       ` Joanna Rutkowska
  2010-05-25 14:13         ` Keir Fraser
  1 sibling, 1 reply; 18+ messages in thread
From: Joanna Rutkowska @ 2010-05-25 14:12 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com, Rafal Wojtczuk


[-- Attachment #1.1: Type: text/plain, Size: 868 bytes --]

On 05/25/2010 02:59 PM, Keir Fraser wrote:
> On 25/05/2010 13:50, "Rafal Wojtczuk" <rafal@invisiblethingslab.com> wrote:
> 
>>> There is no guarantee that the memory will be zeroed.
>> Interesting.
>> For my education, could you explain who is responsible for clearing memory
>> of a newborn domain ? Xend ? Could you point me to the relevant code
>> fragments ?
> 
> New domains are not guaranteed to receive zeroed memory. The only guarantee
> Xen provides is that when it frees memory for a *dead* domain, it will scrub
> the contents before reallocation (it may not write zeroes however, in a
> debug build of Xen for example!). Other memory pages the domain freeing the
> pages must scrub them itself before freeing them back to Xen.
> 

And what happens when we pause and save a domain? Are the pages zero-out
by xen in that case?

joanna.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: scrubbing pages on vm pause
  2010-05-25 14:12       ` scrubbing pages on vm pause Joanna Rutkowska
@ 2010-05-25 14:13         ` Keir Fraser
  2010-05-25 14:19           ` Joanna Rutkowska
  0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2010-05-25 14:13 UTC (permalink / raw)
  To: Joanna Rutkowska; +Cc: xen-devel@lists.xensource.com, Rafal Wojtczuk

On 25/05/2010 15:12, "Joanna Rutkowska" <joanna@invisiblethingslab.com>
wrote:

>> New domains are not guaranteed to receive zeroed memory. The only guarantee
>> Xen provides is that when it frees memory for a *dead* domain, it will scrub
>> the contents before reallocation (it may not write zeroes however, in a
>> debug build of Xen for example!). Other memory pages the domain freeing the
>> pages must scrub them itself before freeing them back to Xen.
>> 
> 
> And what happens when we pause and save a domain? Are the pages zero-out
> by xen in that case?

If the original domain is subsequently destroyed then yes, Xen zeroes the
pages.

 -- Keir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: scrubbing pages on vm pause
  2010-05-25 14:13         ` Keir Fraser
@ 2010-05-25 14:19           ` Joanna Rutkowska
  2010-05-25 14:19             ` Keir Fraser
  0 siblings, 1 reply; 18+ messages in thread
From: Joanna Rutkowska @ 2010-05-25 14:19 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com, Rafal Wojtczuk


[-- Attachment #1.1: Type: text/plain, Size: 932 bytes --]

On 05/25/2010 04:13 PM, Keir Fraser wrote:
> On 25/05/2010 15:12, "Joanna Rutkowska" <joanna@invisiblethingslab.com>
> wrote:
> 
>>> New domains are not guaranteed to receive zeroed memory. The only guarantee
>>> Xen provides is that when it frees memory for a *dead* domain, it will scrub
>>> the contents before reallocation (it may not write zeroes however, in a
>>> debug build of Xen for example!). Other memory pages the domain freeing the
>>> pages must scrub them itself before freeing them back to Xen.
>>>
>>
>> And what happens when we pause and save a domain? Are the pages zero-out
>> by xen in that case?
> 
> If the original domain is subsequently destroyed then yes, Xen zeroes the
> pages.
> 

Let's consider this scenario:

xm save domain1

xm create domain2

Can the domain2 get *unscrubbed* pages that were previously used by
domain1, but were not scrubbed properly by domain1?

j.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: scrubbing pages on vm pause
  2010-05-25 14:19           ` Joanna Rutkowska
@ 2010-05-25 14:19             ` Keir Fraser
  2010-05-25 14:24               ` Joanna Rutkowska
  0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2010-05-25 14:19 UTC (permalink / raw)
  To: Joanna Rutkowska; +Cc: xen-devel@lists.xensource.com, Rafal Wojtczuk

On 25/05/2010 15:19, "Joanna Rutkowska" <joanna@invisiblethingslab.com>
wrote:

> Let's consider this scenario:
> 
> xm save domain1
> 
> xm create domain2
> 
> Can the domain2 get *unscrubbed* pages that were previously used by
> domain1, but were not scrubbed properly by domain1?

Generally speaking a domain loses pages to the free pool in only two ways:
via a decrease_reservation hypercall, and via domain destruction. In the
former case the domain itself is responsible for first scrubbing the page.
In the latter case Xen is responsible. With both avenues covered, domain2
cannot get unscrubbed pages from domain1.

 -- Keir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: scrubbing pages on vm pause
  2010-05-25 14:19             ` Keir Fraser
@ 2010-05-25 14:24               ` Joanna Rutkowska
  0 siblings, 0 replies; 18+ messages in thread
From: Joanna Rutkowska @ 2010-05-25 14:24 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com, Rafal Wojtczuk


[-- Attachment #1.1: Type: text/plain, Size: 742 bytes --]

On 05/25/2010 04:19 PM, Keir Fraser wrote:
> On 25/05/2010 15:19, "Joanna Rutkowska" <joanna@invisiblethingslab.com>
> wrote:
> 
>> Let's consider this scenario:
>>
>> xm save domain1
>>
>> xm create domain2
>>
>> Can the domain2 get *unscrubbed* pages that were previously used by
>> domain1, but were not scrubbed properly by domain1?
> 
> Generally speaking a domain loses pages to the free pool in only two ways:
> via a decrease_reservation hypercall, and via domain destruction. In the
> former case the domain itself is responsible for first scrubbing the page.
> In the latter case Xen is responsible. With both avenues covered, domain2
> cannot get unscrubbed pages from domain1.
> 
Makes sense.

Thanks,
j.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Improving domU restore time
  2010-05-25 11:50 ` Keir Fraser
  2010-05-25 12:50   ` Rafal Wojtczuk
@ 2010-05-31  9:42   ` Rafal Wojtczuk
  2010-06-01 17:00     ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 18+ messages in thread
From: Rafal Wojtczuk @ 2010-05-31  9:42 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 3918 bytes --]

Hello,
> I would be grateful for the comments on possible methods to improve domain
> restore performance. Focusing on the PV case, if it matters.
Continuing the topic; thank you to everyone that responded so far.

Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. 
Let me just reiterate that for our purposes, the domain save time (and 
possible related post-processing) is not critical, it 
is only the restore time that matters. I did some experiments; they involve:
1) before saving a domain, have domU allocate all free memory in an userland
process, then fill it with some MAGIC_PATTERN. Save domU, then process the
savefile, removing all pfns (and their page content) that refer to a page 
containing MAGIC_PATTERN.
This reduces the savefile size.
2) instead of executing "xm restore savefile", just poke the xmlrpc request
to Xend unix socket via socat
3) change the /etc/xen/scripts/block so that in the "add file:" case, it calls
only 3 processes (xenstore-read, losetup, xenstore-write); assuming the
sharing check can be done elsewhere, this should provide realistic lower
bound for the execution time

For a domain with 400MB RAM and 4 vbds, with the savefile in the fs cache, 
this cuts down the restore real time from 2700 ms to 1153 ms. Some questions:
a) is the 1) method safe ? Normally, xc_domain_restore() allocates mfns via 
xc_domain_memory_populate_physmap() and then calls 
xc_add_mmu_update(MMU_MACHPHYS_UPDATE) on
the pfn/mfn pairs. If we remove some pfns from the savefile, this will not
happen. Instead, the mfn for the removed pfn (referring to memory whose
content we don't care for) will be allocated in uncanonicalize_pagetable(),
because there will be a pte entry for this page. But uncanonicalize_pagetable()
does not call xc_add_mmu_update(). Still, the domain seems to be restored 
properly (naturally the buffer filled previously with MAGIC_PATTERN now 
contains junk, but this is the whole purpose of it).
Again, is xc_add_mmu_update(MMU_MACHPHYS_UPDATE) really needed in the above
scenario ? It basically does
set_gpfn_from_mfn(mfn, gpfn)
but this should already be taken care for by 
xc_domain_memory_populate_physmap() ?

b) There still seems to be some discrepancy between the real time (1153ms) and
the CPU time (970ms); considering this is a machine with 2 cores (and at
least the hotplug scripts execute in parallel), it is notable. What can cause 
the involved processes to sleep (we read the savefile from fs cache, so there 
should be no disk reads at all). Is the single threaded nature of xenstored 
the possible cause for the delays ?
Generally xenstored seems to be quite busy during the restore. Do you think
some of the queries (from Xend?) are redundant ? Is there anything else
that can be removed from the relevant Xend code with no harm ? This question
may sound too blunt; but given the fact that "xm restore savefile" wastes 220
ms of CPU time doing apparently nothing useful, I would assume there is some
overhead in Xend too. 
The systemtap trace in the attachment; it does not contain a line about the 
xenstored CPU ticks (259ms, really a lot?), as xenstored does not terminate 
any thread. 

c) 
>> Also, it looks really excessive that basically copying 400MB of memory takes
>> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of
> that loop.
Let's imagine there is a hypercall do_direct_memcpy_from_dom0_to_mfn(int
mfn_count, mfn* mfn_array, char * pages_content).
Would it make xc_restore faster if instead of using the xc_map_foreign_batch()
interface, it would call the above hypercall ? On x86_64 all the physical
memory is already mapped in the hypervisor (is this correct?), so this could 
be quicker, as no page table setup would be necessary ?

Regards,
Rafal Wojtczuk
Principal Researcher
Invisible Things Lab, Qubes-os project

[-- Attachment #2: probe.systemtap --]
[-- Type: text/plain, Size: 634 bytes --]

global process_start
probe kernel.function("do_execve").return {
	p=pid()
	t=gettimeofday_us()
	process_start[p]=t
	printf("executed pid %d parent %d time=%d name=%s\n", p, ppid(), t, execname())
}

probe timer.profile {
	tid=tid()
	if (!user_mode())
		kticks[tid] <<< 1
	else
		uticks[tid] <<< 1
	tids[tid] <<< 1
}
global uticks, kticks, tids
function logit() {
	tid=tid()
	p=pid()
	t=gettimeofday_us()
	elapsed=t-process_start[p]
	printf("finishing tid %d pid %d parent %d ticks %d real %d time=%d name=%s\n", 
		tid, p, ppid(), @count(kticks[tid])+@count(uticks[tid]), elapsed, t, execname())
}

probe syscall.exit {
	logit()
}

[-- Attachment #3: probeoutput.txt --]
[-- Type: text/plain, Size: 3909 bytes --]

# slightly postprocessed, to include [delta_from_prev, abstime] values before
# each event logged
0 0
executed pid 10521 parent 2232 time=1275054562565050 name=socat
254543 254543
executed pid 10523 parent 10215 time=1275054562819593 name=block
28076 282619
executed pid 10524 parent 10220 time=1275054562847669 name=block
4585 287204
executed pid 10525 parent 10523 time=1275054562852254 name=xenstore-read
2645 289849
executed pid 10526 parent 10524 time=1275054562854899 name=xenstore-read
13441 303290
finishing tid 10525 pid 10525 parent 10523 ticks 3 real 16086 time=1275054562868340 name=xenstore-read
6564 309854
executed pid 10528 parent 10523 time=1275054562874904 name=losetup
4772 314626
finishing tid 10526 pid 10526 parent 10524 ticks 3 real 24777 time=1275054562879676 name=xenstore-read
2782 317408
executed pid 10530 parent 10524 time=1275054562882458 name=losetup
1715 319123
executed pid 10529 parent 10527 time=1275054562884173 name=block
1816 320939
finishing tid 10530 pid 10530 parent 10524 ticks 2 real 3531 time=1275054562885989 name=losetup
4658 325597
executed pid 10532 parent 10524 time=1275054562890647 name=xenstore-write
1820 327417
executed pid 10533 parent 10529 time=1275054562892467 name=xenstore-read
4007 331424
finishing tid 10532 pid 10532 parent 10524 ticks 2 real 5827 time=1275054562896474 name=xenstore-write
538 331962
finishing tid 10524 pid 10524 parent 10220 ticks 7 real 49343 time=1275054562897012 name=block
1133 333095
finishing tid 10528 pid 10528 parent 10523 ticks 4 real 23241 time=1275054562898145 name=losetup
3605 336700
finishing tid 10533 pid 10533 parent 10529 ticks 2 real 9283 time=1275054562901750 name=xenstore-read
1714 338414
executed pid 10535 parent 10523 time=1275054562903464 name=xenstore-write
3556 341970
executed pid 10536 parent 10529 time=1275054562907020 name=losetup
2064 344034
finishing tid 10536 pid 10536 parent 10529 ticks 3 real 2064 time=1275054562909084 name=losetup
1382 345416
finishing tid 10535 pid 10535 parent 10523 ticks 3 real 7002 time=1275054562910466 name=xenstore-write
591 346007
finishing tid 10523 pid 10523 parent 10215 ticks 7 real 91464 time=1275054562911057 name=block
3332 349339
executed pid 10538 parent 10529 time=1275054562914389 name=xenstore-write
4557 353896
finishing tid 10538 pid 10538 parent 10529 ticks 2 real 4557 time=1275054562918946 name=xenstore-write
549 354445
finishing tid 10529 pid 10529 parent 10527 ticks 7 real 35322 time=1275054562919495 name=block
25937 380382
executed pid 10539 parent 10215 time=1275054562945432 name=block
6636 387018
executed pid 10540 parent 10539 time=1275054562952068 name=xenstore-read
4327 391345
finishing tid 10540 pid 10540 parent 10539 ticks 3 real 4327 time=1275054562956395 name=xenstore-read
3895 395240
executed pid 10541 parent 10539 time=1275054562960290 name=losetup
1603 396843
finishing tid 10541 pid 10541 parent 10539 ticks 3 real 1603 time=1275054562961893 name=losetup
2141 398984
executed pid 10543 parent 10539 time=1275054562964034 name=xenstore-write
5343 404327
finishing tid 10543 pid 10543 parent 10539 ticks 2 real 5343 time=1275054562969377 name=xenstore-write
577 404904
finishing tid 10539 pid 10539 parent 10215 ticks 7 real 24522 time=1275054562969954 name=block
67293 472197
executed pid 10544 parent 8826 time=1275054563037247 name=xc_restore
407415 879612
finishing tid 10544 pid 10544 parent 8826 ticks 387 real 407415 time=1275054563444662 name=xc_restore
2571 882183
finishing tid 10545 pid 8826 parent 1 ticks 15 real 1275054563447233 time=1275054563447233 name=xend
271673 1153856
finishing tid 10521 pid 10521 parent 2232 ticks 8 real 1153856 time=1275054563718906 name=socat
73 1153929
finishing tid 10522 pid 8826 parent 1 ticks 238 real 1275054563718979 time=1275054563718979 name=xend
2258682 3412611
finishing tid 10215 pid 10215 parent 748 ticks 5 real 1275054565977661 time=1275054565977661 name=udevd

[-- Attachment #4: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Improving domU restore time
  2010-05-31  9:42   ` Rafal Wojtczuk
@ 2010-06-01 17:00     ` Jeremy Fitzhardinge
  2010-06-02 16:24       ` Rafal Wojtczuk
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-01 17:00 UTC (permalink / raw)
  To: Rafal Wojtczuk; +Cc: xen-devel@lists.xensource.com, Keir Fraser

On 05/31/2010 02:42 AM, Rafal Wojtczuk wrote:
> Hello,
>   
>> I would be grateful for the comments on possible methods to improve domain
>> restore performance. Focusing on the PV case, if it matters.
>>     
> Continuing the topic; thank you to everyone that responded so far.
>
> Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. 
> Let me just reiterate that for our purposes, the domain save time (and 
> possible related post-processing) is not critical, it 
> is only the restore time that matters. I did some experiments; they involve:
> 1) before saving a domain, have domU allocate all free memory in an userland
> process, then fill it with some MAGIC_PATTERN. Save domU, then process the
> savefile, removing all pfns (and their page content) that refer to a page 
> containing MAGIC_PATTERN.
> This reduces the savefile size.
>   

Why not just balloon the domain down?

> 2) instead of executing "xm restore savefile", just poke the xmlrpc request
> to Xend unix socket via socat
>   

I would seek alternatives to the xend/xm toolset.  I've been doing my
bit to make libxenlight/xl useful, though it still needs a lot of work
to get it to anything remotely production-ready...

> 3) change the /etc/xen/scripts/block so that in the "add file:" case, it calls
> only 3 processes (xenstore-read, losetup, xenstore-write); assuming the
> sharing check can be done elsewhere, this should provide realistic lower
> bound for the execution time
>
> For a domain with 400MB RAM and 4 vbds, with the savefile in the fs cache, 
> this cuts down the restore real time from 2700 ms to 1153 ms. Some questions:
> a) is the 1) method safe ? Normally, xc_domain_restore() allocates mfns via 
> xc_domain_memory_populate_physmap() and then calls 
> xc_add_mmu_update(MMU_MACHPHYS_UPDATE) on
> the pfn/mfn pairs. If we remove some pfns from the savefile, this will not
> happen. Instead, the mfn for the removed pfn (referring to memory whose
> content we don't care for) will be allocated in uncanonicalize_pagetable(),
> because there will be a pte entry for this page. But uncanonicalize_pagetable()
> does not call xc_add_mmu_update(). Still, the domain seems to be restored 
> properly (naturally the buffer filled previously with MAGIC_PATTERN now 
> contains junk, but this is the whole purpose of it).
> Again, is xc_add_mmu_update(MMU_MACHPHYS_UPDATE) really needed in the above
> scenario ? It basically does
> set_gpfn_from_mfn(mfn, gpfn)
> but this should already be taken care for by 
> xc_domain_memory_populate_physmap() ?
>
> b) There still seems to be some discrepancy between the real time (1153ms) and
> the CPU time (970ms); considering this is a machine with 2 cores (and at
> least the hotplug scripts execute in parallel), it is notable. What can cause 
> the involved processes to sleep (we read the savefile from fs cache, so there 
> should be no disk reads at all). Is the single threaded nature of xenstored 
> the possible cause for the delays ?
>   

Have you tried oxenstored?  It works well for me, and seems to be a lot
faster.

> Generally xenstored seems to be quite busy during the restore. Do you think
> some of the queries (from Xend?) are redundant ? Is there anything else
> that can be removed from the relevant Xend code with no harm ? This question
> may sound too blunt; but given the fact that "xm restore savefile" wastes 220
> ms of CPU time doing apparently nothing useful, I would assume there is some
> overhead in Xend too. 
> The systemtap trace in the attachment; it does not contain a line about the 
> xenstored CPU ticks (259ms, really a lot?), as xenstored does not terminate 
> any thread. 
>
> c) 
>   
>>> Also, it looks really excessive that basically copying 400MB of memory takes
>>> over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its
>>>       
>> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of
>> that loop.
>>     
> Let's imagine there is a hypercall do_direct_memcpy_from_dom0_to_mfn(int
> mfn_count, mfn* mfn_array, char * pages_content).
> Would it make xc_restore faster if instead of using the xc_map_foreign_batch()
> interface, it would call the above hypercall ? On x86_64 all the physical
> memory is already mapped in the hypervisor (is this correct?), so this could 
> be quicker, as no page table setup would be necessary ?
>   

The main cost of pagetable manipulations is the tlb flush; if you can
batch all your setups together to amortize the cost of the tlb flush, it
should be pretty quick.  But if batching is not being used properly,
then it could get very expensive.  My own observation of "strace xl
restore" is that it seems to do a *lot* of ioctls on privcmd, but I
haven't looked more closely to see what those calls are, and whether
they're being done in an optimal way.

     J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Improving domU restore time
  2010-06-01 17:00     ` Jeremy Fitzhardinge
@ 2010-06-02 16:24       ` Rafal Wojtczuk
  2010-06-02 16:33         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Rafal Wojtczuk @ 2010-06-02 16:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com, Keir Fraser

On Tue, Jun 01, 2010 at 10:00:09AM -0700, Jeremy Fitzhardinge wrote:
> On 05/31/2010 02:42 AM, Rafal Wojtczuk wrote:
> > Hello,
> >   
> >> I would be grateful for the comments on possible methods to improve domain
> >> restore performance. Focusing on the PV case, if it matters.
> >>     
> > Continuing the topic; thank you to everyone that responded so far.
> >
> > Focusing on xen-3.4.3 case for now, dom0/domU still 2.6.32.x pvops x86_64. 
> > Let me just reiterate that for our purposes, the domain save time (and 
> > possible related post-processing) is not critical, it 
> > is only the restore time that matters. I did some experiments; they involve:
> > 1) before saving a domain, have domU allocate all free memory in an userland
> > process, then fill it with some MAGIC_PATTERN. Save domU, then process the
> > savefile, removing all pfns (and their page content) that refer to a page 
> > containing MAGIC_PATTERN.
> > This reduces the savefile size.
> Why not just balloon the domain down?
I thought it (well, rather the matching balloon up after restore) would cost 
quite some CPU time; it used to AFAIR. But nowadays it looks sensible, in 90ms
range. Yes, that is much cleaner, thank you for the hint.
 
> > should be no disk reads at all). Is the single threaded nature of xenstored 
> > the possible cause for the delays ?
> Have you tried oxenstored?  It works well for me, and seems to be a lot
> faster.
Do you mean 
http://xenbits.xensource.com/ext/xen-ocaml-tools.hg
?
After some tweaks to Makefiles (-fPIC is required on x86_64 for libs sources) 
it compiles, but then it bails during startup with 
fatal error: exception Failure("ioctl bind_interdomain failed")
This happens under xen-3.4.3; does it require 4.0.0 ?

> >> I would expect IOCTL_PRIVCMD_MMAPBATCH to be the most significant part of
> >> that loop.
> > Let's imagine there is a hypercall do_direct_memcpy_from_dom0_to_mfn(int
> > mfn_count, mfn* mfn_array, char * pages_content).
> The main cost of pagetable manipulations is the tlb flush; if you can
> batch all your setups together to amortize the cost of the tlb flush, it
> should be pretty quick.  But if batching is not being used properly,
> then it could get very expensive.  My own observation of "strace xl
> restore" is that it seems to do a *lot* of ioctls on privcmd, but I
> haven't looked more closely to see what those calls are, and whether
> they're being done in an optimal way.
Well, it looks like xc_restore should _usually_ call 
xc_map_foreign_batch once per pages batch (once per 1024 read pages), which
looks sensible. xc_add_mmu_update also tries to batch requests. There are 
432 occurences of ioctl syscall in the xc_restore strace output; I am not 
sure if it is damagingly numerous. 

Regards,
Rafal Wojtczuk
Principal Researcher
Invisible Things Lab, Qubes-os project

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: Improving domU restore time
  2010-06-02 16:24       ` Rafal Wojtczuk
@ 2010-06-02 16:33         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-02 16:33 UTC (permalink / raw)
  To: Rafal Wojtczuk; +Cc: xen-devel@lists.xensource.com, Keir Fraser

On 06/02/2010 09:24 AM, Rafal Wojtczuk wrote:
>> Why not just balloon the domain down?
>>     
> I thought it (well, rather the matching balloon up after restore) would cost 
> quite some CPU time; it used to AFAIR. But nowadays it looks sensible, in 90ms
> range. Yes, that is much cleaner, thank you for the hint.
>   

Aside from the cost of the hypercalls to actually give up the pages,
ballooning is just the same as memory allocation from the system's
perspective.

>>> should be no disk reads at all). Is the single threaded nature of xenstored 
>>> the possible cause for the delays ?
>>>       
>> Have you tried oxenstored?  It works well for me, and seems to be a lot
>> faster.
>>     
> Do you mean 
> http://xenbits.xensource.com/ext/xen-ocaml-tools.hg
> ?
> After some tweaks to Makefiles (-fPIC is required on x86_64 for libs sources) 
> it compiles,

It builds out of the box for me on my x86-64 machine.

>  but then it bails during startup with 
> fatal error: exception Failure("ioctl bind_interdomain failed")
> This happens under xen-3.4.3; does it require 4.0.0 ?
>   

No, I don't think so, but it does have to be the first xenstore you run
after boot.  Ah, but Xen 4 probably has oxenstored build and other fixes
which aren't in 3.4.3.  In particular, I think it has been brought into
the main xen-unstable repo, rather than living off to the side.

But it is much quicker than the C one, I think primarily because it is
entirely memory resident.

> Well, it looks like xc_restore should _usually_ call 
> xc_map_foreign_batch once per pages batch (once per 1024 read pages), which
> looks sensible. xc_add_mmu_update also tries to batch requests. There are 
> 432 occurences of ioctl syscall in the xc_restore strace output; I am not 
> sure if it is damagingly numerous. 
>   

Time for some profiling to see where the time is going then.

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-06-02 16:33 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-25 10:35 Improving domU restore time Rafal Wojtczuk
2010-05-25 10:58 ` Joanna Rutkowska
2010-05-25 11:50 ` Keir Fraser
2010-05-25 12:50   ` Rafal Wojtczuk
2010-05-25 12:59     ` Keir Fraser
2010-05-25 13:33       ` scrubbing free'd pages James Harper
2010-05-25 13:39         ` Keir Fraser
2010-05-25 13:48           ` Paul Durrant
2010-05-25 14:12       ` scrubbing pages on vm pause Joanna Rutkowska
2010-05-25 14:13         ` Keir Fraser
2010-05-25 14:19           ` Joanna Rutkowska
2010-05-25 14:19             ` Keir Fraser
2010-05-25 14:24               ` Joanna Rutkowska
2010-05-25 13:02     ` Improving domU restore time Keir Fraser
2010-05-31  9:42   ` Rafal Wojtczuk
2010-06-01 17:00     ` Jeremy Fitzhardinge
2010-06-02 16:24       ` Rafal Wojtczuk
2010-06-02 16:33         ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.