From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joanna Rutkowska Subject: Re: Improving domU restore time Date: Tue, 25 May 2010 12:58:28 +0200 Message-ID: <4BFBAD54.8020003@invisiblethingslab.com> References: <20100525103557.GC23903@emperor2.itldev.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0053324475==" Return-path: In-Reply-To: <20100525103557.GC23903@emperor2.itldev.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Rafal Wojtczuk Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============0053324475== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig29F827CBE07FC95A7817585E" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig29F827CBE07FC95A7817585E Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable A bit of a background to the Rafal's post -- we plan to implement a feature that we call "Disposable VMs" in Qubes, that would essentially allow for super-fast creation of small, one-purpose VM (DomU), e.g. just for opening of a PDF, or Word document, etc. The point is: the creation & resume of such a VM must be really fast, i.e. much below 1s. And this seems possible, especially if we use sparse files for storing the VM's save-image and the restore operation (the VMs we're talking about here would have around 100-150MB of the actual data recorded in a sparse savefile). But, as Rafal pointed out, some operations that Xen does seem to be implemented ineffectively, and wanted to get your opinion before we start optimizing them (i.e. xc_restore and /etc/xen/scripts/block optimization that Rafal mentioned). Thanks, j. On 05/25/2010 12:35 PM, Rafal Wojtczuk wrote: > Hello, > I would be grateful for the comments on possible methods to improve dom= ain > restore performance. Focusing on the PV case, if it matters. > 1) xen-4.0.0 > I see a similar problem to the one reported at the thread at > http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00677.htm= l >=20 > Dom0 is 2.6.32.9-7.pvops0 x86_64, xen-4.0.0 x86_64.=20 > [user@qubes ~]$ xm create /dev/null > kernel=3D/boot/vmlinuz-2.6.32.9-7.pvops0.qubes.x86_64=20 > root=3D/dev/mapper/dmroot extra=3D"rootdelay=3D1000" memory=3D400 > ...wait a second... > [user@qubes ~]$ xm save null nullsave > [user@qubes ~]$ time cat nullsave >/dev/null > ... > [user@qubes ~]$ time cat nullsave >/dev/null > ... > [user@qubes ~]$ time cat nullsave >/dev/null > real 0m0.173s > user 0m0.010s > sys 0m0.164s > /* sits nicely in the cache, let's restore... */ > [user@qubes ~]$ time xm restore nullsave > real 0m9.189s > user 0m0.151s > sys 0m0.039s >=20 > According to systemtap, xc_restore uses 3812s of CPU time; besides it b= eing > a lot, what uses the remaining 6s ? Just as reported previously, there = are=20 > some errors in xend.log >=20 > [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:286) restore:shadow=3D= 0x0, > _static_max=3D0x19000000, _static_min=3D0x0,=20 > [2010-05-25 10:49:02 2392] DEBUG (XendCheckpoint:305) [xc_restore]: > /usr/lib64/xen/bin/xc_restore 39 3 1 2 0 0 0 0 > [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) xc_domain_restore > start: p2m_size =3D 19000 > [2010-05-25 10:49:02 2392] INFO (XendCheckpoint:423) Reloading memory p= ages: > 0% > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal err= or: > Error when reading batch size > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) ERROR Internal err= or: > error when buffering batch, finishing > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423)=20 > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:4100% > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Memory reloaded (0= > pages) > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) read VCPU 0 > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Completed checkpoi= nt > load > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Domain ready to be= > built. > [2010-05-25 10:49:11 2392] INFO (XendCheckpoint:423) Restore exit with = rc=3D0 >=20 > Note, xc_restore on xen-3.4.3 works much faster (and with no warnings i= n the > log), with the same dom0 pvops kernel. >=20 > Ok, so there is some issue here. Some more generic thoughts below. >=20 > 2) xen-3.4.3 > Firstly, /etc/xen/scripts/block in xen-3.4.3 tries to do something like= > for i in /dev/loop* ; do > losetup $i > so, spawn one losetup process per each existing /dev/loopX; it hogs CPU= ,=20 > especially if your system comes with maxloops=3D255 :). So, > let's replace it with the xen-4.0.0 version, where this problem is fixe= d (it=20 > uses losetup -a, hurray). > Then, restore time for a 400MB domain, with the restore file in the cac= he, > with 4 vbds backed by /dev/loopX, with one vif, is ca 2.7s real time. > According to systemtap, the CPU time requirements are > xend threads- 0.363s > udevd(in dom0) - 0.007s > /etc/xen/scripts/block and its children - 1.075s > xc_restore - 1.368s > /etc/xen/scripts/vif-bridge (in netvm) - 0.130s >=20 > The obvious idea to improve /etc/xen/scripts/block shell script executi= on time=20 > is to recode it, in some other language that will not spawn hundreds of= =20 > processes to do its job. >=20 > Now, xc_restore. > a) Is it correct that when xc_restore runs, the target domain memory is= already > zeroed (because hypervisor scrubs free memory, before it is assigned to= a > new domain) ? So, xc_save could check whether a given page contains onl= y > zeroes and if so, omit it in the savefile. This could result in quite > significant savings when > - we save a freshly booted domain, or if we can zero out free memory in= the=20 > domain before saving > - we plan to restore multiple times from the same savefile (yes, vbd mu= st be > restored in this case too). >=20 > b) xen-3.4.3/xc_restore reads data from savefile in 4k portions - so, o= ne > read syscall per page. Make it read in larger chunks. It looks it is fi= xed in > xen-4.0.0, is this correct ? >=20 > Also, it looks really excessive that basically copying 400MB of memory = takes=20 > over 1.3s cpu time. Is IOCTL_PRIVCMD_MMAPBATCH the culprit (its > dom0 kernel code ? Xen mm code ? hypercall overhead ? ), anything=20 > else ? > I am aware that in the usual cases, xc_restore is not the bottleneck=20 > (savefile reads from the disk or the network is), but in case we can fe= tch=20 > savefile quickly, it matters. >=20 > Is 3.4.3 branch still being developed, or pure maintenance mode only, s= o new=20 > code should be prepared for 4.0.0 ?=20 >=20 > Regards, > Rafal Wojtczuk > Principal Researcher > Invisible Things Lab, Qubes-os project >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel --------------enig29F827CBE07FC95A7817585E Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkv7rVUACgkQORdkotfEW86BKgCeObf60mD/bnNJzZQIa2gIME8K gZYAnRS2nWlRPmhPY5ef/05rfuL/dag/ =gJ0R -----END PGP SIGNATURE----- --------------enig29F827CBE07FC95A7817585E-- --===============0053324475== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============0053324475==--