From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joanna Rutkowska Subject: Re: Xen 4.0.0x allows for data corruption in Dom0 Date: Mon, 08 Mar 2010 23:34:59 +0100 Message-ID: <4B957B93.4060401@invisiblethingslab.com> References: <4B922A89.2060105@invisiblethingslab.com> <4B957914.4050408@goop.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0973368761==" Return-path: In-Reply-To: <4B957914.4050408@goop.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jeremy Fitzhardinge Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============0973368761== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig43DE6E5682AE041812B8A9E3" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig43DE6E5682AE041812B8A9E3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 03/08/2010 11:24 PM, Jeremy Fitzhardinge wrote: > On 03/06/2010 02:12 AM, Joanna Rutkowska wrote: >> There is a nasty data corruption problem most likely allowed by a bug = in >> the Xen 4.0.0-x hypervisors. >> >> The problem occurs with a frequency of "a few chunks per 10 GB of data= >> copied", and only when running a VM (PV domU) with a specific kernel. >> The problem, however, affects not only the VM but also the Dom0, which= >> is of significant importance. >> >> How to reproduce: >> >> 1) Start at least one Xen PV VM with a pvops0 kernel. One kernel known= >> to demonstrate the problem is the one built by Michael Young, based on= >> xen/master git from Dec 23. It has recently been replaced by a newer >> kernel, which doesn't always show the problem, but I uploaded the >> previous one at the URL below, so people can use it for testing: >> >> http://invisiblethingslab.com/pub/kernel-2.6.31.9-1.2.82.xendom0.fc12.= x86_64.rpm >> >> >> Now you can start a dummy VM with this kernel, e.g.: >> >> # xm create -c /dev/null memory=3D400 kernel=3D >> extra=3D"rootdelay=3D1000" >> >> 2) Now, in Dom0, after having started this dummy VM, create a big test= >> file, filled all with zeros. Make sure to choose a size bigger than yo= ur >> DRAM size, to avoid fs caching effect, e.g.: >> >> $ dd if=3D/dev/zero of=3Dtest bs=3D1M count=3D10000 >> >> That should create a 10GB file. Make sure to use /dev/zero and not >> /dev/null! >> >> 3) Once the test file got created, check if it really consists of zero= s >> only: >> >> $ xxd test.bin | grep -v "0000 0000 0000 0000 0000 0000 0000 0000" >> >> Normally you should not get any output. However, I consistently get >> something like this: >> >> 4593a000:940d 0000 0000 0000 2d40 d6fc c803 0000 ........-@...... >> 4593a010:00f6 1f52 b301 0000 b620 dcd5 ff00 0000 ...R..... ...... >> a5df0000:e542 712c 77da c9f9 a429 4b85 ecc4 9395 .Bq,w....)K..... >> a5df0010:d9d6 971f 0d58 5c70 aba6 387d 805f 09e2 .....X\p..8}._.. >> ceecb000:f80d 0000 0000 0000 096e 1cdc e403 0000 .........n...... >> ceecb010:2460 7ef6 be01 0000 b620 dcd5 ff00 0000 $`~...... ...... >> 148432000580e 0000 0000 0000 5665 ed9d ff03 0000 X.......Ve...... >> 1484320107bcc a023 ca01 0000 b620 dcd5 ff00 0000 {..#..... ...... >> 1c548b000bc0e 0000 0000 0000 6942 387d 1b04 0000 ........iB8}.... >> 1c548b010872b 01c8 d501 0000 b620 dcd5 ff00 0000 .+....... ...... >> 225d450004448 27cd b966 b37e 1f0c e9e3 c2db b6ee DH'..f.~........ >> 225d45010d2b2 55b8 9ef1 e818 a7e3 364d 2322 dc75 ..U.......6M#".u >> 242056000140f 0000 0000 0000 0bb0 3704 3404 0000 ..........7.4... >> 2420560109601 b606 e001 0000 b620 dcd5 ff00 0000 ......... ...... >> >> The actual data vary between tests, however, the "dcd5 ff00 0000" >> pattern seems to be repeatable on a given system with a given hypervis= or >> binary (the above numbers are for Xen-4.0.0-rc5 built from Michael >> Young's SRPM). The errors always occur in chunks of 32-bytes. >> >> We have tested this in our lab on three different machines, with vario= us >> Dom0 kernels -- based on xen/master (AKA xen/stable-2.6.31) and >> xen/stable (AKA xen/stable-2.6.32) -- and with a few Xen 4 hypervisors= >> (rc2, rc4, rc5). Not every kernel allows for reproducing the error wit= h >> such a simple "dummy" VM as the one given above -- e.g. the 2.6.32-bas= ed >> kernels required some more regular VMs to be started for the problem t= o >> be noticeable. However, with the previously mentioned kernel (M. Young= >> Dec23), the problem has been 100% reproducible us. >> >> When downgraded to Xen 3.4.2 the problem went away. >> >> Of course this problem cannot be attributed to a buggy VM kernel, as t= he >> hypervisor should be resistant to any kind of "wrong" software (buggy = or >> malicious) that executes in a VM. >> =20 >=20 > Why "of course"? You report looks to me like a bug in dom0 which is > causing data corruption when there's another domain running. Please note that the "of course" sentence refers to *VM* kernel not Dom0.= > I don't see anything that specifically implicates Xen. The fact that > the symptoms change with a different Xen version could mean kernel > bug is effected by the Xen version (different memory layout, for > example, or different paths in the kernel caused by different feature > availability). >=20 Sure, it can theoretically be anything, perhaps even a generic bug in IA32 just accidentally triggered by some magic value in a register ;) As I said in the first sentence it seems (to me) "most likely" to be a bug in the hypervisor, but there is only one way to find out where it is for sure... (to nail it down (and I'm very sorry that I cannot help with the quest right now)) joanna. --------------enig43DE6E5682AE041812B8A9E3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkuVe5MACgkQORdkotfEW840fQCgqIToiEtVCwdpxnwQMfPdTimZ aW4AoOXwhgwKabxBnlzpv+3RxrICsjp5 =xenj -----END PGP SIGNATURE----- --------------enig43DE6E5682AE041812B8A9E3-- --===============0973368761== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============0973368761==--