xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Joanna Rutkowska <joanna@invisiblethingslab.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: xen-devel@lists.xensource.com
Subject: Re: Xen 4.0.0x allows for data corruption in Dom0
Date: Mon, 08 Mar 2010 23:34:59 +0100	[thread overview]
Message-ID: <4B957B93.4060401@invisiblethingslab.com> (raw)
In-Reply-To: <4B957914.4050408@goop.org>


[-- Attachment #1.1: Type: text/plain, Size: 4787 bytes --]

On 03/08/2010 11:24 PM, Jeremy Fitzhardinge wrote:
> On 03/06/2010 02:12 AM, Joanna Rutkowska wrote:
>> There is a nasty data corruption problem most likely allowed by a bug in
>> the Xen 4.0.0-x hypervisors.
>>
>> The problem occurs with a frequency of "a few chunks per 10 GB of data
>> copied", and only when running a VM (PV domU) with a specific kernel.
>> The problem, however, affects not only the VM but also the Dom0, which
>> is of significant importance.
>>
>> How to reproduce:
>>
>> 1) Start at least one Xen PV VM with a pvops0 kernel. One kernel known
>> to demonstrate the problem is the one built by Michael Young, based on
>> xen/master git from Dec 23. It has recently been replaced by a newer
>> kernel, which doesn't always show the problem, but I uploaded the
>> previous one at the URL below, so people can use it for testing:
>>
>> http://invisiblethingslab.com/pub/kernel-2.6.31.9-1.2.82.xendom0.fc12.x86_64.rpm
>>
>>
>> Now you can start a dummy VM with this kernel, e.g.:
>>
>> # xm create -c /dev/null memory=400 kernel=<path/to/kernel>
>> extra="rootdelay=1000"
>>
>> 2) Now, in Dom0, after having started this dummy VM, create a big test
>> file, filled all with zeros. Make sure to choose a size bigger than your
>> DRAM size, to avoid fs caching effect, e.g.:
>>
>> $ dd if=/dev/zero of=test bs=1M count=10000
>>
>> That should create a 10GB file. Make sure to use /dev/zero and not
>> /dev/null!
>>
>> 3) Once the test file got created, check if it really consists of zeros
>> only:
>>
>> $ xxd test.bin | grep -v "0000 0000 0000 0000 0000 0000 0000 0000"
>>
>> Normally you should not get any output. However, I consistently get
>> something like this:
>>
>> 4593a000:940d 0000 0000 0000 2d40 d6fc c803 0000  ........-@......
>> 4593a010:00f6 1f52 b301 0000 b620 dcd5 ff00 0000  ...R..... ......
>> a5df0000:e542 712c 77da c9f9 a429 4b85 ecc4 9395  .Bq,w....)K.....
>> a5df0010:d9d6 971f 0d58 5c70 aba6 387d 805f 09e2  .....X\p..8}._..
>> ceecb000:f80d 0000 0000 0000 096e 1cdc e403 0000  .........n......
>> ceecb010:2460 7ef6 be01 0000 b620 dcd5 ff00 0000  $`~...... ......
>> 148432000580e 0000 0000 0000 5665 ed9d ff03 0000  X.......Ve......
>> 1484320107bcc a023 ca01 0000 b620 dcd5 ff00 0000  {..#..... ......
>> 1c548b000bc0e 0000 0000 0000 6942 387d 1b04 0000  ........iB8}....
>> 1c548b010872b 01c8 d501 0000 b620 dcd5 ff00 0000  .+....... ......
>> 225d450004448 27cd b966 b37e 1f0c e9e3 c2db b6ee  DH'..f.~........
>> 225d45010d2b2 55b8 9ef1 e818 a7e3 364d 2322 dc75  ..U.......6M#".u
>> 242056000140f 0000 0000 0000 0bb0 3704 3404 0000  ..........7.4...
>> 2420560109601 b606 e001 0000 b620 dcd5 ff00 0000  ......... ......
>>
>> The actual data vary between tests, however, the "dcd5 ff00 0000"
>> pattern seems to be repeatable on a given system with a given hypervisor
>> binary (the above numbers are for Xen-4.0.0-rc5 built from Michael
>> Young's SRPM). The errors always occur in chunks of 32-bytes.
>>
>> We have tested this in our lab on three different machines, with various
>> Dom0 kernels -- based on xen/master (AKA xen/stable-2.6.31) and
>> xen/stable (AKA xen/stable-2.6.32) -- and with a few Xen 4 hypervisors
>> (rc2, rc4, rc5). Not every kernel allows for reproducing the error with
>> such a simple "dummy" VM as the one given above -- e.g. the 2.6.32-based
>> kernels required some more regular VMs to be started for the problem to
>> be noticeable. However, with the previously mentioned kernel (M. Young
>> Dec23), the problem has been 100% reproducible us.
>>
>> When downgraded to Xen 3.4.2 the problem went away.
>>
>> Of course this problem cannot be attributed to a buggy VM kernel, as the
>> hypervisor should be resistant to any kind of "wrong" software (buggy or
>> malicious) that executes in a VM.
>>    
> 
> Why "of course"?  You report looks to me like a bug in dom0 which is
> causing data corruption when there's another domain running.

Please note that the "of course" sentence refers to *VM* kernel not Dom0.

> I don't see anything that specifically implicates Xen.  The fact that
> the symptoms change with a different Xen version could mean kernel
> bug is effected by the Xen version (different memory layout, for
> example, or different paths in the kernel caused by different feature
> availability).
> 

Sure, it can theoretically be anything, perhaps even a generic bug in
IA32 just accidentally triggered by some magic value in a register ;) As
I said in the first sentence it seems (to me) "most likely" to be a bug
in the hypervisor, but there is only one way to find out where it is for
sure... (to nail it down (and I'm very sorry that I cannot help with the
quest right now))

joanna.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2010-03-08 22:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4B922A89.2060105@invisiblethingslab.com>
2010-03-08 22:24 ` Xen 4.0.0x allows for data corruption in Dom0 Jeremy Fitzhardinge
2010-03-08 22:34   ` Joanna Rutkowska [this message]
2010-03-08 23:12     ` Jeremy Fitzhardinge
2010-03-08 23:23       ` Joanna Rutkowska
2010-03-08 23:41         ` Jeremy Fitzhardinge
2010-03-08 23:48           ` Joanna Rutkowska
2010-03-09  0:18         ` James Harper
2010-03-09  0:20           ` Joanna Rutkowska
2010-03-08 23:32 ` Daniel Stodden
     [not found]   ` <4B958A42.4000407@invisiblethingslab.com>
2010-03-08 23:46     ` Daniel Stodden
     [not found] <C7B80F95.C5F3%keir.fraser@eu.citrix.com>
2010-03-06 13:37 ` Joanna Rutkowska
2010-03-06 17:18   ` Keir Fraser
     [not found] <C7B7F4C4.C5D8%keir.fraser@eu.citrix.com>
2010-03-06 13:36 ` Keir Fraser
2010-03-07 14:36   ` Pasi Kärkkäinen
2010-03-07 14:39     ` Keir Fraser
2010-03-07 16:12       ` Pasi Kärkkäinen
2010-03-08 23:22         ` Daniel Stodden
2010-03-08 23:30           ` Joanna Rutkowska
2010-03-08 23:52             ` Daniel Stodden
2010-03-08 23:56               ` Joanna Rutkowska
2010-03-09  0:33                 ` Daniel Stodden
2010-03-09  8:25           ` Pasi Kärkkäinen
2010-03-09  9:37             ` Jan Beulich
2010-03-09 10:15               ` Jan Beulich
2010-03-09 10:17                 ` Keir Fraser
2010-03-09 10:15               ` Keir Fraser
2010-03-09 10:25                 ` Pasi Kärkkäinen
2010-03-09 10:43                   ` Keir Fraser
2010-03-09 12:03                     ` Pasi Kärkkäinen
2010-03-09 10:42                 ` Jan Beulich
2010-03-09 23:28                 ` Jeremy Fitzhardinge
2010-03-10  1:33                   ` Dan Magenheimer
2010-03-10 18:02                     ` Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B957B93.4060401@invisiblethingslab.com \
    --to=joanna@invisiblethingslab.com \
    --cc=jeremy@goop.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).