public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Data corruption in guest using KVM
@ 2007-07-21 17:22 Aurelien Jarno
       [not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-21 17:22 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hi all,

For a long time I am seeing data corruption in guests when using KVM,
but I am convinced only since today that the problem comes from KVM.

The symptoms are a few bytes that are mangled to 0x00 in a file that has
been written. For now I have only seen 2 or 4 consecutive bytes mangled,
but that may due to statistics given the limited samples.

The problem appears very rarely. I am only seeing it when doing huge 
compilations (for example gcc or glibc), and not for every build. Note
that I am only detecting build failures, so I can miss some corruptions.

Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
(always multi-core).

I have never seen such corruptions using QEMU, so I would say the
problem does not comes from the disk emulation, though it may be due to
statistics. Note that I have made a lot of compilation in a MIPS QEMU
guest (a few hundred of hours), without any problem. This platform uses
the same IDE controller as the one in KVM.

Does anybody have seen the same kind of problem? Without a way to 
reproduce the corruption, I think it will be very difficult to debug 
the problem.

Regards,
Aurelien

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
@ 2007-07-21 17:46   ` Anthony Liguori
       [not found]     ` <46A24675.1010506-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  2007-07-22  7:52   ` Data corruption in guest using KVM Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Anthony Liguori @ 2007-07-21 17:46 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
> Hi all,
>
> For a long time I am seeing data corruption in guests when using KVM,
> but I am convinced only since today that the problem comes from KVM.
>
> The symptoms are a few bytes that are mangled to 0x00 in a file that has
> been written. For now I have only seen 2 or 4 consecutive bytes mangled,
> but that may due to statistics given the limited samples.
>
> The problem appears very rarely. I am only seeing it when doing huge 
> compilations (for example gcc or glibc), and not for every build. Note
> that I am only detecting build failures, so I can miss some corruptions.
>
> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
> (always multi-core).
>
> I have never seen such corruptions using QEMU, so I would say the
> problem does not comes from the disk emulation, though it may be due to
> statistics. Note that I have made a lot of compilation in a MIPS QEMU
> guest (a few hundred of hours), without any problem. This platform uses
> the same IDE controller as the one in KVM.
>
> Does anybody have seen the same kind of problem? Without a way to 
> reproduce the corruption, I think it will be very difficult to debug 
> the problem.
>   

What sort of disk are you using (qcow2?)

Regards,

Anthony Liguori

> Regards,
> Aurelien
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]     ` <46A24675.1010506-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-07-21 17:54       ` Aurelien Jarno
       [not found]         ` <20070721175404.GA3665-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-21 17:54 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sat, Jul 21, 2007 at 12:46:29PM -0500, Anthony Liguori wrote:
> Aurelien Jarno wrote:
> >Hi all,
> >
> >For a long time I am seeing data corruption in guests when using KVM,
> >but I am convinced only since today that the problem comes from KVM.
> >
> >The symptoms are a few bytes that are mangled to 0x00 in a file that has
> >been written. For now I have only seen 2 or 4 consecutive bytes mangled,
> >but that may due to statistics given the limited samples.
> >
> >The problem appears very rarely. I am only seeing it when doing huge 
> >compilations (for example gcc or glibc), and not for every build. Note
> >that I am only detecting build failures, so I can miss some corruptions.
> >
> >Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
> >plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
> >hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
> >(always multi-core).
> >
> >I have never seen such corruptions using QEMU, so I would say the
> >problem does not comes from the disk emulation, though it may be due to
> >statistics. Note that I have made a lot of compilation in a MIPS QEMU
> >guest (a few hundred of hours), without any problem. This platform uses
> >the same IDE controller as the one in KVM.
> >
> >Does anybody have seen the same kind of problem? Without a way to 
> >reproduce the corruption, I think it will be very difficult to debug 
> >the problem.
> >  
> 
> What sort of disk are you using (qcow2?)
> 

I am using raw files for the disk in all cases.

Note that I have just seen a three bytes corruption. Building the glibc
seems to be a good way to reproduce the bug, as a lot of source files
are generated on the fly during the build, and as GCC does not like
source files with 0x00.

I will try to do the same compilation using a NFS mount, to see if it
comes or not from the IDE controller emulation.

Regards,
Aurelien

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]         ` <20070721175404.GA3665-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
@ 2007-07-21 18:03           ` Anthony Liguori
       [not found]             ` <46A24A7E.6040104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Anthony Liguori @ 2007-07-21 18:03 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
> On Sat, Jul 21, 2007 at 12:46:29PM -0500, Anthony Liguori wrote:
>   
>> Aurelien Jarno wrote:
>>     
>>> Hi all,
>>>
>>> For a long time I am seeing data corruption in guests when using KVM,
>>> but I am convinced only since today that the problem comes from KVM.
>>>
>>> The symptoms are a few bytes that are mangled to 0x00 in a file that has
>>> been written. For now I have only seen 2 or 4 consecutive bytes mangled,
>>> but that may due to statistics given the limited samples.
>>>
>>> The problem appears very rarely. I am only seeing it when doing huge 
>>> compilations (for example gcc or glibc), and not for every build. Note
>>> that I am only detecting build failures, so I can miss some corruptions.
>>>
>>> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
>>> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
>>> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
>>> (always multi-core).
>>>
>>> I have never seen such corruptions using QEMU, so I would say the
>>> problem does not comes from the disk emulation, though it may be due to
>>> statistics. Note that I have made a lot of compilation in a MIPS QEMU
>>> guest (a few hundred of hours), without any problem. This platform uses
>>> the same IDE controller as the one in KVM.
>>>
>>> Does anybody have seen the same kind of problem? Without a way to 
>>> reproduce the corruption, I think it will be very difficult to debug 
>>> the problem.
>>>  
>>>       
>> What sort of disk are you using (qcow2?)
>>
>>     
>
> I am using raw files for the disk in all cases.
>
> Note that I have just seen a three bytes corruption. Building the glibc
> seems to be a good way to reproduce the bug, as a lot of source files
> are generated on the fly during the build, and as GCC does not like
> source files with 0x00.
>
> I will try to do the same compilation using a NFS mount, to see if it
> comes or not from the IDE controller emulation.
>   

Can you do the same build on the host without corruption?  Are you sure 
it's not a bad disk?

Regards,

Anthony Liguori

> Regards,
> Aurelien
>
>   


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]             ` <46A24A7E.6040104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
@ 2007-07-21 18:39               ` Aurelien Jarno
       [not found]                 ` <20070721183924.GA5108-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-21 18:39 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sat, Jul 21, 2007 at 01:03:42PM -0500, Anthony Liguori wrote:
> >I am using raw files for the disk in all cases.
> >
> >Note that I have just seen a three bytes corruption. Building the glibc
> >seems to be a good way to reproduce the bug, as a lot of source files
> >are generated on the fly during the build, and as GCC does not like
> >source files with 0x00.
> >
> >I will try to do the same compilation using a NFS mount, to see if it
> >comes or not from the IDE controller emulation.
> >  
> 
> Can you do the same build on the host without corruption?  Are you sure 
> it's not a bad disk?
> 

I have got no problem building glibc and gcc on the two hosts (Core 2 
Quad and Athlon 64 X2), and the disks are working correctly on both
machines.

Note that I am using multiple guests on the same system. Could it be the
problem? I will do a test with only one guest when my current tests are
finished.

Regards,
Aurelien

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Missing my posts to this lists
       [not found]                 ` <20070721183924.GA5108-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
@ 2007-07-21 21:00                   ` Simon Gao
       [not found]                     ` <46A273F4.2040001-g4dUTk+gKbW4mfPA/iJWtA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Simon Gao @ 2007-07-21 21:00 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Just curious, is kvm-devel  a moderated list? I sent two emails to the
list, but neither made to the list.  If it's moderated list, please at
least let me know why they are rejected. If not, then there is something
wrong with the list server receiving emails.

Simon

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  2007-07-21 17:46   ` Anthony Liguori
@ 2007-07-22  7:52   ` Avi Kivity
       [not found]     ` <46A30CA3.3090100-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2007-07-22  7:52 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
> Hi all,
>
> For a long time I am seeing data corruption in guests when using KVM,
> but I am convinced only since today that the problem comes from KVM.
>
> The symptoms are a few bytes that are mangled to 0x00 in a file that has
> been written. For now I have only seen 2 or 4 consecutive bytes mangled,
> but that may due to statistics given the limited samples.
>
> The problem appears very rarely. I am only seeing it when doing huge 
> compilations (for example gcc or glibc), and not for every build. Note
> that I am only detecting build failures, so I can miss some corruptions.
>
> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
> (always multi-core).
>
> I have never seen such corruptions using QEMU, so I would say the
> problem does not comes from the disk emulation, though it may be due to
> statistics. Note that I have made a lot of compilation in a MIPS QEMU
> guest (a few hundred of hours), without any problem. This platform uses
> the same IDE controller as the one in KVM.
>
> Does anybody have seen the same kind of problem? Without a way to 
> reproduce the corruption, I think it will be very difficult to debug 
> the problem.

Did you observe anything about the corruption?  For example, are the 
offsets at page boundary?  Can you provide a corrupted file and the 
same, non-corrupted file as a reference?

For the 32-bit case, were the guests pae, nonpae, or both?

How would I go about reproducing this?   Is a single ./configure; make 
clean; make in a loop compiling gcc sufficient?

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Missing my posts to this lists
       [not found]                     ` <46A273F4.2040001-g4dUTk+gKbW4mfPA/iJWtA@public.gmane.org>
@ 2007-07-22  7:53                       ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2007-07-22  7:53 UTC (permalink / raw)
  To: Simon Gao; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Simon Gao wrote:
> Just curious, is kvm-devel  a moderated list? I sent two emails to the
> list, but neither made to the list.  If it's moderated list, please at
> least let me know why they are rejected. If not, then there is something
> wrong with the list server receiving emails.
>
>   

The list is not moderated.  However, it defaults to not sending you 
messages that you originate.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]     ` <46A30CA3.3090100-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-07-22 13:38       ` Aurelien Jarno
       [not found]         ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-22 13:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

[-- Attachment #1: Type: text/plain, Size: 2936 bytes --]

On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote:
> Aurelien Jarno wrote:
> > Hi all,
> >
> > For a long time I am seeing data corruption in guests when using KVM,
> > but I am convinced only since today that the problem comes from KVM.
> >
> > The symptoms are a few bytes that are mangled to 0x00 in a file that has
> > been written. For now I have only seen 2 or 4 consecutive bytes mangled,
> > but that may due to statistics given the limited samples.
> >
> > The problem appears very rarely. I am only seeing it when doing huge 
> > compilations (for example gcc or glibc), and not for every build. Note
> > that I am only detecting build failures, so I can miss some corruptions.
> >
> > Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
> > plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
> > hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
> > (always multi-core).
> >
> > I have never seen such corruptions using QEMU, so I would say the
> > problem does not comes from the disk emulation, though it may be due to
> > statistics. Note that I have made a lot of compilation in a MIPS QEMU
> > guest (a few hundred of hours), without any problem. This platform uses
> > the same IDE controller as the one in KVM.
> >
> > Does anybody have seen the same kind of problem? Without a way to 
> > reproduce the corruption, I think it will be very difficult to debug 
> > the problem.
> 
> Did you observe anything about the corruption?  For example, are the 
> offsets at page boundary?  Can you provide a corrupted file and the 
> same, non-corrupted file as a reference?

For now I am still trying to find an easy way to reproduce it. You will
find below a sample of a bad and a good file. I have gzipped them to
make sure they will not be mangled once more by a MUA or a MTA.

What is strange with this sample is that the size of the file is not the
same. I will try to get more corrupted file.

I have been able to reproduce the bug with one or multiple guests
running, so it is not dependent on the number of guests running.


> For the 32-bit case, were the guests pae, nonpae, or both?

I am using nonpae guests (I only give 1GB of memory to the guests).


> How would I go about reproducing this?   Is a single ./configure; make 
> clean; make in a loop compiling gcc sufficient?

Yes basically that's what I am doing but on the glibc sources as I get 
more "success" to reproduce the bug. Note that you should run the 
configure in a different directory from the sources.

I generally observed the bug every 10 to 15 builds. One build takes
about 45 minutes here.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

[-- Attachment #2: sem_close.o.d.bad.gz --]
[-- Type: application/octet-stream, Size: 1494 bytes --]

[-- Attachment #3: sem_close.o.d.good.gz --]
[-- Type: application/octet-stream, Size: 1471 bytes --]

[-- Attachment #4: Type: text/plain, Size: 315 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

[-- Attachment #5: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]         ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
@ 2007-07-22 13:46           ` Avi Kivity
       [not found]             ` <46A35FAB.701-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-07-22 15:14           ` Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2007-07-22 13:46 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
> On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote:
>   
>> Aurelien Jarno wrote:
>>     
>>> Hi all,
>>>
>>> For a long time I am seeing data corruption in guests when using KVM,
>>> but I am convinced only since today that the problem comes from KVM.
>>>
>>> The symptoms are a few bytes that are mangled to 0x00 in a file that has
>>> been written. For now I have only seen 2 or 4 consecutive bytes mangled,
>>> but that may due to statistics given the limited samples.
>>>
>>> The problem appears very rarely. I am only seeing it when doing huge 
>>> compilations (for example gcc or glibc), and not for every build. Note
>>> that I am only detecting build failures, so I can miss some corruptions.
>>>
>>> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
>>> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit 
>>> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
>>> (always multi-core).
>>>
>>> I have never seen such corruptions using QEMU, so I would say the
>>> problem does not comes from the disk emulation, though it may be due to
>>> statistics. Note that I have made a lot of compilation in a MIPS QEMU
>>> guest (a few hundred of hours), without any problem. This platform uses
>>> the same IDE controller as the one in KVM.
>>>
>>> Does anybody have seen the same kind of problem? Without a way to 
>>> reproduce the corruption, I think it will be very difficult to debug 
>>> the problem.
>>>       
>> Did you observe anything about the corruption?  For example, are the 
>> offsets at page boundary?  Can you provide a corrupted file and the 
>> same, non-corrupted file as a reference?
>>     
>
> For now I am still trying to find an easy way to reproduce it. You will
> find below a sample of a bad and a good file. I have gzipped them to
> make sure they will not be mangled once more by a MUA or a MTA.
>
> What is strange with this sample is that the size of the file is not the
> same. I will try to get more corrupted file.
>   

I guess that this is because the corruption is in some userspace data 
structure, not pagecache, so there is not a 1:1 correspondence between 
the area corrupted and the output file.

If you do happen to get a same-size corruption, that may tell us more.

>
>> How would I go about reproducing this?   Is a single ./configure; make 
>> clean; make in a loop compiling gcc sufficient?
>>     
>
> Yes basically that's what I am doing but on the glibc sources as I get 
> more "success" to reproduce the bug. Note that you should run the 
> configure in a different directory from the sources.
>
> I generally observed the bug every 10 to 15 builds. One build takes
> about 45 minutes here.
>
>   

Okay, I am running a glibc build on a 384 MB x86-64 guest, in a loop.  
We'll see how it goes.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]         ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  2007-07-22 13:46           ` Avi Kivity
@ 2007-07-22 15:14           ` Avi Kivity
       [not found]             ` <46A37448.1010008-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2007-07-22 15:14 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
>   
>> How would I go about reproducing this?   Is a single ./configure; make 
>> clean; make in a loop compiling gcc sufficient?
>>     
>
> Yes basically that's what I am doing but on the glibc sources as I get 
> more "success" to reproduce the bug. Note that you should run the 
> configure in a different directory from the sources.
>   

btw, are you running a parallel make (-jN)?


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]             ` <46A37448.1010008-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-07-22 15:19               ` Aurelien Jarno
       [not found]                 ` <20070722151913.GA22621-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-22 15:19 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sun, Jul 22, 2007 at 06:14:16PM +0300, Avi Kivity wrote:
> Aurelien Jarno wrote:
> >  
> >>How would I go about reproducing this?   Is a single ./configure; make 
> >>clean; make in a loop compiling gcc sufficient?
> >>    
> >
> >Yes basically that's what I am doing but on the glibc sources as I get 
> >more "success" to reproduce the bug. Note that you should run the 
> >configure in a different directory from the sources.
> >  
> 
> btw, are you running a parallel make (-jN)?

I wasn't using -j until a few hours. I am now trying -j2 with SMP
guests. I already get what I think is a corruption, a parse error from 
ld on a file generated by gcc. Unfortunately I don't have the corrupted
file, as it as been removed by gcc.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]                 ` <20070722151913.GA22621-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
@ 2007-07-22 15:24                   ` Avi Kivity
  0 siblings, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2007-07-22 15:24 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
>>
>> btw, are you running a parallel make (-jN)?
>>     
>
> I wasn't using -j until a few hours. I am now trying -j2 with SMP
> guests. I already get what I think is a corruption, a parse error from 
> ld on a file generated by gcc. Unfortunately I don't have the corrupted
> file, as it as been removed by gcc.
>
>   

Well, that may be a different bug.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]             ` <46A35FAB.701-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-07-22 16:44               ` Aurelien Jarno
       [not found]                 ` <20070722164454.GA26166-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-22 16:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

[-- Attachment #1: Type: text/plain, Size: 630 bytes --]

On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote:
> If you do happen to get a same-size corruption, that may tell us more.
> 

I have just got one same-size corruption building glibc 2.6 on 
GNU/kFreeBSD i386 (32-bit nonpae).

One byte at address 0x9000 has been replaced by 0x00. Please find the
good and the bad file attached.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

[-- Attachment #2: sysd-rules.bad.gz --]
[-- Type: application/octet-stream, Size: 8307 bytes --]

[-- Attachment #3: sysd-rules.good.gz --]
[-- Type: application/octet-stream, Size: 8324 bytes --]

[-- Attachment #4: Type: text/plain, Size: 315 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

[-- Attachment #5: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]                 ` <20070722164454.GA26166-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
@ 2007-07-22 17:34                   ` Avi Kivity
       [not found]                     ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2007-07-22 17:34 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
> On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote:
>   
>> If you do happen to get a same-size corruption, that may tell us more.
>>
>>     
>
> I have just got one same-size corruption building glibc 2.6 on 
> GNU/kFreeBSD i386 (32-bit nonpae).
>
> One byte at address 0x9000 has been replaced by 0x00. Please find the
> good and the bad file attached.
>
>   

Good.  We have one or two cross-page-boundary bugs.  The
corruption-chase branch already fixes one (which is much more likely to
be triggered by FreeBSD than Linux, if I understand the FreeBSD VM
correctly).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]                     ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-07-22 18:14                       ` Aurelien Jarno
  2007-07-22 23:34                       ` Aurelien Jarno
  1 sibling, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-22 18:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sun, Jul 22, 2007 at 08:34:37PM +0300, Avi Kivity wrote:
> Aurelien Jarno wrote:
> > On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote:
> >   
> >> If you do happen to get a same-size corruption, that may tell us more.
> >>
> >>     
> >
> > I have just got one same-size corruption building glibc 2.6 on 
> > GNU/kFreeBSD i386 (32-bit nonpae).
> >
> > One byte at address 0x9000 has been replaced by 0x00. Please find the
> > good and the bad file attached.
> >
> >   
> 
> Good.  We have one or two cross-page-boundary bugs.  The
> corruption-chase branch already fixes one (which is much more likely to
> be triggered by FreeBSD than Linux, if I understand the FreeBSD VM
> correctly).

I have got since then 3 more failures, all at page boundaries, mangling
1 to 3 bytes.

I am currently trying the patch from the corruption-chase branch.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]                     ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  2007-07-22 18:14                       ` Aurelien Jarno
@ 2007-07-22 23:34                       ` Aurelien Jarno
       [not found]                         ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-22 23:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sun, Jul 22, 2007 at 08:34:37PM +0300, Avi Kivity wrote:
> Aurelien Jarno wrote:
> > On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote:
> >   
> >> If you do happen to get a same-size corruption, that may tell us more.
> >>
> >>     
> >
> > I have just got one same-size corruption building glibc 2.6 on 
> > GNU/kFreeBSD i386 (32-bit nonpae).
> >
> > One byte at address 0x9000 has been replaced by 0x00. Please find the
> > good and the bad file attached.
> >
> >   
> 
> Good.  We have one or two cross-page-boundary bugs.  The
> corruption-chase branch already fixes one (which is much more likely to
> be triggered by FreeBSD than Linux, if I understand the FreeBSD VM
> correctly).
> 

I have tried this branch, and the data get corrupted another way. This
is due to the fact that the source address is not incremented for the
second write. The patch below fixes that.

With this patch, I haven't be able to make any corruption. I have added
a printk in the code to see that I have been able to trigger 21 
cross-boundary writes without any problem in the various guests.

So I think this bug is now fixed. Thanks for your help.

Aurelien


Signed-off-by: Aurelien Jarno <aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org>

diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 5b317c1..e7c9ca7 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -1141,6 +1141,7 @@ static int emulator_write_emulated(unsigned long addr,
 		if (rc != X86EMUL_CONTINUE)
 			return rc;
 		addr += now;
+		val += now;
 		bytes -= now;
 	}
 	return emulator_write_emulated_onepage(addr, val, bytes, ctxt);


-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]                         ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
@ 2007-07-23  6:08                           ` Aurelien Jarno
  2007-07-23  8:04                           ` Avi Kivity
  1 sibling, 0 replies; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-23  6:08 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Mon, Jul 23, 2007 at 01:34:29AM +0200, Aurelien Jarno wrote:
> On Sun, Jul 22, 2007 at 08:34:37PM +0300, Avi Kivity wrote:
> > Aurelien Jarno wrote:
> > > On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote:
> > >   
> > >> If you do happen to get a same-size corruption, that may tell us more.
> > >>
> > >>     
> > >
> > > I have just got one same-size corruption building glibc 2.6 on 
> > > GNU/kFreeBSD i386 (32-bit nonpae).
> > >
> > > One byte at address 0x9000 has been replaced by 0x00. Please find the
> > > good and the bad file attached.
> > >
> > >   
> > 
> > Good.  We have one or two cross-page-boundary bugs.  The
> > corruption-chase branch already fixes one (which is much more likely to
> > be triggered by FreeBSD than Linux, if I understand the FreeBSD VM
> > correctly).
> > 
> 
> I have tried this branch, and the data get corrupted another way. This
> is due to the fact that the source address is not incremented for the
> second write. The patch below fixes that.
> 
> With this patch, I haven't be able to make any corruption. I have added
> a printk in the code to see that I have been able to trigger 21 
> cross-boundary writes without any problem in the various guests.
> 

After a night of tests, 172 cross-boundary writes have been triggered,
still nothing wrong in the guests.

-- 
  .''`.  Aurelien Jarno	            | GPG: 1024D/F1BCDB73
 : :' :  Debian developer           | Electrical Engineer
 `. `'   aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org         | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
   `-    people.debian.org/~aurel32 | www.aurel32.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Data corruption in guest using KVM
       [not found]                         ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
  2007-07-23  6:08                           ` Aurelien Jarno
@ 2007-07-23  8:04                           ` Avi Kivity
  1 sibling, 0 replies; 19+ messages in thread
From: Avi Kivity @ 2007-07-23  8:04 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Aurelien Jarno wrote:
> I have tried this branch, and the data get corrupted another way. This
> is due to the fact that the source address is not incremented for the
> second write. The patch below fixes that.
>
> With this patch, I haven't be able to make any corruption. I have added
> a printk in the code to see that I have been able to trigger 21 
> cross-boundary writes without any problem in the various guests.
>
> So I think this bug is now fixed. Thanks for your help.
>   

Thanks for the fix, and for such a good report and testing.


-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-07-23  8:04 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-21 17:22 Data corruption in guest using KVM Aurelien Jarno
     [not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-21 17:46   ` Anthony Liguori
     [not found]     ` <46A24675.1010506-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-07-21 17:54       ` Aurelien Jarno
     [not found]         ` <20070721175404.GA3665-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-21 18:03           ` Anthony Liguori
     [not found]             ` <46A24A7E.6040104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-07-21 18:39               ` Aurelien Jarno
     [not found]                 ` <20070721183924.GA5108-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-21 21:00                   ` Missing my posts to this lists Simon Gao
     [not found]                     ` <46A273F4.2040001-g4dUTk+gKbW4mfPA/iJWtA@public.gmane.org>
2007-07-22  7:53                       ` Avi Kivity
2007-07-22  7:52   ` Data corruption in guest using KVM Avi Kivity
     [not found]     ` <46A30CA3.3090100-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 13:38       ` Aurelien Jarno
     [not found]         ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-22 13:46           ` Avi Kivity
     [not found]             ` <46A35FAB.701-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 16:44               ` Aurelien Jarno
     [not found]                 ` <20070722164454.GA26166-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-22 17:34                   ` Avi Kivity
     [not found]                     ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 18:14                       ` Aurelien Jarno
2007-07-22 23:34                       ` Aurelien Jarno
     [not found]                         ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-23  6:08                           ` Aurelien Jarno
2007-07-23  8:04                           ` Avi Kivity
2007-07-22 15:14           ` Avi Kivity
     [not found]             ` <46A37448.1010008-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 15:19               ` Aurelien Jarno
     [not found]                 ` <20070722151913.GA22621-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-22 15:24                   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox