* Data corruption in guest using KVM
@ 2007-07-21 17:22 Aurelien Jarno
[not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Aurelien Jarno @ 2007-07-21 17:22 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Hi all,
For a long time I am seeing data corruption in guests when using KVM,
but I am convinced only since today that the problem comes from KVM.
The symptoms are a few bytes that are mangled to 0x00 in a file that has
been written. For now I have only seen 2 or 4 consecutive bytes mangled,
but that may due to statistics given the limited samples.
The problem appears very rarely. I am only seeing it when doing huge
compilations (for example gcc or glibc), and not for every build. Note
that I am only detecting build failures, so I can miss some corruptions.
Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and
plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit
hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU
(always multi-core).
I have never seen such corruptions using QEMU, so I would say the
problem does not comes from the disk emulation, though it may be due to
statistics. Note that I have made a lot of compilation in a MIPS QEMU
guest (a few hundred of hours), without any problem. This platform uses
the same IDE controller as the one in KVM.
Does anybody have seen the same kind of problem? Without a way to
reproduce the corruption, I think it will be very difficult to debug
the problem.
Regards,
Aurelien
--
.''`. Aurelien Jarno | GPG: 1024D/F1BCDB73
: :' : Debian developer | Electrical Engineer
`. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org
`- people.debian.org/~aurel32 | www.aurel32.net
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 19+ messages in thread[parent not found: <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> @ 2007-07-21 17:46 ` Anthony Liguori [not found] ` <46A24675.1010506-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org> 2007-07-22 7:52 ` Data corruption in guest using KVM Avi Kivity 1 sibling, 1 reply; 19+ messages in thread From: Anthony Liguori @ 2007-07-21 17:46 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: > Hi all, > > For a long time I am seeing data corruption in guests when using KVM, > but I am convinced only since today that the problem comes from KVM. > > The symptoms are a few bytes that are mangled to 0x00 in a file that has > been written. For now I have only seen 2 or 4 consecutive bytes mangled, > but that may due to statistics given the limited samples. > > The problem appears very rarely. I am only seeing it when doing huge > compilations (for example gcc or glibc), and not for every build. Note > that I am only detecting build failures, so I can miss some corruptions. > > Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and > plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit > hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU > (always multi-core). > > I have never seen such corruptions using QEMU, so I would say the > problem does not comes from the disk emulation, though it may be due to > statistics. Note that I have made a lot of compilation in a MIPS QEMU > guest (a few hundred of hours), without any problem. This platform uses > the same IDE controller as the one in KVM. > > Does anybody have seen the same kind of problem? Without a way to > reproduce the corruption, I think it will be very difficult to debug > the problem. > What sort of disk are you using (qcow2?) Regards, Anthony Liguori > Regards, > Aurelien > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <46A24675.1010506-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <46A24675.1010506-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org> @ 2007-07-21 17:54 ` Aurelien Jarno [not found] ` <20070721175404.GA3665-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Aurelien Jarno @ 2007-07-21 17:54 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sat, Jul 21, 2007 at 12:46:29PM -0500, Anthony Liguori wrote: > Aurelien Jarno wrote: > >Hi all, > > > >For a long time I am seeing data corruption in guests when using KVM, > >but I am convinced only since today that the problem comes from KVM. > > > >The symptoms are a few bytes that are mangled to 0x00 in a file that has > >been written. For now I have only seen 2 or 4 consecutive bytes mangled, > >but that may due to statistics given the limited samples. > > > >The problem appears very rarely. I am only seeing it when doing huge > >compilations (for example gcc or glibc), and not for every build. Note > >that I am only detecting build failures, so I can miss some corruptions. > > > >Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and > >plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit > >hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU > >(always multi-core). > > > >I have never seen such corruptions using QEMU, so I would say the > >problem does not comes from the disk emulation, though it may be due to > >statistics. Note that I have made a lot of compilation in a MIPS QEMU > >guest (a few hundred of hours), without any problem. This platform uses > >the same IDE controller as the one in KVM. > > > >Does anybody have seen the same kind of problem? Without a way to > >reproduce the corruption, I think it will be very difficult to debug > >the problem. > > > > What sort of disk are you using (qcow2?) > I am using raw files for the disk in all cases. Note that I have just seen a three bytes corruption. Building the glibc seems to be a good way to reproduce the bug, as a lot of source files are generated on the fly during the build, and as GCC does not like source files with 0x00. I will try to do the same compilation using a NFS mount, to see if it comes or not from the IDE controller emulation. Regards, Aurelien -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20070721175404.GA3665-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <20070721175404.GA3665-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> @ 2007-07-21 18:03 ` Anthony Liguori [not found] ` <46A24A7E.6040104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Anthony Liguori @ 2007-07-21 18:03 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: > On Sat, Jul 21, 2007 at 12:46:29PM -0500, Anthony Liguori wrote: > >> Aurelien Jarno wrote: >> >>> Hi all, >>> >>> For a long time I am seeing data corruption in guests when using KVM, >>> but I am convinced only since today that the problem comes from KVM. >>> >>> The symptoms are a few bytes that are mangled to 0x00 in a file that has >>> been written. For now I have only seen 2 or 4 consecutive bytes mangled, >>> but that may due to statistics given the limited samples. >>> >>> The problem appears very rarely. I am only seeing it when doing huge >>> compilations (for example gcc or glibc), and not for every build. Note >>> that I am only detecting build failures, so I can miss some corruptions. >>> >>> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and >>> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit >>> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU >>> (always multi-core). >>> >>> I have never seen such corruptions using QEMU, so I would say the >>> problem does not comes from the disk emulation, though it may be due to >>> statistics. Note that I have made a lot of compilation in a MIPS QEMU >>> guest (a few hundred of hours), without any problem. This platform uses >>> the same IDE controller as the one in KVM. >>> >>> Does anybody have seen the same kind of problem? Without a way to >>> reproduce the corruption, I think it will be very difficult to debug >>> the problem. >>> >>> >> What sort of disk are you using (qcow2?) >> >> > > I am using raw files for the disk in all cases. > > Note that I have just seen a three bytes corruption. Building the glibc > seems to be a good way to reproduce the bug, as a lot of source files > are generated on the fly during the build, and as GCC does not like > source files with 0x00. > > I will try to do the same compilation using a NFS mount, to see if it > comes or not from the IDE controller emulation. > Can you do the same build on the host without corruption? Are you sure it's not a bad disk? Regards, Anthony Liguori > Regards, > Aurelien > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <46A24A7E.6040104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <46A24A7E.6040104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org> @ 2007-07-21 18:39 ` Aurelien Jarno [not found] ` <20070721183924.GA5108-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Aurelien Jarno @ 2007-07-21 18:39 UTC (permalink / raw) To: Anthony Liguori; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sat, Jul 21, 2007 at 01:03:42PM -0500, Anthony Liguori wrote: > >I am using raw files for the disk in all cases. > > > >Note that I have just seen a three bytes corruption. Building the glibc > >seems to be a good way to reproduce the bug, as a lot of source files > >are generated on the fly during the build, and as GCC does not like > >source files with 0x00. > > > >I will try to do the same compilation using a NFS mount, to see if it > >comes or not from the IDE controller emulation. > > > > Can you do the same build on the host without corruption? Are you sure > it's not a bad disk? > I have got no problem building glibc and gcc on the two hosts (Core 2 Quad and Athlon 64 X2), and the disks are working correctly on both machines. Note that I am using multiple guests on the same system. Could it be the problem? I will do a test with only one guest when my current tests are finished. Regards, Aurelien -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20070721183924.GA5108-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>]
* Missing my posts to this lists [not found] ` <20070721183924.GA5108-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> @ 2007-07-21 21:00 ` Simon Gao [not found] ` <46A273F4.2040001-g4dUTk+gKbW4mfPA/iJWtA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Simon Gao @ 2007-07-21 21:00 UTC (permalink / raw) To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Just curious, is kvm-devel a moderated list? I sent two emails to the list, but neither made to the list. If it's moderated list, please at least let me know why they are rejected. If not, then there is something wrong with the list server receiving emails. Simon ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <46A273F4.2040001-g4dUTk+gKbW4mfPA/iJWtA@public.gmane.org>]
* Re: Missing my posts to this lists [not found] ` <46A273F4.2040001-g4dUTk+gKbW4mfPA/iJWtA@public.gmane.org> @ 2007-07-22 7:53 ` Avi Kivity 0 siblings, 0 replies; 19+ messages in thread From: Avi Kivity @ 2007-07-22 7:53 UTC (permalink / raw) To: Simon Gao; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Simon Gao wrote: > Just curious, is kvm-devel a moderated list? I sent two emails to the > list, but neither made to the list. If it's moderated list, please at > least let me know why they are rejected. If not, then there is something > wrong with the list server receiving emails. > > The list is not moderated. However, it defaults to not sending you messages that you originate. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Data corruption in guest using KVM [not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 2007-07-21 17:46 ` Anthony Liguori @ 2007-07-22 7:52 ` Avi Kivity [not found] ` <46A30CA3.3090100-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 1 sibling, 1 reply; 19+ messages in thread From: Avi Kivity @ 2007-07-22 7:52 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: > Hi all, > > For a long time I am seeing data corruption in guests when using KVM, > but I am convinced only since today that the problem comes from KVM. > > The symptoms are a few bytes that are mangled to 0x00 in a file that has > been written. For now I have only seen 2 or 4 consecutive bytes mangled, > but that may due to statistics given the limited samples. > > The problem appears very rarely. I am only seeing it when doing huge > compilations (for example gcc or glibc), and not for every build. Note > that I am only detecting build failures, so I can miss some corruptions. > > Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and > plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit > hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU > (always multi-core). > > I have never seen such corruptions using QEMU, so I would say the > problem does not comes from the disk emulation, though it may be due to > statistics. Note that I have made a lot of compilation in a MIPS QEMU > guest (a few hundred of hours), without any problem. This platform uses > the same IDE controller as the one in KVM. > > Does anybody have seen the same kind of problem? Without a way to > reproduce the corruption, I think it will be very difficult to debug > the problem. Did you observe anything about the corruption? For example, are the offsets at page boundary? Can you provide a corrupted file and the same, non-corrupted file as a reference? For the 32-bit case, were the guests pae, nonpae, or both? How would I go about reproducing this? Is a single ./configure; make clean; make in a loop compiling gcc sufficient? -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <46A30CA3.3090100-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <46A30CA3.3090100-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-22 13:38 ` Aurelien Jarno [not found] ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Aurelien Jarno @ 2007-07-22 13:38 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f [-- Attachment #1: Type: text/plain, Size: 2936 bytes --] On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote: > Aurelien Jarno wrote: > > Hi all, > > > > For a long time I am seeing data corruption in guests when using KVM, > > but I am convinced only since today that the problem comes from KVM. > > > > The symptoms are a few bytes that are mangled to 0x00 in a file that has > > been written. For now I have only seen 2 or 4 consecutive bytes mangled, > > but that may due to statistics given the limited samples. > > > > The problem appears very rarely. I am only seeing it when doing huge > > compilations (for example gcc or glibc), and not for every build. Note > > that I am only detecting build failures, so I can miss some corruptions. > > > > Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and > > plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit > > hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU > > (always multi-core). > > > > I have never seen such corruptions using QEMU, so I would say the > > problem does not comes from the disk emulation, though it may be due to > > statistics. Note that I have made a lot of compilation in a MIPS QEMU > > guest (a few hundred of hours), without any problem. This platform uses > > the same IDE controller as the one in KVM. > > > > Does anybody have seen the same kind of problem? Without a way to > > reproduce the corruption, I think it will be very difficult to debug > > the problem. > > Did you observe anything about the corruption? For example, are the > offsets at page boundary? Can you provide a corrupted file and the > same, non-corrupted file as a reference? For now I am still trying to find an easy way to reproduce it. You will find below a sample of a bad and a good file. I have gzipped them to make sure they will not be mangled once more by a MUA or a MTA. What is strange with this sample is that the size of the file is not the same. I will try to get more corrupted file. I have been able to reproduce the bug with one or multiple guests running, so it is not dependent on the number of guests running. > For the 32-bit case, were the guests pae, nonpae, or both? I am using nonpae guests (I only give 1GB of memory to the guests). > How would I go about reproducing this? Is a single ./configure; make > clean; make in a loop compiling gcc sufficient? Yes basically that's what I am doing but on the glibc sources as I get more "success" to reproduce the bug. Note that you should run the configure in a different directory from the sources. I generally observed the bug every 10 to 15 builds. One build takes about 45 minutes here. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net [-- Attachment #2: sem_close.o.d.bad.gz --] [-- Type: application/octet-stream, Size: 1494 bytes --] [-- Attachment #3: sem_close.o.d.good.gz --] [-- Type: application/octet-stream, Size: 1471 bytes --] [-- Attachment #4: Type: text/plain, Size: 315 bytes --] ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ [-- Attachment #5: Type: text/plain, Size: 186 bytes --] _______________________________________________ kvm-devel mailing list kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> @ 2007-07-22 13:46 ` Avi Kivity [not found] ` <46A35FAB.701-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-07-22 15:14 ` Avi Kivity 1 sibling, 1 reply; 19+ messages in thread From: Avi Kivity @ 2007-07-22 13:46 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: > On Sun, Jul 22, 2007 at 10:52:03AM +0300, Avi Kivity wrote: > >> Aurelien Jarno wrote: >> >>> Hi all, >>> >>> For a long time I am seeing data corruption in guests when using KVM, >>> but I am convinced only since today that the problem comes from KVM. >>> >>> The symptoms are a few bytes that are mangled to 0x00 in a file that has >>> been written. For now I have only seen 2 or 4 consecutive bytes mangled, >>> but that may due to statistics given the limited samples. >>> >>> The problem appears very rarely. I am only seeing it when doing huge >>> compilations (for example gcc or glibc), and not for every build. Note >>> that I am only detecting build failures, so I can miss some corruptions. >>> >>> Note that I have observed the problem on GNU/Linux, GNU/kFreeBSD and >>> plain FreeBSD, for both 32 and 64-bit guests. I always used 64-bit >>> hosts, and I have seen the problem on both Core 2 and Athlon 64 CPU >>> (always multi-core). >>> >>> I have never seen such corruptions using QEMU, so I would say the >>> problem does not comes from the disk emulation, though it may be due to >>> statistics. Note that I have made a lot of compilation in a MIPS QEMU >>> guest (a few hundred of hours), without any problem. This platform uses >>> the same IDE controller as the one in KVM. >>> >>> Does anybody have seen the same kind of problem? Without a way to >>> reproduce the corruption, I think it will be very difficult to debug >>> the problem. >>> >> Did you observe anything about the corruption? For example, are the >> offsets at page boundary? Can you provide a corrupted file and the >> same, non-corrupted file as a reference? >> > > For now I am still trying to find an easy way to reproduce it. You will > find below a sample of a bad and a good file. I have gzipped them to > make sure they will not be mangled once more by a MUA or a MTA. > > What is strange with this sample is that the size of the file is not the > same. I will try to get more corrupted file. > I guess that this is because the corruption is in some userspace data structure, not pagecache, so there is not a 1:1 correspondence between the area corrupted and the output file. If you do happen to get a same-size corruption, that may tell us more. > >> How would I go about reproducing this? Is a single ./configure; make >> clean; make in a loop compiling gcc sufficient? >> > > Yes basically that's what I am doing but on the glibc sources as I get > more "success" to reproduce the bug. Note that you should run the > configure in a different directory from the sources. > > I generally observed the bug every 10 to 15 builds. One build takes > about 45 minutes here. > > Okay, I am running a glibc build on a 384 MB x86-64 guest, in a loop. We'll see how it goes. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <46A35FAB.701-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <46A35FAB.701-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-22 16:44 ` Aurelien Jarno [not found] ` <20070722164454.GA26166-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Aurelien Jarno @ 2007-07-22 16:44 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f [-- Attachment #1: Type: text/plain, Size: 630 bytes --] On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote: > If you do happen to get a same-size corruption, that may tell us more. > I have just got one same-size corruption building glibc 2.6 on GNU/kFreeBSD i386 (32-bit nonpae). One byte at address 0x9000 has been replaced by 0x00. Please find the good and the bad file attached. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net [-- Attachment #2: sysd-rules.bad.gz --] [-- Type: application/octet-stream, Size: 8307 bytes --] [-- Attachment #3: sysd-rules.good.gz --] [-- Type: application/octet-stream, Size: 8324 bytes --] [-- Attachment #4: Type: text/plain, Size: 315 bytes --] ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ [-- Attachment #5: Type: text/plain, Size: 186 bytes --] _______________________________________________ kvm-devel mailing list kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20070722164454.GA26166-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <20070722164454.GA26166-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> @ 2007-07-22 17:34 ` Avi Kivity [not found] ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Avi Kivity @ 2007-07-22 17:34 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: > On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote: > >> If you do happen to get a same-size corruption, that may tell us more. >> >> > > I have just got one same-size corruption building glibc 2.6 on > GNU/kFreeBSD i386 (32-bit nonpae). > > One byte at address 0x9000 has been replaced by 0x00. Please find the > good and the bad file attached. > > Good. We have one or two cross-page-boundary bugs. The corruption-chase branch already fixes one (which is much more likely to be triggered by FreeBSD than Linux, if I understand the FreeBSD VM correctly). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-22 18:14 ` Aurelien Jarno 2007-07-22 23:34 ` Aurelien Jarno 1 sibling, 0 replies; 19+ messages in thread From: Aurelien Jarno @ 2007-07-22 18:14 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sun, Jul 22, 2007 at 08:34:37PM +0300, Avi Kivity wrote: > Aurelien Jarno wrote: > > On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote: > > > >> If you do happen to get a same-size corruption, that may tell us more. > >> > >> > > > > I have just got one same-size corruption building glibc 2.6 on > > GNU/kFreeBSD i386 (32-bit nonpae). > > > > One byte at address 0x9000 has been replaced by 0x00. Please find the > > good and the bad file attached. > > > > > > Good. We have one or two cross-page-boundary bugs. The > corruption-chase branch already fixes one (which is much more likely to > be triggered by FreeBSD than Linux, if I understand the FreeBSD VM > correctly). I have got since then 3 more failures, all at page boundaries, mangling 1 to 3 bytes. I am currently trying the patch from the corruption-chase branch. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Data corruption in guest using KVM [not found] ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 2007-07-22 18:14 ` Aurelien Jarno @ 2007-07-22 23:34 ` Aurelien Jarno [not found] ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 1 sibling, 1 reply; 19+ messages in thread From: Aurelien Jarno @ 2007-07-22 23:34 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sun, Jul 22, 2007 at 08:34:37PM +0300, Avi Kivity wrote: > Aurelien Jarno wrote: > > On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote: > > > >> If you do happen to get a same-size corruption, that may tell us more. > >> > >> > > > > I have just got one same-size corruption building glibc 2.6 on > > GNU/kFreeBSD i386 (32-bit nonpae). > > > > One byte at address 0x9000 has been replaced by 0x00. Please find the > > good and the bad file attached. > > > > > > Good. We have one or two cross-page-boundary bugs. The > corruption-chase branch already fixes one (which is much more likely to > be triggered by FreeBSD than Linux, if I understand the FreeBSD VM > correctly). > I have tried this branch, and the data get corrupted another way. This is due to the fact that the source address is not incremented for the second write. The patch below fixes that. With this patch, I haven't be able to make any corruption. I have added a printk in the code to see that I have been able to trigger 21 cross-boundary writes without any problem in the various guests. So I think this bug is now fixed. Thanks for your help. Aurelien Signed-off-by: Aurelien Jarno <aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org> diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c index 5b317c1..e7c9ca7 100644 --- a/drivers/kvm/kvm_main.c +++ b/drivers/kvm/kvm_main.c @@ -1141,6 +1141,7 @@ static int emulator_write_emulated(unsigned long addr, if (rc != X86EMUL_CONTINUE) return rc; addr += now; + val += now; bytes -= now; } return emulator_write_emulated_onepage(addr, val, bytes, ctxt); -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> @ 2007-07-23 6:08 ` Aurelien Jarno 2007-07-23 8:04 ` Avi Kivity 1 sibling, 0 replies; 19+ messages in thread From: Aurelien Jarno @ 2007-07-23 6:08 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Mon, Jul 23, 2007 at 01:34:29AM +0200, Aurelien Jarno wrote: > On Sun, Jul 22, 2007 at 08:34:37PM +0300, Avi Kivity wrote: > > Aurelien Jarno wrote: > > > On Sun, Jul 22, 2007 at 04:46:19PM +0300, Avi Kivity wrote: > > > > > >> If you do happen to get a same-size corruption, that may tell us more. > > >> > > >> > > > > > > I have just got one same-size corruption building glibc 2.6 on > > > GNU/kFreeBSD i386 (32-bit nonpae). > > > > > > One byte at address 0x9000 has been replaced by 0x00. Please find the > > > good and the bad file attached. > > > > > > > > > > Good. We have one or two cross-page-boundary bugs. The > > corruption-chase branch already fixes one (which is much more likely to > > be triggered by FreeBSD than Linux, if I understand the FreeBSD VM > > correctly). > > > > I have tried this branch, and the data get corrupted another way. This > is due to the fact that the source address is not incremented for the > second write. The patch below fixes that. > > With this patch, I haven't be able to make any corruption. I have added > a printk in the code to see that I have been able to trigger 21 > cross-boundary writes without any problem in the various guests. > After a night of tests, 172 cross-boundary writes have been triggered, still nothing wrong in the guests. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Data corruption in guest using KVM [not found] ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 2007-07-23 6:08 ` Aurelien Jarno @ 2007-07-23 8:04 ` Avi Kivity 1 sibling, 0 replies; 19+ messages in thread From: Avi Kivity @ 2007-07-23 8:04 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: > I have tried this branch, and the data get corrupted another way. This > is due to the fact that the source address is not incremented for the > second write. The patch below fixes that. > > With this patch, I haven't be able to make any corruption. I have added > a printk in the code to see that I have been able to trigger 21 > cross-boundary writes without any problem in the various guests. > > So I think this bug is now fixed. Thanks for your help. > Thanks for the fix, and for such a good report and testing. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Data corruption in guest using KVM [not found] ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 2007-07-22 13:46 ` Avi Kivity @ 2007-07-22 15:14 ` Avi Kivity [not found] ` <46A37448.1010008-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 1 sibling, 1 reply; 19+ messages in thread From: Avi Kivity @ 2007-07-22 15:14 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: > >> How would I go about reproducing this? Is a single ./configure; make >> clean; make in a loop compiling gcc sufficient? >> > > Yes basically that's what I am doing but on the glibc sources as I get > more "success" to reproduce the bug. Note that you should run the > configure in a different directory from the sources. > btw, are you running a parallel make (-jN)? -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <46A37448.1010008-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <46A37448.1010008-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-07-22 15:19 ` Aurelien Jarno [not found] ` <20070722151913.GA22621-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Aurelien Jarno @ 2007-07-22 15:19 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sun, Jul 22, 2007 at 06:14:16PM +0300, Avi Kivity wrote: > Aurelien Jarno wrote: > > > >>How would I go about reproducing this? Is a single ./configure; make > >>clean; make in a loop compiling gcc sufficient? > >> > > > >Yes basically that's what I am doing but on the glibc sources as I get > >more "success" to reproduce the bug. Note that you should run the > >configure in a different directory from the sources. > > > > btw, are you running a parallel make (-jN)? I wasn't using -j until a few hours. I am now trying -j2 with SMP guests. I already get what I think is a corruption, a parse error from ld on a file generated by gcc. Unfortunately I don't have the corrupted file, as it as been removed by gcc. -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' aurel32-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org | aurelien-rXXEIb44qovR7s880joybQ@public.gmane.org `- people.debian.org/~aurel32 | www.aurel32.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20070722151913.GA22621-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>]
* Re: Data corruption in guest using KVM [not found] ` <20070722151913.GA22621-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org> @ 2007-07-22 15:24 ` Avi Kivity 0 siblings, 0 replies; 19+ messages in thread From: Avi Kivity @ 2007-07-22 15:24 UTC (permalink / raw) To: Aurelien Jarno; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Aurelien Jarno wrote: >> >> btw, are you running a parallel make (-jN)? >> > > I wasn't using -j until a few hours. I am now trying -j2 with SMP > guests. I already get what I think is a corruption, a parse error from > ld on a file generated by gcc. Unfortunately I don't have the corrupted > file, as it as been removed by gcc. > > Well, that may be a different bug. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-07-23 8:04 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-21 17:22 Data corruption in guest using KVM Aurelien Jarno
[not found] ` <20070721172248.GA1555-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-21 17:46 ` Anthony Liguori
[not found] ` <46A24675.1010506-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-07-21 17:54 ` Aurelien Jarno
[not found] ` <20070721175404.GA3665-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-21 18:03 ` Anthony Liguori
[not found] ` <46A24A7E.6040104-rdkfGonbjUSkNkDKm+mE6A@public.gmane.org>
2007-07-21 18:39 ` Aurelien Jarno
[not found] ` <20070721183924.GA5108-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-21 21:00 ` Missing my posts to this lists Simon Gao
[not found] ` <46A273F4.2040001-g4dUTk+gKbW4mfPA/iJWtA@public.gmane.org>
2007-07-22 7:53 ` Avi Kivity
2007-07-22 7:52 ` Data corruption in guest using KVM Avi Kivity
[not found] ` <46A30CA3.3090100-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 13:38 ` Aurelien Jarno
[not found] ` <20070722133818.GG16993-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-22 13:46 ` Avi Kivity
[not found] ` <46A35FAB.701-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 16:44 ` Aurelien Jarno
[not found] ` <20070722164454.GA26166-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-22 17:34 ` Avi Kivity
[not found] ` <46A3952D.2020009-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 18:14 ` Aurelien Jarno
2007-07-22 23:34 ` Aurelien Jarno
[not found] ` <20070722233429.GA10146-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-23 6:08 ` Aurelien Jarno
2007-07-23 8:04 ` Avi Kivity
2007-07-22 15:14 ` Avi Kivity
[not found] ` <46A37448.1010008-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-07-22 15:19 ` Aurelien Jarno
[not found] ` <20070722151913.GA22621-OqXK5JiLQY5aJl8KAwiEcA@public.gmane.org>
2007-07-22 15:24 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox