From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:49550) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TNh4M-0001oQ-ML for qemu-devel@nongnu.org; Mon, 15 Oct 2012 05:34:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TNh4G-0005iH-KF for qemu-devel@nongnu.org; Mon, 15 Oct 2012 05:34:42 -0400 Received: from mail.stepping-stone.ch ([194.176.109.206]:37467) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TNh4G-0005iB-Dl for qemu-devel@nongnu.org; Mon, 15 Oct 2012 05:34:36 -0400 Message-ID: <1350293667.4696.159.camel@storm> From: Tiziano =?ISO-8859-1?Q?M=FCller?= Date: Mon, 15 Oct 2012 11:34:27 +0200 In-Reply-To: <20121015074819.GB24883@stefanha-thinkpad.redhat.com> References: <1349962403.4696.51.camel@storm> <20121012083315.GB14822@stefanha-thinkpad.redhat.com> <1350032009.4696.84.camel@storm> <20121015074819.GB24883@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Silent filesystem/qcow2 corruptions with qemu-kvm-1.0 and 1.1.1 Reply-To: tiziano.mueller@stepping-stone.ch List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: qemu-devel Am Montag, den 15.10.2012, 09:48 +0200 schrieb Stefan Hajnoczi: > Okay, that's consistent with the other symptoms you've reported. > > It's not clear whether the corruption arises inside qcow2 or if > something else is causing corruption and qcow2/xfs get upset. That is > the next step to debugging this - hopefully it will become possible to > reproduce it reliably, at which point it is much easier to debug :). Yes, we have now 3 different VM-configurations deployed on the server where we saw the most corruptions so far: * config 1: as-is * config 2: raw-image format instead of qcow2 as you suggested * config 3: vhost=off for the network The third test-case is because we figured out that one major difference between the two servers is that on the server with the most corruptions we have vhost=on (automatically turned on by qemu/libvirt because the vhost-net module got loaded). I know this does not make a lot of sense. In fact, on the server without vhost-net (and with qemu-1.0) we only saw one corruption after all (where the qcow2 got really messed up), so one could assume that this was a single event and may have been a user error. To summarize it: * server 1 (qemu-kvm-1.1, host-kernel 3.5.2, vhost-net loaded and used) has had +5 machines with corrupt filesystems (ext4 and xfs) and 1 machine with a corrupt qcow2. * server 2 (qemu-kvm-1.0, host-kernel 3.2.6, vhost-net not loaded) has had only 1 machine with a corrupt qcow2. The mentioned Test-VMs are running since Friday and unfortunately none of them had the decency to go corrupt again. We will keep you posted, thanks a lot for your help. Best regards, Tiziano -- stepping stone GmbH Neufeldstrasse 9 CH-3012 Bern Telefon: +41 31 332 53 63 www.stepping-stone.ch tiziano.mueller@stepping-stone.ch