From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:57802) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TimZX-0004JW-Bd for qemu-devel@nongnu.org; Wed, 12 Dec 2012 08:42:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TimZR-0002HY-CP for qemu-devel@nongnu.org; Wed, 12 Dec 2012 08:42:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42160) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TimZR-0002HJ-4Z for qemu-devel@nongnu.org; Wed, 12 Dec 2012 08:41:57 -0500 Message-ID: <50C8899D.2050308@redhat.com> Date: Wed, 12 Dec 2012 14:41:49 +0100 From: Kevin Wolf MIME-Version: 1.0 References: <1339767219-24297-1-git-send-email-kwolf@redhat.com> <1339767219-24297-29-git-send-email-kwolf@redhat.com> <201212121425.41850.hahn@univention.de> In-Reply-To: <201212121425.41850.hahn@univention.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [BUG] qemu-1.1.2 [FIXED-BY] qcow2: Fix avail_sectors in cluster allocation code List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philipp Hahn Cc: Michael Tokarev , qemu-devel@nongnu.org Hi Philipp, Am 12.12.2012 14:25, schrieb Philipp Hahn: > Hello Kevin, hello Michael, hello *, > > we noticed a data corruption bug in qemu-1.1.2, which will be shipped by > Debian and our own Debian based distibution. > The corruption mostly manifests while installing large Debian package files > and seems to be reladed to memory preasure: As long as the file is still in > the page cache, everything looks fine, but when the file is re-read from the > virtual hard disk using a qcow2 file backed by another qcow2 file, the file > is corrupted: dpkg complains that the .tar.gz file inside the Debian archive > file is corrupted and the md5sum no longer matches. > > I tracked this down using "git bisect" to your patch attached below, which > fixed this bug, so everything is fine with qemu-kvm-1.2.0. > From my reading this seems to explain our problems, since during my own > testing during development I never used backing chains and the problem only > showed up when my collegues started using qemu-kvm-1.1.2 with their VMs using > backing chains. > > @Kevin: Do you thinks that's a valid explanation and your patch should fix > that problem? > I'd like to get your expertise before filing a bug with Debian and asking > Michael to include that patch with his next stable update for 1.1. As you can see in the commit message of that patch I was convinced that no bug did exist in practice and this was only dangerous with respect to future changes. Therefore my first question is if you're using an unmodified upstream qemu or if some backported patches are applied to it? If it's indeed unmodified, we should probably review the code once again to understand why it makes a difference. In any case, this is the cluster allocation code. It's probably not related to rereading things from disk, but rather to the writeout of the page cache. Kevin