From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LYejN-00050r-MC
	for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:13 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LYejL-0004zm-5d
	for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:12 -0500
Received: from [199.232.76.173] (port=59623 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LYejK-0004zY-JL
	for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:10 -0500
Received: from mx2.redhat.com ([66.187.237.31]:34394)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <gleb@redhat.com>) id 1LYejJ-000504-Gy
	for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:10 -0500
Date: Sun, 15 Feb 2009 12:57:18 +0200
From: Gleb Natapov <gleb@redhat.com>
Message-ID: <20090215105718.GH25994@redhat.com>
References: <20090211070049.GA27821@shareable.org>
	<loom.20090213T060937-534@post.gmane.org>
	<49955681.9070301@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <49955681.9070301@suse.de>
Subject: [Qemu-devel] Re: qcow2 corruption observed,
	fixed by reverting old change
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@suse.de>
Cc: Marc Bevand <m.bevand@gmail.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org

> > I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the
> > qcow2 performance regression caused by the default writethrough caching policy)
> > but it randomly triggers an even worse bug: the moment I shut down a guest by
> > typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk
> > image with mostly NUL bytes (!) which completely destroys it. I am familiar with
> > the qcow2 format and apparently this 4kB block seems to be an L2 table with most
> > entries set to zero. I have had to restore at least 6 or 7 disk images from
> > backup after occurences of that bug. My intuition tells me this may be the qcow2
> > code trying to allocate a cluster to write a new L2 table, but not noticing the
> > allocation failed (represented by a 0 offset), and writing the L2 table at that
> > 0 offset, overwriting the qcow2 header.
> > 
> > Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c reverted
> > to its kvm-72 version.
> > 
> > Basically qcow2 in kvm-73 or newer is completely unreliable.
> > 
> > -marc
> 
> I think the corruption is a completely unrelated bug. I would suspect it
> was introduced in one of Gleb's patches in December. Adding him to CC.
> 
I am not able to reproduce this. After more then hundred boot linux; generate
disk io; quit loops all I've got is an image with 7 leaked blocks and
couple of filesystem corruptions that were fixed by fsck.

--
			Gleb.