From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LYejN-00050r-MC for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:13 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LYejL-0004zm-5d for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:12 -0500 Received: from [199.232.76.173] (port=59623 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LYejK-0004zY-JL for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:10 -0500 Received: from mx2.redhat.com ([66.187.237.31]:34394) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LYejJ-000504-Gy for qemu-devel@nongnu.org; Sun, 15 Feb 2009 06:00:10 -0500 Date: Sun, 15 Feb 2009 12:57:18 +0200 From: Gleb Natapov Message-ID: <20090215105718.GH25994@redhat.com> References: <20090211070049.GA27821@shareable.org> <49955681.9070301@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49955681.9070301@suse.de> Subject: [Qemu-devel] Re: qcow2 corruption observed, fixed by reverting old change Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Marc Bevand , qemu-devel@nongnu.org, kvm@vger.kernel.org > > I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the > > qcow2 performance regression caused by the default writethrough caching policy) > > but it randomly triggers an even worse bug: the moment I shut down a guest by > > typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk > > image with mostly NUL bytes (!) which completely destroys it. I am familiar with > > the qcow2 format and apparently this 4kB block seems to be an L2 table with most > > entries set to zero. I have had to restore at least 6 or 7 disk images from > > backup after occurences of that bug. My intuition tells me this may be the qcow2 > > code trying to allocate a cluster to write a new L2 table, but not noticing the > > allocation failed (represented by a 0 offset), and writing the L2 table at that > > 0 offset, overwriting the qcow2 header. > > > > Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c reverted > > to its kvm-72 version. > > > > Basically qcow2 in kvm-73 or newer is completely unreliable. > > > > -marc > > I think the corruption is a completely unrelated bug. I would suspect it > was introduced in one of Gleb's patches in December. Adding him to CC. > I am not able to reproduce this. After more then hundred boot linux; generate disk io; quit loops all I've got is an image with 7 leaked blocks and couple of filesystem corruptions that were fixed by fsck. -- Gleb.