From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gleb Natapov Subject: Re: qcow2 corruption observed, fixed by reverting old change Date: Sun, 15 Feb 2009 12:57:18 +0200 Message-ID: <20090215105718.GH25994@redhat.com> References: <20090211070049.GA27821@shareable.org> <49955681.9070301@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Marc Bevand , kvm@vger.kernel.org, qemu-devel@nongnu.org To: Kevin Wolf Return-path: Received: from mx2.redhat.com ([66.187.237.31]:59669 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751866AbZBOLAN (ORCPT ); Sun, 15 Feb 2009 06:00:13 -0500 Content-Disposition: inline In-Reply-To: <49955681.9070301@suse.de> Sender: kvm-owner@vger.kernel.org List-ID: > > I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because of the > > qcow2 performance regression caused by the default writethrough caching policy) > > but it randomly triggers an even worse bug: the moment I shut down a guest by > > typing "quit" in the monitor, it sometimes overwrite the first 4kB of the disk > > image with mostly NUL bytes (!) which completely destroys it. I am familiar with > > the qcow2 format and apparently this 4kB block seems to be an L2 table with most > > entries set to zero. I have had to restore at least 6 or 7 disk images from > > backup after occurences of that bug. My intuition tells me this may be the qcow2 > > code trying to allocate a cluster to write a new L2 table, but not noticing the > > allocation failed (represented by a 0 offset), and writing the L2 table at that > > 0 offset, overwriting the qcow2 header. > > > > Fortunately this bug is also fixed by running kvm-75 with block-qcow2.c reverted > > to its kvm-72 version. > > > > Basically qcow2 in kvm-73 or newer is completely unreliable. > > > > -marc > > I think the corruption is a completely unrelated bug. I would suspect it > was introduced in one of Gleb's patches in December. Adding him to CC. > I am not able to reproduce this. After more then hundred boot linux; generate disk io; quit loops all I've got is an image with 7 leaked blocks and couple of filesystem corruptions that were fixed by fsck. -- Gleb.