From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:35764)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1QpF75-0002Au-MG
	for qemu-devel@nongnu.org; Fri, 05 Aug 2011 03:46:36 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1QpF72-0005oW-7M
	for qemu-devel@nongnu.org; Fri, 05 Aug 2011 03:46:35 -0400
Received: from mx1.redhat.com ([209.132.183.28]:8259)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1QpF71-0005oJ-Ub
	for qemu-devel@nongnu.org; Fri, 05 Aug 2011 03:46:32 -0400
Message-ID: <4E3BA085.80302@redhat.com>
Date: Fri, 05 Aug 2011 09:49:25 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <CAHt6W4eQxmf=2A4YtKJbBYZA86dw0nR_hxg4Hv3xwXiA1JFC3A@mail.gmail.com>
In-Reply-To: <CAHt6W4eQxmf=2A4YtKJbBYZA86dw0nR_hxg4Hv3xwXiA1JFC3A@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Possible corruption on qcow2_snapshot_goto
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Frediano Ziglio <freddy77@gmail.com>
Cc: qemu-devel <qemu-devel@nongnu.org>

Am 05.08.2011 09:34, schrieb Frediano Ziglio:
> This function seems to be not that safe.
> 
> The first problem is that the first thing it does if free refcount
> calling qcow2_update_snapshot_refcount(bs, s->l1_table_offset,
> s->l1_size, -1). Now if something goes wrong (crash, disk errors) this
> could lead to refcount == 0 (depending on L1 size, cache, open flags).
> 
> If table grow and qcow2_grow_l1_table needs to resize on disk it
> probably will found some reference counter set to 0 (cause we already
> decremented it), fail than qcow2_snapshot_goto return -EIO leaving
> refcounts deallocated.

I can't say I'm surprised.

This kind of problem used to be common all over the place in qcow2. I
fixed it for most paths, but internal snapshots aren't considered
supported for RHEL, so I never got around to fixing the snapshot code.

So, yes, the order must be changed, so that in the worst case we have
cluster leaks.

Kevin