From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1L4edF-0006bB-2K
	for qemu-devel@nongnu.org; Mon, 24 Nov 2008 11:49:53 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1L4edD-0006aR-C3
	for qemu-devel@nongnu.org; Mon, 24 Nov 2008 11:49:52 -0500
Received: from [199.232.76.173] (port=43223 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1L4edD-0006aI-2E
	for qemu-devel@nongnu.org; Mon, 24 Nov 2008 11:49:51 -0500
Received: from yw-out-1718.google.com ([74.125.46.158]:34177)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1L4edC-0001IN-Ka
	for qemu-devel@nongnu.org; Mon, 24 Nov 2008 11:49:50 -0500
Received: by yw-out-1718.google.com with SMTP id 6so804452ywa.82
	for <qemu-devel@nongnu.org>; Mon, 24 Nov 2008 08:49:49 -0800 (PST)
Message-ID: <492ADB2A.7030700@codemonkey.ws>
Date: Mon, 24 Nov 2008 10:49:46 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH 5/5] Change order of metadata update to
	prevent loosing guest data because of unexpected exit.
References: <20081123145248.22178.36228.stgit@dhcp-1-237.tlv.redhat.com>
	<20081123145326.22178.36990.stgit@dhcp-1-237.tlv.redhat.com>
In-Reply-To: <20081123145326.22178.36990.stgit@dhcp-1-237.tlv.redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Gleb Natapov wrote:
> Currently the order is this (during cow since it's the interesting case):
> 1. Decrement refcount of old clusters
> 2. Increment refcount for newly allocated clusters
> 3. Copy content of old sectors that will not be rewritten
> 4. Update L2 table with pointers to new clusters
> 5. Write guest data into new clusters (asynchronously)
>
> There are several problems with this order. The first one is that if qemu
> crashes (or killed or host reboots) after new clusters are linked into L2
> table but before user data is written there, then on the next reboot guest
> will find neither old data nor new one in those sectors and this is not
> what gust expects even when journaling file system is in use.  The other
> problem is that if qemu is killed between steps 1 and 4 then refcount
> of old cluster will be incorrect and may cause snapshot corruption.
>
> The patch change the order to be like this:
> 1. Increment refcount for newly allocated clusters
> 2. Write guest data into new clusters (asynchronously)
> 3. Copy content of old sectors that were not rewritten
> 4. Update L2 table with pointers to new clusters
> 5. Decrement refcount of old clusters
>
> Unexpected crash may cause cluster leakage, but guest data should be safe.
>
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>
>  block-qcow2.c |  155 +++++++++++++++++++++++++++++++++------------------------
>  1 files changed, 91 insertions(+), 64 deletions(-)
>
> diff --git a/block-qcow2.c b/block-qcow2.c
> index 0771281..c600517 100644
> --- a/block-qcow2.c
> +++ b/block-qcow2.c
> @@ -852,6 +852,69 @@ static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs,
>      return cluster_offset;
>  }
>  
> +typedef struct QCowL2Meta
> +{
> +    uint64_t offset;
> +    int n_start;
> +    int nb_available;
> +    int nb_clusters;
> +} QCowL2Meta;
> +
> +static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
> +        QCowL2Meta *m)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i, j = 0, l2_index, ret;
> +    uint64_t *old_cluster, start_sect, l2_offset, *l2_table;
> +
> +    if (m->nb_clusters == 0)
> +        return 0;
> +
> +    if (!(old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t))))
> +        return -ENOMEM;
>   

This memory is never freed.

Regards,

Anthony Liguori