From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1M1kOm-00038S-VE
	for qemu-devel@nongnu.org; Wed, 06 May 2009 12:55:13 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1M1kOi-00037n-Fz
	for qemu-devel@nongnu.org; Wed, 06 May 2009 12:55:12 -0400
Received: from [199.232.76.173] (port=44494 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1M1kOi-00037k-AG
	for qemu-devel@nongnu.org; Wed, 06 May 2009 12:55:08 -0400
Received: from mx2.redhat.com ([66.187.237.31]:58443)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1M1kOh-00031k-Q9
	for qemu-devel@nongnu.org; Wed, 06 May 2009 12:55:08 -0400
Message-ID: <4A01C0C6.7020902@redhat.com>
Date: Wed, 06 May 2009 19:54:30 +0300
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <1241627950-22195-1-git-send-email-kwolf@redhat.com>
In-Reply-To: <1241627950-22195-1-git-send-email-kwolf@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH] qcow2/virtio corruption: Don't allocate
 the same cluster twice
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: markmc@redhat.com, qemu-devel@nongnu.org

Kevin Wolf wrote:
> A write on a qcow2 image that results in the allocation of a new cluster can be
> divided into two parts: There is a part which happens before the actual data is
> written, this is about allocating clusters in the image file. The second part
> happens in the AIO callback handler and is about making L2 entries for the
> newly allocated clusters.
>
> When two AIO requests happen to touch the same free cluster, there is a chance
> that the second request still sees the cluster as free because the callback of
> the first request has not yet run. This means that it reserves another cluster
> for the same virtual hard disk offset and hooks it up in its own callback,
> overwriting what the first callback has done. Long story cut short: Bad Things
> happen.
>
> This patch maintains a list of in-flight requests that have allocated new
> clusters. A second request touching the same cluster doesn't find an entry yet
> in the L2 table, but can look it up in the list now. The second request is
> limited so that it either doesn't touch the allocation of the first request
> (so it can have a non-overlapping allocation) or that it doesn't exceed the
> end of the allocation of the first request (so it can reuse this allocation
> and doesn't need to do anything itself).
>
>   


> +        } else {
> +            /* Reuse the previously allocated clusters */
> +            if (end_offset > old_end_offset) {
> +                nb_clusters = (old_end_offset - offset) >> s->cluster_bits;
> +            }
> +            cluster_offset = old_alloc->cluster_offset + (offset - old_offset);
> +            m->nb_clusters = 0;
> +            goto out;
> +        }
> +    }
> +
> +    LIST_INSERT_HEAD(&s->cluster_allocs, m, next);
>
>   

What happens if the second request completes before the first?  Then, 
when the first request completes, alloc_cluster_link_l2() will call 
copy_clusters() and overwrite the second request.

Also, the second request now depends on the first to update its 
metadata.  But if the first request fail, it will not update its 
metadata, and the second request will complete without error and also 
without updating its metadata.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.