From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43531)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XCagX-0004re-FU
	for qemu-devel@nongnu.org; Wed, 30 Jul 2014 16:41:23 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XCagR-0006Tj-AV
	for qemu-devel@nongnu.org; Wed, 30 Jul 2014 16:41:17 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37314)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XCagR-0006TZ-2R
	for qemu-devel@nongnu.org; Wed, 30 Jul 2014 16:41:11 -0400
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
	(int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s6UKf9tD010263
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=OK)
	for <qemu-devel@nongnu.org>; Wed, 30 Jul 2014 16:41:10 -0400
Message-ID: <53D95862.8080506@redhat.com>
Date: Wed, 30 Jul 2014 22:41:06 +0200
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1406311665-2814-1-git-send-email-mreitz@redhat.com>	<1406311665-2814-8-git-send-email-mreitz@redhat.com>
	<53D919CC.9050706@redhat.com> <53D9561D.5010505@redhat.com>
In-Reply-To: <53D9561D.5010505@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 7/8] block/qcow2: Speed up zero cluster
	expansion
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>

On 30.07.2014 22:31, Eric Blake wrote:
> On 07/30/2014 10:14 AM, Eric Blake wrote:
>> On 07/25/2014 12:07 PM, Max Reitz wrote:
>>> Actually, we do not need to allocate a new data cluster for every zero
>>> cluster to be expanded: It is completely sufficient to rely on qcow2's
>>> COW part and instead create a single zero cluster and reuse it as much
>>> as possible.
>> Also, I have to wonder - since the all-zero cluster is the most likely
>> cluster to have a large refcount, even during normal runtime, should we
>> special case the normal qcow2 write code to track the current all-zero
>> cluster (if any), and merely increase its refcount rather than allocate
>> a new cluster any time it is detected that an all-zero cluster is
>> needed?  [Of course, the tracking would be runtime only, since
>> compat=0.10 header doesn't provide any way to track the location of an
>> all-zero cluster across file reloads.  Each new runtime would probably
>> settle on a new location for the all-zero cluster used during that run,
>> rather than trying to find an existing one.  And there's really no point
>> to adding a header to track an all-zero cluster in compat=1.1 images,
>> since those images already have the ability to track zero clusters
>> without needing one allocated.]
>
>>> +                    ret = bdrv_write_zeroes(bs->file, offset / BDRV_SECTOR_SIZE,
>>> +                                            s->cluster_sectors, 0);
>> That is, if bdrv_write_zeroes knows how to take advantage of an already
>> existing all-zero cluster, it would be less special casing in this code,
>> but still get the same benefits of maximizing refcount during the amend
>> operation, if all expanded clusters go through bdrv_write_zeroes.
> Now that I've looked through both variants, I'm leaning towards the
> simplicity of your alternate series, rather than the complexity of this
> one, if we can (independently?) optimize bdrv_write_zeroes to reuse a
> known-all-zeroes cluster when possible.  Of course, you may want to get
> other opinions than just mine before posting your next round of these
> patches.

I'm pretty sure Kevin prefers a variant which is as simple as possible, 
so I'll use that (alternative) version for v2, then.

However, I still think we should not optimize bdrv_write_zeroes(). As 
far as I know, qemu should work best with raw and qcow2 in its current 
version. raw will not support things like a common zero cluster anyway; 
and qcow2 in its current version has zero clusters built-in. I don't 
think we should optimize for qcow2 compat=0.10 to make up for things it 
lacks in comparison to compat=1.1 by design.

Also, in regard to this patch: bs->file is most probably a raw file 
which won't support a common zero cluster. If we want to optimize the 
bdrv_write_zeroes() call alone, all we can do is to allow it to discard 
the sectors (which I guess I'll just do in v2 because it doesn't cost 
anything).

In any case, if later on I or somebody else does decide to optimize 
bdrv_write_zeroes() we can still implement this optimization 
independently of this series.

Max