From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43531) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCagX-0004re-FU for qemu-devel@nongnu.org; Wed, 30 Jul 2014 16:41:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XCagR-0006Tj-AV for qemu-devel@nongnu.org; Wed, 30 Jul 2014 16:41:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37314) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCagR-0006TZ-2R for qemu-devel@nongnu.org; Wed, 30 Jul 2014 16:41:11 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s6UKf9tD010263 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 30 Jul 2014 16:41:10 -0400 Message-ID: <53D95862.8080506@redhat.com> Date: Wed, 30 Jul 2014 22:41:06 +0200 From: Max Reitz MIME-Version: 1.0 References: <1406311665-2814-1-git-send-email-mreitz@redhat.com> <1406311665-2814-8-git-send-email-mreitz@redhat.com> <53D919CC.9050706@redhat.com> <53D9561D.5010505@redhat.com> In-Reply-To: <53D9561D.5010505@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 7/8] block/qcow2: Speed up zero cluster expansion List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: Kevin Wolf , Stefan Hajnoczi On 30.07.2014 22:31, Eric Blake wrote: > On 07/30/2014 10:14 AM, Eric Blake wrote: >> On 07/25/2014 12:07 PM, Max Reitz wrote: >>> Actually, we do not need to allocate a new data cluster for every zero >>> cluster to be expanded: It is completely sufficient to rely on qcow2's >>> COW part and instead create a single zero cluster and reuse it as much >>> as possible. >> Also, I have to wonder - since the all-zero cluster is the most likely >> cluster to have a large refcount, even during normal runtime, should we >> special case the normal qcow2 write code to track the current all-zero >> cluster (if any), and merely increase its refcount rather than allocate >> a new cluster any time it is detected that an all-zero cluster is >> needed? [Of course, the tracking would be runtime only, since >> compat=0.10 header doesn't provide any way to track the location of an >> all-zero cluster across file reloads. Each new runtime would probably >> settle on a new location for the all-zero cluster used during that run, >> rather than trying to find an existing one. And there's really no point >> to adding a header to track an all-zero cluster in compat=1.1 images, >> since those images already have the ability to track zero clusters >> without needing one allocated.] > >>> + ret = bdrv_write_zeroes(bs->file, offset / BDRV_SECTOR_SIZE, >>> + s->cluster_sectors, 0); >> That is, if bdrv_write_zeroes knows how to take advantage of an already >> existing all-zero cluster, it would be less special casing in this code, >> but still get the same benefits of maximizing refcount during the amend >> operation, if all expanded clusters go through bdrv_write_zeroes. > Now that I've looked through both variants, I'm leaning towards the > simplicity of your alternate series, rather than the complexity of this > one, if we can (independently?) optimize bdrv_write_zeroes to reuse a > known-all-zeroes cluster when possible. Of course, you may want to get > other opinions than just mine before posting your next round of these > patches. I'm pretty sure Kevin prefers a variant which is as simple as possible, so I'll use that (alternative) version for v2, then. However, I still think we should not optimize bdrv_write_zeroes(). As far as I know, qemu should work best with raw and qcow2 in its current version. raw will not support things like a common zero cluster anyway; and qcow2 in its current version has zero clusters built-in. I don't think we should optimize for qcow2 compat=0.10 to make up for things it lacks in comparison to compat=1.1 by design. Also, in regard to this patch: bs->file is most probably a raw file which won't support a common zero cluster. If we want to optimize the bdrv_write_zeroes() call alone, all we can do is to allow it to discard the sectors (which I guess I'll just do in v2 because it doesn't cost anything). In any case, if later on I or somebody else does decide to optimize bdrv_write_zeroes() we can still implement this optimization independently of this series. Max