From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45528) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCWW4-00039t-J2 for qemu-devel@nongnu.org; Wed, 30 Jul 2014 12:14:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XCWVy-00050N-Kw for qemu-devel@nongnu.org; Wed, 30 Jul 2014 12:14:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:28683) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCWVy-0004z5-DX for qemu-devel@nongnu.org; Wed, 30 Jul 2014 12:14:06 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s6UGE5Zv011855 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 30 Jul 2014 12:14:06 -0400 Message-ID: <53D919CC.9050706@redhat.com> Date: Wed, 30 Jul 2014 10:14:04 -0600 From: Eric Blake MIME-Version: 1.0 References: <1406311665-2814-1-git-send-email-mreitz@redhat.com> <1406311665-2814-8-git-send-email-mreitz@redhat.com> In-Reply-To: <1406311665-2814-8-git-send-email-mreitz@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="S9ejvkUl7Jbnb7rQXN9Vc5ePPpnp69m1q" Subject: Re: [Qemu-devel] [PATCH 7/8] block/qcow2: Speed up zero cluster expansion List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz , qemu-devel@nongnu.org Cc: Kevin Wolf , Stefan Hajnoczi This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --S9ejvkUl7Jbnb7rQXN9Vc5ePPpnp69m1q Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 07/25/2014 12:07 PM, Max Reitz wrote: > Actually, we do not need to allocate a new data cluster for every zero > cluster to be expanded: It is completely sufficient to rely on qcow2's > COW part and instead create a single zero cluster and reuse it as much > as possible. >=20 > Signed-off-by: Max Reitz > --- > block/qcow2-cluster.c | 119 ++++++++++++++++++++++++++++++++++++++----= -------- > 1 file changed, 92 insertions(+), 27 deletions(-) >=20 > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c > index 905beb6..867db03 100644 > --- a/block/qcow2-cluster.c > +++ b/block/qcow2-cluster.c > @@ -1558,6 +1558,9 @@ static int expand_zero_clusters_in_l1(BlockDriver= State *bs, uint64_t *l1_table, > BDRVQcowState *s =3D bs->opaque; > bool is_active_l1 =3D (l1_table =3D=3D s->l1_table); > uint64_t *l2_table =3D NULL; > + int64_t zeroed_cluster_offset =3D 0; > + int zeroed_cluster_refcount =3D 0; > + int last_zeroed_cluster_l1i =3D 0, last_zeroed_cluster_l2i =3D 0; > int ret; > int i, j; > =20 > @@ -1617,47 +1620,79 @@ static int expand_zero_clusters_in_l1(BlockDriv= erState *bs, uint64_t *l1_table, > continue; > } > =20 > - offset =3D qcow2_alloc_clusters(bs, s->cluster_size); > - if (offset < 0) { > - ret =3D offset; > - goto fail; > + if (zeroed_cluster_offset) { > + zeroed_cluster_refcount +=3D l2_refcount; > + if (zeroed_cluster_refcount > 0xffff) { Doesn't the qcow2 file format allow variable-sized maximum refcount (bytes 96-99 refcount_order in the header)? Therefore, you should be using the value computed from the header rather than hard-coding the assumption that the header used (the default of) 16-bit refcount. [Yeah, I know, we don't yet have code that supports non-default size, even though the file format documents it, but that doesn't mean we should make it harder to add support down the road...] > + zeroed_cluster_refcount =3D 0; > + zeroed_cluster_offset =3D 0; > + } > } This isn't a maximal packing. As long as we don't mind complexity to gain compactness, couldn't we also expand the existing zeroed_cluster_offset all the way up to full refcount, and decrement l2_refcount by the difference, before spilling over to allocating the next zero cluster? Also, I have to wonder - since the all-zero cluster is the most likely cluster to have a large refcount, even during normal runtime, should we special case the normal qcow2 write code to track the current all-zero cluster (if any), and merely increase its refcount rather than allocate a new cluster any time it is detected that an all-zero cluster is needed? [Of course, the tracking would be runtime only, since compat=3D0.10 header doesn't provide any way to track the location of an all-zero cluster across file reloads. Each new runtime would probably settle on a new location for the all-zero cluster used during that run, rather than trying to find an existing one. And there's really no point to adding a header to track an all-zero cluster in compat=3D1.1 images, since those images already have the ability to track zero clusters without needing one allocated.] > + if (!zeroed_cluster_offset) { > + offset =3D qcow2_alloc_clusters(bs, s->cluster_siz= e); > + if (offset < 0) { > + ret =3D offset; > + goto fail; > + } > =20 > - if (l2_refcount > 1) { > - /* For shared L2 tables, set the refcount accordin= gly (it is > - * already 1 and needs to be l2_refcount) */ > - ret =3D qcow2_update_cluster_refcount(bs, > - offset >> s->cluster_bits, l2_refcount - 1= , > - QCOW2_DISCARD_OTHER); > + ret =3D qcow2_pre_write_overlap_check(bs, 0, offse= t, > + s->cluster_siz= e); > + if (ret < 0) { > + qcow2_free_clusters(bs, offset, s->cluster_siz= e, > + QCOW2_DISCARD_OTHER); > + goto fail; > + } > + > + ret =3D bdrv_write_zeroes(bs->file, offset / BDRV_= SECTOR_SIZE, > + s->cluster_sectors, 0); That is, if bdrv_write_zeroes knows how to take advantage of an already existing all-zero cluster, it would be less special casing in this code, but still get the same benefits of maximizing refcount during the amend operation, if all expanded clusters go through bdrv_write_zeroes. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --S9ejvkUl7Jbnb7rQXN9Vc5ePPpnp69m1q Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJT2RnMAAoJEKeha0olJ0Nq+h8H/17iI0V/jp9D2PWbbmI+DY5c YQZfsYuVPqyk4EfK2FCiSHjIRK51B+icOA79C9P8GiUdKYYJuhaBNJ7+ai6fjfKX lFV7niWDJ+9B5slz3GyD4sYbnq3FuK7ruut+v+Wz6Uct4nfw4MubLCstddZbIimr 9f2HrndCfnBsChDkKuwSUFmj0xT/y6juipVvcKmZK6vV/0kWKpzVNVof7lmpuRD+ /LXRI/ncsEHCWodimxh9TbAo2bymBAyDHJr3SnRtxGVOqatBJflME8+evLccQxgD MpJRq87fEksZs9ZhYcep01HgC0odlZSVk0bQv4E90GcCj9vqwHg7xlqIG35uaUs= =rp3L -----END PGP SIGNATURE----- --S9ejvkUl7Jbnb7rQXN9Vc5ePPpnp69m1q--