From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41759) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eydTw-0006i4-8z for qemu-devel@nongnu.org; Wed, 21 Mar 2018 09:08:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eydTv-0004ty-6j for qemu-devel@nongnu.org; Wed, 21 Mar 2018 09:08:44 -0400 References: <20180320135515.16823-1-berto@igalia.com> <665e0ccd-4440-4e0e-30f9-e38537795325@redhat.com> From: Eric Blake Message-ID: <1809fec1-cdf6-0da4-14e8-04b01d4fc48b@redhat.com> Date: Wed, 21 Mar 2018 08:08:23 -0500 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH for-2.12] qcow2: Reset free_cluster_index when allocating a new refcount block List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alberto Garcia , qemu-devel@nongnu.org Cc: Kevin Wolf , qemu-block@nongnu.org, Max Reitz On 03/21/2018 04:28 AM, Alberto Garcia wrote: > On Tue 20 Mar 2018 06:54:15 PM CET, Eric Blake wrote: >=20 >>> When we try to allocate new clusters we first look for available ones >>> starting from s->free_cluster_index and once we find them we increase >>> their reference counts. Before we get to call update_refcount() to do >>> this last step s->free_cluster_index is already pointing to the next >>> cluster after the ones we are trying to allocate. >>> >> > During update_refcount() it may happen however that we also need t= o >> > allocate a new refcount block in order to store the refcounts of >> > these new clusters >> >> Your changes to test 121 covers this... >> >> > (and to complicate things further that may also require us to grow >> > the refcount table). >> >> ...but not this. Is it worth also trying to cover this case in the >> testsuite as well? >=20 > I checked and the patch doesn't really fix that scenario. There's a > different problem that I haven't debugged completely yet, because of tw= o > reasons: >=20 > - One difference is that when we grow the refcount table we actually > allocate a new one, so s->free_cluster_index points to the beginnin= g > of the image (where the previous table was) and any holes left duri= ng > the process are allocated after that (depending on how much data we > write though). Yeah, that can make the test harder to reason about, and is slightly=20 more sensitive to our internal algorithm - but it also explains why you=20 checked for index > start rather than unconditionally assigning index =3D= =20 start. >=20 > - This scenario is harder to reach: in order to fill a 1-cluster > refcount table the size of the image needs to be larger than > (cluster_size=C2=B3 / refcount_bits) bytes, that's 16TB with the de= fault > parameters. So although it can be reproduced easily if you reduce t= he > cluster size I think it's very infrequent under normal conditions. Yes, 16TB for default size, but only 2M for the best-case (512-byte=20 cluster, 64-bit refcount), so still easy to write a test for. >=20 > But yes, it's a task left for the future. Fair enough. >=20 >>> + /* If the caller needs to restart the search for free cl= usters, >>> + * try the same ones first to see if they're still free.= */ >>> + if (ret =3D=3D -EAGAIN) { >>> + if (s->free_cluster_index > (start >> s->cluster_bit= s)) { >>> + s->free_cluster_index =3D (start >> s->cluster_b= its); >>> + } >> >> Is there any harm in making this assignment unconditional, instead of >> only doing it when free_cluster_index has grown larger than start? >=20 > It can happen that it is smaller than 'start' if we were moving the > refcount table to a new location, so we want to keep the lowest value. Okay, then my R-b is sufficient on the patch as-is. >=20 >> [And unrelated, but it might be nice to do a followup cleanup to track >> free_cluster_offset by bytes instead of having to shift >> free_cluster_index everywhere] >=20 > I've actually just seen that we already have free_byte_offset, we use > that for compressed clusters, so it might be possible to use that one..= . free_byte_offset really IS a byte offset, as it can point mid-cluster.=20 But having EVERYTHING be byte-based seems like it will make reasoning=20 about the code easier to do. >=20 > I'll put that in my TODO list. >=20 > Berto >=20 --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org