From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57403) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoZRc-0007US-Jo for qemu-devel@nongnu.org; Wed, 21 Feb 2018 13:48:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eoZRa-0005xt-N4 for qemu-devel@nongnu.org; Wed, 21 Feb 2018 13:48:44 -0500 Date: Wed, 21 Feb 2018 19:48:22 +0100 From: Kevin Wolf Message-ID: <20180221184822.GD353@localhost.localdomain> References: <20180220222459.8461-1-eblake@redhat.com> <20180220222459.8461-3-eblake@redhat.com> <20180221165116.GB4196@localhost.localdomain> <20180221173926.GB353@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 2/2] qcow2: Avoid memory over-allocation on compressed images List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, mreitz@redhat.com, berto@igalia.com Am 21.02.2018 um 19:32 hat Eric Blake geschrieben: > On 02/21/2018 11:39 AM, Kevin Wolf wrote: > > > See my commit message comment - we have other spots in the code bas= e that > > > blindly g_malloc(2 * s->cluster_size). > >=20 > > Though is that a reason to do the same in new code or to phase out su= ch > > allocations whenever you touch them? >=20 > Touch=E9. >=20 > >=20 > > > And I intended (but sent the email without amending my commit) to u= se > > > g_malloc(). But as Berto has convinced me that an externally produ= ced > > > image can convince us to read up to 4M (even though we don't need t= hat > > > much to decompress), I suppose that the try_ variant plus checking = is > > > reasonable (and care in NULL'ing out if one but not both allocation= s > > > succeed). > >=20 > > Sounds good. > >=20 > > Another thought I had is whether we should do per-request allocation = for > > compressed clusters, too, instead of having per-BDS buffers. >=20 > The only benefit of a per-BDS buffer is that we cache things - multiple > sub-cluster reads in a row all from the same compressed cluster benefit= from > decompressing only once. Oh, you're right. I missed that this is actually used as a cache. I guess we want to leave it for now then. Maybe at some point we can actually implement the data cache that I proposed a few years ago (using Qcow2Cache for data clusters under some circumstances), then we could probably make that hold the data instead of having a separate cache. > The drawbacks of a per-BDS buffer: we can't do things in parallel > (everything else in qcow2 drops the lock around bdrv_co_pread[v]), so > the initial read prevents anything else in the qcow2 layer from > progressing. Yes, though there are probably other optimisations that could be made for compression before this becomes relevant, like reading more than one cluster at a time. > I also wonder - since we ARE allowing multiple parallel readers in othe= r > parts of qcow2 (without a patch, decompression is not in this boat, but > decryption and even bounce buffers due to lower-layer alignment constra= ints > are), what sort of mechanisms do we have for using a pool of reusable > buffers, rather than having each cluster access that requires a buffer > malloc and free the buffer on a per-access basis? I don't know how muc= h > time the malloc/free per-transaction overhead adds, or if it is already= much > smaller than the actual I/O time. I don't either. A while ago, we used g_slice_alloc() in some places (I remember qemu_aio_get), but it was actually slower than just using malloc/free each time. So if we do want to pool buffers, we probably need to implement that manually. I don't think we have a generic memory pool in qemu yet. > But note that while reusable buffers from a pool would cut down on the > per-I/O malloc/free overhead if we switch decompression away from per-B= DS > buffer, it would still not solve the fact that we only get the caching > ability where multiple sub-cluster requests from the same compressed cl= uster > require only one decompression, since that's only possible on a per-BDS > caching level. Yes, as I said above, I didn't notice that it's a real cache. Without the possibility to use Qcow2Cache instead, we'll want to keep it. Kevin