From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10D8BC35247 for ; Wed, 5 Feb 2020 19:23:20 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D29BA2072B for ; Wed, 5 Feb 2020 19:23:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Y2ByfCKm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D29BA2072B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:55404 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1izQGc-0004qW-4b for qemu-devel@archiver.kernel.org; Wed, 05 Feb 2020 14:23:18 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:51256) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1izQFH-0003hB-23 for qemu-devel@nongnu.org; Wed, 05 Feb 2020 14:21:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1izQFD-0001nv-VY for qemu-devel@nongnu.org; Wed, 05 Feb 2020 14:21:53 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:59265 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1izQFD-0001mi-RV for qemu-devel@nongnu.org; Wed, 05 Feb 2020 14:21:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580930510; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kqWdqycrodNKYIeMbVyal/JqkP4USxPlM3i1E2/nWB0=; b=Y2ByfCKmzcLUS50M2gJysA4nj3lf5VAGilafl7/vRDnoXzwG15tFDfY+PHDSzCRkWmo+CL 77GCg57111f97ctgmTNXtJ5mxmgNue4QrvcdxPWPeKe/rSLxBkJ7R72tQwq8le8lCTCgdA k676C5oiS/YRbRGGEmOOqY9K448mIfg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-385-2SaAOdshM3m6VvqsxVLSew-1; Wed, 05 Feb 2020 14:21:43 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5BCBE18A5500; Wed, 5 Feb 2020 19:21:42 +0000 (UTC) Received: from [10.3.116.181] (ovpn-116-181.phx2.redhat.com [10.3.116.181]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8D582790C1; Wed, 5 Feb 2020 19:21:41 +0000 (UTC) Subject: Re: [PATCH 00/17] Improve qcow2 all-zero detection To: Max Reitz , qemu-devel@nongnu.org References: <20200131174436.2961874-1-eblake@redhat.com> <8574b42d-479e-ef72-ecab-4546b364adb6@redhat.com> <5141ea4b-a7c2-e9a3-045e-91dc088785c7@redhat.com> From: Eric Blake Organization: Red Hat, Inc. Message-ID: <59ba95c5-4c8b-a059-2332-3bafdc90dd2e@redhat.com> Date: Wed, 5 Feb 2020 13:21:40 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <5141ea4b-a7c2-e9a3-045e-91dc088785c7@redhat.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: 2SaAOdshM3m6VvqsxVLSew-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: david.edmondson@oracle.com, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On 2/5/20 11:04 AM, Max Reitz wrote: > OK, I expected users to come in a separate patch. I can refactor that better in v2. >=20 >> That's the use case: when copying into a destination file, it's useful >> to know if the destination already reads as all zeroes, before >> attempting a fallback to bdrv_make_zero(BDRV_REQ_NO_FALLBACK) or calls >> to block status checking for holes. >=20 > But that was my point on IRC. Is it really more useful if > bdrv_make_zero() is just as quick? (And the fact that NBD doesn=E2=80=99= t have > an implementation looks more like a problem with NBD to me.) That is indeed a thought - why should qemu-img even TRY to call=20 bdrv_has_init_zero; it should instead call bdrv_make_zero(), which=20 should be as fast as possible (see my latest reply on 9/17 exploring=20 that idea more). Under the hood, we can then make bdrv_make_zero() use=20 whatever tricks it needs, whether keeping the driver's=20 .bdrv_has_zero_init/_truncate callbacks, adding another callback, making=20 write_zeroes faster, or whatever, but instead of making qemu-img sort=20 through multiple ideas, the burden would now be hidden in the block layer. >=20 > (Considering that at least the code we discussed on IRC didn=E2=80=99t wo= rk for > preallocated images, which was the one point where we actually have a > problem in practice.) And this series DOES improve the case for preallocated qcow2 images, by=20 virtue of a new qcow2 autoclear bit. But again, that may be something=20 we want to hide in the driver callback interfaces, while qemu-img just=20 blindly calls bdrv_make_zero() and gets success (the image now reads as=20 zeroes, either because it was already that way or we did something=20 quick) or failure (it is a waste of time to prezero, whether by=20 write_zeroes or by trim or by truncate, so just manually write zeroes as=20 part of your image copying). >=20 >>> (We have a use case with convert -n to freshly created image files, but >>> my position on this on IRC was that we want the --target-is-zero flag >>> for that anyway: Auto-detection may always break, our preferred default >>> behavior may always change, so if you want convert -n not to touch the >>> target image except to write non-zero data from the source, we need a >>> --target-is-zero flag and users need to use it.=C2=A0 Well, management >>> layers, because I don=E2=80=99t think users would use convert -n anyway= . >>> >>> And with --target-is-zero and users effectively having to use it, I >>> don=E2=80=99t think that=E2=80=99s a good example of a use case.) >> >> Yes, there will still be cases where you have to use --target-is-zero >> because the image itself couldn't report that it already reads as >> zeroes, but there are also enough cases where the destination is already >> known to read zeroes and it's a shame to tell the user that 'you have to >> add --target-is-zero to get faster copying even though we could have >> inferred it on your behalf'. >=20 > How is it a shame? I think only management tools would use convert -n. > Management tools want reliable behavior. If you want reliable > behavior, you have to use --target-is-zero anyway. So I don=E2=80=99t se= e the > actual benefit for qemu-img convert. qemu-img convert to an NBD destination cannot create the destination, so=20 it ALWAYS has to use -n. I don't know how often users are likely to=20 wire up a command line for qemu-img convert with NBD as the destination,=20 or whether you are correct that it will be a management app able to=20 supply -n (and thus able to supply --target-is-zero). But at the same=20 time, can a management app learn whether it is safe to supply=20 --target-is-zero? With my upcoming NBD patches, 'qemu-img --list' will=20 show whether the NBD target is known to start life all zero, and a=20 management app could use mechanism to decide when to pass=20 --target-is-zero (whether a management app would actually fork qemu-img=20 --list, or just be an NBD client itself to do the same thing qemu-img=20 would do, is beside the point). Similarly, this series includes enhancements to 'qemu-img info' on qcow2=20 images known to currently read as zero; again, that sort of information=20 is necessary somewhere in the chain, whether it be because qemu-img=20 consumes the information itself, or because the management app consumes=20 the information in order to pass the --target-is-zero option to=20 qemu-img, either way, the information needs to be available for consumption= . >=20 >>> I suppose there is the point of blockdev-create + blockdev-mirror: This >>> has exactly the same problem as convert -n.=C2=A0 But again, if you rea= lly >>> want blockdev-mirror not just to force-zero the image, you probably nee= d >>> to tell it so explicitly (i.e., with a --target-is-zero flag for >>> blockdev-mirror). >>> >>> (Well, I suppose we could save us a target-is-zero for mirror if we too= k >>> this series and had a filter driver that force-reports BDRV_ZERO_OPEN. >>> But, well, please no.) >>> >>> But maybe I=E2=80=99m just an idiot and there is no reason not to take = this >>> series and make blockdev-create + blockdev-mirror do the sensible thing >>> by default in most cases. *shrug* >> >> My argument for taking the series _is_ that the common case can be made >> more efficient without user effort. >=20 > The thing is, I don=E2=80=99t see the user effort. I don=E2=80=99t think= users use > convert -n or backup manually. And for management tools, it isn=E2=80=99= t > really effort to add another switch. Maybe, but it is just shifting the burden between who consumes the=20 information that an image is all zero, and how the consumption of that=20 information is passed to qemu-img. >=20 >> Yes, we still need the knob for >> when the common case isn't already smart enough, >=20 > But the user can=E2=80=99t know when qemu isn=E2=80=99t smart enough. So= users who care > have to always give the flag. >=20 >> but the difference in >> avoiding a pre-zeroing pass is noticeable when copying images around >=20 > I=E2=80=99m sure it is, but the question I ask is whether in practice we > wouldn=E2=80=99t get --target-is-zero in all of these cases anyway. >=20 >=20 > So I=E2=80=99m not sold on =E2=80=9Cit works most of the time=E2=80=9D, b= ecause if it=E2=80=99s just > most of the time, then we=E2=80=99ll likely see --target-is-zero all of t= he time. >=20 > OTOH, I suppose that with the new qcow2 extension, it would always work > for the following case: > (1) Create a qcow2 file, > (2) Immediately (with the next qemu-img/QMP invocation) use it as a > target of convert -n or mirror or anything similar. Yes, that is one of the immediately obvious fallouts from this series -=20 you can now create a preallocated qcow2 image in one process, and the=20 next process using that image can readily tell that it is still=20 just-created. >=20 > If so, that means it works reliably all of the time for a common case. > I guess that=E2=80=99d be enough for me. >=20 > Max >=20 >> (and more than just for qcow2 - my followup series to improve NBD is >> similarly useful given how much work has already been invested in >> mapping NBD into storage access over https in the upper layers like ovir= t). >> >=20 >=20 At any rate, I think you've convinced me to rethink how I present v2=20 (maybe not by refactoring bdrv_known_zeroes usage, but instead=20 refactoring bdrv_make_zero), but that the qcow2 autoclear bit is still a=20 useful feature to have. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org