Re: [PATCH 09/17] block: Refactor bdrv_has_zero_init{,_truncate}

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	"open list:Sheepdog" <sheepdog@lists.wpkg.org>,
	qemu-block@nongnu.org, Jeff Cody <codyprime@gmail.com>,
	Stefan Weil <sw@weilnetz.de>, Peter Lieven <pl@kamp.de>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	david.edmondson@oracle.com, Stefan Hajnoczi <stefanha@redhat.com>,
	Liu Yuan <namei.unix@gmail.com>,
	"Denis V. Lunev" <den@openvz.org>,
	Jason Dillaman <dillaman@redhat.com>
Subject: Re: [PATCH 09/17] block: Refactor bdrv_has_zero_init{,_truncate}
Date: Wed, 5 Feb 2020 18:22:23 +0100	[thread overview]
Message-ID: <6b081138-5ee5-f5d6-352d-ec2deff0de73@redhat.com> (raw)
In-Reply-To: <ab03b053-5caa-7316-25ed-d6103889d06e@redhat.com>


[-- Attachment #1.1: Type: text/plain, Size: 6929 bytes --]

On 04.02.20 20:03, Eric Blake wrote:
> On 2/4/20 11:53 AM, Max Reitz wrote:
>> On 31.01.20 18:44, Eric Blake wrote:
>>> Having two slightly-different function names for related purposes is
>>> unwieldy, especially since I envision adding yet another notion of
>>> zero support in an upcoming patch.  It doesn't help that
>>> bdrv_has_zero_init() is a misleading name (I originally thought that a
>>> driver could only return 1 when opening an already-existing image
>>> known to be all zeroes; but in reality many drivers always return 1
>>> because it only applies to a just-created image).
>>
>> I don’t find it misleading, I just find it meaningless, which then makes
>> it open to interpretation (or maybe rather s/interpretation/wishful
>> thinking/).
>>
>>> Refactor all uses
>>> to instead have a single function that returns multiple bits of
>>> information, with better naming and documentation.
>>
>> It doesn’t make sense to me.  How exactly is it unwieldy?  In the sense
>> that we have to deal with multiple rather small implementation functions
>> rather than a big one per driver?  Actually, multiple small functions
>> sounds better to me – unless the three implementations share common code.
> 
> Common code for dealing with encryption, backing files, and so on.  It
> felt like I had a lot of code repetition when keeping functions separate.

Well, I suppose “dealing with” means “if (encrypted || has_backing)”, so
duplicating that doesn’t seem too bad.

>> As for the callers, they only want a single flag out of the three, don’t
>> they?  If so, it doesn’t really matter for them.
> 
> The qemu-img.c caller in patch 10 checks ZERO_CREATE | ZERO_OPEN, so we
> DO have situations of checking more than one bit, vs. needing two
> function calls.

Hm, OK.  Not sure if that place would look worse with two function
calls, but, well.

>> In fact, I can imagine that drivers can trivially return
>> BDRV_ZERO_TRUNCATE information (because the preallocation mode is
>> fixed), whereas BDRV_ZERO_CREATE can be a bit more involved, and
>> BDRV_ZERO_OPEN could take even more time because some (constant-time)
>> inquiries have to be done.
> 
> In looking at the rest of the series, drivers were either completely
> trivial (in which case, declaring:
> 
> .bdrv_has_zero_init = bdrv_has_zero_init_1,
> .bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,
> 
> was a lot wordier than the new:
> 
> .bdrv_known_zeroes = bdrv_known_zeroes_truncate,

Not sure if that’s bad, though.

> ), or completely spelled out but where both creation and truncation were
> determined in the same amount of effort.

Well, usually, the effort is minimal, but OK.

>> And thus callers which just want the trivially obtainable
>> BDRV_ZERO_TRUNCATE info have to wait for the BDRV_ZERO_OPEN inquiry,
>> even though they don’t care about that flag.
> 
> True, but only to a minor extent; and the documentation mentions that
> the BDRV_ZERO_OPEN calculation MUST NOT be as expensive as a blind
> block_status loop.

So it must be less expensive than an arbitrarily complex loop.  I think
a single SEEK_DATA/HOLE call was something like O(n) on tmpfs?

What I’m trying to say is that this is not a good limit and can mean
anything.

I do think this limit definition makes sense for callers that want to
know about ZERO_OPEN.  But I don’t know why we would have to let other
callers wait, too.

> Meanwhile, callers tend to only care about
> bdrv_known_zeroes() right after opening an image or right before
> resizing (not repeatedly during runtime);

Hm, yes.  I was thinking of parallels, but that only checks once in
parallels_open(), so it’s OK.

> and you also argued elsewhere
> in this thread that it may be worth having the block layer cache
> BDRV_ZERO_OPEN and update the cache on any write,

I didn’t say the block layer, but it if makes sense.

> at which point, the
> expense in the driver callback really is a one-time call during
> bdrv_co_open().

It definitely doesn’t make sense to me to do that call unconditionally
in bdrv_co_open().

> And in that case, whether the one-time expense is done
> via a single function call or via three driver callbacks, the amount of
> work is the same; but the driver callback interface is easier if there
> is only one callback (similar to how bdrv_unallocated_blocks_are_zero()
> calls bdrv_get_info() only for bdi.unallocated_blocks_are_zero, even
> though BlockDriverInfo tracks much more than that boolean).
> 
> In fact, it may be worth consolidating known zeroes support into
> BlockDriverInfo.

I’m very skeptical of that.  BDI already has the problem that it doesn’t
know which of the information the caller actually wants and that it is
sometimes used in a quasi-hot path.

Maybe that means it is indeed time to incorporate it into BDI, but the
caller should have a way of specifying what parts of BDI it actually
needs and then drivers can skip anything that isn’t trivially obtainable
that the caller doesn’t need.

>> So I’d leave it as separate functions so drivers can feel free to have
>> implementations for BDRV_ZERO_OPEN that take more than mere microseconds
>> but that are more accurate.
>>
>> (Or maybe if you really want it to be a single functions, callers could
>> pass the mask of flags they care about.  If all flags are trivially
>> obtainable, the implementations would then simply create their result
>> mask and & it with the caller-given mask.  For implementations where
>> some branches could take a bit more time, those branches are only taken
>> when the caller cares about the given flag.  But again, I don’t
>> necessarily think having a single function is more easily handleable
>> than three smaller ones.)
> 
> Those are still viable options, but before I repaint the bikeshed along
> those lines, I'd at least like a review of whether the overall idea of
> having a notion of 'reads-all-zeroes' is indeed useful enough,
> regardless of how we implement it as one vs. three driver callbacks.

I’m as hesitant as ever to give a review that this notion is useful,
because I haven’t seen a practical example yet where the problem isn’t
the fact that NBD doesn’t have 64-bit write_zeroes support.

So far, it looks to me like this notion is only really useful for cases
where we expect a management layer on top of qemu anyway.  And then I’m
not sure that this new feature works reliably enough for such a
management layer.

(I’m not saying it isn’t useful.  Again, intuitively it does seem
useful.  Intuition can be enough to merge a sufficiently simple series
that doesn’t increase code complexity too much.  But I’m still asking
for actual practical examples, because that would make a better
argument, of course.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

next prev parent reply	other threads:[~2020-02-05 17:27 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-31 17:44 [PATCH 00/17] Improve qcow2 all-zero detection Eric Blake
2020-01-31 17:44 ` [PATCH 01/17] qcow2: Comment typo fixes Eric Blake
2020-02-04 14:12   ` Vladimir Sementsov-Ogievskiy
2020-02-09 19:34   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 02/17] qcow2: List autoclear bit names in header Eric Blake
2020-02-04 14:26   ` Vladimir Sementsov-Ogievskiy
2020-01-31 17:44 ` [PATCH 03/17] qcow2: Avoid feature name extension on small cluster size Eric Blake
2020-02-04 14:39   ` Vladimir Sementsov-Ogievskiy
2020-02-09 19:28   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 04/17] block: Improve documentation of .bdrv_has_zero_init Eric Blake
2020-02-04 15:03   ` Vladimir Sementsov-Ogievskiy
2020-02-04 15:16     ` Eric Blake
2020-01-31 17:44 ` [PATCH 05/17] block: Don't advertise zero_init_truncate with encryption Eric Blake
2020-02-10 18:12   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 06/17] block: Improve bdrv_has_zero_init_truncate with backing file Eric Blake
2020-02-10 18:13   ` Alberto Garcia
2020-01-31 17:44 ` [PATCH 07/17] gluster: Drop useless has_zero_init callback Eric Blake
2020-02-04 15:06   ` Vladimir Sementsov-Ogievskiy
2020-02-10 18:21   ` Alberto Garcia
2020-02-17  8:06   ` [GEDI] " Niels de Vos
2020-02-17 12:03     ` Eric Blake
2020-02-17 12:22       ` Eric Blake
2020-02-17 14:01       ` Niels de Vos
2020-01-31 17:44 ` [PATCH 08/17] sheepdog: Consistently set bdrv_has_zero_init_truncate Eric Blake
2020-02-04 15:09   ` Vladimir Sementsov-Ogievskiy
2020-01-31 17:44 ` [PATCH 09/17] block: Refactor bdrv_has_zero_init{,_truncate} Eric Blake
2020-02-04 15:35   ` Vladimir Sementsov-Ogievskiy
2020-02-04 15:49     ` Eric Blake
2020-02-04 16:07       ` Vladimir Sementsov-Ogievskiy
2020-02-04 17:42     ` Max Reitz
2020-02-04 17:51       ` Eric Blake
2020-02-05 16:43         ` Max Reitz
2020-02-05  7:51       ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:07         ` Eric Blake
2020-02-05 14:25           ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:36             ` Eric Blake
2020-02-05 17:55           ` Max Reitz
2020-02-04 17:53   ` Max Reitz
2020-02-04 19:03     ` Eric Blake
2020-02-05 17:22       ` Max Reitz [this message]
2020-02-05 18:39         ` Eric Blake
2020-02-06  9:18           ` Max Reitz
2020-01-31 17:44 ` [PATCH 10/17] block: Add new BDRV_ZERO_OPEN flag Eric Blake
2020-01-31 18:03   ` Eric Blake
2020-02-04 17:34   ` Max Reitz
2020-02-04 17:50     ` Eric Blake
2020-02-05  8:39       ` Vladimir Sementsov-Ogievskiy
2020-02-05 17:26       ` Max Reitz
2020-01-31 17:44 ` [PATCH 11/17] file-posix: Support BDRV_ZERO_OPEN Eric Blake
2020-01-31 17:44 ` [PATCH 12/17] gluster: " Eric Blake
2020-02-17  8:16   ` [GEDI] " Niels de Vos
2020-01-31 17:44 ` [PATCH 13/17] qcow2: Add new autoclear feature for all zero image Eric Blake
2020-02-03 17:45   ` Vladimir Sementsov-Ogievskiy
2020-02-04 13:12     ` Eric Blake
2020-02-04 13:29       ` Vladimir Sementsov-Ogievskiy
2020-01-31 17:44 ` [PATCH 14/17] qcow2: Expose all zero bit through .bdrv_known_zeroes Eric Blake
2020-01-31 17:44 ` [PATCH 15/17] qcow2: Implement all-zero autoclear bit Eric Blake
2020-01-31 17:44 ` [PATCH 16/17] iotests: Add new test for qcow2 all-zero bit Eric Blake
2020-01-31 17:44 ` [PATCH 17/17] qcow2: Let qemu-img check cover " Eric Blake
2020-02-04 17:32 ` [PATCH 00/17] Improve qcow2 all-zero detection Max Reitz
2020-02-04 18:53   ` Eric Blake
2020-02-05 17:04     ` Max Reitz
2020-02-05 19:21       ` Eric Blake
2020-02-06  9:12         ` Max Reitz
2020-02-05  9:04 ` Vladimir Sementsov-Ogievskiy
2020-02-05  9:25   ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:26     ` Eric Blake
2020-02-05 14:47       ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:14         ` Vladimir Sementsov-Ogievskiy
2020-02-05 17:58           ` Max Reitz
2020-02-05 14:22   ` Eric Blake
2020-02-05 14:43     ` Vladimir Sementsov-Ogievskiy
2020-02-05 14:58       ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b081138-5ee5-f5d6-352d-ec2deff0de73@redhat.com \
    --to=mreitz@redhat.com \
    --cc=armbru@redhat.com \
    --cc=codyprime@gmail.com \
    --cc=david.edmondson@oracle.com \
    --cc=den@openvz.org \
    --cc=dillaman@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=namei.unix@gmail.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=sheepdog@lists.wpkg.org \
    --cc=stefanha@redhat.com \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).