qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>, Kevin Wolf <kwolf@redhat.com>,
	Max Reitz <mreitz@redhat.com>
Cc: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	qemu block <qemu-block@nongnu.org>
Subject: Re: backing chain & block status & filters
Date: Wed, 29 Apr 2020 12:15:35 +0300	[thread overview]
Message-ID: <7b1d4246-e59b-0fdb-3c44-6810eea6e5b8@virtuozzo.com> (raw)
In-Reply-To: <91b741ac-248c-2065-17b9-7fe31eafee40@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 3851 bytes --]

28.04.2020 22:44, Vladimir Sementsov-Ogievskiy wrote:
> 28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote:
>> 28.04.2020 19:18, Eric Blake wrote:
>>> On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>>>>> Hm.  I could imagine that there are formats that have non-zero holes
>>>>>> (e.g. 0xff or just garbage).  It would be a bit wrong for them to return
>>>>>> ZERO or DATA then.
>>>>>>
>>>>>> But OTOH we don’t care about such cases, do we?  We need to know whether
>>>>>> ranges are zero, data, or unallocated.  If they aren’t zero, we only
>>>>>> care about whether reading from it will return data from this layer or not.
>>>>>>
>>>>>> So I suppose that anything that doesn’t support backing files (or
>>>>>> filtered children) should always return ZERO and/or DATA.
>>>>>
>>>>> I'm not sure I agree with the notion that everything should be
>>>>> BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today
>>>>> at least. If we want to change this, we will have to check all callers
>>>>> of bdrv_is_allocated() and friends who might use this to find holes in
>>>>> the file.
>>>>
>>>> Yes. Because they are doing incorrect (or at least undocumented and unreliable) thing.
>>>
>>> Here's some previous mails discussing the same question about what block_status should actually mean.  At the time, I was so scared of the prospect of something breaking if I changed things that I ended up keeping status quo, so here we are revisiting the topic several years later, still asking the same questions.
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html
>>>
>>>>
>>>>>
>>>>> Basically, the way bdrv_is_allocated() works today is that we assume an
>>>>> implicit zeroed backing layer even for block drivers that don't support
>>>>> backing files.
>>>>
>>>> But read doesn't work so: it will read data from the bottom layer, not from
>>>> this implicit zeroed backing layer. And it is inconsistent. On read data
>>>> comes exactly from this layer, not from its implicit backing. So it should
>>>> return BDRV_BLOCK_ALLOCATED, accordingly to its definition..
>>>>
>>>> Or, we should at least document current behavior:
>>>>
>>>>    BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>>>>    layer rather than any backing, set by block. Attention: it may not be set
>>>>    for drivers without backing support, still data is of course read from
>>>>    this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean
>>>>    allocation on fs level, which occupies real space on disk.. So, for such drivers
>>>>
>>>>    ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or
>>>>    (most probably) not,
>>>>    don't look at ALLOCATED flag, as it is added by generic layer for another logic,
>>>>    not related to fs-allocation.
>>>>
>>>>    0 means that, most probably, data doesn't occupy space on fs, zero-status is
>>>>    unknown (most probably non-zero)
>>>>
>>>
>>> That may be right in describing the current situation, but again, needs a GOOD audit of what we are actually using it for, and whether it is what we really WANT to be using it for.  If we're going to audit/refactor the code, we might as well get semantics that are actually useful, rather than painfully contorted to documentation that happens to match our current contorted code.
>>>
>>
>> Honest enough:) I'll try to make a table.
>>
>> I don't think that reporting fs-allocation status is a bad thing. But I'm sure that it should be separated from backing-chain-allocated concept.
>>
> 
> As a first step, I've don brief analysis of .bdrv_co_block_status of drivers (attached)
> 

As a second step, here is brief analysis of all block_status usage

-- 
Best regards,
Vladimir

[-- Attachment #2: block-status-usage-report --]
[-- Type: text/plain, Size: 4098 bytes --]

Public interface of block-status is:

    bdrv_block_status
    bdrv_block_status_above
    bdrv_is_allocated
    bdrv_is_allocated_above


= bdrv_block_status =

bdrv_make_zero: works on current level of backing-chain, want's to skip zeroes, not interested in @map and @file

img convert: convert_iteration_sectors: wants to distinguish ZERO, DATA and go-to-backing. It also tries to not write zeroes, if have short backing file, but does it a bit wrong. Treats unallocated as DATA if no backing.

img-map: get_block_status: distinguish ZERO, DATA and go-to-backing. Count depth of the backing. Just reports final ZERO and DATA. So, fs-unallocated thing is reported to user

= bdrv_block_status_above =

block-copy: block_copy_block_status: wants two things:
  
  1. skip go-to-backing holes in top layer for top mode
  2. do write_zero for ZERO areas

mirror: call on the whole backing chain
   - for DATA (and for DATA|ZERO which is bad) do just copy
   - for ZERO do just ZERO
   - for 0 (which means that bottom layer doesn't report that unallocated are zero) does DISCARD (which is most-probably zeroing) - absolutely wrong thing

qcow2: is_zero: call on the whole backing chain, want's just to check is reads-as-zero or not.

qcow2: qcow2_measure: call on the whole backing chain:
   - skip ZERO
   - count clusters with both DATA and ALLOCATED set. Hmm. ALLOCATED is always set for DATA. Seems the function actually tries to calculate disk occupation, assuming that BDRV_BLOCK_ALLOCATED helps in it, but it actually doesn't..

   I think, correct solution is to support offset and bytes in bdrv_measure, and split it from block_status. Then qcow2_measure will just recursively call bdrv_measure on its children. This would be clean.

nbd: nbd_co_send_sparse_read: call on the whole backing chain:
   - wants to distinguish zeroes

nbd: blockstatus_to_extents: call on the whole backing chain:
   !ALLOCATED -> NBD_HOLE
   ZERO -> NBD_ZERO

   So, we report HOLE only if it's not BDRV_BLOCK_ALLOCATED on any layer.. That's wrong. I think, we should report HOLE in a lot more cases. Actually, when not occupy real space on disk.

img-compare: call on the whole backing chain:
  - do not compare zeroes
  - do not compare if both report unallocated.. it's actually not correct for protocols which reports fs-unallocated-non-zeroes. As reads may differ actually. Still, read from fs-unallocated area is not guaranteed to return same thing each time, yes? At least, null-co doesn't guarantee it :).. So, it may be correct to skip these areas. Or may be better to always report them different??
  - consider data-zeroes equal to unallocated.. it's definitely not correct for protocols which reports fs-unallocated-non-zeroes

  I think, img-compare must only consider zero/non-zero, and don't touch other block-status features. Otherwise it's a mess

img-convert: convert_iteration_sectors: call on the whole backing chain: already described in bdrv_block_status section

= bdrv_is_allocated =

Obvious thing for backing-chain related operation (still wrong that some protocol drivers may return fs-unallocated and it is treated as go-to-backing areas):
    block-copy, commit, copy-on-read, stream, img-rebase

Others:
vvfat: o_O it has qcow child.. and operates like self is a backing of this child. But yes, it just uses bdrv_is_allocated to understand is chunk is rewritten in qcow.

migration/block: skip unallocated for top mode (shared_base, as it called here)

io-alloc: just report number of allocated in top layer

io-map: map_is_allocated: same thing as io-alloc, but report chunks

test_sync_op_block_status: just check what it returns

= bdrv_is_allocated_above =

Obvious usage for backing-chain related: commit, mirror, stream, img-rebase. Wrong for fs-unallocated-non-zero reporting drivers

Others:
qcow2: is_unallocated: call for the whole backing chain. Used to check is-zero.. Wrong for fs-unallocated-non-zero reporting drivers, and may be more efficient if consider also ZERO status.. but in some smart-fast way.

replication: allocated or not in backing-chain: common case

  reply	other threads:[~2020-04-29  9:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28  8:55 backing chain & block status & filters Vladimir Sementsov-Ogievskiy
2020-04-28 11:08 ` Max Reitz
2020-04-28 11:28   ` Kevin Wolf
2020-04-28 15:13     ` Vladimir Sementsov-Ogievskiy
2020-04-28 16:18       ` Eric Blake
2020-04-28 16:46         ` Vladimir Sementsov-Ogievskiy
2020-04-28 18:37           ` Kevin Wolf
2020-04-28 19:44           ` Vladimir Sementsov-Ogievskiy
2020-04-29  9:15             ` Vladimir Sementsov-Ogievskiy [this message]
2020-04-29 10:50               ` Vladimir Sementsov-Ogievskiy
2020-04-28 14:51   ` Vladimir Sementsov-Ogievskiy
2020-04-30 19:12     ` Vladimir Sementsov-Ogievskiy
2020-05-01  3:04       ` Andrey Shinkevich
2020-05-06  5:56         ` Vladimir Sementsov-Ogievskiy
2020-05-07 12:58     ` Max Reitz
2020-05-07 19:34       ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b1d4246-e59b-0fdb-3c44-6810eea6e5b8@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=andrey.shinkevich@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).