From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Max Reitz <mreitz@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
Date: Thu, 23 May 2019 15:08:15 +0000	[thread overview]
Message-ID: <4314c0f7-cb04-9a23-d120-496ea6e9969c@virtuozzo.com> (raw)
In-Reply-To: <f0a29172-4647-b4d8-109e-5eaae5a6ac86@redhat.com>
23.05.2019 17:49, Max Reitz wrote:
> On 17.05.19 13:50, Vladimir Sementsov-Ogievskiy wrote:
>> 07.05.2019 18:13, Max Reitz wrote:
>>> On 07.05.19 15:30, Vladimir Sementsov-Ogievskiy wrote:
>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>> bs->file.  Usually.
>>>>>
>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>> our currently limited form of expression.  It is better to introduce
>>>>> some functions that actually guarantee a meaning:
>>>>>
>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>      filtered through COW.  That is, reads may or may not be forwarded
>>>>>      (depending on the overlay's allocation status), but writes never go to
>>>>>      this child.
>>>>>
>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>      filtered through some very plain process.  Reads and writes issued to
>>>>>      the parent will go to the child as well (although timing, etc. may be
>>>>>      modified).
>>>>>
>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>      block layer anyway) always only have one of these children: All read
>>>>>      requests must be served from the filtered_rw_child (if it exists), so
>>>>>      if there was a filtered_cow_child in addition, it would not receive
>>>>>      any requests at all.
>>>>>      (The closest here is mirror, where all requests are passed on to the
>>>>>      source, but with write-blocking, write requests are "COWed" to the
>>>>>      target.  But that just means that the target is a special child that
>>>>>      cannot be introspected by the generic block layer functions, and that
>>>>>      source is a filtered_rw_child.)
>>>>>      Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>      one child (or NULL, if there is no filtered child).
>>>>>
>>>>> Also, many places in the current block layer should be skipping filters
>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>> through a block node chain.  They do not do that currently, but this
>>>>> patch makes them.
>>>>>
>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>> patch, the allocation in the base image is reported correctly.
>>>>>
>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>> blockdev option, which is that it does not default to null like all
>>>>> other child references do.
>>>>>
>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>> are modified to return any filtered child under "backing", not just
>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>> the throttled node now appears as a backing child.
>>>>>
>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>     qapi/block-core.json           |   4 +
>>>>>     include/block/block.h          |   1 +
>>>>>     include/block/block_int.h      |  40 +++++--
>>>>>     block.c                        | 210 +++++++++++++++++++++++++++------
>>>>>     block/backup.c                 |   8 +-
>>>>>     block/block-backend.c          |  16 ++-
>>>>>     block/commit.c                 |  33 +++---
>>>>>     block/io.c                     |  45 ++++---
>>>>>     block/mirror.c                 |  21 ++--
>>>>>     block/qapi.c                   |  30 +++--
>>>>>     block/stream.c                 |  13 +-
>>>>>     blockdev.c                     |  88 +++++++++++---
>>>>>     migration/block-dirty-bitmap.c |   4 +-
>>>>>     nbd/server.c                   |   6 +-
>>>>>     qemu-img.c                     |  29 ++---
>>>>>     tests/qemu-iotests/184.out     |   7 +-
>>>>>     tests/qemu-iotests/204.out     |   1 +
>>>>>     17 files changed, 411 insertions(+), 145 deletions(-)
>>>>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index 7ccbfff9d0..dbd9286e4a 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -2502,6 +2502,10 @@
>>>>>     # On successful completion the image file is updated to drop the backing file
>>>>>     # and the BLOCK_JOB_COMPLETED event is emitted.
>>>>>     #
>>>>> +# In case @device is a filter node, block-stream modifies the first non-filter
>>>>> +# overlay node below it to point to base's backing node (or NULL if @base was
>>>>> +# not specified) instead of modifying @device itself.
>>>>> +#
>>>>
>>>> Is it necessary, why we can't keep it as is, modifying exactly device node? May be,
>>>> user wants to use filter in stream process, throttling for example.
>>>
>>> That wouldn't make any sense.  Say you have this configuration:
>>>
>>> throttle -> top -> base
>>>
>>> Now you stream from base to throttle.  The data goes from base through
>>> throttle to top.  You propose to then make throttle point to base:
>>>
>>> throttle -> base
>>>
>>> This will discard all the data in top.
>>>
>>> Filters don’t store any data.  You need to keep the top data storing
>>> image, i.e. the first non-filter overlay.
>>
>> Ah, yes, good reason.
>>
>>>
>>>>>     # @job-id: identifier for the newly-created block job. If
>>>>>     #          omitted, the device name will be used. (Since 2.7)
>>>>>     #
>>>
>>> [...]
>>>
>>>>> @@ -2345,7 +2347,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>>>>>         bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>>>>>             bdrv_inherits_from_recursive(backing_hd, bs);
>>>>>     
>>>>> -    if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
>>>>> +    if (bdrv_is_backing_chain_frozen(bs, child_bs(bs->backing), errp)) {
>>>>
>>>> If we support file-filters for frozen backing chain, could it go through file child here?
>>>> Hmm, only in case when we are going to set backing hd for file-filter.. Hmm, could filter have
>>>> both file and backing children?
>>>
>>> No.  A filter passes through data from its children, so it can only have
>>> a single child, or it is quorum.
>>>
>>> The file/backing combination is reserved for COW overlays.  file is
>>> where the current layer’s data is, backing is the filtered child.
>>
>> My backup-top has two children - backing and target.. So, I think, we can state that
>> filter should not have both file and backing children, but may have any other special
>> children he wants, invisible for backing-child/file-child generic logic.
> 
> Ah, yes, sorry, that’s what I meant.  A filter can have only a single
> filtered child, but other than that, they’re free to have whatever.
> 
> [...]
> 
>>>> Here we don't want to check the chain, we exactly want to check backing link, so it should be
>>>> something like
>>>>
>>>> if (bs->backing && bs->backing->frozen) {
>>>>       error_setg("backig exists and frozen!");
>>>>       return;
>>>> }
>>>>
>>>>
>>>> Hmm, on the other hand, if we have frozen backing chain, going through file child, we must not add
>>>> backing child to the node with file child, as it will change backing chain (which by default goes
>>>> through backing)..
>>>>
>>>> Anyway, we don't need to check the whole backing chain, as we may find other frozen backing subchain,
>>>> far away of bs.. So, we possibly want to check
>>>>
>>>> if (bdrv_filtered_child(bs) && bdrv_filtered_child(bs)->frozed) {
>>>>      ERROR
>>>> }
>>>>
>>>>
>>>> ....
>>>>
>>>> also, we'll need to check for frozen file child, when we want to replace it.
>>>
>>> I don’t quite understand.  It sounds to me like you’re saying we don’t
>>> need to check the whole chain here but just the immediate child.  But
>>> isn’t that true regardless of this series?
>>
>> If we restrict adding backing child to filter with file child, all becomes simpler and seems to be correct.
> 
> OK. :-)
> 
>> Should we add check for frozen file child to bdrv_replace_child() ?
> 
> Argh.  You mean move it from bdrv_set_backing_hd()?  That actually makes
> a lot of sense to me.  The problem is that bdrv_replace_child()
> currently cannot return an error, which may be a problem for
> bdrv_detach_child().  Hm.  But that’s effectively only called from
> functions where the child is unref’d, and you have to know that your own
> child is not frozen before you unref it.  So I guess we should be good
> to pass an &error_abort there.
> 
> [...]
> 
>>>>> @@ -2208,7 +2218,7 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>>>>>         bool first = true;
>>>>>     
>>>>>         assert(bs != base);
>>>>> -    for (p = bs; p != base; p = backing_bs(p)) {
>>>>> +    for (p = bs; p != base; p = bdrv_filtered_bs(p)) {
>>>>>             ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>>>>>                                        file);
>>>>
>>>> Interesting that for filters who use bdrv_co_block_status_from_backing and
>>>> bdrv_co_block_status_from_file we will finally call .bdrv_co_block_status of
>>>> underalying real node two or more times.. It's not wrong but obviously not optimal.
>>>
>>> Hm.  If @p is a filter, we could skip straight to *file.  Would that work?
>>
>> No, as file may be not in backing chain:
>>
>> filter  [A]
>>      |
>>      v
>> qcow2 -> file  [B]
>>      |
>>      v
>> qcow2
>>
>> So, we shouldn't redirect the whole loop to file..
> 
> But qcow2 is not a filter.  I meant skipping to *file only if the
> current node is a filter.  And I don’t mean bs->file, I mean *file --
> like, what bdrv_co_block_status() returns.
Me too. But as I understand, if we call bdrv_block_status on filter [A],
resulting *file returned by bdrv_co_block_status() will point to file [B]
due to recursion in bdrv_co_block_status.
> 
> You say in your other mail that filters can have an own implementation
> of .bdrv_co_block_status(), but I don’t think that makes sense,
> actually.  They should always pass the status of their filtered child.
> 
> blkdebug is the only filter I know that has an own implementation, and
> the only thing besides passing the data through is add an alignment
> assertion.  If it simplifies everything else, I’m very much willing to
> break that.
Agree that assertion is a bad reason to not implement some clean generic
logic.
> 
> Max
> 
>> May be the correct solution should be introducing additional handler
>> .bdrv_co_block_status_above with different logic..
> 
-- 
Best regards,
Vladimir
next prev parent reply	other threads:[~2019-05-23 15:12 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 20:20 [Qemu-devel] [PATCH v4 00/11] block: Deal with filters Max Reitz
2019-04-10 20:20 ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 01/11] block: Mark commit and mirror as filter drivers Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-16 10:02   ` Vladimir Sementsov-Ogievskiy
2019-04-16 10:02     ` Vladimir Sementsov-Ogievskiy
2019-04-17 16:22     ` Max Reitz
2019-04-17 16:22       ` Max Reitz
2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
2019-04-18  8:36         ` Vladimir Sementsov-Ogievskiy
2019-04-24 15:23         ` Max Reitz
2019-04-24 15:23           ` Max Reitz
2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
2019-04-19 10:23         ` Vladimir Sementsov-Ogievskiy
2019-04-24 16:36         ` Max Reitz
2019-04-24 16:36           ` Max Reitz
2019-05-07  9:32           ` Vladimir Sementsov-Ogievskiy
2019-05-07 13:15             ` Max Reitz
2019-05-07 13:33               ` Vladimir Sementsov-Ogievskiy
2019-05-31 16:26     ` Max Reitz
2019-05-31 17:02       ` Max Reitz
2019-05-07 13:30   ` Vladimir Sementsov-Ogievskiy
2019-05-07 15:13     ` Max Reitz
2019-05-17 11:50       ` Vladimir Sementsov-Ogievskiy
2019-05-23 14:49         ` Max Reitz
2019-05-23 15:08           ` Vladimir Sementsov-Ogievskiy [this message]
2019-05-23 15:56             ` Max Reitz
2019-05-17 14:50   ` Vladimir Sementsov-Ogievskiy
2019-05-23 17:27     ` Max Reitz
2019-05-24  8:12       ` Vladimir Sementsov-Ogievskiy
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 03/11] block: Storage child access function Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-20 10:41   ` Vladimir Sementsov-Ogievskiy
2019-05-28 18:09     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-21  8:57   ` Vladimir Sementsov-Ogievskiy
2019-05-28 17:58     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 05/11] block: Fix check_to_replace_node() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 07/11] block: Leave BDS.backing_file constant Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 08/11] iotests: Add filter commit test cases Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 09/11] iotests: Add filter mirror " Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 10/11] iotests: Add test for commit in sub directory Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 11/11] iotests: Test committing to overridden backing Max Reitz
2019-04-10 20:20   ` Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=4314c0f7-cb04-9a23-d120-496ea6e9969c@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).