From: Hanna Reitz <hreitz@redhat.com>
To: Emanuele Giuseppe Esposito <eesposit@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org, qemu-devel@nongnu.org,
Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 30 Mar 2022 16:12:30 +0200 [thread overview]
Message-ID: <c7f6fe7e-c309-010a-eaba-549fbfcb45ce@redhat.com> (raw)
In-Reply-To: <a4d3fc47-0769-7d11-47aa-a1c4ac503406@redhat.com>
On 30.03.22 13:55, Emanuele Giuseppe Esposito wrote:
>
> Am 30/03/2022 um 12:53 schrieb Hanna Reitz:
>> On 17.03.22 17:23, Emanuele Giuseppe Esposito wrote:
>>> Am 09/03/2022 um 14:26 schrieb Emanuele Giuseppe Esposito:
>>>>>> * Drains allow the caller (either main loop or iothread running
>>>>>> the context) to wait all in_flights requests and operations
>>>>>> of a BDS: normal drains target a given node and is parents, while
>>>>>> subtree ones also include the subgraph of the node. Siblings are
>>>>>> not affected by any of these two kind of drains.
>>>>> Siblings are drained to the extent required for their parent node to
>>>>> reach in_flight == 0.
>>>>>
>>>>> I haven't checked the code but I guess the case you're alluding to is
>>>>> that siblings with multiple parents could have other I/O in flight that
>>>>> will not be drained and further I/O can be submitted after the parent
>>>>> has drained?
>>>> Yes, this in theory can happen. I don't really know if this happens
>>>> practically, and how likely is to happen.
>>>>
>>>> The alternative would be to make a drain that blocks the whole graph,
>>>> siblings included, but that would probably be an overkill.
>>>>
>>> So I have thought about this, and I think maybe this is not a concrete
>>> problem.
>>> Suppose we have a graph where "parent" has 2 children: "child" and
>>> "sibling". "sibling" also has a blockjob.
>>>
>>> Now, main loop wants to modify parent-child relation and maybe detach
>>> child from parent.
>>>
>>> 1st wrong assumption: the sibling is not drained. Actually my strategy
>>> takes into account draining both nodes, also because parent could be in
>>> another graph. Therefore sibling is drained.
>>>
>>> But let's assume "sibling" is the sibling of the parent.
>>>
>>> Therefore we have
>>> "child" -> "parent" -> "grandparent"
>>> and
>>> "blockjob" -> "sibling" -> "grandparent"
>>>
>>> The issue is the following: main loop can't drain "sibling", because
>>> subtree_drained does not reach it. Therefore blockjob can still run
>>> while main loop modifies "child" -> "parent". Blockjob can either:
>>> 1) drain, but this won't affect "child" -> "parent"
>>> 2) read the graph in other ways different from drain, for example
>>> .set_aio_context recursively touches the whole graph.
>>> 3) write the graph.
>> I don’t really understand the problem here. If the block job only
>> operates on the sibling subgraph, why would it care what’s going on in
>> the other subgraph?
> We are talking about something that probably does not happen, but what
> if it calls a callback similar to .set_aio_context that goes through the
> whole graph?
Hm. Quite unfortunate if such a callback can operate on drained nodes,
I’d say. Ideally callbacks wouldn’t do that, but probably they will. :/
> Even though the first question is: is there such callback?
I mean, you could say any callback qualifies. Draining a node will only
drain its recursive parents, so siblings are not affected. If the
sibling issues the callback on its parent... (E.g. changes in the
backing chain requiring a qcow2 parent node to change the backing file
string in its image file)
> Second even more irrealistic case is when a job randomly looks for a bs
> in another connectivity component and for example drains it.
> Again probably impossible.
I hope so, but the block layer sure likes to surprise me.
>> Block jobs should own all nodes that are associated with them (e.g.
>> because they intend to drop or replace them when the job is done), so
>> when part of the graph is drained, all jobs that could modify that part
>> should be drained, too.
> What do you mean with "own"?
They’re added with block_job_add_bdrv(), and then are children of the
BlockJob object.
>>> 3) can be only performed in the main loop, because it's a graph
>>> operation. It means that the blockjob runs when the graph modifying
>>> coroutine/bh is not running. They never run together.
>>> The safety of this operation relies on where the drains are and will be
>>> inserted. If you do like in my patch "block.c:
>>> bdrv_replace_child_noperm: first call ->attach(), and then add child\x0f",
>>> then we would have problem, because we drain between two writes, and the
>>> blockjob will find an inconsistent graph. If we do it as we seem to do
>>> it so far, then we won't really have any problem.
>>>
>>> 2) is a read, and can theoretically be performed by another thread. But
>>> is there a function that does that? .set_aio_context for example is a GS
>>> function, so we will fall back to case 3) and nothing bad would happen.
>>>
>>> Is there a counter example for this?
>>>
>>> -----------
>>>
>>> Talking about something else, I discussed with Kevin what *seems* to be
>>> an alternative way to do this, instead of adding drains everywhere.
>>> His idea is to replicate what blk_wait_while_drained() currently does
>>> but on a larger scale. It is something in between this subtree_drains
>>> logic and a rwlock.
>>>
>>> Basically if I understood correctly, we could implement
>>> bdrv_wait_while_drained(), and put in all places where we would put a
>>> read lock: all the reads to ->parents and ->children.
>>> This function detects if the bdrv is under drain, and if so it will stop
>>> and wait that the drain finishes (ie the graph modification).
>>> On the other side, each write would just need to drain probably both
>>> nodes (simple drain), to signal that we are modifying the graph. Once
>>> bdrv_drained_begin() finishes, we are sure all coroutines are stopped.
>>> Once bdrv_drained_end() finishes, we automatically let all coroutine
>>> restart, and continue where they left off.
>>>
>>> Seems a good compromise between drains and rwlock. What do you think?
>> Well, sounds complicated. So I’m asking myself whether this would be
>> noticeably better than just an RwLock for graph modifications, like the
>> global lock Vladimir has proposed.
> But the point is then: aren't we re-inventing an AioContext lock?
I don’t know how AioContext locks would even help with graph changes.
If I want to change a block subgraph that’s in a different I/O thread,
locking that thread isn’t enough (I would’ve thought); because I have no
idea what the thread is doing when I’m locking it. Perhaps it’s
iterating through ->children right now (with some yields in between),
and by pausing it, changing the graph, and then resuming it, it’ll still
cause problems.
> the lock will protect not only ->parents and ->child, but also other
> bdrv fields that are concurrently read/written.
I would’ve thought this lock should only protect ->parents and ->children.
> I don't know, it seems to me that there is a lot of uncertainty on which
> way to take...
Definitely. :)
I wouldn’t call that a bad thing, necessarily. Let’s look at the
positive side: There are many ideas!
Hanna
next prev parent reply other threads:[~2022-03-30 14:13 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 14:21 [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-01 14:21 ` [RFC PATCH 1/5] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:21 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 2/5] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:22 ` Stefan Hajnoczi
2022-03-09 13:49 ` Eric Blake
2022-03-01 14:21 ` [RFC PATCH 3/5] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-03-02 16:25 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 4/5] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-02 16:37 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 5/5] test-bdrv-drain: ensure draining from main loop stops iothreads Emanuele Giuseppe Esposito
2022-03-01 14:26 ` [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-02 9:47 ` Stefan Hajnoczi
2022-03-09 13:26 ` Emanuele Giuseppe Esposito
2022-03-10 15:54 ` Stefan Hajnoczi
2022-03-17 16:23 ` Emanuele Giuseppe Esposito
2022-03-30 10:53 ` Hanna Reitz
2022-03-30 11:55 ` Emanuele Giuseppe Esposito
2022-03-30 14:12 ` Hanna Reitz [this message]
2022-03-30 16:02 ` Paolo Bonzini
2022-03-31 9:59 ` Paolo Bonzini
2022-03-31 13:51 ` Emanuele Giuseppe Esposito
2022-03-31 16:40 ` Paolo Bonzini
2022-04-01 8:05 ` Emanuele Giuseppe Esposito
2022-04-01 11:01 ` Paolo Bonzini
2022-04-04 9:25 ` Stefan Hajnoczi
2022-04-04 9:41 ` Paolo Bonzini
2022-04-04 9:51 ` Emanuele Giuseppe Esposito
2022-04-04 10:07 ` Paolo Bonzini
2022-04-05 9:39 ` Stefan Hajnoczi
2022-04-05 10:43 ` Kevin Wolf
2022-04-13 13:43 ` Emanuele Giuseppe Esposito
2022-04-13 14:51 ` Kevin Wolf
2022-04-13 15:14 ` Emanuele Giuseppe Esposito
2022-04-13 15:22 ` Emanuele Giuseppe Esposito
2022-04-13 16:29 ` Kevin Wolf
2022-04-13 20:43 ` Paolo Bonzini
2022-04-13 20:46 ` Paolo Bonzini
2022-03-02 11:07 ` Vladimir Sementsov-Ogievskiy
2022-03-02 16:20 ` Stefan Hajnoczi
2022-03-09 13:26 ` Emanuele Giuseppe Esposito
2022-03-16 21:55 ` Emanuele Giuseppe Esposito
2022-03-21 12:22 ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:24 ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:44 ` Vladimir Sementsov-Ogievskiy
2022-03-30 9:09 ` Emanuele Giuseppe Esposito
2022-03-30 9:52 ` Vladimir Sementsov-Ogievskiy
2022-03-30 9:58 ` Emanuele Giuseppe Esposito
2022-04-05 10:55 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c7f6fe7e-c309-010a-eaba-549fbfcb45ce@redhat.com \
--to=hreitz@redhat.com \
--cc=eesposit@redhat.com \
--cc=fam@euphon.net \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).