From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Hanna Reitz <hreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org, qemu-devel@nongnu.org,
Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 30 Mar 2022 13:55:41 +0200 [thread overview]
Message-ID: <a4d3fc47-0769-7d11-47aa-a1c4ac503406@redhat.com> (raw)
In-Reply-To: <c8d45cd9-e7de-9acd-3fd6-13de58f5ce48@redhat.com>
Am 30/03/2022 um 12:53 schrieb Hanna Reitz:
> On 17.03.22 17:23, Emanuele Giuseppe Esposito wrote:
>>
>> Am 09/03/2022 um 14:26 schrieb Emanuele Giuseppe Esposito:
>>>>> * Drains allow the caller (either main loop or iothread running
>>>>> the context) to wait all in_flights requests and operations
>>>>> of a BDS: normal drains target a given node and is parents, while
>>>>> subtree ones also include the subgraph of the node. Siblings are
>>>>> not affected by any of these two kind of drains.
>>>> Siblings are drained to the extent required for their parent node to
>>>> reach in_flight == 0.
>>>>
>>>> I haven't checked the code but I guess the case you're alluding to is
>>>> that siblings with multiple parents could have other I/O in flight that
>>>> will not be drained and further I/O can be submitted after the parent
>>>> has drained?
>>> Yes, this in theory can happen. I don't really know if this happens
>>> practically, and how likely is to happen.
>>>
>>> The alternative would be to make a drain that blocks the whole graph,
>>> siblings included, but that would probably be an overkill.
>>>
>> So I have thought about this, and I think maybe this is not a concrete
>> problem.
>> Suppose we have a graph where "parent" has 2 children: "child" and
>> "sibling". "sibling" also has a blockjob.
>>
>> Now, main loop wants to modify parent-child relation and maybe detach
>> child from parent.
>>
>> 1st wrong assumption: the sibling is not drained. Actually my strategy
>> takes into account draining both nodes, also because parent could be in
>> another graph. Therefore sibling is drained.
>>
>> But let's assume "sibling" is the sibling of the parent.
>>
>> Therefore we have
>> "child" -> "parent" -> "grandparent"
>> and
>> "blockjob" -> "sibling" -> "grandparent"
>>
>> The issue is the following: main loop can't drain "sibling", because
>> subtree_drained does not reach it. Therefore blockjob can still run
>> while main loop modifies "child" -> "parent". Blockjob can either:
>> 1) drain, but this won't affect "child" -> "parent"
>> 2) read the graph in other ways different from drain, for example
>> .set_aio_context recursively touches the whole graph.
>> 3) write the graph.
>
> I don’t really understand the problem here. If the block job only
> operates on the sibling subgraph, why would it care what’s going on in
> the other subgraph?
We are talking about something that probably does not happen, but what
if it calls a callback similar to .set_aio_context that goes through the
whole graph?
Even though the first question is: is there such callback?
Second even more irrealistic case is when a job randomly looks for a bs
in another connectivity component and for example drains it.
Again probably impossible.
> Block jobs should own all nodes that are associated with them (e.g.
> because they intend to drop or replace them when the job is done), so
> when part of the graph is drained, all jobs that could modify that part
> should be drained, too.
What do you mean with "own"?
>
>> 3) can be only performed in the main loop, because it's a graph
>> operation. It means that the blockjob runs when the graph modifying
>> coroutine/bh is not running. They never run together.
>> The safety of this operation relies on where the drains are and will be
>> inserted. If you do like in my patch "block.c:
>> bdrv_replace_child_noperm: first call ->attach(), and then add child\x0f",
>> then we would have problem, because we drain between two writes, and the
>> blockjob will find an inconsistent graph. If we do it as we seem to do
>> it so far, then we won't really have any problem.
>>
>> 2) is a read, and can theoretically be performed by another thread. But
>> is there a function that does that? .set_aio_context for example is a GS
>> function, so we will fall back to case 3) and nothing bad would happen.
>>
>> Is there a counter example for this?
>>
>> -----------
>>
>> Talking about something else, I discussed with Kevin what *seems* to be
>> an alternative way to do this, instead of adding drains everywhere.
>> His idea is to replicate what blk_wait_while_drained() currently does
>> but on a larger scale. It is something in between this subtree_drains
>> logic and a rwlock.
>>
>> Basically if I understood correctly, we could implement
>> bdrv_wait_while_drained(), and put in all places where we would put a
>> read lock: all the reads to ->parents and ->children.
>> This function detects if the bdrv is under drain, and if so it will stop
>> and wait that the drain finishes (ie the graph modification).
>> On the other side, each write would just need to drain probably both
>> nodes (simple drain), to signal that we are modifying the graph. Once
>> bdrv_drained_begin() finishes, we are sure all coroutines are stopped.
>> Once bdrv_drained_end() finishes, we automatically let all coroutine
>> restart, and continue where they left off.
>>
>> Seems a good compromise between drains and rwlock. What do you think?
>
> Well, sounds complicated. So I’m asking myself whether this would be
> noticeably better than just an RwLock for graph modifications, like the
> global lock Vladimir has proposed.
But the point is then: aren't we re-inventing an AioContext lock?
the lock will protect not only ->parents and ->child, but also other
bdrv fields that are concurrently read/written.
I don't know, it seems to me that there is a lot of uncertainty on which
way to take...
Emanuele
next prev parent reply other threads:[~2022-03-30 11:59 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 14:21 [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-01 14:21 ` [RFC PATCH 1/5] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:21 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 2/5] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:22 ` Stefan Hajnoczi
2022-03-09 13:49 ` Eric Blake
2022-03-01 14:21 ` [RFC PATCH 3/5] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-03-02 16:25 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 4/5] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-02 16:37 ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 5/5] test-bdrv-drain: ensure draining from main loop stops iothreads Emanuele Giuseppe Esposito
2022-03-01 14:26 ` [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-02 9:47 ` Stefan Hajnoczi
2022-03-09 13:26 ` Emanuele Giuseppe Esposito
2022-03-10 15:54 ` Stefan Hajnoczi
2022-03-17 16:23 ` Emanuele Giuseppe Esposito
2022-03-30 10:53 ` Hanna Reitz
2022-03-30 11:55 ` Emanuele Giuseppe Esposito [this message]
2022-03-30 14:12 ` Hanna Reitz
2022-03-30 16:02 ` Paolo Bonzini
2022-03-31 9:59 ` Paolo Bonzini
2022-03-31 13:51 ` Emanuele Giuseppe Esposito
2022-03-31 16:40 ` Paolo Bonzini
2022-04-01 8:05 ` Emanuele Giuseppe Esposito
2022-04-01 11:01 ` Paolo Bonzini
2022-04-04 9:25 ` Stefan Hajnoczi
2022-04-04 9:41 ` Paolo Bonzini
2022-04-04 9:51 ` Emanuele Giuseppe Esposito
2022-04-04 10:07 ` Paolo Bonzini
2022-04-05 9:39 ` Stefan Hajnoczi
2022-04-05 10:43 ` Kevin Wolf
2022-04-13 13:43 ` Emanuele Giuseppe Esposito
2022-04-13 14:51 ` Kevin Wolf
2022-04-13 15:14 ` Emanuele Giuseppe Esposito
2022-04-13 15:22 ` Emanuele Giuseppe Esposito
2022-04-13 16:29 ` Kevin Wolf
2022-04-13 20:43 ` Paolo Bonzini
2022-04-13 20:46 ` Paolo Bonzini
2022-03-02 11:07 ` Vladimir Sementsov-Ogievskiy
2022-03-02 16:20 ` Stefan Hajnoczi
2022-03-09 13:26 ` Emanuele Giuseppe Esposito
2022-03-16 21:55 ` Emanuele Giuseppe Esposito
2022-03-21 12:22 ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:24 ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:44 ` Vladimir Sementsov-Ogievskiy
2022-03-30 9:09 ` Emanuele Giuseppe Esposito
2022-03-30 9:52 ` Vladimir Sementsov-Ogievskiy
2022-03-30 9:58 ` Emanuele Giuseppe Esposito
2022-04-05 10:55 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4d3fc47-0769-7d11-47aa-a1c4ac503406@redhat.com \
--to=eesposit@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).