qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Reitz <hreitz@redhat.com>
To: Emanuele Giuseppe Esposito <eesposit@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 30 Mar 2022 16:12:30 +0200	[thread overview]
Message-ID: <c7f6fe7e-c309-010a-eaba-549fbfcb45ce@redhat.com> (raw)
In-Reply-To: <a4d3fc47-0769-7d11-47aa-a1c4ac503406@redhat.com>

On 30.03.22 13:55, Emanuele Giuseppe Esposito wrote:
>
> Am 30/03/2022 um 12:53 schrieb Hanna Reitz:
>> On 17.03.22 17:23, Emanuele Giuseppe Esposito wrote:
>>> Am 09/03/2022 um 14:26 schrieb Emanuele Giuseppe Esposito:
>>>>>> * Drains allow the caller (either main loop or iothread running
>>>>>> the context) to wait all in_flights requests and operations
>>>>>> of a BDS: normal drains target a given node and is parents, while
>>>>>> subtree ones also include the subgraph of the node. Siblings are
>>>>>> not affected by any of these two kind of drains.
>>>>> Siblings are drained to the extent required for their parent node to
>>>>> reach in_flight == 0.
>>>>>
>>>>> I haven't checked the code but I guess the case you're alluding to is
>>>>> that siblings with multiple parents could have other I/O in flight that
>>>>> will not be drained and further I/O can be submitted after the parent
>>>>> has drained?
>>>> Yes, this in theory can happen. I don't really know if this happens
>>>> practically, and how likely is to happen.
>>>>
>>>> The alternative would be to make a drain that blocks the whole graph,
>>>> siblings included, but that would probably be an overkill.
>>>>
>>> So I have thought about this, and I think maybe this is not a concrete
>>> problem.
>>> Suppose we have a graph where "parent" has 2 children: "child" and
>>> "sibling". "sibling" also has a blockjob.
>>>
>>> Now, main loop wants to modify parent-child relation and maybe detach
>>> child from parent.
>>>
>>> 1st wrong assumption: the sibling is not drained. Actually my strategy
>>> takes into account draining both nodes, also because parent could be in
>>> another graph. Therefore sibling is drained.
>>>
>>> But let's assume "sibling" is the sibling of the parent.
>>>
>>> Therefore we have
>>> "child" -> "parent" -> "grandparent"
>>> and
>>> "blockjob" -> "sibling" -> "grandparent"
>>>
>>> The issue is the following: main loop can't drain "sibling", because
>>> subtree_drained does not reach it. Therefore blockjob can still run
>>> while main loop modifies "child" -> "parent". Blockjob can either:
>>> 1) drain, but this won't affect "child" -> "parent"
>>> 2) read the graph in other ways different from drain, for example
>>> .set_aio_context recursively touches the whole graph.
>>> 3) write the graph.
>> I don’t really understand the problem here.  If the block job only
>> operates on the sibling subgraph, why would it care what’s going on in
>> the other subgraph?
> We are talking about something that probably does not happen, but what
> if it calls a callback similar to .set_aio_context that goes through the
> whole graph?

Hm.  Quite unfortunate if such a callback can operate on drained nodes, 
I’d say.  Ideally callbacks wouldn’t do that, but probably they will. :/

> Even though the first question is: is there such callback?

I mean, you could say any callback qualifies.  Draining a node will only 
drain its recursive parents, so siblings are not affected.  If the 
sibling issues the callback on its parent...  (E.g. changes in the 
backing chain requiring a qcow2 parent node to change the backing file 
string in its image file)

> Second even more irrealistic case is when a job randomly looks for a bs
> in another connectivity component and for example drains it.
> Again probably impossible.

I hope so, but the block layer sure likes to surprise me.

>> Block jobs should own all nodes that are associated with them (e.g.
>> because they intend to drop or replace them when the job is done), so
>> when part of the graph is drained, all jobs that could modify that part
>> should be drained, too.
> What do you mean with "own"?

They’re added with block_job_add_bdrv(), and then are children of the 
BlockJob object.

>>> 3) can be only performed in the main loop, because it's a graph
>>> operation. It means that the blockjob runs when the graph modifying
>>> coroutine/bh is not running. They never run together.
>>> The safety of this operation relies on where the drains are and will be
>>> inserted. If you do like in my patch "block.c:
>>> bdrv_replace_child_noperm: first call ->attach(), and then add child\x0f",
>>> then we would have problem, because we drain between two writes, and the
>>> blockjob will find an inconsistent graph. If we do it as we seem to do
>>> it so far, then we won't really have any problem.
>>>
>>> 2) is a read, and can theoretically be performed by another thread. But
>>> is there a function that does that? .set_aio_context for example is a GS
>>> function, so we will fall back to case 3) and nothing bad would happen.
>>>
>>> Is there a counter example for this?
>>>
>>> -----------
>>>
>>> Talking about something else, I discussed with Kevin what *seems* to be
>>> an alternative way to do this, instead of adding drains everywhere.
>>> His idea is to replicate what blk_wait_while_drained() currently does
>>> but on a larger scale. It is something in between this subtree_drains
>>> logic and a rwlock.
>>>
>>> Basically if I understood correctly, we could implement
>>> bdrv_wait_while_drained(), and put in all places where we would put a
>>> read lock: all the reads to ->parents and ->children.
>>> This function detects if the bdrv is under drain, and if so it will stop
>>> and wait that the drain finishes (ie the graph modification).
>>> On the other side, each write would just need to drain probably both
>>> nodes (simple drain), to signal that we are modifying the graph. Once
>>> bdrv_drained_begin() finishes, we are sure all coroutines are stopped.
>>> Once bdrv_drained_end() finishes, we automatically let all coroutine
>>> restart, and continue where they left off.
>>>
>>> Seems a good compromise between drains and rwlock. What do you think?
>> Well, sounds complicated.  So I’m asking myself whether this would be
>> noticeably better than just an RwLock for graph modifications, like the
>> global lock Vladimir has proposed.
> But the point is then: aren't we re-inventing an AioContext lock?

I don’t know how AioContext locks would even help with graph changes.  
If I want to change a block subgraph that’s in a different I/O thread, 
locking that thread isn’t enough (I would’ve thought); because I have no 
idea what the thread is doing when I’m locking it.  Perhaps it’s 
iterating through ->children right now (with some yields in between), 
and by pausing it, changing the graph, and then resuming it, it’ll still 
cause problems.

> the lock will protect not only ->parents and ->child, but also other
> bdrv fields that are concurrently read/written.

I would’ve thought this lock should only protect ->parents and ->children.

> I don't know, it seems to me that there is a lot of uncertainty on which
> way to take...

Definitely. :)

I wouldn’t call that a bad thing, necessarily.  Let’s look at the 
positive side: There are many ideas!

Hanna



  reply	other threads:[~2022-03-30 14:13 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 14:21 [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-01 14:21 ` [RFC PATCH 1/5] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:21   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 2/5] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-03-02 16:22   ` Stefan Hajnoczi
2022-03-09 13:49   ` Eric Blake
2022-03-01 14:21 ` [RFC PATCH 3/5] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-03-02 16:25   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 4/5] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-02 16:37   ` Stefan Hajnoczi
2022-03-01 14:21 ` [RFC PATCH 5/5] test-bdrv-drain: ensure draining from main loop stops iothreads Emanuele Giuseppe Esposito
2022-03-01 14:26 ` [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept Emanuele Giuseppe Esposito
2022-03-02  9:47 ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito
2022-03-10 15:54     ` Stefan Hajnoczi
2022-03-17 16:23     ` Emanuele Giuseppe Esposito
2022-03-30 10:53       ` Hanna Reitz
2022-03-30 11:55         ` Emanuele Giuseppe Esposito
2022-03-30 14:12           ` Hanna Reitz [this message]
2022-03-30 16:02         ` Paolo Bonzini
2022-03-31  9:59           ` Paolo Bonzini
2022-03-31 13:51             ` Emanuele Giuseppe Esposito
2022-03-31 16:40               ` Paolo Bonzini
2022-04-01  8:05                 ` Emanuele Giuseppe Esposito
2022-04-01 11:01                   ` Paolo Bonzini
2022-04-04  9:25                     ` Stefan Hajnoczi
2022-04-04  9:41                       ` Paolo Bonzini
2022-04-04  9:51                         ` Emanuele Giuseppe Esposito
2022-04-04 10:07                           ` Paolo Bonzini
2022-04-05  9:39                         ` Stefan Hajnoczi
2022-04-05 10:43                         ` Kevin Wolf
2022-04-13 13:43                     ` Emanuele Giuseppe Esposito
2022-04-13 14:51                       ` Kevin Wolf
2022-04-13 15:14                         ` Emanuele Giuseppe Esposito
2022-04-13 15:22                           ` Emanuele Giuseppe Esposito
2022-04-13 16:29                           ` Kevin Wolf
2022-04-13 20:43                             ` Paolo Bonzini
2022-04-13 20:46                         ` Paolo Bonzini
2022-03-02 11:07 ` Vladimir Sementsov-Ogievskiy
2022-03-02 16:20   ` Stefan Hajnoczi
2022-03-09 13:26   ` Emanuele Giuseppe Esposito
2022-03-16 21:55     ` Emanuele Giuseppe Esposito
2022-03-21 12:22       ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:24     ` Vladimir Sementsov-Ogievskiy
2022-03-21 15:44     ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:09       ` Emanuele Giuseppe Esposito
2022-03-30  9:52         ` Vladimir Sementsov-Ogievskiy
2022-03-30  9:58           ` Emanuele Giuseppe Esposito
2022-04-05 10:55             ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c7f6fe7e-c309-010a-eaba-549fbfcb45ce@redhat.com \
    --to=hreitz@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).