qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [PATCH 10/12] block.c: add subtree_drains where needed
Date: Wed, 2 Feb 2022 16:37:43 +0100	[thread overview]
Message-ID: <a2e77f99-3138-0a24-9ced-79f441d42ca4@redhat.com> (raw)
In-Reply-To: <52eff922-0ca4-fc12-0edb-8eb963ac306c@virtuozzo.com>



On 01/02/2022 15:47, Vladimir Sementsov-Ogievskiy wrote:
> 18.01.2022 19:27, Emanuele Giuseppe Esposito wrote:
>> Protect bdrv_replace_child_noperm, as it modifies the
>> graph by adding/removing elements to .children and .parents
>> list of a bs. Use the newly introduced
>> bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
>> that and be free from the aiocontext lock.
>>
>> One important criteria to keep in mind is that if the caller of
>> bdrv_replace_child_noperm creates a transaction, we need to make sure
>> that the
>> whole transaction is under the same drain block. This is imperative,
>> as having
>> multiple drains also in the .abort() class of functions causes
>> discrepancies
>> in the drained counters (as nodes are put back into the original
>> positions),
>> making it really hard to retourn all to zero and leaving the code very
>> buggy.
>> See https://patchew.org/QEMU/20211213104014.69858-1-eesposit@redhat.com/
>> for more explanations.
>>
>> Unfortunately we still need to have bdrv_subtree_drained_begin/end
>> in bdrv_detach_child() releasing and then holding the AioContext
>> lock, since it later invokes bdrv_try_set_aio_context() that is
>> not safe yet. Once all is cleaned up, we can also remove the
>> acquire/release locks in job_unref, artificially added because of this.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> ---
>>   block.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------
>>   1 file changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index fcc44a49a0..6196c95aae 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
>>       BlockDriverState *old_bs = (*childp)->bs;
>>         assert(qemu_in_main_thread());
>> +    if (old_bs) {
>> +        /*
>> +         * TODO: this is called by job_unref with lock held, because
>> +         * afterwards it calls bdrv_try_set_aio_context.
>> +         * Once all of this is fixed, take care of removing
>> +         * the aiocontext lock and make this function _unlocked.
>> +         */
>> +        bdrv_subtree_drained_begin(old_bs);
>> +    }
>> +
>>       bdrv_replace_child_noperm(childp, NULL, true);
>>   +    if (old_bs) {
>> +        bdrv_subtree_drained_end(old_bs);
>> +    }
>> +
>>       if (old_bs) {
>>           /*
>>            * Update permissions for old node. We're just taking a
>> parent away, so
>> @@ -3154,6 +3168,7 @@ BdrvChild
>> *bdrv_root_attach_child(BlockDriverState *child_bs,
>>       Transaction *tran = tran_new();
>>         assert(qemu_in_main_thread());
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>>         ret = bdrv_attach_child_common(child_bs, child_name, child_class,
>>                                      child_role, perm, shared_perm,
>> opaque,
>> @@ -3168,6 +3183,7 @@ out:
>>       tran_finalize(tran, ret);
>>       /* child is unset on failure by bdrv_attach_child_common_abort() */
>>       assert((ret < 0) == !child);
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>>         bdrv_unref(child_bs);
>>       return child;
>> @@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>         assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(parent_bs);
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>> +
>>       ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
>> child_class,
>>                                      child_role, &child, tran, errp);
>>       if (ret < 0) {
>> @@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>   out:
>>       tran_finalize(tran, ret);
>>       /* child is unset on failure by bdrv_attach_child_common_abort() */
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>> +    bdrv_subtree_drained_end_unlocked(parent_bs);
>> +
>>       assert((ret < 0) == !child);
>>         bdrv_unref(child_bs);
>> @@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>         assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(bs);
>> +    if (backing_hd) {
>> +        bdrv_subtree_drained_begin_unlocked(backing_hd);
>> +    }
>> +
>>       ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
>>       if (ret < 0) {
>>           goto out;
>> @@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>       ret = bdrv_refresh_perms(bs, errp);
>>   out:
>>       tran_finalize(tran, ret);
>> +    if (backing_hd) {
>> +        bdrv_subtree_drained_end_unlocked(backing_hd);
>> +    }
>> +    bdrv_subtree_drained_end_unlocked(bs);
>>         return ret;
>>   }
>> @@ -5266,7 +5297,8 @@ static int
>> bdrv_replace_node_common(BlockDriverState *from,
>>         assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>>       assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
>> -    bdrv_drained_begin(from);
>> +    bdrv_subtree_drained_begin_unlocked(from);
>> +    bdrv_subtree_drained_begin_unlocked(to);
>>         /*
>>        * Do the replacement without permission update.
>> @@ -5298,7 +5330,8 @@ static int
>> bdrv_replace_node_common(BlockDriverState *from,
>>   out:
>>       tran_finalize(tran, ret);
>>   -    bdrv_drained_end(from);
>> +    bdrv_subtree_drained_end_unlocked(to);
>> +    bdrv_subtree_drained_end_unlocked(from);
>>       bdrv_unref(from);
>>         return ret;
>> @@ -5345,6 +5378,9 @@ int bdrv_append(BlockDriverState *bs_new,
>> BlockDriverState *bs_top,
>>         assert(!bs_new->backing);
>>   +    bdrv_subtree_drained_begin_unlocked(bs_new);
>> +    bdrv_subtree_drained_begin_unlocked(bs_top);
>> +
>>       ret = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
>>                                      &child_of_bds,
>> bdrv_backing_role(bs_new),
>>                                      &bs_new->backing, tran, errp);
>> @@ -5360,6 +5396,8 @@ int bdrv_append(BlockDriverState *bs_new,
>> BlockDriverState *bs_top,
>>       ret = bdrv_refresh_perms(bs_new, errp);
>>   out:
>>       tran_finalize(tran, ret);
>> +    bdrv_subtree_drained_end_unlocked(bs_top);
>> +    bdrv_subtree_drained_end_unlocked(bs_new);
>>         bdrv_refresh_limits(bs_top, NULL, NULL);
>>   @@ -5379,8 +5417,8 @@ int bdrv_replace_child_bs(BdrvChild *child,
>> BlockDriverState *new_bs,
>>       assert(qemu_in_main_thread());
>>         bdrv_ref(old_bs);
>> -    bdrv_drained_begin(old_bs);
>> -    bdrv_drained_begin(new_bs);
>> +    bdrv_subtree_drained_begin_unlocked(old_bs);
>> +    bdrv_subtree_drained_begin_unlocked(new_bs);
>>         bdrv_replace_child_tran(&child, new_bs, tran, true);
>>       /* @new_bs must have been non-NULL, so @child must not have been
>> freed */
>> @@ -5394,8 +5432,8 @@ int bdrv_replace_child_bs(BdrvChild *child,
>> BlockDriverState *new_bs,
>>         tran_finalize(tran, ret);
>>   -    bdrv_drained_end(old_bs);
>> -    bdrv_drained_end(new_bs);
>> +    bdrv_subtree_drained_end_unlocked(new_bs);
>> +    bdrv_subtree_drained_end_unlocked(old_bs);
>>       bdrv_unref(old_bs);
>>         return ret;
>>

From the other thread:
> So you mean that backing_hd is modified as its parent is changed, do I understand correctly?

Yes

> 
> As we started to discuss in a thread started with my "[PATCH] block:
> bdrv_set_backing_hd(): use drained section", I think draining the whole
> subtree when we modify some parent-child relation is too much. In-flight
> requests in subtree don't touch these relations, so why to wait/stop
> them? Imagine two disks A and B with same backing file C. If we want to
> detach A from C, what is the reason to drain in-fligth read request of B
> in C?

If I am not mistaken, bdrv_set_backing_hd adds a new node (yes removes
the old backing_hd, but let's not think about this for a moment).
So we have disk B with backing file C, and new disk A that wants to have
backing file C.

I think I understand what you mean, so in theory the operation would be
- create new child
- add child to A->children list
- add child to C->parents list

So in theory we need to
* drain A (without subtree), because it can't happen that child nodes of
  A have in-flight requests that look at A status (children list), right?
  In other words, if A has another node X, can a request in X inspect
  A->children
* drain C, as parents can inspect C status (like B). Same assumption
  here, C->children[x]->bs cannot have requests inspecting C->parents
  list?

> Modifying children/parents lists should be protected somehow. Currently
> it's protected by aio-context lock.. If we drop it, we probably need
> some new mutex for it. Also, graph modification should be protected from
> each other, so that one graph modifiction is not started until another
> is in-flight, probably we need some global mutex for graph modification.
> But that's all not about drains.

The idea was to use BQL + drain, to obtain the same effect of aiocontext
lock. BQL should prevent concurrent graph modifications, drain
concurrent I/O reading the state while being modificated.

Emanuele
> 
> Drains are about in-flight requests. And when we switch a child of node
> X, we should do it in drained section for X, as in-flight requests in X
> can touch its children. But another in-flight requests in subtree are
> safe and should not be awaited.
> 
> 



  reply	other threads:[~2022-02-02 16:38 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 16:27 [PATCH 00/12] Removal of Aiocontext lock through drains: protect bdrv_replace_child_noperm Emanuele Giuseppe Esposito
2022-01-18 16:27 ` [PATCH 01/12] introduce BDRV_POLL_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-01-26 10:49   ` Stefan Hajnoczi
2022-02-03 13:57     ` Emanuele Giuseppe Esposito
2022-02-04 12:13       ` Paolo Bonzini
2022-01-18 16:27 ` [PATCH 02/12] block/io.c: make bdrv_do_drained_begin_quiesce static and introduce bdrv_drained_begin_no_poll Emanuele Giuseppe Esposito
2022-01-19  9:11   ` Paolo Bonzini
2022-01-18 16:27 ` [PATCH 03/12] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
2022-01-18 16:27 ` [PATCH 04/12] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
2022-01-18 16:27 ` [PATCH 05/12] test-bdrv-drain.c: adapt test to the coming subtree drains Emanuele Giuseppe Esposito
2022-01-19  9:18   ` Paolo Bonzini
2022-02-03 11:41     ` Emanuele Giuseppe Esposito
2022-01-18 16:27 ` [PATCH 06/12] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
2022-01-18 16:27 ` [PATCH 07/12] block/io.c: introduce bdrv_subtree_drained_{begin/end}_unlocked Emanuele Giuseppe Esposito
2022-01-19  9:52   ` Paolo Bonzini
2022-01-26 11:04   ` Stefan Hajnoczi
2022-01-18 16:27 ` [PATCH 08/12] reopen: add a transaction to drain_end nodes picked in bdrv_reopen_parse_file_or_backing Emanuele Giuseppe Esposito
2022-01-19  9:33   ` Paolo Bonzini
2022-01-26 11:16   ` Stefan Hajnoczi
2022-01-18 16:27 ` [PATCH 09/12] jobs: ensure sleep in job_sleep_ns is fully performed Emanuele Giuseppe Esposito
2022-01-26 11:21   ` Stefan Hajnoczi
2022-02-03 14:21     ` Emanuele Giuseppe Esposito
2022-01-18 16:27 ` [PATCH 10/12] block.c: add subtree_drains where needed Emanuele Giuseppe Esposito
2022-01-19  9:47   ` Paolo Bonzini
2022-02-03 13:13     ` Emanuele Giuseppe Esposito
2022-02-01 14:47   ` Vladimir Sementsov-Ogievskiy
2022-02-02 15:37     ` Emanuele Giuseppe Esposito [this message]
2022-02-02 17:38       ` Paolo Bonzini
2022-02-03 10:09         ` Emanuele Giuseppe Esposito
2022-02-04  9:49       ` Vladimir Sementsov-Ogievskiy
2022-02-04 13:30         ` Emanuele Giuseppe Esposito
2022-02-04 14:03           ` Vladimir Sementsov-Ogievskiy
2022-01-18 16:27 ` [PATCH 11/12] block/io.c: fully enable assert_bdrv_graph_writable Emanuele Giuseppe Esposito
2022-01-18 16:27 ` [PATCH 12/12] block.c: additional assert qemu in main tread Emanuele Giuseppe Esposito
2022-01-19  9:51 ` [PATCH 00/12] Removal of Aiocontext lock through drains: protect bdrv_replace_child_noperm Paolo Bonzini
2022-01-26 11:29 ` Stefan Hajnoczi
2022-01-27 13:46   ` Paolo Bonzini
2022-01-28 12:20     ` Emanuele Giuseppe Esposito

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2e77f99-3138-0a24-9ced-79f441d42ca4@redhat.com \
    --to=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).