Re: [Qemu-devel] [PATCH] blk: postpone request execution on a context protected with "drained section"

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Denis Plotnikov <dplotnikov@virtuozzo.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "keith.busch@intel.com" <keith.busch@intel.com>,
	"jsnow@redhat.com" <jsnow@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"famz@redhat.com" <famz@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Subject: Re: [Qemu-devel] [PATCH] blk: postpone request execution on a context protected with "drained section"
Date: Wed, 9 Jan 2019 08:18:45 +0000	[thread overview]
Message-ID: <9d97e822-7c9b-c095-6a27-a74facf21e47@virtuozzo.com> (raw)
In-Reply-To: <2ee882bf-00cc-addf-d866-c5c916a00774@virtuozzo.com>

ping ping!!!

On 18.12.2018 11:53, Denis Plotnikov wrote:
> ping ping
> 
> On 14.12.2018 14:54, Denis Plotnikov wrote:
>>
>>
>> On 13.12.2018 15:20, Kevin Wolf wrote:
>>> Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben:
>>>> On 12.12.2018 15:24, Kevin Wolf wrote:
>>>>> Am 11.12.2018 um 17:55 hat Denis Plotnikov geschrieben:
>>>>>>> Why involve the AioContext at all? This could all be kept at the
>>>>>>> BlockBackend level without extending the layering violation that
>>>>>>> aio_disable_external() is.
>>>>>>>
>>>>>>> BlockBackends get notified when their root node is drained, so 
>>>>>>> hooking
>>>>>>> things up there should be as easy, if not even easier than in
>>>>>>> AioContext.
>>>>>>
>>>>>> Just want to make sure that I understood correctly what you meant by
>>>>>> "BlockBackends get notified". Did you mean that bdrv_drain_end calls
>>>>>> child's role callback blk_root_drained_end by calling
>>>>>> bdrv_parent_drained_end?
>>>>>
>>>>> Yes, blk_root_drained_begin/end calls are all you need. Specifically,
>>>>> their adjustments to blk->quiesce_counter that are already there, 
>>>>> and in
>>>>> the 'if (--blk->quiesce_counter == 0)' block of blk_root_drained_end()
>>>>> we can resume the queued requests.
>>>> Sounds it should be so, but it doesn't work that way and that's why:
>>>> when doing mirror we may resume postponed coroutines too early when the
>>>> underlying bs is protected from writing at and thus we encounter the
>>>> assert on a write request execution at bdrv_co_write_req_prepare when
>>>> resuming the postponed coroutines.
>>>>
>>>> The thing is that the bs is protected for writing before execution of
>>>> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls
>>>> bdrv_replace_child_noperm which, in turn, calls 
>>>> child->role->drained_end
>>>> where one of the callbacks is blk_root_drained_end which check
>>>> if(--blk->quiesce_counter == 0) and runs the postponed requests
>>>> (coroutines) if the coundition is true.
>>>
>>> Hm, so something is messed up with the drain sections in the mirror
>>> driver. We have:
>>>
>>>      bdrv_drained_begin(target_bs);
>>>      bdrv_replace_node(to_replace, target_bs, &local_err);
>>>      bdrv_drained_end(target_bs);
>>>
>>> Obviously, the intention was to keep the BlockBackend drained during
>>> bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0
>>> inside bdrv_replace_node() when target_bs is drained?
>>>
>>> Looking at bdrv_replace_child_noperm(), it seems that the function has
>>> a bug: Even if old_bs and new_bs are both drained, the quiesce_counter
>>> for the parent reaches 0 for a moment because we call .drained_end for
>>> the old child first and .drained_begin for the new one later.
>>>
>>> So it seems the fix would be to reverse the order and first call
>>> .drained_begin for the new child and then .drained_end for the old
>>> child. Sounds like a good new testcase for tests/test-bdrv-drain.c, too.
>> Yes, it's true, but it's not enough...
>> In mirror_exit_common() we actively manipulate with block driver states.
>> When we replaced a node in the snippet you showed we can't allow the 
>> postponed coroutines to run because the block tree isn't ready to 
>> receive the requests yet.
>> To be ready, we need to insert a proper block driver state to the 
>> block backend which is done here
>>
>>      blk_remove_bs(bjob->blk);
>>      blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
>>      blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << <<
>>
>>      bs_opaque->job = NULL;
>>
>>      bdrv_drained_end(src);
>>
>> If the tree isn't ready and we resume the coroutines, we'll end up 
>> with the request landed in a wrong block driver state.
>>
>> So, we explicitly should stop all activities on all the driver states
>> and its parents and allow the activities when everything is ready to go.
>>
>> Why explicitly, because the block driver states may belong to 
>> different block backends at the moment of the manipulation beginning.
>>
>> So, it seems we need to disable all their contexts until the 
>> manipulation ends.
>>
>> Please, correct me if I'm wrong.
>>
>>>
>>>> In seems that if the external requests disabled on the context we can't
>>>> rely on anything or should check where the underlying bs and its
>>>> underlying nodes are ready to receive requests which sounds quite
>>>> complicated.
>>>> Please correct me if still don't understand something in that routine.
>>>
>>> I think the reason why reyling on aio_disable_external() works is simply
>>> because src is also drained, which keeps external events in the
>>> AioContext disabled despite the bug in draining the target node.
>>>
>>> The bug would become apparent even with aio_disable_external() if we
>>> didn't drain src, or even if we just supported src and target being in
>>> different AioContexts.
>>
>> Why don't we disable all those contexts involved until the end of the 
>> block device tree reconstruction?
>>
>> Thanks!
>>
>> Denis
>>>
>>> Kevin
>>>
>>
> 

-- 
Best,
Denis

next prev parent reply	other threads:[~2019-01-09  8:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05 12:23 [Qemu-devel] [PATCH] blk: postpone request execution on a context protected with "drained section" Denis Plotnikov
2018-12-07 12:26 ` Kevin Wolf
2018-12-10 12:14   ` Denis Plotnikov
2018-12-10 12:25     ` Kevin Wolf
2018-12-11 16:55   ` Denis Plotnikov
2018-12-12 12:24     ` Kevin Wolf
2018-12-13 11:07       ` Denis Plotnikov
2018-12-13 12:20         ` Kevin Wolf
2018-12-14 11:54           ` Denis Plotnikov
2018-12-18  8:53             ` Denis Plotnikov
2019-01-09  8:18               ` Denis Plotnikov [this message]
2019-01-15  7:22                 ` Denis Plotnikov
2019-01-17 12:57                   ` [Qemu-devel] PING: " Denis Plotnikov
2019-01-17 14:23                     ` Kevin Wolf
2019-01-18  7:43                       ` Denis Plotnikov
     [not found]             ` <20190313160412.GF5167@linux.fritz.box>
2019-04-02  8:35               ` [Qemu-devel] " Denis Plotnikov
2019-04-09 10:01                 ` Kevin Wolf
2019-04-09 10:01                   ` Kevin Wolf
2019-06-21  9:16                   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2019-06-21  9:59                     ` Vladimir Sementsov-Ogievskiy
2019-06-24  9:46                       ` Denis Plotnikov
2019-06-26  8:46                         ` Denis Plotnikov
2019-06-28 12:32                           ` Kevin Wolf
2019-07-02 14:41                             ` Denis Plotnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d97e822-7c9b-c095-6a27-a74facf21e47@virtuozzo.com \
    --to=dplotnikov@virtuozzo.com \
    --cc=den@virtuozzo.com \
    --cc=famz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).