All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "keith.busch@intel.com" <keith.busch@intel.com>,
	"jsnow@redhat.com" <jsnow@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"famz@redhat.com" <famz@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Subject: Re: [Qemu-devel] PING: [PATCH] blk: postpone request execution on a context protected with "drained section"
Date: Thu, 17 Jan 2019 15:23:53 +0100	[thread overview]
Message-ID: <20190117142353.GA6761@localhost.localdomain> (raw)
In-Reply-To: <83610386-bc64-6691-9af8-e98eb0a68342@virtuozzo.com>

Am 17.01.2019 um 13:57 hat Denis Plotnikov geschrieben:
> Kevin,
> 
> could you please take a look at my last comments?

I read it, and what it told me is essentially that I need to work on it
myself to fully understand the problem and possible acceptable solutions
because you can't seem to find one yourself. I will, but I can't
guarantee when I can find the time for it.

Kevin

> On 15.01.2019 10:22, Denis Plotnikov wrote:
> > ping ping ping ping!!!!
> > 
> > On 09.01.2019 11:18, Denis Plotnikov wrote:
> >> ping ping!!!
> >>
> >> On 18.12.2018 11:53, Denis Plotnikov wrote:
> >>> ping ping
> >>>
> >>> On 14.12.2018 14:54, Denis Plotnikov wrote:
> >>>>
> >>>>
> >>>> On 13.12.2018 15:20, Kevin Wolf wrote:
> >>>>> Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben:
> >>>>>> On 12.12.2018 15:24, Kevin Wolf wrote:
> >>>>>>> Am 11.12.2018 um 17:55 hat Denis Plotnikov geschrieben:
> >>>>>>>>> Why involve the AioContext at all? This could all be kept at the
> >>>>>>>>> BlockBackend level without extending the layering violation that
> >>>>>>>>> aio_disable_external() is.
> >>>>>>>>>
> >>>>>>>>> BlockBackends get notified when their root node is drained, so 
> >>>>>>>>> hooking
> >>>>>>>>> things up there should be as easy, if not even easier than in
> >>>>>>>>> AioContext.
> >>>>>>>>
> >>>>>>>> Just want to make sure that I understood correctly what you 
> >>>>>>>> meant by
> >>>>>>>> "BlockBackends get notified". Did you mean that bdrv_drain_end 
> >>>>>>>> calls
> >>>>>>>> child's role callback blk_root_drained_end by calling
> >>>>>>>> bdrv_parent_drained_end?
> >>>>>>>
> >>>>>>> Yes, blk_root_drained_begin/end calls are all you need. 
> >>>>>>> Specifically,
> >>>>>>> their adjustments to blk->quiesce_counter that are already there, 
> >>>>>>> and in
> >>>>>>> the 'if (--blk->quiesce_counter == 0)' block of 
> >>>>>>> blk_root_drained_end()
> >>>>>>> we can resume the queued requests.
> >>>>>> Sounds it should be so, but it doesn't work that way and that's why:
> >>>>>> when doing mirror we may resume postponed coroutines too early 
> >>>>>> when the
> >>>>>> underlying bs is protected from writing at and thus we encounter the
> >>>>>> assert on a write request execution at bdrv_co_write_req_prepare when
> >>>>>> resuming the postponed coroutines.
> >>>>>>
> >>>>>> The thing is that the bs is protected for writing before execution of
> >>>>>> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls
> >>>>>> bdrv_replace_child_noperm which, in turn, calls 
> >>>>>> child->role->drained_end
> >>>>>> where one of the callbacks is blk_root_drained_end which check
> >>>>>> if(--blk->quiesce_counter == 0) and runs the postponed requests
> >>>>>> (coroutines) if the coundition is true.
> >>>>>
> >>>>> Hm, so something is messed up with the drain sections in the mirror
> >>>>> driver. We have:
> >>>>>
> >>>>>      bdrv_drained_begin(target_bs);
> >>>>>      bdrv_replace_node(to_replace, target_bs, &local_err);
> >>>>>      bdrv_drained_end(target_bs);
> >>>>>
> >>>>> Obviously, the intention was to keep the BlockBackend drained during
> >>>>> bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0
> >>>>> inside bdrv_replace_node() when target_bs is drained?
> >>>>>
> >>>>> Looking at bdrv_replace_child_noperm(), it seems that the function has
> >>>>> a bug: Even if old_bs and new_bs are both drained, the quiesce_counter
> >>>>> for the parent reaches 0 for a moment because we call .drained_end for
> >>>>> the old child first and .drained_begin for the new one later.
> >>>>>
> >>>>> So it seems the fix would be to reverse the order and first call
> >>>>> .drained_begin for the new child and then .drained_end for the old
> >>>>> child. Sounds like a good new testcase for tests/test-bdrv-drain.c, 
> >>>>> too.
> >>>> Yes, it's true, but it's not enough...
> >>>> In mirror_exit_common() we actively manipulate with block driver 
> >>>> states.
> >>>> When we replaced a node in the snippet you showed we can't allow the 
> >>>> postponed coroutines to run because the block tree isn't ready to 
> >>>> receive the requests yet.
> >>>> To be ready, we need to insert a proper block driver state to the 
> >>>> block backend which is done here
> >>>>
> >>>>      blk_remove_bs(bjob->blk);
> >>>>      blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
> >>>>      blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << <<
> >>>>
> >>>>      bs_opaque->job = NULL;
> >>>>
> >>>>      bdrv_drained_end(src);
> >>>>
> >>>> If the tree isn't ready and we resume the coroutines, we'll end up 
> >>>> with the request landed in a wrong block driver state.
> >>>>
> >>>> So, we explicitly should stop all activities on all the driver states
> >>>> and its parents and allow the activities when everything is ready to 
> >>>> go.
> >>>>
> >>>> Why explicitly, because the block driver states may belong to 
> >>>> different block backends at the moment of the manipulation beginning.
> >>>>
> >>>> So, it seems we need to disable all their contexts until the 
> >>>> manipulation ends.
> >>>>
> >>>> Please, correct me if I'm wrong.
> >>>>
> >>>>>
> >>>>>> In seems that if the external requests disabled on the context we 
> >>>>>> can't
> >>>>>> rely on anything or should check where the underlying bs and its
> >>>>>> underlying nodes are ready to receive requests which sounds quite
> >>>>>> complicated.
> >>>>>> Please correct me if still don't understand something in that 
> >>>>>> routine.
> >>>>>
> >>>>> I think the reason why reyling on aio_disable_external() works is 
> >>>>> simply
> >>>>> because src is also drained, which keeps external events in the
> >>>>> AioContext disabled despite the bug in draining the target node.
> >>>>>
> >>>>> The bug would become apparent even with aio_disable_external() if we
> >>>>> didn't drain src, or even if we just supported src and target being in
> >>>>> different AioContexts.
> >>>>
> >>>> Why don't we disable all those contexts involved until the end of 
> >>>> the block device tree reconstruction?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Denis
> >>>>>
> >>>>> Kevin
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> -- 
> Best,
> Denis

  reply	other threads:[~2019-01-17 14:24 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-05 12:23 [Qemu-devel] [PATCH] blk: postpone request execution on a context protected with "drained section" Denis Plotnikov
2018-12-07 12:26 ` Kevin Wolf
2018-12-10 12:14   ` Denis Plotnikov
2018-12-10 12:25     ` Kevin Wolf
2018-12-11 16:55   ` Denis Plotnikov
2018-12-12 12:24     ` Kevin Wolf
2018-12-13 11:07       ` Denis Plotnikov
2018-12-13 12:20         ` Kevin Wolf
2018-12-14 11:54           ` Denis Plotnikov
2018-12-18  8:53             ` Denis Plotnikov
2019-01-09  8:18               ` Denis Plotnikov
2019-01-15  7:22                 ` Denis Plotnikov
2019-01-17 12:57                   ` [Qemu-devel] PING: " Denis Plotnikov
2019-01-17 14:23                     ` Kevin Wolf [this message]
2019-01-18  7:43                       ` Denis Plotnikov
     [not found]             ` <20190313160412.GF5167@linux.fritz.box>
2019-04-02  8:35               ` [Qemu-devel] " Denis Plotnikov
2019-04-09 10:01                 ` Kevin Wolf
2019-04-09 10:01                   ` Kevin Wolf
2019-06-21  9:16                   ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2019-06-21  9:59                     ` Vladimir Sementsov-Ogievskiy
2019-06-24  9:46                       ` Denis Plotnikov
2019-06-26  8:46                         ` Denis Plotnikov
2019-06-28 12:32                           ` Kevin Wolf
2019-07-02 14:41                             ` Denis Plotnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190117142353.GA6761@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=dplotnikov@virtuozzo.com \
    --cc=famz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.