qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Anton Kuchin <antonkuchin@yandex-team.ru>
Cc: "yc-core (рассылка)" <yc-core@yandex-team.ru>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	"Max Reitz" <mreitz@redhat.com>
Subject: Re: [Qemu-devel] aio context ownership during bdrv_close()
Date: Mon, 6 May 2019 15:57:03 +0200	[thread overview]
Message-ID: <20190506135703.GB6288@localhost.localdomain> (raw)
In-Reply-To: <acda1900-6d9f-e551-6310-4192514be35a@yandex-team.ru>

Am 06.05.2019 um 14:47 hat Anton Kuchin geschrieben:
> On 29/04/2019 13:38, Kevin Wolf wrote:
> > Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> > > I can't figure out ownership of aio context during bdrv_close().
> > > 
> > > As far as I understand bdrv_unref() shold be called with acquired aio
> > > context to prevent concurrent operations (at least most usages in blockdev.c
> > > explicitly acquire and release context, but not all).
> > I think the theory is like this:
> > 
> > 1. bdrv_unref() can only be called from the main thread
> > 
> > 2. A block node for which bdrv_close() is called has no references. If
> >     there are no more parents that keep it in a non-default iothread,
> >     they should be in the main AioContext. So no locks need to be taken
> >     during bdrv_close().
> > 
> > In practice, 2. is not fully true today, even though block devices do
> > stop their dataplane and move the block nodes back to the main
> > AioContext on shutdown. I am currently working on some fixes related to
> > this, afterwards the situation should be better.
> > 
> > > But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> > > we need to ensure that drain is finished data is flushed and there are no
> > > more pending coroutines and bottomhalves, so drain and flush functions can
> > > enter coroutine and perform yield in several places. As a result control
> > > returns to coroutine caller that will release aio context and when
> > > completion bh will continue cleanup process it will be executed without
> > > ownership of context. Is this a valid situation?
> > Do you have an example where this happens?
> > 
> > Normally, leaving the coroutine means that the AioContext lock is
> > released, but it is later reentered from the same AioContext, so the
> > lock will be taken again.
> I was wrong. Coroutines do acquire aio context when reentered.
> > 
> > > Moreover if yield happens bs that is being deleted has zero refcount but is
> > > still present in lists graph_bdrv_states and all_bdrv_states and can be
> > > accidentally accessed. Shouldn't we remove it from these lists ASAP when
> > > deletion process starts as we do from monitor_bdrv_states?
> > Hm, I think it should only disappear when the image file is actually
> > closed. But in practice, it probably makes little difference either way.
> 
> I'm still worried about a period of time since coroutine yields and until it
> is reentered, looks like aio context can be grabbed by other code and bs can
> be treated as valid.
> 
> I have no example of such situation too but I see there bdrv_aio_flush and
> bdrv_co_flush_to_disk callbacks which are called during flush and can take
> unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread
> pool inside to process request and RBD can also be affected but I didn't dig
> deep enough to be sure).
> 
> If main loop starts processing next qmp command before completion is called
> lists will be in inconsistent state and hard to debug use-after-free bugs
> and crashes can happen.

Hm, at the point of flush, bs is actually still valid, so e.g.
query-block would just work. But I think we would indeed have a problem
if a new reference to the node is created.

> Fix seems trivial and shouldn't break any existing code.
> 
> ---
> 
> diff --git a/block.c b/block.c
> index 16615bc876..25c3b72520 100644
> --- a/block.c
> +++ b/block.c
> @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
>      assert(bdrv_op_blocker_is_empty(bs));
>      assert(!bs->refcnt);
> 
> -    bdrv_close(bs);
> -
>      /* remove from list, if necessary */
>      if (bs->node_name[0] != '\0') {
>          QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
>      }
>      QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
> 
> +    bdrv_close(bs);
> +
>      g_free(bs);
>  }

This looks reasonable enough to me.

Kevin


      reply	other threads:[~2019-05-06 13:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-26 12:24 [Qemu-devel] aio context ownership during bdrv_close() Anton Kuchin
2019-04-26 12:24 ` Anton Kuchin
2019-04-29 10:38 ` Kevin Wolf
2019-04-29 10:38   ` Kevin Wolf
2019-05-06 12:47   ` Anton Kuchin
2019-05-06 13:57     ` Kevin Wolf [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190506135703.GB6288@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=antonkuchin@yandex-team.ru \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=yc-core@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).