[Qemu-devel] aio context ownership during bdrv

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] aio context ownership during bdrv_close()
@ 2019-04-26 12:24 Anton Kuchin
  2019-04-26 12:24 ` Anton Kuchin
  2019-04-29 10:38 ` Kevin Wolf
  0 siblings, 2 replies; 6+ messages in thread
From: Anton Kuchin @ 2019-04-26 12:24 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf,
	yc-core (рассылка)

I can't figure out ownership of aio context during bdrv_close().

As far as I understand bdrv_unref() shold be called with acquired aio 
context to prevent concurrent operations (at least most usages in 
blockdev.c explicitly acquire and release context, but not all).

But if refcount reaches zero and bs is going to be deleted in 
bdrv_close() we need to ensure that drain is finished data is flushed 
and there are no more pending coroutines and bottomhalves, so drain and 
flush functions can enter coroutine and perform yield in several places. 
As a result control returns to coroutine caller that will release aio 
context and when completion bh will continue cleanup process it will be 
executed without ownership of context. Is this a valid situation?

Moreover if yield happens bs that is being deleted has zero refcount but 
is still present in lists graph_bdrv_states and all_bdrv_states and can 
be accidentally accessed. Shouldn't we remove it from these lists ASAP 
when deletion process starts as we do from monitor_bdrv_states?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] aio context ownership during bdrv_close()
  2019-04-26 12:24 [Qemu-devel] aio context ownership during bdrv_close() Anton Kuchin
@ 2019-04-26 12:24 ` Anton Kuchin
  2019-04-29 10:38 ` Kevin Wolf
  1 sibling, 0 replies; 6+ messages in thread
From: Anton Kuchin @ 2019-04-26 12:24 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, qemu-devel,
	yc-core (рассылка),
	Max Reitz

I can't figure out ownership of aio context during bdrv_close().

As far as I understand bdrv_unref() shold be called with acquired aio 
context to prevent concurrent operations (at least most usages in 
blockdev.c explicitly acquire and release context, but not all).

But if refcount reaches zero and bs is going to be deleted in 
bdrv_close() we need to ensure that drain is finished data is flushed 
and there are no more pending coroutines and bottomhalves, so drain and 
flush functions can enter coroutine and perform yield in several places. 
As a result control returns to coroutine caller that will release aio 
context and when completion bh will continue cleanup process it will be 
executed without ownership of context. Is this a valid situation?

Moreover if yield happens bs that is being deleted has zero refcount but 
is still present in lists graph_bdrv_states and all_bdrv_states and can 
be accidentally accessed. Shouldn't we remove it from these lists ASAP 
when deletion process starts as we do from monitor_bdrv_states?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] aio context ownership during bdrv_close()
  2019-04-26 12:24 [Qemu-devel] aio context ownership during bdrv_close() Anton Kuchin
  2019-04-26 12:24 ` Anton Kuchin
@ 2019-04-29 10:38 ` Kevin Wolf
  2019-04-29 10:38   ` Kevin Wolf
  2019-05-06 12:47   ` Anton Kuchin
  1 sibling, 2 replies; 6+ messages in thread
From: Kevin Wolf @ 2019-04-29 10:38 UTC (permalink / raw)
  To: Anton Kuchin
  Cc: qemu-block, qemu-devel, Max Reitz,
	yc-core (рассылка)

Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> I can't figure out ownership of aio context during bdrv_close().
> 
> As far as I understand bdrv_unref() shold be called with acquired aio
> context to prevent concurrent operations (at least most usages in blockdev.c
> explicitly acquire and release context, but not all).

I think the theory is like this:

1. bdrv_unref() can only be called from the main thread

2. A block node for which bdrv_close() is called has no references. If
   there are no more parents that keep it in a non-default iothread,
   they should be in the main AioContext. So no locks need to be taken
   during bdrv_close().

In practice, 2. is not fully true today, even though block devices do
stop their dataplane and move the block nodes back to the main
AioContext on shutdown. I am currently working on some fixes related to
this, afterwards the situation should be better.

> But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> we need to ensure that drain is finished data is flushed and there are no
> more pending coroutines and bottomhalves, so drain and flush functions can
> enter coroutine and perform yield in several places. As a result control
> returns to coroutine caller that will release aio context and when
> completion bh will continue cleanup process it will be executed without
> ownership of context. Is this a valid situation?

Do you have an example where this happens?

Normally, leaving the coroutine means that the AioContext lock is
released, but it is later reentered from the same AioContext, so the
lock will be taken again.

> Moreover if yield happens bs that is being deleted has zero refcount but is
> still present in lists graph_bdrv_states and all_bdrv_states and can be
> accidentally accessed. Shouldn't we remove it from these lists ASAP when
> deletion process starts as we do from monitor_bdrv_states?

Hm, I think it should only disappear when the image file is actually
closed. But in practice, it probably makes little difference either way.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] aio context ownership during bdrv_close()
  2019-04-29 10:38 ` Kevin Wolf
@ 2019-04-29 10:38   ` Kevin Wolf
  2019-05-06 12:47   ` Anton Kuchin
  1 sibling, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2019-04-29 10:38 UTC (permalink / raw)
  To: Anton Kuchin
  Cc: yc-core (рассылка),
	qemu-devel, qemu-block, Max Reitz

Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> I can't figure out ownership of aio context during bdrv_close().
> 
> As far as I understand bdrv_unref() shold be called with acquired aio
> context to prevent concurrent operations (at least most usages in blockdev.c
> explicitly acquire and release context, but not all).

I think the theory is like this:

1. bdrv_unref() can only be called from the main thread

2. A block node for which bdrv_close() is called has no references. If
   there are no more parents that keep it in a non-default iothread,
   they should be in the main AioContext. So no locks need to be taken
   during bdrv_close().

In practice, 2. is not fully true today, even though block devices do
stop their dataplane and move the block nodes back to the main
AioContext on shutdown. I am currently working on some fixes related to
this, afterwards the situation should be better.

> But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> we need to ensure that drain is finished data is flushed and there are no
> more pending coroutines and bottomhalves, so drain and flush functions can
> enter coroutine and perform yield in several places. As a result control
> returns to coroutine caller that will release aio context and when
> completion bh will continue cleanup process it will be executed without
> ownership of context. Is this a valid situation?

Do you have an example where this happens?

Normally, leaving the coroutine means that the AioContext lock is
released, but it is later reentered from the same AioContext, so the
lock will be taken again.

> Moreover if yield happens bs that is being deleted has zero refcount but is
> still present in lists graph_bdrv_states and all_bdrv_states and can be
> accidentally accessed. Shouldn't we remove it from these lists ASAP when
> deletion process starts as we do from monitor_bdrv_states?

Hm, I think it should only disappear when the image file is actually
closed. But in practice, it probably makes little difference either way.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] aio context ownership during bdrv_close()
  2019-04-29 10:38 ` Kevin Wolf
  2019-04-29 10:38   ` Kevin Wolf
@ 2019-05-06 12:47   ` Anton Kuchin
  2019-05-06 13:57     ` Kevin Wolf
  1 sibling, 1 reply; 6+ messages in thread
From: Anton Kuchin @ 2019-05-06 12:47 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: yc-core (рассылка),
	qemu-devel, qemu-block, Max Reitz

On 29/04/2019 13:38, Kevin Wolf wrote:
> Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
>> I can't figure out ownership of aio context during bdrv_close().
>>
>> As far as I understand bdrv_unref() shold be called with acquired aio
>> context to prevent concurrent operations (at least most usages in blockdev.c
>> explicitly acquire and release context, but not all).
> I think the theory is like this:
>
> 1. bdrv_unref() can only be called from the main thread
>
> 2. A block node for which bdrv_close() is called has no references. If
>     there are no more parents that keep it in a non-default iothread,
>     they should be in the main AioContext. So no locks need to be taken
>     during bdrv_close().
>
> In practice, 2. is not fully true today, even though block devices do
> stop their dataplane and move the block nodes back to the main
> AioContext on shutdown. I am currently working on some fixes related to
> this, afterwards the situation should be better.
>
>> But if refcount reaches zero and bs is going to be deleted in bdrv_close()
>> we need to ensure that drain is finished data is flushed and there are no
>> more pending coroutines and bottomhalves, so drain and flush functions can
>> enter coroutine and perform yield in several places. As a result control
>> returns to coroutine caller that will release aio context and when
>> completion bh will continue cleanup process it will be executed without
>> ownership of context. Is this a valid situation?
> Do you have an example where this happens?
>
> Normally, leaving the coroutine means that the AioContext lock is
> released, but it is later reentered from the same AioContext, so the
> lock will be taken again.
I was wrong. Coroutines do acquire aio context when reentered.
>
>> Moreover if yield happens bs that is being deleted has zero refcount but is
>> still present in lists graph_bdrv_states and all_bdrv_states and can be
>> accidentally accessed. Shouldn't we remove it from these lists ASAP when
>> deletion process starts as we do from monitor_bdrv_states?
> Hm, I think it should only disappear when the image file is actually
> closed. But in practice, it probably makes little difference either way.

I'm still worried about a period of time since coroutine yields and 
until it is reentered, looks like aio context can be grabbed by other 
code and bs can be treated as valid.

I have no example of such situation too but I see there bdrv_aio_flush 
and bdrv_co_flush_to_disk callbacks which are called during flush and 
can take unpredicable time to complete (e.g. raw_aio_flush in file-win32 
uses thread pool inside to process request and RBD can also be affected 
but I didn't dig deep enough to be sure).

If main loop starts processing next qmp command before completion is 
called lists will be in inconsistent state and hard to debug 
use-after-free bugs and crashes can happen.

Fix seems trivial and shouldn't break any existing code.

---

diff --git a/block.c b/block.c
index 16615bc876..25c3b72520 100644
--- a/block.c
+++ b/block.c
@@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
      assert(bdrv_op_blocker_is_empty(bs));
      assert(!bs->refcnt);

-    bdrv_close(bs);
-
      /* remove from list, if necessary */
      if (bs->node_name[0] != '\0') {
          QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
      }
      QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);

+    bdrv_close(bs);
+
      g_free(bs);
  }

--


>
> Kevin


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] aio context ownership during bdrv_close()
  2019-05-06 12:47   ` Anton Kuchin
@ 2019-05-06 13:57     ` Kevin Wolf
  0 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2019-05-06 13:57 UTC (permalink / raw)
  To: Anton Kuchin
  Cc: yc-core (рассылка),
	qemu-devel, qemu-block, Max Reitz

Am 06.05.2019 um 14:47 hat Anton Kuchin geschrieben:
> On 29/04/2019 13:38, Kevin Wolf wrote:
> > Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> > > I can't figure out ownership of aio context during bdrv_close().
> > > 
> > > As far as I understand bdrv_unref() shold be called with acquired aio
> > > context to prevent concurrent operations (at least most usages in blockdev.c
> > > explicitly acquire and release context, but not all).
> > I think the theory is like this:
> > 
> > 1. bdrv_unref() can only be called from the main thread
> > 
> > 2. A block node for which bdrv_close() is called has no references. If
> >     there are no more parents that keep it in a non-default iothread,
> >     they should be in the main AioContext. So no locks need to be taken
> >     during bdrv_close().
> > 
> > In practice, 2. is not fully true today, even though block devices do
> > stop their dataplane and move the block nodes back to the main
> > AioContext on shutdown. I am currently working on some fixes related to
> > this, afterwards the situation should be better.
> > 
> > > But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> > > we need to ensure that drain is finished data is flushed and there are no
> > > more pending coroutines and bottomhalves, so drain and flush functions can
> > > enter coroutine and perform yield in several places. As a result control
> > > returns to coroutine caller that will release aio context and when
> > > completion bh will continue cleanup process it will be executed without
> > > ownership of context. Is this a valid situation?
> > Do you have an example where this happens?
> > 
> > Normally, leaving the coroutine means that the AioContext lock is
> > released, but it is later reentered from the same AioContext, so the
> > lock will be taken again.
> I was wrong. Coroutines do acquire aio context when reentered.
> > 
> > > Moreover if yield happens bs that is being deleted has zero refcount but is
> > > still present in lists graph_bdrv_states and all_bdrv_states and can be
> > > accidentally accessed. Shouldn't we remove it from these lists ASAP when
> > > deletion process starts as we do from monitor_bdrv_states?
> > Hm, I think it should only disappear when the image file is actually
> > closed. But in practice, it probably makes little difference either way.
> 
> I'm still worried about a period of time since coroutine yields and until it
> is reentered, looks like aio context can be grabbed by other code and bs can
> be treated as valid.
> 
> I have no example of such situation too but I see there bdrv_aio_flush and
> bdrv_co_flush_to_disk callbacks which are called during flush and can take
> unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread
> pool inside to process request and RBD can also be affected but I didn't dig
> deep enough to be sure).
> 
> If main loop starts processing next qmp command before completion is called
> lists will be in inconsistent state and hard to debug use-after-free bugs
> and crashes can happen.

Hm, at the point of flush, bs is actually still valid, so e.g.
query-block would just work. But I think we would indeed have a problem
if a new reference to the node is created.

> Fix seems trivial and shouldn't break any existing code.
> 
> ---
> 
> diff --git a/block.c b/block.c
> index 16615bc876..25c3b72520 100644
> --- a/block.c
> +++ b/block.c
> @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
>      assert(bdrv_op_blocker_is_empty(bs));
>      assert(!bs->refcnt);
> 
> -    bdrv_close(bs);
> -
>      /* remove from list, if necessary */
>      if (bs->node_name[0] != '\0') {
>          QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
>      }
>      QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
> 
> +    bdrv_close(bs);
> +
>      g_free(bs);
>  }

This looks reasonable enough to me.

Kevin


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-06 13:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-26 12:24 [Qemu-devel] aio context ownership during bdrv_close() Anton Kuchin
2019-04-26 12:24 ` Anton Kuchin
2019-04-29 10:38 ` Kevin Wolf
2019-04-29 10:38   ` Kevin Wolf
2019-05-06 12:47   ` Anton Kuchin
2019-05-06 13:57     ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).