* [Qemu-devel] aio context ownership during bdrv_close()
@ 2019-04-26 12:24 Anton Kuchin
2019-04-26 12:24 ` Anton Kuchin
2019-04-29 10:38 ` Kevin Wolf
0 siblings, 2 replies; 6+ messages in thread
From: Anton Kuchin @ 2019-04-26 12:24 UTC (permalink / raw)
To: qemu-block
Cc: qemu-devel, Max Reitz, Kevin Wolf,
yc-core (рассылка)
I can't figure out ownership of aio context during bdrv_close().
As far as I understand bdrv_unref() shold be called with acquired aio
context to prevent concurrent operations (at least most usages in
blockdev.c explicitly acquire and release context, but not all).
But if refcount reaches zero and bs is going to be deleted in
bdrv_close() we need to ensure that drain is finished data is flushed
and there are no more pending coroutines and bottomhalves, so drain and
flush functions can enter coroutine and perform yield in several places.
As a result control returns to coroutine caller that will release aio
context and when completion bh will continue cleanup process it will be
executed without ownership of context. Is this a valid situation?
Moreover if yield happens bs that is being deleted has zero refcount but
is still present in lists graph_bdrv_states and all_bdrv_states and can
be accidentally accessed. Shouldn't we remove it from these lists ASAP
when deletion process starts as we do from monitor_bdrv_states?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] aio context ownership during bdrv_close()
2019-04-26 12:24 [Qemu-devel] aio context ownership during bdrv_close() Anton Kuchin
@ 2019-04-26 12:24 ` Anton Kuchin
2019-04-29 10:38 ` Kevin Wolf
1 sibling, 0 replies; 6+ messages in thread
From: Anton Kuchin @ 2019-04-26 12:24 UTC (permalink / raw)
To: qemu-block
Cc: Kevin Wolf, qemu-devel,
yc-core (рассылка),
Max Reitz
I can't figure out ownership of aio context during bdrv_close().
As far as I understand bdrv_unref() shold be called with acquired aio
context to prevent concurrent operations (at least most usages in
blockdev.c explicitly acquire and release context, but not all).
But if refcount reaches zero and bs is going to be deleted in
bdrv_close() we need to ensure that drain is finished data is flushed
and there are no more pending coroutines and bottomhalves, so drain and
flush functions can enter coroutine and perform yield in several places.
As a result control returns to coroutine caller that will release aio
context and when completion bh will continue cleanup process it will be
executed without ownership of context. Is this a valid situation?
Moreover if yield happens bs that is being deleted has zero refcount but
is still present in lists graph_bdrv_states and all_bdrv_states and can
be accidentally accessed. Shouldn't we remove it from these lists ASAP
when deletion process starts as we do from monitor_bdrv_states?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] aio context ownership during bdrv_close()
2019-04-26 12:24 [Qemu-devel] aio context ownership during bdrv_close() Anton Kuchin
2019-04-26 12:24 ` Anton Kuchin
@ 2019-04-29 10:38 ` Kevin Wolf
2019-04-29 10:38 ` Kevin Wolf
2019-05-06 12:47 ` Anton Kuchin
1 sibling, 2 replies; 6+ messages in thread
From: Kevin Wolf @ 2019-04-29 10:38 UTC (permalink / raw)
To: Anton Kuchin
Cc: qemu-block, qemu-devel, Max Reitz,
yc-core (рассылка)
Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> I can't figure out ownership of aio context during bdrv_close().
>
> As far as I understand bdrv_unref() shold be called with acquired aio
> context to prevent concurrent operations (at least most usages in blockdev.c
> explicitly acquire and release context, but not all).
I think the theory is like this:
1. bdrv_unref() can only be called from the main thread
2. A block node for which bdrv_close() is called has no references. If
there are no more parents that keep it in a non-default iothread,
they should be in the main AioContext. So no locks need to be taken
during bdrv_close().
In practice, 2. is not fully true today, even though block devices do
stop their dataplane and move the block nodes back to the main
AioContext on shutdown. I am currently working on some fixes related to
this, afterwards the situation should be better.
> But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> we need to ensure that drain is finished data is flushed and there are no
> more pending coroutines and bottomhalves, so drain and flush functions can
> enter coroutine and perform yield in several places. As a result control
> returns to coroutine caller that will release aio context and when
> completion bh will continue cleanup process it will be executed without
> ownership of context. Is this a valid situation?
Do you have an example where this happens?
Normally, leaving the coroutine means that the AioContext lock is
released, but it is later reentered from the same AioContext, so the
lock will be taken again.
> Moreover if yield happens bs that is being deleted has zero refcount but is
> still present in lists graph_bdrv_states and all_bdrv_states and can be
> accidentally accessed. Shouldn't we remove it from these lists ASAP when
> deletion process starts as we do from monitor_bdrv_states?
Hm, I think it should only disappear when the image file is actually
closed. But in practice, it probably makes little difference either way.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] aio context ownership during bdrv_close()
2019-04-29 10:38 ` Kevin Wolf
@ 2019-04-29 10:38 ` Kevin Wolf
2019-05-06 12:47 ` Anton Kuchin
1 sibling, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2019-04-29 10:38 UTC (permalink / raw)
To: Anton Kuchin
Cc: yc-core (рассылка),
qemu-devel, qemu-block, Max Reitz
Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> I can't figure out ownership of aio context during bdrv_close().
>
> As far as I understand bdrv_unref() shold be called with acquired aio
> context to prevent concurrent operations (at least most usages in blockdev.c
> explicitly acquire and release context, but not all).
I think the theory is like this:
1. bdrv_unref() can only be called from the main thread
2. A block node for which bdrv_close() is called has no references. If
there are no more parents that keep it in a non-default iothread,
they should be in the main AioContext. So no locks need to be taken
during bdrv_close().
In practice, 2. is not fully true today, even though block devices do
stop their dataplane and move the block nodes back to the main
AioContext on shutdown. I am currently working on some fixes related to
this, afterwards the situation should be better.
> But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> we need to ensure that drain is finished data is flushed and there are no
> more pending coroutines and bottomhalves, so drain and flush functions can
> enter coroutine and perform yield in several places. As a result control
> returns to coroutine caller that will release aio context and when
> completion bh will continue cleanup process it will be executed without
> ownership of context. Is this a valid situation?
Do you have an example where this happens?
Normally, leaving the coroutine means that the AioContext lock is
released, but it is later reentered from the same AioContext, so the
lock will be taken again.
> Moreover if yield happens bs that is being deleted has zero refcount but is
> still present in lists graph_bdrv_states and all_bdrv_states and can be
> accidentally accessed. Shouldn't we remove it from these lists ASAP when
> deletion process starts as we do from monitor_bdrv_states?
Hm, I think it should only disappear when the image file is actually
closed. But in practice, it probably makes little difference either way.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] aio context ownership during bdrv_close()
2019-04-29 10:38 ` Kevin Wolf
2019-04-29 10:38 ` Kevin Wolf
@ 2019-05-06 12:47 ` Anton Kuchin
2019-05-06 13:57 ` Kevin Wolf
1 sibling, 1 reply; 6+ messages in thread
From: Anton Kuchin @ 2019-05-06 12:47 UTC (permalink / raw)
To: Kevin Wolf
Cc: yc-core (рассылка),
qemu-devel, qemu-block, Max Reitz
On 29/04/2019 13:38, Kevin Wolf wrote:
> Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
>> I can't figure out ownership of aio context during bdrv_close().
>>
>> As far as I understand bdrv_unref() shold be called with acquired aio
>> context to prevent concurrent operations (at least most usages in blockdev.c
>> explicitly acquire and release context, but not all).
> I think the theory is like this:
>
> 1. bdrv_unref() can only be called from the main thread
>
> 2. A block node for which bdrv_close() is called has no references. If
> there are no more parents that keep it in a non-default iothread,
> they should be in the main AioContext. So no locks need to be taken
> during bdrv_close().
>
> In practice, 2. is not fully true today, even though block devices do
> stop their dataplane and move the block nodes back to the main
> AioContext on shutdown. I am currently working on some fixes related to
> this, afterwards the situation should be better.
>
>> But if refcount reaches zero and bs is going to be deleted in bdrv_close()
>> we need to ensure that drain is finished data is flushed and there are no
>> more pending coroutines and bottomhalves, so drain and flush functions can
>> enter coroutine and perform yield in several places. As a result control
>> returns to coroutine caller that will release aio context and when
>> completion bh will continue cleanup process it will be executed without
>> ownership of context. Is this a valid situation?
> Do you have an example where this happens?
>
> Normally, leaving the coroutine means that the AioContext lock is
> released, but it is later reentered from the same AioContext, so the
> lock will be taken again.
I was wrong. Coroutines do acquire aio context when reentered.
>
>> Moreover if yield happens bs that is being deleted has zero refcount but is
>> still present in lists graph_bdrv_states and all_bdrv_states and can be
>> accidentally accessed. Shouldn't we remove it from these lists ASAP when
>> deletion process starts as we do from monitor_bdrv_states?
> Hm, I think it should only disappear when the image file is actually
> closed. But in practice, it probably makes little difference either way.
I'm still worried about a period of time since coroutine yields and
until it is reentered, looks like aio context can be grabbed by other
code and bs can be treated as valid.
I have no example of such situation too but I see there bdrv_aio_flush
and bdrv_co_flush_to_disk callbacks which are called during flush and
can take unpredicable time to complete (e.g. raw_aio_flush in file-win32
uses thread pool inside to process request and RBD can also be affected
but I didn't dig deep enough to be sure).
If main loop starts processing next qmp command before completion is
called lists will be in inconsistent state and hard to debug
use-after-free bugs and crashes can happen.
Fix seems trivial and shouldn't break any existing code.
---
diff --git a/block.c b/block.c
index 16615bc876..25c3b72520 100644
--- a/block.c
+++ b/block.c
@@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
assert(bdrv_op_blocker_is_empty(bs));
assert(!bs->refcnt);
- bdrv_close(bs);
-
/* remove from list, if necessary */
if (bs->node_name[0] != '\0') {
QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
}
QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
+ bdrv_close(bs);
+
g_free(bs);
}
--
>
> Kevin
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] aio context ownership during bdrv_close()
2019-05-06 12:47 ` Anton Kuchin
@ 2019-05-06 13:57 ` Kevin Wolf
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2019-05-06 13:57 UTC (permalink / raw)
To: Anton Kuchin
Cc: yc-core (рассылка),
qemu-devel, qemu-block, Max Reitz
Am 06.05.2019 um 14:47 hat Anton Kuchin geschrieben:
> On 29/04/2019 13:38, Kevin Wolf wrote:
> > Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:
> > > I can't figure out ownership of aio context during bdrv_close().
> > >
> > > As far as I understand bdrv_unref() shold be called with acquired aio
> > > context to prevent concurrent operations (at least most usages in blockdev.c
> > > explicitly acquire and release context, but not all).
> > I think the theory is like this:
> >
> > 1. bdrv_unref() can only be called from the main thread
> >
> > 2. A block node for which bdrv_close() is called has no references. If
> > there are no more parents that keep it in a non-default iothread,
> > they should be in the main AioContext. So no locks need to be taken
> > during bdrv_close().
> >
> > In practice, 2. is not fully true today, even though block devices do
> > stop their dataplane and move the block nodes back to the main
> > AioContext on shutdown. I am currently working on some fixes related to
> > this, afterwards the situation should be better.
> >
> > > But if refcount reaches zero and bs is going to be deleted in bdrv_close()
> > > we need to ensure that drain is finished data is flushed and there are no
> > > more pending coroutines and bottomhalves, so drain and flush functions can
> > > enter coroutine and perform yield in several places. As a result control
> > > returns to coroutine caller that will release aio context and when
> > > completion bh will continue cleanup process it will be executed without
> > > ownership of context. Is this a valid situation?
> > Do you have an example where this happens?
> >
> > Normally, leaving the coroutine means that the AioContext lock is
> > released, but it is later reentered from the same AioContext, so the
> > lock will be taken again.
> I was wrong. Coroutines do acquire aio context when reentered.
> >
> > > Moreover if yield happens bs that is being deleted has zero refcount but is
> > > still present in lists graph_bdrv_states and all_bdrv_states and can be
> > > accidentally accessed. Shouldn't we remove it from these lists ASAP when
> > > deletion process starts as we do from monitor_bdrv_states?
> > Hm, I think it should only disappear when the image file is actually
> > closed. But in practice, it probably makes little difference either way.
>
> I'm still worried about a period of time since coroutine yields and until it
> is reentered, looks like aio context can be grabbed by other code and bs can
> be treated as valid.
>
> I have no example of such situation too but I see there bdrv_aio_flush and
> bdrv_co_flush_to_disk callbacks which are called during flush and can take
> unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread
> pool inside to process request and RBD can also be affected but I didn't dig
> deep enough to be sure).
>
> If main loop starts processing next qmp command before completion is called
> lists will be in inconsistent state and hard to debug use-after-free bugs
> and crashes can happen.
Hm, at the point of flush, bs is actually still valid, so e.g.
query-block would just work. But I think we would indeed have a problem
if a new reference to the node is created.
> Fix seems trivial and shouldn't break any existing code.
>
> ---
>
> diff --git a/block.c b/block.c
> index 16615bc876..25c3b72520 100644
> --- a/block.c
> +++ b/block.c
> @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
> assert(bdrv_op_blocker_is_empty(bs));
> assert(!bs->refcnt);
>
> - bdrv_close(bs);
> -
> /* remove from list, if necessary */
> if (bs->node_name[0] != '\0') {
> QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
> }
> QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
>
> + bdrv_close(bs);
> +
> g_free(bs);
> }
This looks reasonable enough to me.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-05-06 13:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-26 12:24 [Qemu-devel] aio context ownership during bdrv_close() Anton Kuchin
2019-04-26 12:24 ` Anton Kuchin
2019-04-29 10:38 ` Kevin Wolf
2019-04-29 10:38 ` Kevin Wolf
2019-05-06 12:47 ` Anton Kuchin
2019-05-06 13:57 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).