[Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
@ 2013-06-25 17:38 Liu Ping Fan
  2013-06-25  6:24 ` Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi

This series relies on refcnt of object used by bh callback to run against unplug.

Open issue:
Another choice may be rcu, but I think some issues are hard to resolve.
Using rcu, we have two choice:
  when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release.
  Or making qemu_bh_delete() sync in the path of DeviceState's finalization.
but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition.

Liu Ping Fan (3):
  QEMUBH: introduce canceled member for bh
  QEMUBH: pin bh's referring object while scheduling
  virtio-net: set referred object for virtio net's bh

 async.c             | 37 ++++++++++++++++++++++++++++++++-----
 hw/net/virtio-net.c |  1 +
 include/block/aio.h |  6 ++++++
 stubs/Makefile.objs |  1 +
 4 files changed, 40 insertions(+), 5 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan
@ 2013-06-25  6:24 ` Paolo Bonzini
  2013-06-25  6:32   ` liu ping fan
  2013-06-25 17:38 ` [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh Liu Ping Fan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2013-06-25  6:24 UTC (permalink / raw)
  To: Liu Ping Fan; +Cc: qemu-devel, Stefan Hajnoczi

Il 25/06/2013 19:38, Liu Ping Fan ha scritto:
> This series relies on refcnt of object used by bh callback to run against unplug.
> 
> Open issue:
> Another choice may be rcu, but I think some issues are hard to resolve.
> Using rcu, we have two choice:
>   when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release.
>   Or making qemu_bh_delete() sync in the path of DeviceState's finalization.

What do you mean exactly?  qemu_bh_delete() is async, but it operates on
dynamically-allocated memory.

Paolo

> but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-25  6:24 ` Paolo Bonzini
@ 2013-06-25  6:32   ` liu ping fan
  2013-06-25  7:53     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: liu ping fan @ 2013-06-25  6:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Tue, Jun 25, 2013 at 2:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 25/06/2013 19:38, Liu Ping Fan ha scritto:
>> This series relies on refcnt of object used by bh callback to run against unplug.
>>
>> Open issue:
>> Another choice may be rcu, but I think some issues are hard to resolve.
>> Using rcu, we have two choice:
>>   when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release.
>>   Or making qemu_bh_delete() sync in the path of DeviceState's finalization.
>
> What do you mean exactly?  qemu_bh_delete() is async, but it operates on
> dynamically-allocated memory.
>
Before DeviceState is freed, we should make sure the bh is delete, and
not related bh is in fly.

> Paolo
>
>> but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-25  6:32   ` liu ping fan
@ 2013-06-25  7:53     ` Paolo Bonzini
  2013-06-26  2:59       ` liu ping fan
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2013-06-25  7:53 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi

Il 25/06/2013 08:32, liu ping fan ha scritto:
> On Tue, Jun 25, 2013 at 2:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 25/06/2013 19:38, Liu Ping Fan ha scritto:
>>> This series relies on refcnt of object used by bh callback to run against unplug.
>>>
>>> Open issue:
>>> Another choice may be rcu, but I think some issues are hard to resolve.
>>> Using rcu, we have two choice:
>>>   when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release.
>>>   Or making qemu_bh_delete() sync in the path of DeviceState's finalization.
>>
>> What do you mean exactly?  qemu_bh_delete() is async, but it operates on
>> dynamically-allocated memory.
>>
> Before DeviceState is freed, we should make sure the bh is delete, and
> not related bh is in fly.

The bh need not be deleted.  It just needs to be not-scheduled (easy,
qemu_bh_delete does that) and not running.

The latter part could be the hard one in a multi-threaded context, but I
think it's up to the device to ensure it.  It doesn't _have_ to be hard.
 For example, joining the data-plane thread would do that as well.

Paolo

>> Paolo
>>
>>> but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition.
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-25  7:53     ` Paolo Bonzini
@ 2013-06-26  2:59       ` liu ping fan
  2013-06-26  6:34         ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: liu ping fan @ 2013-06-26  2:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Tue, Jun 25, 2013 at 3:53 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 25/06/2013 08:32, liu ping fan ha scritto:
>> On Tue, Jun 25, 2013 at 2:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 25/06/2013 19:38, Liu Ping Fan ha scritto:
>>>> This series relies on refcnt of object used by bh callback to run against unplug.
>>>>
>>>> Open issue:
>>>> Another choice may be rcu, but I think some issues are hard to resolve.
>>>> Using rcu, we have two choice:
>>>>   when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release.
>>>>   Or making qemu_bh_delete() sync in the path of DeviceState's finalization.
>>>
>>> What do you mean exactly?  qemu_bh_delete() is async, but it operates on
>>> dynamically-allocated memory.
>>>
>> Before DeviceState is freed, we should make sure the bh is delete, and
>> not related bh is in fly.
>
> The bh need not be deleted.  It just needs to be not-scheduled (easy,
> qemu_bh_delete does that) and not running.
>
Yes, not-scheduled and not running are the only things needed, but not
easy to know.
> The latter part could be the hard one in a multi-threaded context, but I
> think it's up to the device to ensure it.  It doesn't _have_ to be hard.
>  For example, joining the data-plane thread would do that as well.
>
It seems not easy, take consideration of the sequence:
    _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState
    wait for rcu grace period to come, // ensure not running
    free DeviceState in rcu-reclaimer
But  in current code, _delete() can be placed in DeviceState
finalization path(in reclaimer), which means while reclaiming, there
still exist access path to the DeviceState.
On the other hand, pushing _delete() out of  finalization path is not
easy, since we do not what time the DeviceState has done with its bh.

Regards,
Pingfan
> Paolo
>
>>> Paolo
>>>
>>>> but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition.
>>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-26  2:59       ` liu ping fan
@ 2013-06-26  6:34         ` Paolo Bonzini
  2013-06-26  8:20           ` liu ping fan
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2013-06-26  6:34 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi

Il 26/06/2013 04:59, liu ping fan ha scritto:
>> The latter part could be the hard one in a multi-threaded context, but I
>> think it's up to the device to ensure it.  It doesn't _have_ to be hard.
>>  For example, joining the data-plane thread would do that as well.
>>
> It seems not easy, take consideration of the sequence:
>     _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState
>     wait for rcu grace period to come, // ensure not running
>     free DeviceState in rcu-reclaimer
> But  in current code, _delete() can be placed in DeviceState
> finalization path(in reclaimer), which means while reclaiming, there
> still exist access path to the DeviceState.

It can be placed there, but it doesn't mean it is a good idea. :)

> On the other hand, pushing _delete() out of  finalization path is not
> easy, since we do not what time the DeviceState has done with its bh.

See above:

- if the BH will run in the iothread, the BH is definitely not running
(because the BH runs under the BQL, and the reclaimer owns it)

- if the BH is running in another thread, waiting for that thread to
terminate obviously makes the BH not running.

What we need is separation of removal and reclamation.  Without that,
any effort to make things unplug-safe is going to be way way more
complicated than necessary.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-26  6:34         ` Paolo Bonzini
@ 2013-06-26  8:20           ` liu ping fan
  2013-06-26  8:38             ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: liu ping fan @ 2013-06-26  8:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Wed, Jun 26, 2013 at 2:34 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 26/06/2013 04:59, liu ping fan ha scritto:
>>> The latter part could be the hard one in a multi-threaded context, but I
>>> think it's up to the device to ensure it.  It doesn't _have_ to be hard.
>>>  For example, joining the data-plane thread would do that as well.
>>>
>> It seems not easy, take consideration of the sequence:
>>     _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState
>>     wait for rcu grace period to come, // ensure not running
>>     free DeviceState in rcu-reclaimer
>> But  in current code, _delete() can be placed in DeviceState
>> finalization path(in reclaimer), which means while reclaiming, there
>> still exist access path to the DeviceState.
>
> It can be placed there, but it doesn't mean it is a good idea. :)
>
Unfortunately , these is what current code does, otherwise, they
should _bh_new() each time when scheduling bh, and _bh_delete() in
bh's callback, i.e. callback is the exact place to know we have done
with bh. (The other place is the destroy of DeviceState)

>> On the other hand, pushing _delete() out of  finalization path is not
>> easy, since we do not what time the DeviceState has done with its bh.
>
> See above:
>
> - if the BH will run in the iothread, the BH is definitely not running
> (because the BH runs under the BQL, and the reclaimer owns it)
>
It works for the case in which the DeviceState's reclaimer calls
_bh_delete(). But in another case(also exist in current code), where
we call _bh_delete() in callback, the bh will be scheduled, and oops!

> - if the BH is running in another thread, waiting for that thread to
> terminate obviously makes the BH not running.
>
This imply that except for qemu_aio_context, AioContext can be only
shared by one device, right? Otherwise we can not just terminate the
thread. If it is true, why can not we have more than one just like
qemu_aio_context?

> What we need is separation of removal and reclamation.  Without that,
> any effort to make things unplug-safe is going to be way way more
> complicated than necessary.
>
Agree, but when I tried for bh, I found the separation of removal and
reclamation are not easy.  We can not _bh_delete() in
acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the
same time. And as explained, only two places can hold the
_bh_delete().
In a short word, with rcu, we need to constrain the calling idiom of
bh, i.e., putting them in reclaimer.  On the other hand, my patch try
to leave the calling idiom as it is, and handle this issue inside bh.

Regards,
Pingfan

> Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-26  8:20           ` liu ping fan
@ 2013-06-26  8:38             ` Paolo Bonzini
  2013-06-26  9:44               ` liu ping fan
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2013-06-26  8:38 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi

Il 26/06/2013 10:20, liu ping fan ha scritto:
>>> On the other hand, pushing _delete() out of  finalization path is not
>>> easy, since we do not what time the DeviceState has done with its bh.
>>
>> See above:
>>
>> - if the BH will run in the iothread, the BH is definitely not running
>> (because the BH runs under the BQL, and the reclaimer owns it)
>
> It works for the case in which the DeviceState's reclaimer calls
> _bh_delete(). But in another case(also exist in current code), where
> we call _bh_delete() in callback, the bh will be scheduled, and oops!

But you may know that the BH is not scheduled after removal, too.

There are not that many devices that have bottom halves (apart from
those that use bottom halves in ptimer).  If they really need to, they
can do the object_ref/unref themselves.  But I expect this to be rare,
and generic code should not need an "owner" field in bottom halves.  For
example, block devices should stop sending requests after removal.

>> - if the BH is running in another thread, waiting for that thread to
>> terminate obviously makes the BH not running.
>>
> This imply that except for qemu_aio_context, AioContext can be only
> shared by one device, right? Otherwise we can not just terminate the
> thread. If it is true, why can not we have more than one just like
> qemu_aio_context?

Yes, if you do it that way you can have only one AioContext per device.
 RCU is another way, and doesn't have the same limitation.

We should avoid introducing infrastructure that we are not sure is
needed.  For what it's worth, your patches to make the bottom half list
thread-safe are also slightly premature.  However, they do not change
the API and it makes some sense to add the infrastructure.  In this
case, I'm simply not sure that we're there yet.

If you look at the memory work, for example, the owner patches happen to
be useful for BQL-less dispatch too, but they are solving existing
hot-unplug bugs.

>> What we need is separation of removal and reclamation.  Without that,
>> any effort to make things unplug-safe is going to be way way more
>> complicated than necessary.
>>
> Agree, but when I tried for bh, I found the separation of removal and
> reclamation are not easy.  We can not _bh_delete() in
> acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the
> same time.

acpi_piix_eject_slot() is removal, not reclamation.  The plan there is
that qdev_free calls the exit callback immediately (which can do
qemu_bh_delete), and schedules an unref after the next RCU grace period.
 At the next RCU grace period the BH will not be running, thus it is
safe to finalize the object.

Paolo

 And as explained, only two places can hold the
> _bh_delete().
> In a short word, with rcu, we need to constrain the calling idiom of
> bh, i.e., putting them in reclaimer.  On the other hand, my patch try
> to leave the calling idiom as it is, and handle this issue inside bh.
> 
> Regards,
> Pingfan
> 
>> Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-26  8:38             ` Paolo Bonzini
@ 2013-06-26  9:44               ` liu ping fan
  2013-06-26  9:55                 ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: liu ping fan @ 2013-06-26  9:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 26/06/2013 10:20, liu ping fan ha scritto:
>>>> On the other hand, pushing _delete() out of  finalization path is not
>>>> easy, since we do not what time the DeviceState has done with its bh.
>>>
>>> See above:
>>>
>>> - if the BH will run in the iothread, the BH is definitely not running
>>> (because the BH runs under the BQL, and the reclaimer owns it)
>>
>> It works for the case in which the DeviceState's reclaimer calls
>> _bh_delete(). But in another case(also exist in current code), where
>> we call _bh_delete() in callback, the bh will be scheduled, and oops!
>
> But you may know that the BH is not scheduled after removal, too.
>
No, the removal can run in parallel with the other mmio-dispatch which
can resort to schedule a bh. I.e, the removal can not call
_bh_delete(), so it do not know whether bh will be scheduled or not.

> There are not that many devices that have bottom halves (apart from
> those that use bottom halves in ptimer).  If they really need to, they
> can do the object_ref/unref themselves.  But I expect this to be rare,
> and generic code should not need an "owner" field in bottom halves.  For

Agree :), so the aim is how to  handle bh safely when unplug, if we
can resolve it with rcu, it will be more fine!
> example, block devices should stop sending requests after removal.
>
Yes, but need we need take account for "stop sending request != no
mmio-dispatch in fly" ? And I think these mmio-dispatch are the
criminal :),  making it is hard to sync with bh's stop.
>>> - if the BH is running in another thread, waiting for that thread to
>>> terminate obviously makes the BH not running.
>>>
>> This imply that except for qemu_aio_context, AioContext can be only
>> shared by one device, right? Otherwise we can not just terminate the
>> thread. If it is true, why can not we have more than one just like
>> qemu_aio_context?
>
> Yes, if you do it that way you can have only one AioContext per device.
>  RCU is another way, and doesn't have the same limitation.
>
> We should avoid introducing infrastructure that we are not sure is
> needed.  For what it's worth, your patches to make the bottom half list
> thread-safe are also slightly premature.  However, they do not change
> the API and it makes some sense to add the infrastructure.  In this
> case, I'm simply not sure that we're there yet.
>
See, thx.
> If you look at the memory work, for example, the owner patches happen to
> be useful for BQL-less dispatch too, but they are solving existing
> hot-unplug bugs.
>
Oh, I will read it again, since I had thought the owner is only used
for the purpose of refcnt.

>>> What we need is separation of removal and reclamation.  Without that,
>>> any effort to make things unplug-safe is going to be way way more
>>> complicated than necessary.
>>>
>> Agree, but when I tried for bh, I found the separation of removal and
>> reclamation are not easy.  We can not _bh_delete() in
>> acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the
>> same time.
>
> acpi_piix_eject_slot() is removal, not reclamation.  The plan there is
> that qdev_free calls the exit callback immediately (which can do
> qemu_bh_delete), and schedules an unref after the next RCU grace period.
>  At the next RCU grace period the BH will not be running, thus it is
> safe to finalize the object.
>
I have two question:
1st, who trigger qdev_free?  //Not figuring out the total design, but
I feel it is quite different from kernel's design, where refcnt->0,
then exit is called.
2nd, _delete_bh() means even there is any not-severed request, the
request will be canceled. Is it legal, and will not lose data(not
sure, since I do not know who will call qdev_free)?

Thx&regards,
Pingfan
> Paolo
>
>  And as explained, only two places can hold the
>> _bh_delete().
>> In a short word, with rcu, we need to constrain the calling idiom of
>> bh, i.e., putting them in reclaimer.  On the other hand, my patch try
>> to leave the calling idiom as it is, and handle this issue inside bh.
>>
>> Regards,
>> Pingfan
>>
>>> Paolo
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-26  9:44               ` liu ping fan
@ 2013-06-26  9:55                 ` Paolo Bonzini
  2013-06-27  2:08                   ` liu ping fan
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2013-06-26  9:55 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi

Il 26/06/2013 11:44, liu ping fan ha scritto:
> On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 26/06/2013 10:20, liu ping fan ha scritto:
>>>>> On the other hand, pushing _delete() out of  finalization path is not
>>>>> easy, since we do not what time the DeviceState has done with its bh.
>>>>
>>>> See above:
>>>>
>>>> - if the BH will run in the iothread, the BH is definitely not running
>>>> (because the BH runs under the BQL, and the reclaimer owns it)
>>>
>>> It works for the case in which the DeviceState's reclaimer calls
>>> _bh_delete(). But in another case(also exist in current code), where
>>> we call _bh_delete() in callback, the bh will be scheduled, and oops!
>>
>> But you may know that the BH is not scheduled after removal, too.
>>
> No, the removal can run in parallel with the other mmio-dispatch which
> can resort to schedule a bh.

But then behavior is more or less undefined.  Of course the device must
ensure that something sane happens, but that's the responsibility of the
device, not of the generic layer.

> I.e, the removal can not call
> _bh_delete(), so it do not know whether bh will be scheduled or not.

It can call qemu_bh_delete(), then all concurrent qemu_bh_schedule()
will be no-ops.

>> If you look at the memory work, for example, the owner patches happen to
>> be useful for BQL-less dispatch too, but they are solving existing
>> hot-unplug bugs.
>>
> Oh, I will read it again, since I had thought the owner is only used
> for the purpose of refcnt.

It is.  But it goes beyond BQL-less dispatch.  Anything that gives away
the BQL between a map and unmap (including asynchronous I/O) needs an
owner.  We have ignored that until now because we do not have memory unplug.

>>>> What we need is separation of removal and reclamation.  Without that,
>>>> any effort to make things unplug-safe is going to be way way more
>>>> complicated than necessary.
>>>>
>>> Agree, but when I tried for bh, I found the separation of removal and
>>> reclamation are not easy.  We can not _bh_delete() in
>>> acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the
>>> same time.
>>
>> acpi_piix_eject_slot() is removal, not reclamation.  The plan there is
>> that qdev_free calls the exit callback immediately (which can do
>> qemu_bh_delete), and schedules an unref after the next RCU grace period.
>>  At the next RCU grace period the BH will not be running, thus it is
>> safe to finalize the object.
>>
> I have two question:
> 1st, who trigger qdev_free?  //Not figuring out the total design, but
> I feel it is quite different from kernel's design, where refcnt->0,
> then exit is called.

qdev_free is triggered by the guest, but free is a misnomer.  It is
really "make it inaccessible from the guest and management" (the kernel
equivalent would be removal of /dev and /sys entries, for example).  The
actual "free" will happen later.

> 2nd, _delete_bh() means even there is any not-severed request, the
> request will be canceled. Is it legal, and will not lose data(not
> sure, since I do not know who will call qdev_free)?

That's something that should be ensured by the device.  For example it
is not a problem to cancel virtio-net's tx_bh.  If it is a problem, the
device must do something else.  It could even be a bdrv_drain_all() in
the worst case.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-26  9:55                 ` Paolo Bonzini
@ 2013-06-27  2:08                   ` liu ping fan
  2013-06-27  6:59                     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: liu ping fan @ 2013-06-27  2:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Wed, Jun 26, 2013 at 5:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 26/06/2013 11:44, liu ping fan ha scritto:
>> On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 26/06/2013 10:20, liu ping fan ha scritto:
>>>>>> On the other hand, pushing _delete() out of  finalization path is not
>>>>>> easy, since we do not what time the DeviceState has done with its bh.
>>>>>
>>>>> See above:
>>>>>
>>>>> - if the BH will run in the iothread, the BH is definitely not running
>>>>> (because the BH runs under the BQL, and the reclaimer owns it)
>>>>
>>>> It works for the case in which the DeviceState's reclaimer calls
>>>> _bh_delete(). But in another case(also exist in current code), where
>>>> we call _bh_delete() in callback, the bh will be scheduled, and oops!
>>>
>>> But you may know that the BH is not scheduled after removal, too.
>>>
>> No, the removal can run in parallel with the other mmio-dispatch which
>> can resort to schedule a bh.
>
> But then behavior is more or less undefined.  Of course the device must
> ensure that something sane happens, but that's the responsibility of the
> device, not of the generic layer.
>
>> I.e, the removal can not call
>> _bh_delete(), so it do not know whether bh will be scheduled or not.
>
> It can call qemu_bh_delete(), then all concurrent qemu_bh_schedule()
> will be no-ops.
>
Great, with this ability, we can achieve it in a more elegant way --
rcu!  BTW, the responsibility is also assigned to guest driver wrt its
unplug behavior.

>>> If you look at the memory work, for example, the owner patches happen to
>>> be useful for BQL-less dispatch too, but they are solving existing
>>> hot-unplug bugs.
>>>
>> Oh, I will read it again, since I had thought the owner is only used
>> for the purpose of refcnt.
>
> It is.  But it goes beyond BQL-less dispatch.  Anything that gives away
> the BQL between a map and unmap (including asynchronous I/O) needs an
> owner.  We have ignored that until now because we do not have memory unplug.
>
>>>>> What we need is separation of removal and reclamation.  Without that,
>>>>> any effort to make things unplug-safe is going to be way way more
>>>>> complicated than necessary.
>>>>>
>>>> Agree, but when I tried for bh, I found the separation of removal and
>>>> reclamation are not easy.  We can not _bh_delete() in
>>>> acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the
>>>> same time.
>>>
>>> acpi_piix_eject_slot() is removal, not reclamation.  The plan there is
>>> that qdev_free calls the exit callback immediately (which can do
>>> qemu_bh_delete), and schedules an unref after the next RCU grace period.
>>>  At the next RCU grace period the BH will not be running, thus it is
>>> safe to finalize the object.
>>>
>> I have two question:
>> 1st, who trigger qdev_free?  //Not figuring out the total design, but
>> I feel it is quite different from kernel's design, where refcnt->0,
>> then exit is called.
>
> qdev_free is triggered by the guest, but free is a misnomer.  It is
> really "make it inaccessible from the guest and management" (the kernel
> equivalent would be removal of /dev and /sys entries, for example).  The
> actual "free" will happen later.
>
Without seeing your detail design, but I suggest that leaving the
"exit" as it is, and pick out the inaccessible related code to
removal. Finally, when refcnt->0, exit is called, and it play as the
final sync point for the remaining access.

Regards,
Pingfan
>> 2nd, _delete_bh() means even there is any not-severed request, the
>> request will be canceled. Is it legal, and will not lose data(not
>> sure, since I do not know who will call qdev_free)?
>
> That's something that should be ensured by the device.  For example it
> is not a problem to cancel virtio-net's tx_bh.  If it is a problem, the
> device must do something else.  It could even be a bdrv_drain_all() in
> the worst case.
>
> Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug
  2013-06-27  2:08                   ` liu ping fan
@ 2013-06-27  6:59                     ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2013-06-27  6:59 UTC (permalink / raw)
  To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi

Il 27/06/2013 04:08, liu ping fan ha scritto:
>> > qdev_free is triggered by the guest, but free is a misnomer.  It is
>> > really "make it inaccessible from the guest and management" (the kernel
>> > equivalent would be removal of /dev and /sys entries, for example).  The
>> > actual "free" will happen later.
>
> Without seeing your detail design, but I suggest that leaving the
> "exit" as it is, and pick out the inaccessible related code to
> removal.

We already have a reclamation point, it is instance_finalize.

I posted a series a few weeks ago ("[PATCH 00/39] Delay destruction of
memory regions to instance_finalize").

> Finally, when refcnt->0, exit is called, and it play as the
> final sync point for the remaining access.

It is the guest that determines when to start the removal phase.  That's
qdev_free.

refcnt = 0 means that the memory is inaccessible to the guest, and
that's when the reclamation phase is started (asynchronously: the
instance_finalize callback is actually called at the end of the RCU
grace period).

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh
  2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan
  2013-06-25  6:24 ` Paolo Bonzini
@ 2013-06-25 17:38 ` Liu Ping Fan
  2013-06-25 17:38 ` [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling Liu Ping Fan
  2013-06-25 17:38 ` [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh Liu Ping Fan
  3 siblings, 0 replies; 15+ messages in thread
From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi

When dismissing the bh scheduling(delete/cancel/schedule), we need to
reset bh->scheduled to zero and release the refcnt of object which is
referred by bh(will introduced by next patch).
Currently, the bh->scheduled will be reset to zero by many writers,
so atomic ops should be involved in, which results in expensive memory
barrier. With this patch, bh->scheduled is only reset by aio_bh_poll().

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 async.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/async.c b/async.c
index e73b93c..bea3d7e 100644
--- a/async.c
+++ b/async.c
@@ -38,6 +38,7 @@ struct QEMUBH {
     bool scheduled;
     bool idle;
     bool deleted;
+    bool canceled;
 };
 
 QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
@@ -69,8 +70,15 @@ int aio_bh_poll(AioContext *ctx)
         /* Make sure that fetching bh happens before accessing its members */
         smp_read_barrier_depends();
         next = bh->next;
-        if (!bh->deleted && bh->scheduled) {
+        if (bh->scheduled) {
             bh->scheduled = 0;
+            if (unlikely(bh->deleted)) {
+                continue;
+            }
+            if (unlikely(bh->canceled)) {
+                bh->canceled = 0;
+                continue;
+            }
             /* Paired with write barrier in bh schedule to ensure reading for
              * idle & callbacks coming after bh's scheduling.
              */
@@ -133,7 +141,7 @@ void qemu_bh_schedule(QEMUBH *bh)
  */
 void qemu_bh_cancel(QEMUBH *bh)
 {
-    bh->scheduled = 0;
+    bh->canceled = 1;
 }
 
 /* This func is async.The bottom half will do the delete action at the finial
@@ -141,7 +149,6 @@ void qemu_bh_cancel(QEMUBH *bh)
  */
 void qemu_bh_delete(QEMUBH *bh)
 {
-    bh->scheduled = 0;
     bh->deleted = 1;
 }
 
@@ -152,7 +159,7 @@ aio_ctx_prepare(GSource *source, gint    *timeout)
     QEMUBH *bh;
 
     for (bh = ctx->first_bh; bh; bh = bh->next) {
-        if (!bh->deleted && bh->scheduled) {
+        if (!bh->deleted && !bh->canceled && bh->scheduled) {
             if (bh->idle) {
                 /* idle bottom halves will be polled at least
                  * every 10ms */
@@ -176,7 +183,7 @@ aio_ctx_check(GSource *source)
     QEMUBH *bh;
 
     for (bh = ctx->first_bh; bh; bh = bh->next) {
-        if (!bh->deleted && bh->scheduled) {
+        if (!bh->deleted && !bh->canceled && bh->scheduled) {
             return true;
 	}
     }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling
  2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan
  2013-06-25  6:24 ` Paolo Bonzini
  2013-06-25 17:38 ` [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh Liu Ping Fan
@ 2013-06-25 17:38 ` Liu Ping Fan
  2013-06-25 17:38 ` [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh Liu Ping Fan
  3 siblings, 0 replies; 15+ messages in thread
From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi

When out of biglock, object will be unplugged while its bh is
scheduling, using refcnt on bh->object to pin the object.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

--Another choice may be --
  rcu_read_lock protect aio_bh_poll()
  qemu_bh_delete(); smp_wmb();  object_unref(obj)
  reclaim object after grace period, NOTE obj is not accessed from
  "view", since bh->deleted guarantee that even if bh survive to next
gp, bh->deleted will forbid the access to bh->cb
---
 async.c             | 24 ++++++++++++++++++++++--
 include/block/aio.h |  6 ++++++
 stubs/Makefile.objs |  1 +
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/async.c b/async.c
index bea3d7e..0b977b3 100644
--- a/async.c
+++ b/async.c
@@ -34,6 +34,10 @@ struct QEMUBH {
     AioContext *ctx;
     QEMUBHFunc *cb;
     void *opaque;
+    /* Object which is referred by bh's callback. It can  be dissappear and
+     * should be pinned when bh is scheduling
+     */
+    Object *obj;
     QEMUBH *next;
     bool scheduled;
     bool idle;
@@ -73,11 +77,11 @@ int aio_bh_poll(AioContext *ctx)
         if (bh->scheduled) {
             bh->scheduled = 0;
             if (unlikely(bh->deleted)) {
-                continue;
+                goto release_obj;
             }
             if (unlikely(bh->canceled)) {
                 bh->canceled = 0;
-                continue;
+                goto release_obj;
             }
             /* Paired with write barrier in bh schedule to ensure reading for
              * idle & callbacks coming after bh's scheduling.
@@ -87,6 +91,10 @@ int aio_bh_poll(AioContext *ctx)
                 ret = 1;
             bh->idle = 0;
             bh->cb(bh->opaque);
+release_obj:
+            if (bh->obj) {
+                object_unref(bh->obj);
+            }
         }
     }
 
@@ -116,6 +124,9 @@ void qemu_bh_schedule_idle(QEMUBH *bh)
     if (bh->scheduled)
         return;
     bh->idle = 1;
+    if (bh->obj) {
+        object_ref(bh->obj);
+    }
     /* Make sure that idle & any writes needed by the callback are done
      * before the locations are read in the aio_bh_poll.
      */
@@ -128,6 +139,9 @@ void qemu_bh_schedule(QEMUBH *bh)
     if (bh->scheduled)
         return;
     bh->idle = 0;
+    if (bh->obj) {
+        object_ref(bh->obj);
+    }
     /* Make sure that idle & any writes needed by the callback are done
      * before the locations are read in the aio_bh_poll.
      */
@@ -152,6 +166,12 @@ void qemu_bh_delete(QEMUBH *bh)
     bh->deleted = 1;
 }
 
+void qemu_bh_set_obj(QEMUBH *bh, Object *obj)
+{
+    assert(obj);
+    bh->obj = obj;
+}
+
 static gboolean
 aio_ctx_prepare(GSource *source, gint    *timeout)
 {
diff --git a/include/block/aio.h b/include/block/aio.h
index cc77771..7da93d3 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -15,6 +15,7 @@
 #define QEMU_AIO_H
 
 #include "qemu-common.h"
+#include "qom/object.h"
 #include "qemu/queue.h"
 #include "qemu/event_notifier.h"
 #include "qemu/thread.h"
@@ -175,6 +176,11 @@ void qemu_bh_cancel(QEMUBH *bh);
  */
 void qemu_bh_delete(QEMUBH *bh);
 
+/* @obj which is referred by bh's callback. It can  be dissappear and
+ * should be pinned when bh is scheduling.
+ */
+void qemu_bh_set_obj(QEMUBH *bh, Object *obj);
+
 /* Return whether there are any pending callbacks from the GSource
  * attached to the AioContext.
  *
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 03dff20..fc80562 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -24,3 +24,4 @@ stub-obj-y += vm-stop.o
 stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
+stub-obj-y += object.o
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh
  2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan
                   ` (2 preceding siblings ...)
  2013-06-25 17:38 ` [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling Liu Ping Fan
@ 2013-06-25 17:38 ` Liu Ping Fan
  3 siblings, 0 replies; 15+ messages in thread
From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi

Expose object to bh, so bh will pin virtio-net against unplugged in
parallel

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/net/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1ea9556..7c0ded9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1124,6 +1124,7 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue)
             n->vqs[i].tx_vq =
                 virtio_add_queue(vdev, 256, virtio_net_handle_tx_bh);
             n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[i]);
+            qemu_bh_set_obj(n->vqs[i].tx_bh, OBJECT(n));
         }
 
         n->vqs[i].tx_waiting = 0;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-06-27  6:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan
2013-06-25  6:24 ` Paolo Bonzini
2013-06-25  6:32   ` liu ping fan
2013-06-25  7:53     ` Paolo Bonzini
2013-06-26  2:59       ` liu ping fan
2013-06-26  6:34         ` Paolo Bonzini
2013-06-26  8:20           ` liu ping fan
2013-06-26  8:38             ` Paolo Bonzini
2013-06-26  9:44               ` liu ping fan
2013-06-26  9:55                 ` Paolo Bonzini
2013-06-27  2:08                   ` liu ping fan
2013-06-27  6:59                     ` Paolo Bonzini
2013-06-25 17:38 ` [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh Liu Ping Fan
2013-06-25 17:38 ` [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling Liu Ping Fan
2013-06-25 17:38 ` [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh Liu Ping Fan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).