* [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug @ 2013-06-25 17:38 Liu Ping Fan 2013-06-25 6:24 ` Paolo Bonzini ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi This series relies on refcnt of object used by bh callback to run against unplug. Open issue: Another choice may be rcu, but I think some issues are hard to resolve. Using rcu, we have two choice: when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release. Or making qemu_bh_delete() sync in the path of DeviceState's finalization. but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition. Liu Ping Fan (3): QEMUBH: introduce canceled member for bh QEMUBH: pin bh's referring object while scheduling virtio-net: set referred object for virtio net's bh async.c | 37 ++++++++++++++++++++++++++++++++----- hw/net/virtio-net.c | 1 + include/block/aio.h | 6 ++++++ stubs/Makefile.objs | 1 + 4 files changed, 40 insertions(+), 5 deletions(-) -- 1.8.1.4 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan @ 2013-06-25 6:24 ` Paolo Bonzini 2013-06-25 6:32 ` liu ping fan 2013-06-25 17:38 ` [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh Liu Ping Fan ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2013-06-25 6:24 UTC (permalink / raw) To: Liu Ping Fan; +Cc: qemu-devel, Stefan Hajnoczi Il 25/06/2013 19:38, Liu Ping Fan ha scritto: > This series relies on refcnt of object used by bh callback to run against unplug. > > Open issue: > Another choice may be rcu, but I think some issues are hard to resolve. > Using rcu, we have two choice: > when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release. > Or making qemu_bh_delete() sync in the path of DeviceState's finalization. What do you mean exactly? qemu_bh_delete() is async, but it operates on dynamically-allocated memory. Paolo > but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-25 6:24 ` Paolo Bonzini @ 2013-06-25 6:32 ` liu ping fan 2013-06-25 7:53 ` Paolo Bonzini 0 siblings, 1 reply; 15+ messages in thread From: liu ping fan @ 2013-06-25 6:32 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi On Tue, Jun 25, 2013 at 2:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 25/06/2013 19:38, Liu Ping Fan ha scritto: >> This series relies on refcnt of object used by bh callback to run against unplug. >> >> Open issue: >> Another choice may be rcu, but I think some issues are hard to resolve. >> Using rcu, we have two choice: >> when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release. >> Or making qemu_bh_delete() sync in the path of DeviceState's finalization. > > What do you mean exactly? qemu_bh_delete() is async, but it operates on > dynamically-allocated memory. > Before DeviceState is freed, we should make sure the bh is delete, and not related bh is in fly. > Paolo > >> but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-25 6:32 ` liu ping fan @ 2013-06-25 7:53 ` Paolo Bonzini 2013-06-26 2:59 ` liu ping fan 0 siblings, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2013-06-25 7:53 UTC (permalink / raw) To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi Il 25/06/2013 08:32, liu ping fan ha scritto: > On Tue, Jun 25, 2013 at 2:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 25/06/2013 19:38, Liu Ping Fan ha scritto: >>> This series relies on refcnt of object used by bh callback to run against unplug. >>> >>> Open issue: >>> Another choice may be rcu, but I think some issues are hard to resolve. >>> Using rcu, we have two choice: >>> when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release. >>> Or making qemu_bh_delete() sync in the path of DeviceState's finalization. >> >> What do you mean exactly? qemu_bh_delete() is async, but it operates on >> dynamically-allocated memory. >> > Before DeviceState is freed, we should make sure the bh is delete, and > not related bh is in fly. The bh need not be deleted. It just needs to be not-scheduled (easy, qemu_bh_delete does that) and not running. The latter part could be the hard one in a multi-threaded context, but I think it's up to the device to ensure it. It doesn't _have_ to be hard. For example, joining the data-plane thread would do that as well. Paolo >> Paolo >> >>> but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition. >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-25 7:53 ` Paolo Bonzini @ 2013-06-26 2:59 ` liu ping fan 2013-06-26 6:34 ` Paolo Bonzini 0 siblings, 1 reply; 15+ messages in thread From: liu ping fan @ 2013-06-26 2:59 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi On Tue, Jun 25, 2013 at 3:53 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 25/06/2013 08:32, liu ping fan ha scritto: >> On Tue, Jun 25, 2013 at 2:24 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> Il 25/06/2013 19:38, Liu Ping Fan ha scritto: >>>> This series relies on refcnt of object used by bh callback to run against unplug. >>>> >>>> Open issue: >>>> Another choice may be rcu, but I think some issues are hard to resolve. >>>> Using rcu, we have two choice: >>>> when holding object refcnt, call qemu_bh_delete(); then after grace period, we can release. >>>> Or making qemu_bh_delete() sync in the path of DeviceState's finalization. >>> >>> What do you mean exactly? qemu_bh_delete() is async, but it operates on >>> dynamically-allocated memory. >>> >> Before DeviceState is freed, we should make sure the bh is delete, and >> not related bh is in fly. > > The bh need not be deleted. It just needs to be not-scheduled (easy, > qemu_bh_delete does that) and not running. > Yes, not-scheduled and not running are the only things needed, but not easy to know. > The latter part could be the hard one in a multi-threaded context, but I > think it's up to the device to ensure it. It doesn't _have_ to be hard. > For example, joining the data-plane thread would do that as well. > It seems not easy, take consideration of the sequence: _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState wait for rcu grace period to come, // ensure not running free DeviceState in rcu-reclaimer But in current code, _delete() can be placed in DeviceState finalization path(in reclaimer), which means while reclaiming, there still exist access path to the DeviceState. On the other hand, pushing _delete() out of finalization path is not easy, since we do not what time the DeviceState has done with its bh. Regards, Pingfan > Paolo > >>> Paolo >>> >>>> but currently, the callers of qemu_bh_delete() can not satisfy any of the two condition. >>> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-26 2:59 ` liu ping fan @ 2013-06-26 6:34 ` Paolo Bonzini 2013-06-26 8:20 ` liu ping fan 0 siblings, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2013-06-26 6:34 UTC (permalink / raw) To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi Il 26/06/2013 04:59, liu ping fan ha scritto: >> The latter part could be the hard one in a multi-threaded context, but I >> think it's up to the device to ensure it. It doesn't _have_ to be hard. >> For example, joining the data-plane thread would do that as well. >> > It seems not easy, take consideration of the sequence: > _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState > wait for rcu grace period to come, // ensure not running > free DeviceState in rcu-reclaimer > But in current code, _delete() can be placed in DeviceState > finalization path(in reclaimer), which means while reclaiming, there > still exist access path to the DeviceState. It can be placed there, but it doesn't mean it is a good idea. :) > On the other hand, pushing _delete() out of finalization path is not > easy, since we do not what time the DeviceState has done with its bh. See above: - if the BH will run in the iothread, the BH is definitely not running (because the BH runs under the BQL, and the reclaimer owns it) - if the BH is running in another thread, waiting for that thread to terminate obviously makes the BH not running. What we need is separation of removal and reclamation. Without that, any effort to make things unplug-safe is going to be way way more complicated than necessary. Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-26 6:34 ` Paolo Bonzini @ 2013-06-26 8:20 ` liu ping fan 2013-06-26 8:38 ` Paolo Bonzini 0 siblings, 1 reply; 15+ messages in thread From: liu ping fan @ 2013-06-26 8:20 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi On Wed, Jun 26, 2013 at 2:34 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 26/06/2013 04:59, liu ping fan ha scritto: >>> The latter part could be the hard one in a multi-threaded context, but I >>> think it's up to the device to ensure it. It doesn't _have_ to be hard. >>> For example, joining the data-plane thread would do that as well. >>> >> It seems not easy, take consideration of the sequence: >> _bh_delete(), // ensure not scheduled, ie, no further access to DeviceState >> wait for rcu grace period to come, // ensure not running >> free DeviceState in rcu-reclaimer >> But in current code, _delete() can be placed in DeviceState >> finalization path(in reclaimer), which means while reclaiming, there >> still exist access path to the DeviceState. > > It can be placed there, but it doesn't mean it is a good idea. :) > Unfortunately , these is what current code does, otherwise, they should _bh_new() each time when scheduling bh, and _bh_delete() in bh's callback, i.e. callback is the exact place to know we have done with bh. (The other place is the destroy of DeviceState) >> On the other hand, pushing _delete() out of finalization path is not >> easy, since we do not what time the DeviceState has done with its bh. > > See above: > > - if the BH will run in the iothread, the BH is definitely not running > (because the BH runs under the BQL, and the reclaimer owns it) > It works for the case in which the DeviceState's reclaimer calls _bh_delete(). But in another case(also exist in current code), where we call _bh_delete() in callback, the bh will be scheduled, and oops! > - if the BH is running in another thread, waiting for that thread to > terminate obviously makes the BH not running. > This imply that except for qemu_aio_context, AioContext can be only shared by one device, right? Otherwise we can not just terminate the thread. If it is true, why can not we have more than one just like qemu_aio_context? > What we need is separation of removal and reclamation. Without that, > any effort to make things unplug-safe is going to be way way more > complicated than necessary. > Agree, but when I tried for bh, I found the separation of removal and reclamation are not easy. We can not _bh_delete() in acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the same time. And as explained, only two places can hold the _bh_delete(). In a short word, with rcu, we need to constrain the calling idiom of bh, i.e., putting them in reclaimer. On the other hand, my patch try to leave the calling idiom as it is, and handle this issue inside bh. Regards, Pingfan > Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-26 8:20 ` liu ping fan @ 2013-06-26 8:38 ` Paolo Bonzini 2013-06-26 9:44 ` liu ping fan 0 siblings, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2013-06-26 8:38 UTC (permalink / raw) To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi Il 26/06/2013 10:20, liu ping fan ha scritto: >>> On the other hand, pushing _delete() out of finalization path is not >>> easy, since we do not what time the DeviceState has done with its bh. >> >> See above: >> >> - if the BH will run in the iothread, the BH is definitely not running >> (because the BH runs under the BQL, and the reclaimer owns it) > > It works for the case in which the DeviceState's reclaimer calls > _bh_delete(). But in another case(also exist in current code), where > we call _bh_delete() in callback, the bh will be scheduled, and oops! But you may know that the BH is not scheduled after removal, too. There are not that many devices that have bottom halves (apart from those that use bottom halves in ptimer). If they really need to, they can do the object_ref/unref themselves. But I expect this to be rare, and generic code should not need an "owner" field in bottom halves. For example, block devices should stop sending requests after removal. >> - if the BH is running in another thread, waiting for that thread to >> terminate obviously makes the BH not running. >> > This imply that except for qemu_aio_context, AioContext can be only > shared by one device, right? Otherwise we can not just terminate the > thread. If it is true, why can not we have more than one just like > qemu_aio_context? Yes, if you do it that way you can have only one AioContext per device. RCU is another way, and doesn't have the same limitation. We should avoid introducing infrastructure that we are not sure is needed. For what it's worth, your patches to make the bottom half list thread-safe are also slightly premature. However, they do not change the API and it makes some sense to add the infrastructure. In this case, I'm simply not sure that we're there yet. If you look at the memory work, for example, the owner patches happen to be useful for BQL-less dispatch too, but they are solving existing hot-unplug bugs. >> What we need is separation of removal and reclamation. Without that, >> any effort to make things unplug-safe is going to be way way more >> complicated than necessary. >> > Agree, but when I tried for bh, I found the separation of removal and > reclamation are not easy. We can not _bh_delete() in > acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the > same time. acpi_piix_eject_slot() is removal, not reclamation. The plan there is that qdev_free calls the exit callback immediately (which can do qemu_bh_delete), and schedules an unref after the next RCU grace period. At the next RCU grace period the BH will not be running, thus it is safe to finalize the object. Paolo And as explained, only two places can hold the > _bh_delete(). > In a short word, with rcu, we need to constrain the calling idiom of > bh, i.e., putting them in reclaimer. On the other hand, my patch try > to leave the calling idiom as it is, and handle this issue inside bh. > > Regards, > Pingfan > >> Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-26 8:38 ` Paolo Bonzini @ 2013-06-26 9:44 ` liu ping fan 2013-06-26 9:55 ` Paolo Bonzini 0 siblings, 1 reply; 15+ messages in thread From: liu ping fan @ 2013-06-26 9:44 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 26/06/2013 10:20, liu ping fan ha scritto: >>>> On the other hand, pushing _delete() out of finalization path is not >>>> easy, since we do not what time the DeviceState has done with its bh. >>> >>> See above: >>> >>> - if the BH will run in the iothread, the BH is definitely not running >>> (because the BH runs under the BQL, and the reclaimer owns it) >> >> It works for the case in which the DeviceState's reclaimer calls >> _bh_delete(). But in another case(also exist in current code), where >> we call _bh_delete() in callback, the bh will be scheduled, and oops! > > But you may know that the BH is not scheduled after removal, too. > No, the removal can run in parallel with the other mmio-dispatch which can resort to schedule a bh. I.e, the removal can not call _bh_delete(), so it do not know whether bh will be scheduled or not. > There are not that many devices that have bottom halves (apart from > those that use bottom halves in ptimer). If they really need to, they > can do the object_ref/unref themselves. But I expect this to be rare, > and generic code should not need an "owner" field in bottom halves. For Agree :), so the aim is how to handle bh safely when unplug, if we can resolve it with rcu, it will be more fine! > example, block devices should stop sending requests after removal. > Yes, but need we need take account for "stop sending request != no mmio-dispatch in fly" ? And I think these mmio-dispatch are the criminal :), making it is hard to sync with bh's stop. >>> - if the BH is running in another thread, waiting for that thread to >>> terminate obviously makes the BH not running. >>> >> This imply that except for qemu_aio_context, AioContext can be only >> shared by one device, right? Otherwise we can not just terminate the >> thread. If it is true, why can not we have more than one just like >> qemu_aio_context? > > Yes, if you do it that way you can have only one AioContext per device. > RCU is another way, and doesn't have the same limitation. > > We should avoid introducing infrastructure that we are not sure is > needed. For what it's worth, your patches to make the bottom half list > thread-safe are also slightly premature. However, they do not change > the API and it makes some sense to add the infrastructure. In this > case, I'm simply not sure that we're there yet. > See, thx. > If you look at the memory work, for example, the owner patches happen to > be useful for BQL-less dispatch too, but they are solving existing > hot-unplug bugs. > Oh, I will read it again, since I had thought the owner is only used for the purpose of refcnt. >>> What we need is separation of removal and reclamation. Without that, >>> any effort to make things unplug-safe is going to be way way more >>> complicated than necessary. >>> >> Agree, but when I tried for bh, I found the separation of removal and >> reclamation are not easy. We can not _bh_delete() in >> acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the >> same time. > > acpi_piix_eject_slot() is removal, not reclamation. The plan there is > that qdev_free calls the exit callback immediately (which can do > qemu_bh_delete), and schedules an unref after the next RCU grace period. > At the next RCU grace period the BH will not be running, thus it is > safe to finalize the object. > I have two question: 1st, who trigger qdev_free? //Not figuring out the total design, but I feel it is quite different from kernel's design, where refcnt->0, then exit is called. 2nd, _delete_bh() means even there is any not-severed request, the request will be canceled. Is it legal, and will not lose data(not sure, since I do not know who will call qdev_free)? Thx®ards, Pingfan > Paolo > > And as explained, only two places can hold the >> _bh_delete(). >> In a short word, with rcu, we need to constrain the calling idiom of >> bh, i.e., putting them in reclaimer. On the other hand, my patch try >> to leave the calling idiom as it is, and handle this issue inside bh. >> >> Regards, >> Pingfan >> >>> Paolo > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-26 9:44 ` liu ping fan @ 2013-06-26 9:55 ` Paolo Bonzini 2013-06-27 2:08 ` liu ping fan 0 siblings, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2013-06-26 9:55 UTC (permalink / raw) To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi Il 26/06/2013 11:44, liu ping fan ha scritto: > On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 26/06/2013 10:20, liu ping fan ha scritto: >>>>> On the other hand, pushing _delete() out of finalization path is not >>>>> easy, since we do not what time the DeviceState has done with its bh. >>>> >>>> See above: >>>> >>>> - if the BH will run in the iothread, the BH is definitely not running >>>> (because the BH runs under the BQL, and the reclaimer owns it) >>> >>> It works for the case in which the DeviceState's reclaimer calls >>> _bh_delete(). But in another case(also exist in current code), where >>> we call _bh_delete() in callback, the bh will be scheduled, and oops! >> >> But you may know that the BH is not scheduled after removal, too. >> > No, the removal can run in parallel with the other mmio-dispatch which > can resort to schedule a bh. But then behavior is more or less undefined. Of course the device must ensure that something sane happens, but that's the responsibility of the device, not of the generic layer. > I.e, the removal can not call > _bh_delete(), so it do not know whether bh will be scheduled or not. It can call qemu_bh_delete(), then all concurrent qemu_bh_schedule() will be no-ops. >> If you look at the memory work, for example, the owner patches happen to >> be useful for BQL-less dispatch too, but they are solving existing >> hot-unplug bugs. >> > Oh, I will read it again, since I had thought the owner is only used > for the purpose of refcnt. It is. But it goes beyond BQL-less dispatch. Anything that gives away the BQL between a map and unmap (including asynchronous I/O) needs an owner. We have ignored that until now because we do not have memory unplug. >>>> What we need is separation of removal and reclamation. Without that, >>>> any effort to make things unplug-safe is going to be way way more >>>> complicated than necessary. >>>> >>> Agree, but when I tried for bh, I found the separation of removal and >>> reclamation are not easy. We can not _bh_delete() in >>> acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the >>> same time. >> >> acpi_piix_eject_slot() is removal, not reclamation. The plan there is >> that qdev_free calls the exit callback immediately (which can do >> qemu_bh_delete), and schedules an unref after the next RCU grace period. >> At the next RCU grace period the BH will not be running, thus it is >> safe to finalize the object. >> > I have two question: > 1st, who trigger qdev_free? //Not figuring out the total design, but > I feel it is quite different from kernel's design, where refcnt->0, > then exit is called. qdev_free is triggered by the guest, but free is a misnomer. It is really "make it inaccessible from the guest and management" (the kernel equivalent would be removal of /dev and /sys entries, for example). The actual "free" will happen later. > 2nd, _delete_bh() means even there is any not-severed request, the > request will be canceled. Is it legal, and will not lose data(not > sure, since I do not know who will call qdev_free)? That's something that should be ensured by the device. For example it is not a problem to cancel virtio-net's tx_bh. If it is a problem, the device must do something else. It could even be a bdrv_drain_all() in the worst case. Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-26 9:55 ` Paolo Bonzini @ 2013-06-27 2:08 ` liu ping fan 2013-06-27 6:59 ` Paolo Bonzini 0 siblings, 1 reply; 15+ messages in thread From: liu ping fan @ 2013-06-27 2:08 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi On Wed, Jun 26, 2013 at 5:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 26/06/2013 11:44, liu ping fan ha scritto: >> On Wed, Jun 26, 2013 at 4:38 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> Il 26/06/2013 10:20, liu ping fan ha scritto: >>>>>> On the other hand, pushing _delete() out of finalization path is not >>>>>> easy, since we do not what time the DeviceState has done with its bh. >>>>> >>>>> See above: >>>>> >>>>> - if the BH will run in the iothread, the BH is definitely not running >>>>> (because the BH runs under the BQL, and the reclaimer owns it) >>>> >>>> It works for the case in which the DeviceState's reclaimer calls >>>> _bh_delete(). But in another case(also exist in current code), where >>>> we call _bh_delete() in callback, the bh will be scheduled, and oops! >>> >>> But you may know that the BH is not scheduled after removal, too. >>> >> No, the removal can run in parallel with the other mmio-dispatch which >> can resort to schedule a bh. > > But then behavior is more or less undefined. Of course the device must > ensure that something sane happens, but that's the responsibility of the > device, not of the generic layer. > >> I.e, the removal can not call >> _bh_delete(), so it do not know whether bh will be scheduled or not. > > It can call qemu_bh_delete(), then all concurrent qemu_bh_schedule() > will be no-ops. > Great, with this ability, we can achieve it in a more elegant way -- rcu! BTW, the responsibility is also assigned to guest driver wrt its unplug behavior. >>> If you look at the memory work, for example, the owner patches happen to >>> be useful for BQL-less dispatch too, but they are solving existing >>> hot-unplug bugs. >>> >> Oh, I will read it again, since I had thought the owner is only used >> for the purpose of refcnt. > > It is. But it goes beyond BQL-less dispatch. Anything that gives away > the BQL between a map and unmap (including asynchronous I/O) needs an > owner. We have ignored that until now because we do not have memory unplug. > >>>>> What we need is separation of removal and reclamation. Without that, >>>>> any effort to make things unplug-safe is going to be way way more >>>>> complicated than necessary. >>>>> >>>> Agree, but when I tried for bh, I found the separation of removal and >>>> reclamation are not easy. We can not _bh_delete() in >>>> acpi_piix_eject_slot(), because mmio-dispatch can schedule a bh at the >>>> same time. >>> >>> acpi_piix_eject_slot() is removal, not reclamation. The plan there is >>> that qdev_free calls the exit callback immediately (which can do >>> qemu_bh_delete), and schedules an unref after the next RCU grace period. >>> At the next RCU grace period the BH will not be running, thus it is >>> safe to finalize the object. >>> >> I have two question: >> 1st, who trigger qdev_free? //Not figuring out the total design, but >> I feel it is quite different from kernel's design, where refcnt->0, >> then exit is called. > > qdev_free is triggered by the guest, but free is a misnomer. It is > really "make it inaccessible from the guest and management" (the kernel > equivalent would be removal of /dev and /sys entries, for example). The > actual "free" will happen later. > Without seeing your detail design, but I suggest that leaving the "exit" as it is, and pick out the inaccessible related code to removal. Finally, when refcnt->0, exit is called, and it play as the final sync point for the remaining access. Regards, Pingfan >> 2nd, _delete_bh() means even there is any not-severed request, the >> request will be canceled. Is it legal, and will not lose data(not >> sure, since I do not know who will call qdev_free)? > > That's something that should be ensured by the device. For example it > is not a problem to cancel virtio-net's tx_bh. If it is a problem, the > device must do something else. It could even be a bdrv_drain_all() in > the worst case. > > Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug 2013-06-27 2:08 ` liu ping fan @ 2013-06-27 6:59 ` Paolo Bonzini 0 siblings, 0 replies; 15+ messages in thread From: Paolo Bonzini @ 2013-06-27 6:59 UTC (permalink / raw) To: liu ping fan; +Cc: qemu-devel, Stefan Hajnoczi Il 27/06/2013 04:08, liu ping fan ha scritto: >> > qdev_free is triggered by the guest, but free is a misnomer. It is >> > really "make it inaccessible from the guest and management" (the kernel >> > equivalent would be removal of /dev and /sys entries, for example). The >> > actual "free" will happen later. > > Without seeing your detail design, but I suggest that leaving the > "exit" as it is, and pick out the inaccessible related code to > removal. We already have a reclamation point, it is instance_finalize. I posted a series a few weeks ago ("[PATCH 00/39] Delay destruction of memory regions to instance_finalize"). > Finally, when refcnt->0, exit is called, and it play as the > final sync point for the remaining access. It is the guest that determines when to start the removal phase. That's qdev_free. refcnt = 0 means that the memory is inaccessible to the guest, and that's when the reclamation phase is started (asynchronously: the instance_finalize callback is actually called at the end of the RCU grace period). Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh 2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan 2013-06-25 6:24 ` Paolo Bonzini @ 2013-06-25 17:38 ` Liu Ping Fan 2013-06-25 17:38 ` [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling Liu Ping Fan 2013-06-25 17:38 ` [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh Liu Ping Fan 3 siblings, 0 replies; 15+ messages in thread From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi When dismissing the bh scheduling(delete/cancel/schedule), we need to reset bh->scheduled to zero and release the refcnt of object which is referred by bh(will introduced by next patch). Currently, the bh->scheduled will be reset to zero by many writers, so atomic ops should be involved in, which results in expensive memory barrier. With this patch, bh->scheduled is only reset by aio_bh_poll(). Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> --- async.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/async.c b/async.c index e73b93c..bea3d7e 100644 --- a/async.c +++ b/async.c @@ -38,6 +38,7 @@ struct QEMUBH { bool scheduled; bool idle; bool deleted; + bool canceled; }; QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque) @@ -69,8 +70,15 @@ int aio_bh_poll(AioContext *ctx) /* Make sure that fetching bh happens before accessing its members */ smp_read_barrier_depends(); next = bh->next; - if (!bh->deleted && bh->scheduled) { + if (bh->scheduled) { bh->scheduled = 0; + if (unlikely(bh->deleted)) { + continue; + } + if (unlikely(bh->canceled)) { + bh->canceled = 0; + continue; + } /* Paired with write barrier in bh schedule to ensure reading for * idle & callbacks coming after bh's scheduling. */ @@ -133,7 +141,7 @@ void qemu_bh_schedule(QEMUBH *bh) */ void qemu_bh_cancel(QEMUBH *bh) { - bh->scheduled = 0; + bh->canceled = 1; } /* This func is async.The bottom half will do the delete action at the finial @@ -141,7 +149,6 @@ void qemu_bh_cancel(QEMUBH *bh) */ void qemu_bh_delete(QEMUBH *bh) { - bh->scheduled = 0; bh->deleted = 1; } @@ -152,7 +159,7 @@ aio_ctx_prepare(GSource *source, gint *timeout) QEMUBH *bh; for (bh = ctx->first_bh; bh; bh = bh->next) { - if (!bh->deleted && bh->scheduled) { + if (!bh->deleted && !bh->canceled && bh->scheduled) { if (bh->idle) { /* idle bottom halves will be polled at least * every 10ms */ @@ -176,7 +183,7 @@ aio_ctx_check(GSource *source) QEMUBH *bh; for (bh = ctx->first_bh; bh; bh = bh->next) { - if (!bh->deleted && bh->scheduled) { + if (!bh->deleted && !bh->canceled && bh->scheduled) { return true; } } -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling 2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan 2013-06-25 6:24 ` Paolo Bonzini 2013-06-25 17:38 ` [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh Liu Ping Fan @ 2013-06-25 17:38 ` Liu Ping Fan 2013-06-25 17:38 ` [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh Liu Ping Fan 3 siblings, 0 replies; 15+ messages in thread From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi When out of biglock, object will be unplugged while its bh is scheduling, using refcnt on bh->object to pin the object. Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> --Another choice may be -- rcu_read_lock protect aio_bh_poll() qemu_bh_delete(); smp_wmb(); object_unref(obj) reclaim object after grace period, NOTE obj is not accessed from "view", since bh->deleted guarantee that even if bh survive to next gp, bh->deleted will forbid the access to bh->cb --- async.c | 24 ++++++++++++++++++++++-- include/block/aio.h | 6 ++++++ stubs/Makefile.objs | 1 + 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/async.c b/async.c index bea3d7e..0b977b3 100644 --- a/async.c +++ b/async.c @@ -34,6 +34,10 @@ struct QEMUBH { AioContext *ctx; QEMUBHFunc *cb; void *opaque; + /* Object which is referred by bh's callback. It can be dissappear and + * should be pinned when bh is scheduling + */ + Object *obj; QEMUBH *next; bool scheduled; bool idle; @@ -73,11 +77,11 @@ int aio_bh_poll(AioContext *ctx) if (bh->scheduled) { bh->scheduled = 0; if (unlikely(bh->deleted)) { - continue; + goto release_obj; } if (unlikely(bh->canceled)) { bh->canceled = 0; - continue; + goto release_obj; } /* Paired with write barrier in bh schedule to ensure reading for * idle & callbacks coming after bh's scheduling. @@ -87,6 +91,10 @@ int aio_bh_poll(AioContext *ctx) ret = 1; bh->idle = 0; bh->cb(bh->opaque); +release_obj: + if (bh->obj) { + object_unref(bh->obj); + } } } @@ -116,6 +124,9 @@ void qemu_bh_schedule_idle(QEMUBH *bh) if (bh->scheduled) return; bh->idle = 1; + if (bh->obj) { + object_ref(bh->obj); + } /* Make sure that idle & any writes needed by the callback are done * before the locations are read in the aio_bh_poll. */ @@ -128,6 +139,9 @@ void qemu_bh_schedule(QEMUBH *bh) if (bh->scheduled) return; bh->idle = 0; + if (bh->obj) { + object_ref(bh->obj); + } /* Make sure that idle & any writes needed by the callback are done * before the locations are read in the aio_bh_poll. */ @@ -152,6 +166,12 @@ void qemu_bh_delete(QEMUBH *bh) bh->deleted = 1; } +void qemu_bh_set_obj(QEMUBH *bh, Object *obj) +{ + assert(obj); + bh->obj = obj; +} + static gboolean aio_ctx_prepare(GSource *source, gint *timeout) { diff --git a/include/block/aio.h b/include/block/aio.h index cc77771..7da93d3 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -15,6 +15,7 @@ #define QEMU_AIO_H #include "qemu-common.h" +#include "qom/object.h" #include "qemu/queue.h" #include "qemu/event_notifier.h" #include "qemu/thread.h" @@ -175,6 +176,11 @@ void qemu_bh_cancel(QEMUBH *bh); */ void qemu_bh_delete(QEMUBH *bh); +/* @obj which is referred by bh's callback. It can be dissappear and + * should be pinned when bh is scheduling. + */ +void qemu_bh_set_obj(QEMUBH *bh, Object *obj); + /* Return whether there are any pending callbacks from the GSource * attached to the AioContext. * diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs index 03dff20..fc80562 100644 --- a/stubs/Makefile.objs +++ b/stubs/Makefile.objs @@ -24,3 +24,4 @@ stub-obj-y += vm-stop.o stub-obj-y += vmstate.o stub-obj-$(CONFIG_WIN32) += fd-register.o stub-obj-y += cpus.o +stub-obj-y += object.o -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh 2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan ` (2 preceding siblings ...) 2013-06-25 17:38 ` [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling Liu Ping Fan @ 2013-06-25 17:38 ` Liu Ping Fan 3 siblings, 0 replies; 15+ messages in thread From: Liu Ping Fan @ 2013-06-25 17:38 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini, Stefan Hajnoczi Expose object to bh, so bh will pin virtio-net against unplugged in parallel Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> --- hw/net/virtio-net.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 1ea9556..7c0ded9 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1124,6 +1124,7 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue) n->vqs[i].tx_vq = virtio_add_queue(vdev, 256, virtio_net_handle_tx_bh); n->vqs[i].tx_bh = qemu_bh_new(virtio_net_tx_bh, &n->vqs[i]); + qemu_bh_set_obj(n->vqs[i].tx_bh, OBJECT(n)); } n->vqs[i].tx_waiting = 0; -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-06-27 6:59 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-25 17:38 [Qemu-devel] [PATCH 0/3] make bh safe with hot-unplug Liu Ping Fan 2013-06-25 6:24 ` Paolo Bonzini 2013-06-25 6:32 ` liu ping fan 2013-06-25 7:53 ` Paolo Bonzini 2013-06-26 2:59 ` liu ping fan 2013-06-26 6:34 ` Paolo Bonzini 2013-06-26 8:20 ` liu ping fan 2013-06-26 8:38 ` Paolo Bonzini 2013-06-26 9:44 ` liu ping fan 2013-06-26 9:55 ` Paolo Bonzini 2013-06-27 2:08 ` liu ping fan 2013-06-27 6:59 ` Paolo Bonzini 2013-06-25 17:38 ` [Qemu-devel] [PATCH 1/3] QEMUBH: introduce canceled member for bh Liu Ping Fan 2013-06-25 17:38 ` [Qemu-devel] [PATCH 2/3] QEMUBH: pin bh's referring object while scheduling Liu Ping Fan 2013-06-25 17:38 ` [Qemu-devel] [PATCH 3/3] virtio-net: set referred object for virtio net's bh Liu Ping Fan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).