* [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock @ 2012-09-19 3:02 liu ping fan 2012-09-19 8:06 ` Avi Kivity 0 siblings, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-19 3:02 UTC (permalink / raw) To: qemu-devel, Anthony Liguori, Avi Kivity, Jan Kiszka, Paolo Bonzini, Marcelo Tosatti Currently, cpu_physical_memory_rw() can be used directly or indirectly by mmio-dispatcher to access other devices' memory region. This can cause some problem when adopting device's private lock. Back ground refer to: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html For lazy, just refer to: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html --1st. the recursive lock of biglock. If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have the following (section of the whole call chain, and with private_lockA): lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > Before c_p_m_rw(), we drop private_lockA to anti the possibly of deadlock. But we can not anti the nested of this chain or calling to another lockB-mmio-dispatcher. So we can not avoid the possibility of nested lock of biglock. And another important factor is that we break the lock sequence: private_lock-->biglock. All of these require us to push biglock's holding into c_p_m_rw(), the wrapper can not give help. --2nd. c_p_m_rw(), sync or async? IF we convert all of the device to be protected by refcount, then we can have //no big lock c_p_m_rw() { devB->ref++; { --------------------------------------->pushed onto another thread. lock_privatelock mr->ops->write(); unlock_privatelock } wait_for_completion(); devB->ref--; } This model can help c_p_m_rw() present as a SYNC API. But currently, we mix biglock and private lock together, and wait_for_completion() maybe block the release of big lock, which finally causes deadlock. So we can not simply rely on this model. Instead, we need to classify the calling scene into three cases: case1. lockA--dispatcher ---> lockB-dispatcher //can use async+completion model case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can cause the nested lock of biglock case3. biglock-dispacher ---> lockB-dispatcher // async to avoid the lock sequence problem, (as to completion, it need to be placed outside the top level biglock, and it is hard to do so. Suggest to change to case 1. Or at present, just leave it async) This new model will require the biglock can be nested. Any comments? Thanks and regards, pingfan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 3:02 [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock liu ping fan @ 2012-09-19 8:06 ` Avi Kivity 2012-09-19 9:00 ` liu ping fan 0 siblings, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-19 8:06 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, Anthony Liguori, Paolo Bonzini On 09/19/2012 06:02 AM, liu ping fan wrote: > Currently, cpu_physical_memory_rw() can be used directly or indirectly > by mmio-dispatcher to access other devices' memory region. This can > cause some problem when adopting device's private lock. > > Back ground refer to: > http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html > For lazy, just refer to: > http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html > > > --1st. the recursive lock of biglock. > If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have > the following (section of the whole call chain, and with > private_lockA): > lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > > Before c_p_m_rw(), we drop private_lockA to anti the possibly of > deadlock. But we can not anti the nested of this chain or calling to > another lockB-mmio-dispatcher. So we can not avoid the possibility of > nested lock of biglock. And another important factor is that we break > the lock sequence: private_lock-->biglock. > All of these require us to push biglock's holding into c_p_m_rw(), the > wrapper can not give help. I agree that this is unavoidable. > > --2nd. c_p_m_rw(), sync or async? > > IF we convert all of the device to be protected by refcount, then we can have > //no big lock > c_p_m_rw() > { > devB->ref++; > { > --------------------------------------->pushed onto another thread. > lock_privatelock > mr->ops->write(); > unlock_privatelock > } > wait_for_completion(); > devB->ref--; > } > This model can help c_p_m_rw() present as a SYNC API. But currently, > we mix biglock and private lock together, and wait_for_completion() > maybe block the release of big lock, which finally causes deadlock. So > we can not simply rely on this model. > Instead, we need to classify the calling scene into three cases: > case1. lockA--dispatcher ---> lockB-dispatcher //can use > async+completion model > case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can > cause the nested lock of biglock > case3. biglock-dispacher ---> lockB-dispatcher // async to avoid > the lock sequence problem, (as to completion, it need to be placed > outside the top level biglock, and it is hard to do so. Suggest to > change to case 1. Or at present, just leave it async) > > This new model will require the biglock can be nested. I think changing to an async model is too complicated. It's difficult enough already. Isn't dropping private locks + recursive big locks sufficient? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 8:06 ` Avi Kivity @ 2012-09-19 9:00 ` liu ping fan 2012-09-19 9:07 ` Avi Kivity 2012-09-19 9:34 ` Jan Kiszka 0 siblings, 2 replies; 45+ messages in thread From: liu ping fan @ 2012-09-19 9:00 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, Anthony Liguori, Paolo Bonzini On Wed, Sep 19, 2012 at 4:06 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/19/2012 06:02 AM, liu ping fan wrote: >> Currently, cpu_physical_memory_rw() can be used directly or indirectly >> by mmio-dispatcher to access other devices' memory region. This can >> cause some problem when adopting device's private lock. >> >> Back ground refer to: >> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html >> For lazy, just refer to: >> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html >> >> >> --1st. the recursive lock of biglock. >> If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have >> the following (section of the whole call chain, and with >> private_lockA): >> lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > >> Before c_p_m_rw(), we drop private_lockA to anti the possibly of >> deadlock. But we can not anti the nested of this chain or calling to >> another lockB-mmio-dispatcher. So we can not avoid the possibility of >> nested lock of biglock. And another important factor is that we break >> the lock sequence: private_lock-->biglock. >> All of these require us to push biglock's holding into c_p_m_rw(), the >> wrapper can not give help. > > I agree that this is unavoidable. > >> >> --2nd. c_p_m_rw(), sync or async? >> >> IF we convert all of the device to be protected by refcount, then we can have >> //no big lock >> c_p_m_rw() >> { >> devB->ref++; >> { >> --------------------------------------->pushed onto another thread. >> lock_privatelock >> mr->ops->write(); >> unlock_privatelock >> } >> wait_for_completion(); >> devB->ref--; >> } >> This model can help c_p_m_rw() present as a SYNC API. But currently, >> we mix biglock and private lock together, and wait_for_completion() >> maybe block the release of big lock, which finally causes deadlock. So >> we can not simply rely on this model. >> Instead, we need to classify the calling scene into three cases: >> case1. lockA--dispatcher ---> lockB-dispatcher //can use >> async+completion model >> case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can >> cause the nested lock of biglock >> case3. biglock-dispacher ---> lockB-dispatcher // async to avoid >> the lock sequence problem, (as to completion, it need to be placed >> outside the top level biglock, and it is hard to do so. Suggest to >> change to case 1. Or at present, just leave it async) >> >> This new model will require the biglock can be nested. > > I think changing to an async model is too complicated. It's difficult > enough already. Isn't dropping private locks + recursive big locks > sufficient? > I think that "dropping private locks + recursive big locks" just cover case 2. And most of the important, it dont describe case3 which break the rule of lock sequence "private-lock --> biglock". Scene: devA_lock-->(devX_with-biglock--->devB_lock). I just want to classify and post these cases to discuss. Maybe we can achieve without async. Thanks and regards, pingfan > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:00 ` liu ping fan @ 2012-09-19 9:07 ` Avi Kivity 2012-09-19 9:11 ` liu ping fan 2012-09-19 9:34 ` Jan Kiszka 1 sibling, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-19 9:07 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, Anthony Liguori, Paolo Bonzini On 09/19/2012 12:00 PM, liu ping fan wrote: > On Wed, Sep 19, 2012 at 4:06 PM, Avi Kivity <avi@redhat.com> wrote: >> On 09/19/2012 06:02 AM, liu ping fan wrote: >>> Currently, cpu_physical_memory_rw() can be used directly or indirectly >>> by mmio-dispatcher to access other devices' memory region. This can >>> cause some problem when adopting device's private lock. >>> >>> Back ground refer to: >>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html >>> For lazy, just refer to: >>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html >>> >>> >>> --1st. the recursive lock of biglock. >>> If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have >>> the following (section of the whole call chain, and with >>> private_lockA): >>> lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > >>> Before c_p_m_rw(), we drop private_lockA to anti the possibly of >>> deadlock. But we can not anti the nested of this chain or calling to >>> another lockB-mmio-dispatcher. So we can not avoid the possibility of >>> nested lock of biglock. And another important factor is that we break >>> the lock sequence: private_lock-->biglock. >>> All of these require us to push biglock's holding into c_p_m_rw(), the >>> wrapper can not give help. >> >> I agree that this is unavoidable. >> >>> >>> --2nd. c_p_m_rw(), sync or async? >>> >>> IF we convert all of the device to be protected by refcount, then we can have >>> //no big lock >>> c_p_m_rw() >>> { >>> devB->ref++; >>> { >>> --------------------------------------->pushed onto another thread. >>> lock_privatelock >>> mr->ops->write(); >>> unlock_privatelock >>> } >>> wait_for_completion(); >>> devB->ref--; >>> } >>> This model can help c_p_m_rw() present as a SYNC API. But currently, >>> we mix biglock and private lock together, and wait_for_completion() >>> maybe block the release of big lock, which finally causes deadlock. So >>> we can not simply rely on this model. >>> Instead, we need to classify the calling scene into three cases: >>> case1. lockA--dispatcher ---> lockB-dispatcher //can use >>> async+completion model >>> case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can >>> cause the nested lock of biglock >>> case3. biglock-dispacher ---> lockB-dispatcher // async to avoid >>> the lock sequence problem, (as to completion, it need to be placed >>> outside the top level biglock, and it is hard to do so. Suggest to >>> change to case 1. Or at present, just leave it async) >>> >>> This new model will require the biglock can be nested. >> >> I think changing to an async model is too complicated. It's difficult >> enough already. Isn't dropping private locks + recursive big locks >> sufficient? >> > I think that "dropping private locks + recursive big locks" just cover > case 2. And most of the important, it dont describe case3 which break > the rule of lock sequence "private-lock --> biglock". Scene: > devA_lock-->(devX_with-biglock--->devB_lock). Why not? devA will drop its local lock, devX will retake the big lock recursively, devB will take its local lock. In the end, we have biglock -> devB. > I just want to classify and post these cases to discuss. Maybe we can > achieve without async. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:07 ` Avi Kivity @ 2012-09-19 9:11 ` liu ping fan 2012-09-19 9:14 ` Paolo Bonzini 0 siblings, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-19 9:11 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, Anthony Liguori, Paolo Bonzini On Wed, Sep 19, 2012 at 5:07 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/19/2012 12:00 PM, liu ping fan wrote: >> On Wed, Sep 19, 2012 at 4:06 PM, Avi Kivity <avi@redhat.com> wrote: >>> On 09/19/2012 06:02 AM, liu ping fan wrote: >>>> Currently, cpu_physical_memory_rw() can be used directly or indirectly >>>> by mmio-dispatcher to access other devices' memory region. This can >>>> cause some problem when adopting device's private lock. >>>> >>>> Back ground refer to: >>>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html >>>> For lazy, just refer to: >>>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html >>>> >>>> >>>> --1st. the recursive lock of biglock. >>>> If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have >>>> the following (section of the whole call chain, and with >>>> private_lockA): >>>> lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > >>>> Before c_p_m_rw(), we drop private_lockA to anti the possibly of >>>> deadlock. But we can not anti the nested of this chain or calling to >>>> another lockB-mmio-dispatcher. So we can not avoid the possibility of >>>> nested lock of biglock. And another important factor is that we break >>>> the lock sequence: private_lock-->biglock. >>>> All of these require us to push biglock's holding into c_p_m_rw(), the >>>> wrapper can not give help. >>> >>> I agree that this is unavoidable. >>> >>>> >>>> --2nd. c_p_m_rw(), sync or async? >>>> >>>> IF we convert all of the device to be protected by refcount, then we can have >>>> //no big lock >>>> c_p_m_rw() >>>> { >>>> devB->ref++; >>>> { >>>> --------------------------------------->pushed onto another thread. >>>> lock_privatelock >>>> mr->ops->write(); >>>> unlock_privatelock >>>> } >>>> wait_for_completion(); >>>> devB->ref--; >>>> } >>>> This model can help c_p_m_rw() present as a SYNC API. But currently, >>>> we mix biglock and private lock together, and wait_for_completion() >>>> maybe block the release of big lock, which finally causes deadlock. So >>>> we can not simply rely on this model. >>>> Instead, we need to classify the calling scene into three cases: >>>> case1. lockA--dispatcher ---> lockB-dispatcher //can use >>>> async+completion model >>>> case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can >>>> cause the nested lock of biglock >>>> case3. biglock-dispacher ---> lockB-dispatcher // async to avoid >>>> the lock sequence problem, (as to completion, it need to be placed >>>> outside the top level biglock, and it is hard to do so. Suggest to >>>> change to case 1. Or at present, just leave it async) >>>> >>>> This new model will require the biglock can be nested. >>> >>> I think changing to an async model is too complicated. It's difficult >>> enough already. Isn't dropping private locks + recursive big locks >>> sufficient? >>> >> I think that "dropping private locks + recursive big locks" just cover >> case 2. And most of the important, it dont describe case3 which break >> the rule of lock sequence "private-lock --> biglock". Scene: >> devA_lock-->(devX_with-biglock--->devB_lock). > > Why not? devA will drop its local lock, devX will retake the big lock > recursively, devB will take its local lock. In the end, we have biglock > -> devB. > But when adopting local lock, we assume take local lock, then biglock. Otherwise another thread will take biglock then local lock, which cause the possibility of deadlock. >> I just want to classify and post these cases to discuss. Maybe we can >> achieve without async. > > > > > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:11 ` liu ping fan @ 2012-09-19 9:14 ` Paolo Bonzini 2012-09-19 9:19 ` liu ping fan 2012-09-19 9:21 ` Avi Kivity 0 siblings, 2 replies; 45+ messages in thread From: Paolo Bonzini @ 2012-09-19 9:14 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, Avi Kivity, Anthony Liguori, qemu-devel Il 19/09/2012 11:11, liu ping fan ha scritto: >> > Why not? devA will drop its local lock, devX will retake the big lock >> > recursively, devB will take its local lock. In the end, we have biglock >> > -> devB. >> > > But when adopting local lock, we assume take local lock, then biglock. No, because the local lock will be dropped before taking the biglock. The order must always be coarse->fine. I don't know if the front-end (device) lock should come before or after the back-end (e.g. netdev) lock in the hierarchy, but that's another story. Paolo > Otherwise another thread will take biglock then local lock, which > cause the possibility of deadlock. > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:14 ` Paolo Bonzini @ 2012-09-19 9:19 ` liu ping fan 2012-09-19 9:23 ` Avi Kivity 2012-09-19 9:21 ` Avi Kivity 1 sibling, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-19 9:19 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, Avi Kivity, Anthony Liguori, qemu-devel On Wed, Sep 19, 2012 at 5:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 19/09/2012 11:11, liu ping fan ha scritto: >>> > Why not? devA will drop its local lock, devX will retake the big lock >>> > recursively, devB will take its local lock. In the end, we have biglock >>> > -> devB. >>> > >> But when adopting local lock, we assume take local lock, then biglock. > > No, because the local lock will be dropped before taking the biglock. > The order must always be coarse->fine. > But if we takes coarse firstly, then the mmio-dispatcher will still contend for the big lock against each other. > I don't know if the front-end (device) lock should come before or after > the back-end (e.g. netdev) lock in the hierarchy, but that's another story. > I think fine->coarse may be the rule, since for some code path, it is not necessary to take coarse lock. Thanks and regards, pingfan > Paolo > >> Otherwise another thread will take biglock then local lock, which >> cause the possibility of deadlock. >> > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:19 ` liu ping fan @ 2012-09-19 9:23 ` Avi Kivity 2012-09-19 9:27 ` Jan Kiszka 2012-09-20 7:51 ` liu ping fan 0 siblings, 2 replies; 45+ messages in thread From: Avi Kivity @ 2012-09-19 9:23 UTC (permalink / raw) To: liu ping fan Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel, Anthony Liguori, Jan Kiszka On 09/19/2012 12:19 PM, liu ping fan wrote: > On Wed, Sep 19, 2012 at 5:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 19/09/2012 11:11, liu ping fan ha scritto: >>>> > Why not? devA will drop its local lock, devX will retake the big lock >>>> > recursively, devB will take its local lock. In the end, we have biglock >>>> > -> devB. >>>> > >>> But when adopting local lock, we assume take local lock, then biglock. >> >> No, because the local lock will be dropped before taking the biglock. >> The order must always be coarse->fine. >> > But if we takes coarse firstly, then the mmio-dispatcher will still > contend for the big lock against each other. Can you detail the sequence? > >> I don't know if the front-end (device) lock should come before or after >> the back-end (e.g. netdev) lock in the hierarchy, but that's another story. >> > I think fine->coarse may be the rule, since for some code path, it is > not necessary to take coarse lock. coarse->fine doesn't mean you have to take the coarse lock. Valid: lock(coarse) lock(fine) Valid: lock(find) Valid: lock(coarse) Invalid: lock(fine) lock(coarse) -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:23 ` Avi Kivity @ 2012-09-19 9:27 ` Jan Kiszka 2012-09-19 9:28 ` Jan Kiszka 2012-09-20 7:51 ` liu ping fan 1 sibling, 1 reply; 45+ messages in thread From: Jan Kiszka @ 2012-09-19 9:27 UTC (permalink / raw) To: Avi Kivity Cc: Paolo Bonzini, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org On 2012-09-19 11:23, Avi Kivity wrote: > On 09/19/2012 12:19 PM, liu ping fan wrote: >> On Wed, Sep 19, 2012 at 5:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> Il 19/09/2012 11:11, liu ping fan ha scritto: >>>>>> Why not? devA will drop its local lock, devX will retake the big lock >>>>>> recursively, devB will take its local lock. In the end, we have biglock >>>>>> -> devB. >>>>>> >>>> But when adopting local lock, we assume take local lock, then biglock. >>> >>> No, because the local lock will be dropped before taking the biglock. >>> The order must always be coarse->fine. >>> >> But if we takes coarse firstly, then the mmio-dispatcher will still >> contend for the big lock against each other. > > Can you detail the sequence? > >> >>> I don't know if the front-end (device) lock should come before or after >>> the back-end (e.g. netdev) lock in the hierarchy, but that's another story. >>> >> I think fine->coarse may be the rule, since for some code path, it is >> not necessary to take coarse lock. > > coarse->fine doesn't mean you have to take the coarse lock. > > Valid: > lock(coarse) > lock(fine) This is invalid due to prio inversion issues, independent of potential deadlocks. Jan > > Valid: > lock(find) > > Valid: > lock(coarse) > > Invalid: > lock(fine) > lock(coarse) > > -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:27 ` Jan Kiszka @ 2012-09-19 9:28 ` Jan Kiszka 0 siblings, 0 replies; 45+ messages in thread From: Jan Kiszka @ 2012-09-19 9:28 UTC (permalink / raw) Cc: Marcelo Tosatti, qemu-devel@nongnu.org, liu ping fan, Avi Kivity, Anthony Liguori, Paolo Bonzini On 2012-09-19 11:27, Jan Kiszka wrote: > On 2012-09-19 11:23, Avi Kivity wrote: >> On 09/19/2012 12:19 PM, liu ping fan wrote: >>> On Wed, Sep 19, 2012 at 5:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>> Il 19/09/2012 11:11, liu ping fan ha scritto: >>>>>>> Why not? devA will drop its local lock, devX will retake the big lock >>>>>>> recursively, devB will take its local lock. In the end, we have biglock >>>>>>> -> devB. >>>>>>> >>>>> But when adopting local lock, we assume take local lock, then biglock. >>>> >>>> No, because the local lock will be dropped before taking the biglock. >>>> The order must always be coarse->fine. >>>> >>> But if we takes coarse firstly, then the mmio-dispatcher will still >>> contend for the big lock against each other. >> >> Can you detail the sequence? >> >>> >>>> I don't know if the front-end (device) lock should come before or after >>>> the back-end (e.g. netdev) lock in the hierarchy, but that's another story. >>>> >>> I think fine->coarse may be the rule, since for some code path, it is >>> not necessary to take coarse lock. >> >> coarse->fine doesn't mean you have to take the coarse lock. >> >> Valid: >> lock(coarse) >> lock(fine) > > This is invalid due to prio inversion issues, independent of potential > deadlocks. Err, forget it - the other way around is the problem. Sorry, Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:23 ` Avi Kivity 2012-09-19 9:27 ` Jan Kiszka @ 2012-09-20 7:51 ` liu ping fan 2012-09-20 7:54 ` Paolo Bonzini 2012-09-20 9:07 ` Avi Kivity 1 sibling, 2 replies; 45+ messages in thread From: liu ping fan @ 2012-09-20 7:51 UTC (permalink / raw) To: Avi Kivity Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel, Anthony Liguori, Jan Kiszka On Wed, Sep 19, 2012 at 5:23 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/19/2012 12:19 PM, liu ping fan wrote: >> On Wed, Sep 19, 2012 at 5:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> Il 19/09/2012 11:11, liu ping fan ha scritto: >>>>> > Why not? devA will drop its local lock, devX will retake the big lock >>>>> > recursively, devB will take its local lock. In the end, we have biglock >>>>> > -> devB. >>>>> > >>>> But when adopting local lock, we assume take local lock, then biglock. >>> >>> No, because the local lock will be dropped before taking the biglock. >>> The order must always be coarse->fine. >>> >> But if we takes coarse firstly, then the mmio-dispatcher will still >> contend for the big lock against each other. > > Can you detail the sequence? > LOCK(local lock) ....................... LOCK(big lock) Access timer/block/network subsystem UNLOCK(big lock) ..................... UNLOCK(local lock) >> >>> I don't know if the front-end (device) lock should come before or after >>> the back-end (e.g. netdev) lock in the hierarchy, but that's another story. >>> >> I think fine->coarse may be the rule, since for some code path, it is >> not necessary to take coarse lock. > > coarse->fine doesn't mean you have to take the coarse lock. > Sorry, donot catching your meaning. Does not "coarse->fine" mean LOCK(coarse)-->LOCK(fine); .. UNLOCK(fine)-->UNLOCK(coarse) ? > Valid: > lock(coarse) > lock(fine) > But it is conflict with " localLock(fine) --> bigLock(coarse)" which is taken when mmio dispatch. Regards, pingfan > Valid: > lock(find) > > Valid: > lock(coarse) > > Invalid: > lock(fine) > lock(coarse) > > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-20 7:51 ` liu ping fan @ 2012-09-20 7:54 ` Paolo Bonzini 2012-09-20 8:09 ` liu ping fan 2012-09-20 9:07 ` Avi Kivity 1 sibling, 1 reply; 45+ messages in thread From: Paolo Bonzini @ 2012-09-20 7:54 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, Avi Kivity, Anthony Liguori, qemu-devel Il 20/09/2012 09:51, liu ping fan ha scritto: > Sorry, donot catching your meaning. Does not "coarse->fine" mean > LOCK(coarse)-->LOCK(fine); .. UNLOCK(fine)-->UNLOCK(coarse) ? Yes. >> > Valid: >> > lock(coarse) >> > lock(fine) >> > > But it is conflict with " localLock(fine) --> bigLock(coarse)" which > is taken when mmio dispatch. No, MMIO dispatch has to discard the fine-grained lock before acquiring the big lock. If you allow lock(fine) lock(coarse) then the (presumably higher-priority) thread that is requesting the fine-grained lock must wait for the lower-priority thread that holds the coarse-grained lock. Then you get priority inversion. Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-20 7:54 ` Paolo Bonzini @ 2012-09-20 8:09 ` liu ping fan 2012-09-20 8:27 ` Paolo Bonzini 0 siblings, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-20 8:09 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, Avi Kivity, Anthony Liguori, qemu-devel On Thu, Sep 20, 2012 at 3:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 20/09/2012 09:51, liu ping fan ha scritto: >> Sorry, donot catching your meaning. Does not "coarse->fine" mean >> LOCK(coarse)-->LOCK(fine); .. UNLOCK(fine)-->UNLOCK(coarse) ? > > Yes. > >>> > Valid: >>> > lock(coarse) >>> > lock(fine) >>> > >> But it is conflict with " localLock(fine) --> bigLock(coarse)" which >> is taken when mmio dispatch. > > No, MMIO dispatch has to discard the fine-grained lock before acquiring > the big lock. > This will cause the device state broken, and expose device under changing risk. > If you allow > > lock(fine) > lock(coarse) > > then the (presumably higher-priority) thread that is requesting the > fine-grained lock must wait for the lower-priority thread that holds the > coarse-grained lock. Then you get priority inversion. > Which thread has higher-priority? Why does the thread with coarse lock have lower-priority? Thanks and regards, pingfan > Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-20 8:09 ` liu ping fan @ 2012-09-20 8:27 ` Paolo Bonzini 0 siblings, 0 replies; 45+ messages in thread From: Paolo Bonzini @ 2012-09-20 8:27 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, Avi Kivity, Anthony Liguori, qemu-devel Il 20/09/2012 10:09, liu ping fan ha scritto: > > > > No, MMIO dispatch has to discard the fine-grained lock before acquiring > > the big lock. > > This will cause the device state broken, and expose device under changing risk. We do it all the time with asynchronous I/O. It's just an invariant that fine-grain-locked devices have to obey. You cannot keep the same semantics you enjoyed when a single lock covered everything. > > If you allow > > > > lock(fine) > > lock(coarse) > > > > then the (presumably higher-priority) thread that is requesting the > > fine-grained lock must wait for the lower-priority thread that holds the > > coarse-grained lock. Then you get priority inversion. > > Which thread has higher-priority? Why does the thread with coarse lock > have lower-priority? Because there can be so many things happening under the coarse lock, that just _assuming_ it has lower priority is a good thing. On the other hand, if the device didn't need immediate servicing (which means higher priority), you wouldn't bother taking it out of the big QEMU lock. Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-20 7:51 ` liu ping fan 2012-09-20 7:54 ` Paolo Bonzini @ 2012-09-20 9:07 ` Avi Kivity 2012-09-21 7:27 ` liu ping fan 1 sibling, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-20 9:07 UTC (permalink / raw) To: liu ping fan Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel, Anthony Liguori, Jan Kiszka On 09/20/2012 10:51 AM, liu ping fan wrote: > On Wed, Sep 19, 2012 at 5:23 PM, Avi Kivity <avi@redhat.com> wrote: >> On 09/19/2012 12:19 PM, liu ping fan wrote: >>> On Wed, Sep 19, 2012 at 5:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>> Il 19/09/2012 11:11, liu ping fan ha scritto: >>>>>> > Why not? devA will drop its local lock, devX will retake the big lock >>>>>> > recursively, devB will take its local lock. In the end, we have biglock >>>>>> > -> devB. >>>>>> > >>>>> But when adopting local lock, we assume take local lock, then biglock. >>>> >>>> No, because the local lock will be dropped before taking the biglock. >>>> The order must always be coarse->fine. >>>> >>> But if we takes coarse firstly, then the mmio-dispatcher will still >>> contend for the big lock against each other. >> >> Can you detail the sequence? >> > LOCK(local lock) > ....................... > LOCK(big lock) > Access timer/block/network subsystem > UNLOCK(big lock) > ..................... > UNLOCK(local lock) This is an invalid sequence. Either the subsystem has to be fine-grain locked, or the lock order has to be reversed. Before we finish subsystem conversion, an mmio dispatcher may look like: dev_write(...) { lock(s->lock) switch (addr) { case REGA: ... case REGB: ... case REGC: unlock(s->lock) lock(big lock) lock(s->lock) qemu_mod_timer() unlock(bit lock) break; ... } unlock(s->lock) } -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-20 9:07 ` Avi Kivity @ 2012-09-21 7:27 ` liu ping fan 2012-09-21 8:21 ` Paolo Bonzini 0 siblings, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-21 7:27 UTC (permalink / raw) To: qemu-devel, Paolo Bonzini; +Cc: Jan Kiszka, Marcelo Tosatti, Anthony Liguori On Thu, Sep 20, 2012 at 5:07 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/20/2012 10:51 AM, liu ping fan wrote: >> On Wed, Sep 19, 2012 at 5:23 PM, Avi Kivity <avi@redhat.com> wrote: >>> On 09/19/2012 12:19 PM, liu ping fan wrote: >>>> On Wed, Sep 19, 2012 at 5:14 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>>> Il 19/09/2012 11:11, liu ping fan ha scritto: >>>>>>> > Why not? devA will drop its local lock, devX will retake the big lock >>>>>>> > recursively, devB will take its local lock. In the end, we have biglock >>>>>>> > -> devB. >>>>>>> > >>>>>> But when adopting local lock, we assume take local lock, then biglock. >>>>> >>>>> No, because the local lock will be dropped before taking the biglock. >>>>> The order must always be coarse->fine. >>>>> >>>> But if we takes coarse firstly, then the mmio-dispatcher will still >>>> contend for the big lock against each other. >>> >>> Can you detail the sequence? >>> >> LOCK(local lock) >> ....................... >> LOCK(big lock) >> Access timer/block/network subsystem >> UNLOCK(big lock) >> ..................... >> UNLOCK(local lock) > > This is an invalid sequence. Either the subsystem has to be fine-grain > locked, or the lock order has to be reversed. > Oh! And from this thread, my understanding of the reason for the rule of lock sequence: coarse->fine is that biglock means higher possibility of conflict, so we try it first, then try the fine-lock. In this way, we have a smaller window for holding fine-lock which means the other thread can get this lock more smoothly. Right? NOT want to open an argument, just a question, is there any reason for the sequence devlock->timelock? Regards, pingfan > Before we finish subsystem conversion, an mmio dispatcher may look like: > > dev_write(...) > { > lock(s->lock) > switch (addr) { > case REGA: > ... > case REGB: > ... > case REGC: > unlock(s->lock) > lock(big lock) > lock(s->lock) > qemu_mod_timer() > unlock(bit lock) > break; > ... > } > unlock(s->lock) > } > > > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-21 7:27 ` liu ping fan @ 2012-09-21 8:21 ` Paolo Bonzini 0 siblings, 0 replies; 45+ messages in thread From: Paolo Bonzini @ 2012-09-21 8:21 UTC (permalink / raw) To: liu ping fan; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, Anthony Liguori > Oh! And from this thread, my understanding of the reason for the rule > of lock sequence: coarse->fine is that biglock means higher > possibility of conflict, so we try it first, then try the fine-lock. > In this way, we have a smaller window for holding fine-lock which > means the other thread can get this lock more smoothly. Right? Yes. > NOT want to open an argument, just a question, is there any reason > for the sequence devlock->timelock? Not any particular reason, just that it makes more sense. :) Backend subsystems (timers, network, I/O handlers, ...) typically will not call anything else. So it makes sense that their locks are held inside the device locks (and inside the big lock). Also note that while the timer subsystem could call the devices, it is perfectly plausible that the devices will do for example a qemu_mod_timer from a timer callback. Thus, in some sense the timer subsystem already has to expect modifications of its state during the callback. Releasing the lock during callbacks is "just" an extension of the current behavior. Of course some changes are needed to the invariants, such as the one Avi pointed out elsewhere in the thread, but devlock->timerlock is overall more natural than the opposite. Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:14 ` Paolo Bonzini 2012-09-19 9:19 ` liu ping fan @ 2012-09-19 9:21 ` Avi Kivity 2012-09-19 9:51 ` Paolo Bonzini 1 sibling, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-19 9:21 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel On 09/19/2012 12:14 PM, Paolo Bonzini wrote: > I don't know if the front-end (device) lock should come before or after > the back-end (e.g. netdev) lock in the hierarchy, but that's another story. I would say device -> backend. It's natural if the backend is the timer subsystem, so extend it from there to the block and network layers. Of course callbacks want it to be the other way round. We could solve the callback problem in the same way we're using for mmio callbacks. When providing the callback, give the subsystem ->ref() and ->unref() methods. As long as a callback is pending, the subsystem holds a reference. Once the callback has been invoked or cancelled, the reference can be released. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:21 ` Avi Kivity @ 2012-09-19 9:51 ` Paolo Bonzini 2012-09-19 10:06 ` Avi Kivity 0 siblings, 1 reply; 45+ messages in thread From: Paolo Bonzini @ 2012-09-19 9:51 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel Il 19/09/2012 11:21, Avi Kivity ha scritto: >> > I don't know if the front-end (device) lock should come before or after >> > the back-end (e.g. netdev) lock in the hierarchy, but that's another story. > I would say device -> backend. It's natural if the backend is the timer > subsystem, so extend it from there to the block and network layers. Of > course callbacks want it to be the other way round. Yes, that's what I wasn't sure about. In many cases I believe callbacks can just release the backend lock. It works for timers, for example: for(;;) { ts = clock->active_timers; if (!qemu_timer_expired_ns(ts, current_time)) { break; } /* remove timer from the list before calling the callback */ clock->active_timers = ts->next; ts->next = NULL; /* run the callback (the timer list can be modified) */ - ts->cb(ts->opaque); + cb = ts->cb; + opaque = ts->opaque; + unlock(); + cb(opaque); + lock(); } (The hunch is that ts could be deleted exactly at the moment the callback is unlocked. This can be solved with ref/unref on the opaque value, as you mention below). Paolo > We could solve the callback problem in the same way we're using for mmio > callbacks. When providing the callback, give the subsystem ->ref() and > ->unref() methods. As long as a callback is pending, the subsystem > holds a reference. Once the callback has been invoked or cancelled, the > reference can be released. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:51 ` Paolo Bonzini @ 2012-09-19 10:06 ` Avi Kivity 2012-09-19 10:19 ` Paolo Bonzini 0 siblings, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-19 10:06 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel On 09/19/2012 12:51 PM, Paolo Bonzini wrote: > Il 19/09/2012 11:21, Avi Kivity ha scritto: >>> > I don't know if the front-end (device) lock should come before or after >>> > the back-end (e.g. netdev) lock in the hierarchy, but that's another story. >> I would say device -> backend. It's natural if the backend is the timer >> subsystem, so extend it from there to the block and network layers. Of >> course callbacks want it to be the other way round. > > Yes, that's what I wasn't sure about. In many cases I believe callbacks > can just release the backend lock. It works for timers, for example: > > for(;;) { > ts = clock->active_timers; > if (!qemu_timer_expired_ns(ts, current_time)) { > break; > } > /* remove timer from the list before calling the callback */ > clock->active_timers = ts->next; > ts->next = NULL; > > /* run the callback (the timer list can be modified) */ > - ts->cb(ts->opaque); > + cb = ts->cb; > + opaque = ts->opaque; > + unlock(); > + cb(opaque); > + lock(); > } > > (The hunch is that ts could be deleted exactly at the moment the > callback is unlocked. This can be solved with ref/unref on the opaque > value, as you mention below). Are you saying that this works as is or not? It does seem broken wrt deletion; after qemu_del_timer() completes the caller expects the callback not to be called. This isn't trivial to guarantee, we need something like dispatch_timer(): pending += 1 timer.ref() drop lock timer.cb() take lock timer.unref() pending -= 1 notify del_timer(): take lock timer.unlink() while pending: wait for notification drop lock but, if del_timer is called with the device lock held, we deadlock. ugh. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 10:06 ` Avi Kivity @ 2012-09-19 10:19 ` Paolo Bonzini 2012-09-19 10:27 ` Avi Kivity 0 siblings, 1 reply; 45+ messages in thread From: Paolo Bonzini @ 2012-09-19 10:19 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel Il 19/09/2012 12:06, Avi Kivity ha scritto: >> > (The hunch is that ts could be deleted exactly at the moment the >> > callback is unlocked. This can be solved with ref/unref on the opaque >> > value, as you mention below). > Are you saying that this works as is or not? It does seem broken wrt > deletion; after qemu_del_timer() completes the caller expects the > callback not to be called. Ouch, then I think the only solution is to remove this invariant if you add fine-grained locking and re-check the enable conditions in the timer callback. If you allow calling del_timer to be called with the device lock held, you cannot fixing without deadlocking. If you don't, the caller of del_timer cannot set any expectation because the timer could be run in the window between dropping the device lock and calling del_timer Paolo > This isn't trivial to guarantee, we need something like > > dispatch_timer(): > pending += 1 > timer.ref() > drop lock > timer.cb() > take lock > timer.unref() > pending -= 1 > notify > > del_timer(): > take lock > timer.unlink() > while pending: > wait for notification > drop lock > > but, if del_timer is called with the device lock held, we deadlock. ugh. Wait for notification should be a cond_wait that drops the lock. Alternatively: dispatch_timer(): drop lock if timer has expired: timer.cb() take lock notify del_timer(): take lock timer.unlink() drop lock ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 10:19 ` Paolo Bonzini @ 2012-09-19 10:27 ` Avi Kivity 0 siblings, 0 replies; 45+ messages in thread From: Avi Kivity @ 2012-09-19 10:27 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel On 09/19/2012 01:19 PM, Paolo Bonzini wrote: > Il 19/09/2012 12:06, Avi Kivity ha scritto: >>> > (The hunch is that ts could be deleted exactly at the moment the >>> > callback is unlocked. This can be solved with ref/unref on the opaque >>> > value, as you mention below). >> Are you saying that this works as is or not? It does seem broken wrt >> deletion; after qemu_del_timer() completes the caller expects the >> callback not to be called. > > Ouch, then I think the only solution is to remove this invariant if you > add fine-grained locking and re-check the enable conditions in the timer > callback. I think so too, I know authors of GUI frameworks suffer from this problem (the GUI calling into the framework vs. the framework asking the GUI to repaint). Don't know what the state of the art solution is. > > If you allow calling del_timer to be called with the device lock held, > you cannot fixing without deadlocking. If you don't, the caller of > del_timer cannot set any expectation because the timer could be run in > the window between dropping the device lock and calling del_timer > Right. Callbacks should check qemu_timer_deleted() after taking their little lock. The reference held by the timer subsystem protects against qemu_free_timer(). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:00 ` liu ping fan 2012-09-19 9:07 ` Avi Kivity @ 2012-09-19 9:34 ` Jan Kiszka 2012-09-19 9:50 ` Avi Kivity 2012-09-20 8:11 ` liu ping fan 1 sibling, 2 replies; 45+ messages in thread From: Jan Kiszka @ 2012-09-19 9:34 UTC (permalink / raw) To: liu ping fan Cc: Paolo Bonzini, Marcelo Tosatti, Avi Kivity, Anthony Liguori, qemu-devel@nongnu.org On 2012-09-19 11:00, liu ping fan wrote: > On Wed, Sep 19, 2012 at 4:06 PM, Avi Kivity <avi@redhat.com> wrote: >> On 09/19/2012 06:02 AM, liu ping fan wrote: >>> Currently, cpu_physical_memory_rw() can be used directly or indirectly >>> by mmio-dispatcher to access other devices' memory region. This can >>> cause some problem when adopting device's private lock. >>> >>> Back ground refer to: >>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html >>> For lazy, just refer to: >>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html >>> >>> >>> --1st. the recursive lock of biglock. >>> If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have >>> the following (section of the whole call chain, and with >>> private_lockA): >>> lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > >>> Before c_p_m_rw(), we drop private_lockA to anti the possibly of >>> deadlock. But we can not anti the nested of this chain or calling to >>> another lockB-mmio-dispatcher. So we can not avoid the possibility of >>> nested lock of biglock. And another important factor is that we break >>> the lock sequence: private_lock-->biglock. >>> All of these require us to push biglock's holding into c_p_m_rw(), the >>> wrapper can not give help. >> >> I agree that this is unavoidable. >> >>> >>> --2nd. c_p_m_rw(), sync or async? >>> >>> IF we convert all of the device to be protected by refcount, then we can have >>> //no big lock >>> c_p_m_rw() >>> { >>> devB->ref++; >>> { >>> --------------------------------------->pushed onto another thread. >>> lock_privatelock >>> mr->ops->write(); >>> unlock_privatelock >>> } >>> wait_for_completion(); >>> devB->ref--; >>> } >>> This model can help c_p_m_rw() present as a SYNC API. But currently, >>> we mix biglock and private lock together, and wait_for_completion() >>> maybe block the release of big lock, which finally causes deadlock. So >>> we can not simply rely on this model. >>> Instead, we need to classify the calling scene into three cases: >>> case1. lockA--dispatcher ---> lockB-dispatcher //can use >>> async+completion model >>> case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can >>> cause the nested lock of biglock >>> case3. biglock-dispacher ---> lockB-dispatcher // async to avoid >>> the lock sequence problem, (as to completion, it need to be placed >>> outside the top level biglock, and it is hard to do so. Suggest to >>> change to case 1. Or at present, just leave it async) >>> >>> This new model will require the biglock can be nested. >> >> I think changing to an async model is too complicated. It's difficult >> enough already. Isn't dropping private locks + recursive big locks >> sufficient? >> > I think that "dropping private locks + recursive big locks" just cover > case 2. And most of the important, it dont describe case3 which break > the rule of lock sequence "private-lock --> biglock". Scene: > devA_lock-->(devX_with-biglock--->devB_lock). > I just want to classify and post these cases to discuss. Maybe we can > achieve without async. What about the following: What we really need to support in practice is MMIO access triggers RAM access of device model. Scenarios where a device access triggers another MMIO access could likely just be rejected without causing troubles. So, when we dispatch a request to a device, we mark that the current thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does _not_ target RAM, ie. is another, nested MMIO request - independent of its destination. How much of the known issues would this solve? And what would remain open? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:34 ` Jan Kiszka @ 2012-09-19 9:50 ` Avi Kivity 2012-09-19 10:18 ` Jan Kiszka 2012-09-24 6:33 ` liu ping fan 2012-09-20 8:11 ` liu ping fan 1 sibling, 2 replies; 45+ messages in thread From: Avi Kivity @ 2012-09-19 9:50 UTC (permalink / raw) To: Jan Kiszka Cc: Paolo Bonzini, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org On 09/19/2012 12:34 PM, Jan Kiszka wrote: > > What about the following: > > What we really need to support in practice is MMIO access triggers RAM > access of device model. Scenarios where a device access triggers another > MMIO access could likely just be rejected without causing troubles. > > So, when we dispatch a request to a device, we mark that the current > thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does > _not_ target RAM, ie. is another, nested MMIO request - independent of > its destination. How much of the known issues would this solve? And what > would remain open? Various iommu-like devices re-dispatch I/O, like changing endianness or bitband. I don't know whether it targets I/O rather than RAM. If they do, we can push the support into the memory API. I think it's acceptable as a short term solution (short term meaning as long as needed). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:50 ` Avi Kivity @ 2012-09-19 10:18 ` Jan Kiszka 2012-09-24 6:33 ` liu ping fan 1 sibling, 0 replies; 45+ messages in thread From: Jan Kiszka @ 2012-09-19 10:18 UTC (permalink / raw) To: Avi Kivity Cc: Paolo Bonzini, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org On 2012-09-19 11:50, Avi Kivity wrote: > On 09/19/2012 12:34 PM, Jan Kiszka wrote: >> >> What about the following: >> >> What we really need to support in practice is MMIO access triggers RAM >> access of device model. Scenarios where a device access triggers another >> MMIO access could likely just be rejected without causing troubles. >> >> So, when we dispatch a request to a device, we mark that the current >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does >> _not_ target RAM, ie. is another, nested MMIO request - independent of >> its destination. How much of the known issues would this solve? And what >> would remain open? > > Various iommu-like devices re-dispatch I/O, like changing endianness or > bitband. I don't know whether it targets I/O rather than RAM. If such dispatching is lock-less and we validate it's not recursive, then it should be fine even when not targeting RAM. And we could also factor out generic dispatchers like MMIO->PIO. > > If they do, we can push the support into the memory API. I think it's > acceptable as a short term solution (short term meaning as long as needed). We will always have to avoid loops. I think all we can improve is what we still allow by better detecting what would be wrong. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:50 ` Avi Kivity 2012-09-19 10:18 ` Jan Kiszka @ 2012-09-24 6:33 ` liu ping fan 2012-09-24 7:44 ` Avi Kivity 1 sibling, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-24 6:33 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On Wed, Sep 19, 2012 at 5:50 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/19/2012 12:34 PM, Jan Kiszka wrote: >> >> What about the following: >> >> What we really need to support in practice is MMIO access triggers RAM >> access of device model. Scenarios where a device access triggers another >> MMIO access could likely just be rejected without causing troubles. >> >> So, when we dispatch a request to a device, we mark that the current >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does >> _not_ target RAM, ie. is another, nested MMIO request - independent of >> its destination. How much of the known issues would this solve? And what >> would remain open? > > Various iommu-like devices re-dispatch I/O, like changing endianness or > bitband. I don't know whether it targets I/O rather than RAM. > Have not found the exact code. But I think the call chain may look like this: dev mmio-handler --> c_p_m_rw() --> iommu mmio-handler --> c_p_m_rw() And I think you worry about the case for "c_p_m_rw() --> iommu mmio-handler". Right? How about introduce an member can_nest for MemoryRegionOps of iommu's mr? Regards, pingfan > If they do, we can push the support into the memory API. I think it's > acceptable as a short term solution (short term meaning as long as needed). > > > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-24 6:33 ` liu ping fan @ 2012-09-24 7:44 ` Avi Kivity 2012-09-24 8:32 ` liu ping fan 0 siblings, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-24 7:44 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On 09/24/2012 08:33 AM, liu ping fan wrote: > On Wed, Sep 19, 2012 at 5:50 PM, Avi Kivity <avi@redhat.com> wrote: > > On 09/19/2012 12:34 PM, Jan Kiszka wrote: > >> > >> What about the following: > >> > >> What we really need to support in practice is MMIO access triggers RAM > >> access of device model. Scenarios where a device access triggers another > >> MMIO access could likely just be rejected without causing troubles. > >> > >> So, when we dispatch a request to a device, we mark that the current > >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does > >> _not_ target RAM, ie. is another, nested MMIO request - independent of > >> its destination. How much of the known issues would this solve? And what > >> would remain open? > > > > Various iommu-like devices re-dispatch I/O, like changing endianness or > > bitband. I don't know whether it targets I/O rather than RAM. > > > Have not found the exact code. But I think the call chain may look > like this: dev mmio-handler --> c_p_m_rw() --> iommu mmio-handler --> > c_p_m_rw() > And I think you worry about the case for "c_p_m_rw() --> iommu > mmio-handler". Right? How about introduce an member can_nest for > MemoryRegionOps of iommu's mr? > I would rather push the iommu logic into the memory API: memory_region_init_iommu(MemoryRegion *mr, const char *name, MemoryRegion *target, MemoryRegionIOMMUOps *ops, unsigned size) struct MemoryRegionIOMMUOps { target_physical_addr_t (*translate)(target_physical_addr_t addr, bool write); void (*fault)(target_physical_addr_t addr); }; I'll look at a proposal for this. It's a generalized case of memory_region_init_alias(). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-24 7:44 ` Avi Kivity @ 2012-09-24 8:32 ` liu ping fan 2012-09-24 9:42 ` Avi Kivity 0 siblings, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-24 8:32 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On Mon, Sep 24, 2012 at 3:44 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/24/2012 08:33 AM, liu ping fan wrote: >> On Wed, Sep 19, 2012 at 5:50 PM, Avi Kivity <avi@redhat.com> wrote: >> > On 09/19/2012 12:34 PM, Jan Kiszka wrote: >> >> >> >> What about the following: >> >> >> >> What we really need to support in practice is MMIO access triggers RAM >> >> access of device model. Scenarios where a device access triggers another >> >> MMIO access could likely just be rejected without causing troubles. >> >> >> >> So, when we dispatch a request to a device, we mark that the current >> >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does >> >> _not_ target RAM, ie. is another, nested MMIO request - independent of >> >> its destination. How much of the known issues would this solve? And what >> >> would remain open? >> > >> > Various iommu-like devices re-dispatch I/O, like changing endianness or >> > bitband. I don't know whether it targets I/O rather than RAM. >> > >> Have not found the exact code. But I think the call chain may look >> like this: dev mmio-handler --> c_p_m_rw() --> iommu mmio-handler --> >> c_p_m_rw() >> And I think you worry about the case for "c_p_m_rw() --> iommu >> mmio-handler". Right? How about introduce an member can_nest for >> MemoryRegionOps of iommu's mr? >> > > I would rather push the iommu logic into the memory API: > > memory_region_init_iommu(MemoryRegion *mr, const char *name, > MemoryRegion *target, MemoryRegionIOMMUOps *ops, > unsigned size) > > struct MemoryRegionIOMMUOps { > target_physical_addr_t (*translate)(target_physical_addr_t addr, > bool write); > void (*fault)(target_physical_addr_t addr); > }; > So I guess, after introduce this, the code logic in c_p_m_rw() will look like this c_p_m_rw(dev_virt_addr, ...) { mr = phys_page_lookup(); if (mr->iommu_ops) real_addr = translate(dev_virt_addr,..); ptr = qemu_get_ram_ptr(real_addr); memcpy(buf, ptr, sz); } > I'll look at a proposal for this. It's a generalized case of > memory_region_init_alias(). > > -- > I have a truly marvellous patch that fixes the bug which this > signature is too narrow to contain. > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-24 8:32 ` liu ping fan @ 2012-09-24 9:42 ` Avi Kivity 2012-09-27 3:13 ` liu ping fan 0 siblings, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-24 9:42 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On 09/24/2012 10:32 AM, liu ping fan wrote: > On Mon, Sep 24, 2012 at 3:44 PM, Avi Kivity <avi@redhat.com> wrote: >> On 09/24/2012 08:33 AM, liu ping fan wrote: >>> On Wed, Sep 19, 2012 at 5:50 PM, Avi Kivity <avi@redhat.com> wrote: >>> > On 09/19/2012 12:34 PM, Jan Kiszka wrote: >>> >> >>> >> What about the following: >>> >> >>> >> What we really need to support in practice is MMIO access triggers RAM >>> >> access of device model. Scenarios where a device access triggers another >>> >> MMIO access could likely just be rejected without causing troubles. >>> >> >>> >> So, when we dispatch a request to a device, we mark that the current >>> >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does >>> >> _not_ target RAM, ie. is another, nested MMIO request - independent of >>> >> its destination. How much of the known issues would this solve? And what >>> >> would remain open? >>> > >>> > Various iommu-like devices re-dispatch I/O, like changing endianness or >>> > bitband. I don't know whether it targets I/O rather than RAM. >>> > >>> Have not found the exact code. But I think the call chain may look >>> like this: dev mmio-handler --> c_p_m_rw() --> iommu mmio-handler --> >>> c_p_m_rw() >>> And I think you worry about the case for "c_p_m_rw() --> iommu >>> mmio-handler". Right? How about introduce an member can_nest for >>> MemoryRegionOps of iommu's mr? >>> >> >> I would rather push the iommu logic into the memory API: >> >> memory_region_init_iommu(MemoryRegion *mr, const char *name, >> MemoryRegion *target, MemoryRegionIOMMUOps *ops, >> unsigned size) >> >> struct MemoryRegionIOMMUOps { >> target_physical_addr_t (*translate)(target_physical_addr_t addr, >> bool write); >> void (*fault)(target_physical_addr_t addr); >> }; >> > So I guess, after introduce this, the code logic in c_p_m_rw() will > look like this > > c_p_m_rw(dev_virt_addr, ...) > { > mr = phys_page_lookup(); > if (mr->iommu_ops) > real_addr = translate(dev_virt_addr,..); > > ptr = qemu_get_ram_ptr(real_addr); > memcpy(buf, ptr, sz); > } > Something like that. It will be a while loop, to allow for iommus strung in series. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-24 9:42 ` Avi Kivity @ 2012-09-27 3:13 ` liu ping fan 2012-09-27 9:16 ` Avi Kivity 0 siblings, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-27 3:13 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On Mon, Sep 24, 2012 at 5:42 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/24/2012 10:32 AM, liu ping fan wrote: >> On Mon, Sep 24, 2012 at 3:44 PM, Avi Kivity <avi@redhat.com> wrote: >>> On 09/24/2012 08:33 AM, liu ping fan wrote: >>>> On Wed, Sep 19, 2012 at 5:50 PM, Avi Kivity <avi@redhat.com> wrote: >>>> > On 09/19/2012 12:34 PM, Jan Kiszka wrote: >>>> >> >>>> >> What about the following: >>>> >> >>>> >> What we really need to support in practice is MMIO access triggers RAM >>>> >> access of device model. Scenarios where a device access triggers another >>>> >> MMIO access could likely just be rejected without causing troubles. >>>> >> >>>> >> So, when we dispatch a request to a device, we mark that the current >>>> >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does >>>> >> _not_ target RAM, ie. is another, nested MMIO request - independent of >>>> >> its destination. How much of the known issues would this solve? And what >>>> >> would remain open? >>>> > >>>> > Various iommu-like devices re-dispatch I/O, like changing endianness or >>>> > bitband. I don't know whether it targets I/O rather than RAM. >>>> > >>>> Have not found the exact code. But I think the call chain may look >>>> like this: dev mmio-handler --> c_p_m_rw() --> iommu mmio-handler --> >>>> c_p_m_rw() >>>> And I think you worry about the case for "c_p_m_rw() --> iommu >>>> mmio-handler". Right? How about introduce an member can_nest for >>>> MemoryRegionOps of iommu's mr? >>>> >>> >>> I would rather push the iommu logic into the memory API: >>> >>> memory_region_init_iommu(MemoryRegion *mr, const char *name, >>> MemoryRegion *target, MemoryRegionIOMMUOps *ops, >>> unsigned size) >>> >>> struct MemoryRegionIOMMUOps { >>> target_physical_addr_t (*translate)(target_physical_addr_t addr, >>> bool write); >>> void (*fault)(target_physical_addr_t addr); >>> }; >>> >> So I guess, after introduce this, the code logic in c_p_m_rw() will >> look like this >> >> c_p_m_rw(dev_virt_addr, ...) >> { >> mr = phys_page_lookup(); >> if (mr->iommu_ops) >> real_addr = translate(dev_virt_addr,..); >> >> ptr = qemu_get_ram_ptr(real_addr); >> memcpy(buf, ptr, sz); >> } >> > > Something like that. It will be a while loop, to allow for iommus > strung in series. > Will model the system like the following: --.Introduce iommu address space. It will be the container of the regions which are put under the management of iommu. --.In the system address space, using alias-iommu-mrX with priority=1 to expose iommu address space and obscure the overlapped regions. -- Device's access to address manged by alias-iommu-mrX c_p_m_rw(target_physical_addr_t addrA, ..) { while (len > 0) { mr = phys_page_lookup(); if (mr->iommu_ops) addrB = translate(addrA,..); ptr = qemu_get_ram_ptr(addrB); memcpy(buf, ptr, sz); } } Is it correct? Thanks and regards, pingfan > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 3:13 ` liu ping fan @ 2012-09-27 9:16 ` Avi Kivity 2012-09-27 9:29 ` Paolo Bonzini 2012-09-29 9:20 ` liu ping fan 0 siblings, 2 replies; 45+ messages in thread From: Avi Kivity @ 2012-09-27 9:16 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On 09/27/2012 05:13 AM, liu ping fan wrote: > On Mon, Sep 24, 2012 at 5:42 PM, Avi Kivity <avi@redhat.com> wrote: >> On 09/24/2012 10:32 AM, liu ping fan wrote: >>> On Mon, Sep 24, 2012 at 3:44 PM, Avi Kivity <avi@redhat.com> wrote: >>>> On 09/24/2012 08:33 AM, liu ping fan wrote: >>>>> On Wed, Sep 19, 2012 at 5:50 PM, Avi Kivity <avi@redhat.com> wrote: >>>>> > On 09/19/2012 12:34 PM, Jan Kiszka wrote: >>>>> >> >>>>> >> What about the following: >>>>> >> >>>>> >> What we really need to support in practice is MMIO access triggers RAM >>>>> >> access of device model. Scenarios where a device access triggers another >>>>> >> MMIO access could likely just be rejected without causing troubles. >>>>> >> >>>>> >> So, when we dispatch a request to a device, we mark that the current >>>>> >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does >>>>> >> _not_ target RAM, ie. is another, nested MMIO request - independent of >>>>> >> its destination. How much of the known issues would this solve? And what >>>>> >> would remain open? >>>>> > >>>>> > Various iommu-like devices re-dispatch I/O, like changing endianness or >>>>> > bitband. I don't know whether it targets I/O rather than RAM. >>>>> > >>>>> Have not found the exact code. But I think the call chain may look >>>>> like this: dev mmio-handler --> c_p_m_rw() --> iommu mmio-handler --> >>>>> c_p_m_rw() >>>>> And I think you worry about the case for "c_p_m_rw() --> iommu >>>>> mmio-handler". Right? How about introduce an member can_nest for >>>>> MemoryRegionOps of iommu's mr? >>>>> >>>> >>>> I would rather push the iommu logic into the memory API: >>>> >>>> memory_region_init_iommu(MemoryRegion *mr, const char *name, >>>> MemoryRegion *target, MemoryRegionIOMMUOps *ops, >>>> unsigned size) >>>> >>>> struct MemoryRegionIOMMUOps { >>>> target_physical_addr_t (*translate)(target_physical_addr_t addr, >>>> bool write); >>>> void (*fault)(target_physical_addr_t addr); >>>> }; >>>> >>> So I guess, after introduce this, the code logic in c_p_m_rw() will >>> look like this >>> >>> c_p_m_rw(dev_virt_addr, ...) >>> { >>> mr = phys_page_lookup(); >>> if (mr->iommu_ops) >>> real_addr = translate(dev_virt_addr,..); >>> >>> ptr = qemu_get_ram_ptr(real_addr); >>> memcpy(buf, ptr, sz); >>> } >>> >> >> Something like that. It will be a while loop, to allow for iommus >> strung in series. >> > Will model the system like the following: > > --.Introduce iommu address space. It will be the container of the > regions which are put under the management of iommu. > --.In the system address space, using alias-iommu-mrX with priority=1 > to expose iommu address space and obscure the overlapped regions. > -- Device's access to address manged by alias-iommu-mrX > c_p_m_rw(target_physical_addr_t addrA, ..) > { > while (len > 0) { > mr = phys_page_lookup(); > if (mr->iommu_ops) > addrB = translate(addrA,..); > > ptr = qemu_get_ram_ptr(addrB); > memcpy(buf, ptr, sz); > } > } > > Is it correct? iommus only apply to device accesses, not cpu accesses (as in cpu_p_m_w()). So we will need a generic dma function: typedef struct MemoryAddressSpace { MemoryRegion *root; PhysPageEntry phys_map; ... // linked list entry for list of all MemoryAddressSpaces } void memory_address_space_rw(MemoryAddressSpace *mas, ...) { look up mas->phys_map dispatch } void cpu_physical_memory_rw(...) { memory_address_space_rw(&system_memory, ...); } The snippet if (mr->iommu_ops) addrB = translate(addrA,..); needs to be a little more complicated. After translation, we need to look up the address again in a different phys_map. So a MemoryRegion that is an iommu needs to hold its own phys_map pointer for the lookup. But let's ignore the problem for now, we have too much on our plate. With a recursive big lock, there is no problem with iommus, yes? So as long as there is no intersection between converted devices and platforms with iommus, we're safe. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 9:16 ` Avi Kivity @ 2012-09-27 9:29 ` Paolo Bonzini 2012-09-27 9:34 ` Avi Kivity 2012-09-29 9:20 ` liu ping fan 1 sibling, 1 reply; 45+ messages in thread From: Paolo Bonzini @ 2012-09-27 9:29 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org Il 27/09/2012 11:16, Avi Kivity ha scritto: > needs to be a little more complicated. After translation, we need to > look up the address again in a different phys_map. So a MemoryRegion > that is an iommu needs to hold its own phys_map pointer for the lookup. > > But let's ignore the problem for now, we have too much on our plate. > With a recursive big lock, there is no problem with iommus, yes? So as > long as there is no intersection between converted devices and platforms > with iommus, we're safe. pSeries has both PCI and iommu, but I think no one should be using e1000 on a pSeries. virtio-net is per-spec not going through the iommu. Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 9:29 ` Paolo Bonzini @ 2012-09-27 9:34 ` Avi Kivity 2012-09-27 9:36 ` Paolo Bonzini 0 siblings, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-27 9:34 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org On 09/27/2012 11:29 AM, Paolo Bonzini wrote: > virtio-net is per-spec not going through the iommu. What's that smell? A feature bit? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 9:34 ` Avi Kivity @ 2012-09-27 9:36 ` Paolo Bonzini 2012-09-27 10:08 ` Avi Kivity 0 siblings, 1 reply; 45+ messages in thread From: Paolo Bonzini @ 2012-09-27 9:36 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org Il 27/09/2012 11:34, Avi Kivity ha scritto: >> > virtio-net is per-spec not going through the iommu. > What's that smell? A feature bit? Why is it a bad thing that it does not go through the iommu? Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 9:36 ` Paolo Bonzini @ 2012-09-27 10:08 ` Avi Kivity 2012-09-27 10:22 ` Paolo Bonzini 0 siblings, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-27 10:08 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org On 09/27/2012 11:36 AM, Paolo Bonzini wrote: > Il 27/09/2012 11:34, Avi Kivity ha scritto: >>> > virtio-net is per-spec not going through the iommu. >> What's that smell? A feature bit? > > Why is it a bad thing that it does not go through the iommu? You can't assign the device to nested guests or to guest userspace via vfio. Some guests may want to use the iommu even in the guest kernel context. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 10:08 ` Avi Kivity @ 2012-09-27 10:22 ` Paolo Bonzini 2012-09-27 10:48 ` Avi Kivity 0 siblings, 1 reply; 45+ messages in thread From: Paolo Bonzini @ 2012-09-27 10:22 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org Il 27/09/2012 12:08, Avi Kivity ha scritto: >>>>> virtio-net is per-spec not going through the iommu. >>> >> What's that smell? A feature bit? >> > >> > Why is it a bad thing that it does not go through the iommu? > You can't assign the device to nested guests or to guest userspace via > vfio. Some guests may want to use the iommu even in the guest kernel > context. All good points. However, using the iommu means you cannot use either vhost-net or (at least for now) per-device lock. So it closes some doors on performance improvements... I can imagine pSeries guys prefer to keep no iommu in virtio devices. Paolo ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 10:22 ` Paolo Bonzini @ 2012-09-27 10:48 ` Avi Kivity 0 siblings, 0 replies; 45+ messages in thread From: Avi Kivity @ 2012-09-27 10:48 UTC (permalink / raw) To: Paolo Bonzini Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, Anthony Liguori, qemu-devel@nongnu.org On 09/27/2012 12:22 PM, Paolo Bonzini wrote: > Il 27/09/2012 12:08, Avi Kivity ha scritto: >>>>>> virtio-net is per-spec not going through the iommu. >>>> >> What's that smell? A feature bit? >>> > >>> > Why is it a bad thing that it does not go through the iommu? >> You can't assign the device to nested guests or to guest userspace via >> vfio. Some guests may want to use the iommu even in the guest kernel >> context. > > All good points. > > However, using the iommu means you cannot use either vhost-net or (at > least for now) per-device lock. So it closes some doors on performance > improvements... I can imagine pSeries guys prefer to keep no iommu in > virtio devices. Eventually we'll have a kernel-emulated iommu for these use cases. We can use the shadow code to generate the gvioaddr -> gpa -> hpa translation (like we use shadow code to fold the ngpa -> gpa -> hpa translation into a single ngpa -> hpa translation with nested npt). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-27 9:16 ` Avi Kivity 2012-09-27 9:29 ` Paolo Bonzini @ 2012-09-29 9:20 ` liu ping fan 2012-09-30 8:13 ` Avi Kivity 1 sibling, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-29 9:20 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On Thu, Sep 27, 2012 at 5:16 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/27/2012 05:13 AM, liu ping fan wrote: >> On Mon, Sep 24, 2012 at 5:42 PM, Avi Kivity <avi@redhat.com> wrote: >>> On 09/24/2012 10:32 AM, liu ping fan wrote: >>>> On Mon, Sep 24, 2012 at 3:44 PM, Avi Kivity <avi@redhat.com> wrote: >>>>> On 09/24/2012 08:33 AM, liu ping fan wrote: >>>>>> On Wed, Sep 19, 2012 at 5:50 PM, Avi Kivity <avi@redhat.com> wrote: >>>>>> > On 09/19/2012 12:34 PM, Jan Kiszka wrote: >>>>>> >> >>>>>> >> What about the following: >>>>>> >> >>>>>> >> What we really need to support in practice is MMIO access triggers RAM >>>>>> >> access of device model. Scenarios where a device access triggers another >>>>>> >> MMIO access could likely just be rejected without causing troubles. >>>>>> >> >>>>>> >> So, when we dispatch a request to a device, we mark that the current >>>>>> >> thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does >>>>>> >> _not_ target RAM, ie. is another, nested MMIO request - independent of >>>>>> >> its destination. How much of the known issues would this solve? And what >>>>>> >> would remain open? >>>>>> > >>>>>> > Various iommu-like devices re-dispatch I/O, like changing endianness or >>>>>> > bitband. I don't know whether it targets I/O rather than RAM. >>>>>> > >>>>>> Have not found the exact code. But I think the call chain may look >>>>>> like this: dev mmio-handler --> c_p_m_rw() --> iommu mmio-handler --> >>>>>> c_p_m_rw() >>>>>> And I think you worry about the case for "c_p_m_rw() --> iommu >>>>>> mmio-handler". Right? How about introduce an member can_nest for >>>>>> MemoryRegionOps of iommu's mr? >>>>>> >>>>> >>>>> I would rather push the iommu logic into the memory API: >>>>> >>>>> memory_region_init_iommu(MemoryRegion *mr, const char *name, >>>>> MemoryRegion *target, MemoryRegionIOMMUOps *ops, >>>>> unsigned size) >>>>> >>>>> struct MemoryRegionIOMMUOps { >>>>> target_physical_addr_t (*translate)(target_physical_addr_t addr, >>>>> bool write); >>>>> void (*fault)(target_physical_addr_t addr); >>>>> }; >>>>> >>>> So I guess, after introduce this, the code logic in c_p_m_rw() will >>>> look like this >>>> >>>> c_p_m_rw(dev_virt_addr, ...) >>>> { >>>> mr = phys_page_lookup(); >>>> if (mr->iommu_ops) >>>> real_addr = translate(dev_virt_addr,..); >>>> >>>> ptr = qemu_get_ram_ptr(real_addr); >>>> memcpy(buf, ptr, sz); >>>> } >>>> >>> >>> Something like that. It will be a while loop, to allow for iommus >>> strung in series. >>> >> Will model the system like the following: >> >> --.Introduce iommu address space. It will be the container of the >> regions which are put under the management of iommu. >> --.In the system address space, using alias-iommu-mrX with priority=1 >> to expose iommu address space and obscure the overlapped regions. >> -- Device's access to address manged by alias-iommu-mrX >> c_p_m_rw(target_physical_addr_t addrA, ..) >> { >> while (len > 0) { >> mr = phys_page_lookup(); >> if (mr->iommu_ops) >> addrB = translate(addrA,..); >> >> ptr = qemu_get_ram_ptr(addrB); >> memcpy(buf, ptr, sz); >> } >> } >> >> Is it correct? > > iommus only apply to device accesses, not cpu accesses (as in > cpu_p_m_w()). So we will need a generic dma function: > Yes, during model it, I found that c_p_m_rw() operate on MMU's result , it like the translation result of IOMMU. But here the iommu, I think, is just a very limited device -- adjust bitband and endian, so it mapped address into self. > typedef struct MemoryAddressSpace { > MemoryRegion *root; > PhysPageEntry phys_map; > ... > // linked list entry for list of all MemoryAddressSpaces > } > > void memory_address_space_rw(MemoryAddressSpace *mas, ...) > { > look up mas->phys_map > dispatch > } > > void cpu_physical_memory_rw(...) > { > memory_address_space_rw(&system_memory, ...); > } > > The snippet > > if (mr->iommu_ops) > addrB = translate(addrA,..); > > needs to be a little more complicated. After translation, we need to > look up the address again in a different phys_map. So a MemoryRegion > that is an iommu needs to hold its own phys_map pointer for the lookup. > > But let's ignore the problem for now, we have too much on our plate. > With a recursive big lock, there is no problem with iommus, yes? So as Do we have iommus in qemu now, since there are no separate phys_maps for real address and dev's virt address, and I think the iommu is only needed by host, not guest, so need not emulated by qemu. If no, we can just reject the nested DMA, and the c_p_m_rw() can only be nested once, so if introduce a wrapper for c_p_m_rw(), we can avoid recursive big lock, right? Regards, pingfan > long as there is no intersection between converted devices and platforms > with iommus, we're safe. > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-29 9:20 ` liu ping fan @ 2012-09-30 8:13 ` Avi Kivity 2012-09-30 8:48 ` liu ping fan 2012-09-30 11:04 ` Blue Swirl 0 siblings, 2 replies; 45+ messages in thread From: Avi Kivity @ 2012-09-30 8:13 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On 09/29/2012 11:20 AM, liu ping fan wrote: > > Do we have iommus in qemu now, We do, but they're hacked into the scsi layer, see hw/sun4m_iommu.c. I don't know if it's a standalone iommu on real hardware or whether it is part of the HBA. > since there are no separate phys_maps > for real address and dev's virt address, and I think the iommu is only > needed by host, not guest, so need not emulated by qemu. Eventually we will emulate iommus for x86 too, so we need to consider them. > If no, we > can just reject the nested DMA, and the c_p_m_rw() can only be nested > once, so if introduce a wrapper for c_p_m_rw(), we can avoid > recursive big lock, right? Don't we need that for other reasons? If not, we can drop it for now. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-30 8:13 ` Avi Kivity @ 2012-09-30 8:48 ` liu ping fan 2012-09-30 11:18 ` Avi Kivity 2012-09-30 11:04 ` Blue Swirl 1 sibling, 1 reply; 45+ messages in thread From: liu ping fan @ 2012-09-30 8:48 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On Sun, Sep 30, 2012 at 4:13 PM, Avi Kivity <avi@redhat.com> wrote: > On 09/29/2012 11:20 AM, liu ping fan wrote: >> >> Do we have iommus in qemu now, > > We do, but they're hacked into the scsi layer, see hw/sun4m_iommu.c. I > don't know if it's a standalone iommu on real hardware or whether it is > part of the HBA. > >> since there are no separate phys_maps >> for real address and dev's virt address, and I think the iommu is only >> needed by host, not guest, so need not emulated by qemu. > > Eventually we will emulate iommus for x86 too, so we need to consider them. > For nested guest? The address translation is like nested mmu? >> If no, we >> can just reject the nested DMA, and the c_p_m_rw() can only be nested >> once, so if introduce a wrapper for c_p_m_rw(), we can avoid >> recursive big lock, right? > > Don't we need that for other reasons? If not, we can drop it for now. > Yes, there is another reason is about the unplug process as you suggest, DeviceState::uninit() can call del timer with nested biglock. Or as alternative, we del timer when unplug and check the valid of timer when mmio-dispatch in e1000. Regards, pingfan > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-30 8:48 ` liu ping fan @ 2012-09-30 11:18 ` Avi Kivity 0 siblings, 0 replies; 45+ messages in thread From: Avi Kivity @ 2012-09-30 11:18 UTC (permalink / raw) To: liu ping fan Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On 09/30/2012 10:48 AM, liu ping fan wrote: > On Sun, Sep 30, 2012 at 4:13 PM, Avi Kivity <avi@redhat.com> wrote: >> On 09/29/2012 11:20 AM, liu ping fan wrote: >>> >>> Do we have iommus in qemu now, >> >> We do, but they're hacked into the scsi layer, see hw/sun4m_iommu.c. I >> don't know if it's a standalone iommu on real hardware or whether it is >> part of the HBA. >> >>> since there are no separate phys_maps >>> for real address and dev's virt address, and I think the iommu is only >>> needed by host, not guest, so need not emulated by qemu. >> >> Eventually we will emulate iommus for x86 too, so we need to consider them. >> > For nested guest? The address translation is like nested mmu? Yes. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-30 8:13 ` Avi Kivity 2012-09-30 8:48 ` liu ping fan @ 2012-09-30 11:04 ` Blue Swirl 2012-09-30 11:17 ` Avi Kivity 1 sibling, 1 reply; 45+ messages in thread From: Blue Swirl @ 2012-09-30 11:04 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On Sun, Sep 30, 2012 at 8:13 AM, Avi Kivity <avi@redhat.com> wrote: > On 09/29/2012 11:20 AM, liu ping fan wrote: >> >> Do we have iommus in qemu now, > > We do, but they're hacked into the scsi layer, see hw/sun4m_iommu.c. I > don't know if it's a standalone iommu on real hardware or whether it is > part of the HBA. It's standalone or even part of CPU (some uniprocessor systems). IOMMU sits between memory and SBus, so all SBus devices (currently only ESP and Lance) use it. > >> since there are no separate phys_maps >> for real address and dev's virt address, and I think the iommu is only >> needed by host, not guest, so need not emulated by qemu. > > Eventually we will emulate iommus for x86 too, so we need to consider them. > >> If no, we >> can just reject the nested DMA, and the c_p_m_rw() can only be nested >> once, so if introduce a wrapper for c_p_m_rw(), we can avoid >> recursive big lock, right? > > Don't we need that for other reasons? If not, we can drop it for now. I don't think nested DMA is ever needed, it would be pretty obscure feature and it would need a pretty heavy implementation (recursion) in a real HW IOMMU. In theory the translated address may map to MMIO but that's different. > > -- > error compiling committee.c: too many arguments to function > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-30 11:04 ` Blue Swirl @ 2012-09-30 11:17 ` Avi Kivity 2012-09-30 11:48 ` Blue Swirl 0 siblings, 1 reply; 45+ messages in thread From: Avi Kivity @ 2012-09-30 11:17 UTC (permalink / raw) To: Blue Swirl Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On 09/30/2012 01:04 PM, Blue Swirl wrote: > On Sun, Sep 30, 2012 at 8:13 AM, Avi Kivity <avi@redhat.com> wrote: >> On 09/29/2012 11:20 AM, liu ping fan wrote: >>> >>> Do we have iommus in qemu now, >> >> We do, but they're hacked into the scsi layer, see hw/sun4m_iommu.c. I >> don't know if it's a standalone iommu on real hardware or whether it is >> part of the HBA. > > It's standalone or even part of CPU (some uniprocessor systems). IOMMU > sits between memory and SBus, so all SBus devices (currently only ESP > and Lance) use it. So, the current modelling is incorrect? I'd like to fix it as a way of getting proper iommu modelling, but I don't know how it should be done, or how to test it. > >> >>> since there are no separate phys_maps >>> for real address and dev's virt address, and I think the iommu is only >>> needed by host, not guest, so need not emulated by qemu. >> >> Eventually we will emulate iommus for x86 too, so we need to consider them. >> >>> If no, we >>> can just reject the nested DMA, and the c_p_m_rw() can only be nested >>> once, so if introduce a wrapper for c_p_m_rw(), we can avoid >>> recursive big lock, right? >> >> Don't we need that for other reasons? If not, we can drop it for now. > > I don't think nested DMA is ever needed, it would be pretty obscure > feature and it would need a pretty heavy implementation (recursion) in > a real HW IOMMU. In theory the translated address may map to MMIO but > that's different. Sure. But if we can get a model that works for everything at no extra cost, then why not? btw, the model of 'either we take the big lock recusrsively, or we drop the local lock before issuing dma' seems to cover nested iommus with any mix of big locks and little locks. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-30 11:17 ` Avi Kivity @ 2012-09-30 11:48 ` Blue Swirl 0 siblings, 0 replies; 45+ messages in thread From: Blue Swirl @ 2012-09-30 11:48 UTC (permalink / raw) To: Avi Kivity Cc: Jan Kiszka, Marcelo Tosatti, liu ping fan, qemu-devel@nongnu.org, Anthony Liguori, Paolo Bonzini On Sun, Sep 30, 2012 at 11:17 AM, Avi Kivity <avi@redhat.com> wrote: > On 09/30/2012 01:04 PM, Blue Swirl wrote: >> On Sun, Sep 30, 2012 at 8:13 AM, Avi Kivity <avi@redhat.com> wrote: >>> On 09/29/2012 11:20 AM, liu ping fan wrote: >>>> >>>> Do we have iommus in qemu now, >>> >>> We do, but they're hacked into the scsi layer, see hw/sun4m_iommu.c. I >>> don't know if it's a standalone iommu on real hardware or whether it is >>> part of the HBA. >> >> It's standalone or even part of CPU (some uniprocessor systems). IOMMU >> sits between memory and SBus, so all SBus devices (currently only ESP >> and Lance) use it. > > So, the current modelling is incorrect? I'd like to fix it as a way of > getting proper iommu modelling, but I don't know how it should be done, > or how to test it. The model (opaque IOMMU handle + ESP/Lance specific accessors) could be improved much, it predates all IOMMU discussions. Just grab any ISO (preferably a sparc32 one), try booting. If it fails, there's a problem because IOMMU is enabled by OpenBIOS. > >> >>> >>>> since there are no separate phys_maps >>>> for real address and dev's virt address, and I think the iommu is only >>>> needed by host, not guest, so need not emulated by qemu. >>> >>> Eventually we will emulate iommus for x86 too, so we need to consider them. >>> >>>> If no, we >>>> can just reject the nested DMA, and the c_p_m_rw() can only be nested >>>> once, so if introduce a wrapper for c_p_m_rw(), we can avoid >>>> recursive big lock, right? >>> >>> Don't we need that for other reasons? If not, we can drop it for now. >> >> I don't think nested DMA is ever needed, it would be pretty obscure >> feature and it would need a pretty heavy implementation (recursion) in >> a real HW IOMMU. In theory the translated address may map to MMIO but >> that's different. > > Sure. But if we can get a model that works for everything at no extra > cost, then why not? That's OK, I'm not opposing it. > > btw, the model of 'either we take the big lock recusrsively, or we drop > the local lock before issuing dma' seems to cover nested iommus with any > mix of big locks and little locks. > > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock 2012-09-19 9:34 ` Jan Kiszka 2012-09-19 9:50 ` Avi Kivity @ 2012-09-20 8:11 ` liu ping fan 1 sibling, 0 replies; 45+ messages in thread From: liu ping fan @ 2012-09-20 8:11 UTC (permalink / raw) To: Jan Kiszka Cc: Paolo Bonzini, Marcelo Tosatti, Avi Kivity, Anthony Liguori, qemu-devel@nongnu.org On Wed, Sep 19, 2012 at 5:34 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote: > On 2012-09-19 11:00, liu ping fan wrote: >> On Wed, Sep 19, 2012 at 4:06 PM, Avi Kivity <avi@redhat.com> wrote: >>> On 09/19/2012 06:02 AM, liu ping fan wrote: >>>> Currently, cpu_physical_memory_rw() can be used directly or indirectly >>>> by mmio-dispatcher to access other devices' memory region. This can >>>> cause some problem when adopting device's private lock. >>>> >>>> Back ground refer to: >>>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html >>>> For lazy, just refer to: >>>> http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html >>>> >>>> >>>> --1st. the recursive lock of biglock. >>>> If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have >>>> the following (section of the whole call chain, and with >>>> private_lockA): >>>> lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > >>>> Before c_p_m_rw(), we drop private_lockA to anti the possibly of >>>> deadlock. But we can not anti the nested of this chain or calling to >>>> another lockB-mmio-dispatcher. So we can not avoid the possibility of >>>> nested lock of biglock. And another important factor is that we break >>>> the lock sequence: private_lock-->biglock. >>>> All of these require us to push biglock's holding into c_p_m_rw(), the >>>> wrapper can not give help. >>> >>> I agree that this is unavoidable. >>> >>>> >>>> --2nd. c_p_m_rw(), sync or async? >>>> >>>> IF we convert all of the device to be protected by refcount, then we can have >>>> //no big lock >>>> c_p_m_rw() >>>> { >>>> devB->ref++; >>>> { >>>> --------------------------------------->pushed onto another thread. >>>> lock_privatelock >>>> mr->ops->write(); >>>> unlock_privatelock >>>> } >>>> wait_for_completion(); >>>> devB->ref--; >>>> } >>>> This model can help c_p_m_rw() present as a SYNC API. But currently, >>>> we mix biglock and private lock together, and wait_for_completion() >>>> maybe block the release of big lock, which finally causes deadlock. So >>>> we can not simply rely on this model. >>>> Instead, we need to classify the calling scene into three cases: >>>> case1. lockA--dispatcher ---> lockB-dispatcher //can use >>>> async+completion model >>>> case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can >>>> cause the nested lock of biglock >>>> case3. biglock-dispacher ---> lockB-dispatcher // async to avoid >>>> the lock sequence problem, (as to completion, it need to be placed >>>> outside the top level biglock, and it is hard to do so. Suggest to >>>> change to case 1. Or at present, just leave it async) >>>> >>>> This new model will require the biglock can be nested. >>> >>> I think changing to an async model is too complicated. It's difficult >>> enough already. Isn't dropping private locks + recursive big locks >>> sufficient? >>> >> I think that "dropping private locks + recursive big locks" just cover >> case 2. And most of the important, it dont describe case3 which break >> the rule of lock sequence "private-lock --> biglock". Scene: >> devA_lock-->(devX_with-biglock--->devB_lock). >> I just want to classify and post these cases to discuss. Maybe we can >> achieve without async. > > What about the following: > > What we really need to support in practice is MMIO access triggers RAM > access of device model. Scenarios where a device access triggers another > MMIO access could likely just be rejected without causing troubles. > > So, when we dispatch a request to a device, we mark that the current > thread is in a MMIO dispatch and reject any follow-up c_p_m_rw that does > _not_ target RAM, ie. is another, nested MMIO request - independent of > its destination. How much of the known issues would this solve? And what > would remain open? > This make things more easy and realistic. Will dig into code to see whether there is exception or not. Thanks and regards, pingfan > Jan > > -- > Siemens AG, Corporate Technology, CT RTC ITP SDP-DE > Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2012-09-30 11:48 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-19 3:02 [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock liu ping fan 2012-09-19 8:06 ` Avi Kivity 2012-09-19 9:00 ` liu ping fan 2012-09-19 9:07 ` Avi Kivity 2012-09-19 9:11 ` liu ping fan 2012-09-19 9:14 ` Paolo Bonzini 2012-09-19 9:19 ` liu ping fan 2012-09-19 9:23 ` Avi Kivity 2012-09-19 9:27 ` Jan Kiszka 2012-09-19 9:28 ` Jan Kiszka 2012-09-20 7:51 ` liu ping fan 2012-09-20 7:54 ` Paolo Bonzini 2012-09-20 8:09 ` liu ping fan 2012-09-20 8:27 ` Paolo Bonzini 2012-09-20 9:07 ` Avi Kivity 2012-09-21 7:27 ` liu ping fan 2012-09-21 8:21 ` Paolo Bonzini 2012-09-19 9:21 ` Avi Kivity 2012-09-19 9:51 ` Paolo Bonzini 2012-09-19 10:06 ` Avi Kivity 2012-09-19 10:19 ` Paolo Bonzini 2012-09-19 10:27 ` Avi Kivity 2012-09-19 9:34 ` Jan Kiszka 2012-09-19 9:50 ` Avi Kivity 2012-09-19 10:18 ` Jan Kiszka 2012-09-24 6:33 ` liu ping fan 2012-09-24 7:44 ` Avi Kivity 2012-09-24 8:32 ` liu ping fan 2012-09-24 9:42 ` Avi Kivity 2012-09-27 3:13 ` liu ping fan 2012-09-27 9:16 ` Avi Kivity 2012-09-27 9:29 ` Paolo Bonzini 2012-09-27 9:34 ` Avi Kivity 2012-09-27 9:36 ` Paolo Bonzini 2012-09-27 10:08 ` Avi Kivity 2012-09-27 10:22 ` Paolo Bonzini 2012-09-27 10:48 ` Avi Kivity 2012-09-29 9:20 ` liu ping fan 2012-09-30 8:13 ` Avi Kivity 2012-09-30 8:48 ` liu ping fan 2012-09-30 11:18 ` Avi Kivity 2012-09-30 11:04 ` Blue Swirl 2012-09-30 11:17 ` Avi Kivity 2012-09-30 11:48 ` Blue Swirl 2012-09-20 8:11 ` liu ping fan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).