From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:48643) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEFJO-0004O9-CK for qemu-devel@nongnu.org; Wed, 19 Sep 2012 04:07:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TEFJG-0003Uk-Q0 for qemu-devel@nongnu.org; Wed, 19 Sep 2012 04:07:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33751) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TEFJG-0003Ua-GN for qemu-devel@nongnu.org; Wed, 19 Sep 2012 04:07:02 -0400 Message-ID: <50597D1F.3070607@redhat.com> Date: Wed, 19 Sep 2012 11:06:55 +0300 From: Avi Kivity MIME-Version: 1.0 References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [big lock] Discussion about the convention of device's DMA each other after breaking down biglock List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: liu ping fan Cc: Jan Kiszka , Marcelo Tosatti , qemu-devel@nongnu.org, Anthony Liguori , Paolo Bonzini On 09/19/2012 06:02 AM, liu ping fan wrote: > Currently, cpu_physical_memory_rw() can be used directly or indirectly > by mmio-dispatcher to access other devices' memory region. This can > cause some problem when adopting device's private lock. > > Back ground refer to: > http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01481.html > For lazy, just refer to: > http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01878.html > > > --1st. the recursive lock of biglock. > If we leave c_p_m_rw() as it is, ie, no lock inside. Then we can have > the following (section of the whole call chain, and with > private_lockA): > lockA-mmio-dispatcher --> hold biglock -- >c_p_m_rw() --- > > Before c_p_m_rw(), we drop private_lockA to anti the possibly of > deadlock. But we can not anti the nested of this chain or calling to > another lockB-mmio-dispatcher. So we can not avoid the possibility of > nested lock of biglock. And another important factor is that we break > the lock sequence: private_lock-->biglock. > All of these require us to push biglock's holding into c_p_m_rw(), the > wrapper can not give help. I agree that this is unavoidable. > > --2nd. c_p_m_rw(), sync or async? > > IF we convert all of the device to be protected by refcount, then we can have > //no big lock > c_p_m_rw() > { > devB->ref++; > { > --------------------------------------->pushed onto another thread. > lock_privatelock > mr->ops->write(); > unlock_privatelock > } > wait_for_completion(); > devB->ref--; > } > This model can help c_p_m_rw() present as a SYNC API. But currently, > we mix biglock and private lock together, and wait_for_completion() > maybe block the release of big lock, which finally causes deadlock. So > we can not simply rely on this model. > Instead, we need to classify the calling scene into three cases: > case1. lockA--dispatcher ---> lockB-dispatcher //can use > async+completion model > case2. lockA--dispatcher ---> biglock-dispatcher // sync, but can > cause the nested lock of biglock > case3. biglock-dispacher ---> lockB-dispatcher // async to avoid > the lock sequence problem, (as to completion, it need to be placed > outside the top level biglock, and it is hard to do so. Suggest to > change to case 1. Or at present, just leave it async) > > This new model will require the biglock can be nested. I think changing to an async model is too complicated. It's difficult enough already. Isn't dropping private locks + recursive big locks sufficient? -- error compiling committee.c: too many arguments to function