From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:43578) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TBPWf-00014L-VC for qemu-devel@nongnu.org; Tue, 11 Sep 2012 08:25:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TBPWZ-0002Ih-Ag for qemu-devel@nongnu.org; Tue, 11 Sep 2012 08:25:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41502) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TBPWZ-0002Hs-0V for qemu-devel@nongnu.org; Tue, 11 Sep 2012 08:25:03 -0400 Message-ID: <504F2D97.3000705@redhat.com> Date: Tue, 11 Sep 2012 15:24:55 +0300 From: Avi Kivity MIME-Version: 1.0 References: <1347349912-15611-1-git-send-email-qemulist@gmail.com> <1347349912-15611-11-git-send-email-qemulist@gmail.com> <504EF7CC.6040900@redhat.com> <504F0B17.5090703@siemens.com> In-Reply-To: <504F0B17.5090703@siemens.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH V3 10/11] vcpu: introduce lockmap List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Paolo Bonzini , Marcelo Tosatti , liu ping fan , Anthony Liguori , "qemu-devel@nongnu.org" On 09/11/2012 12:57 PM, Jan Kiszka wrote: > On 2012-09-11 11:44, liu ping fan wrote: >> On Tue, Sep 11, 2012 at 4:35 PM, Avi Kivity wrote: >>> On 09/11/2012 10:51 AM, Liu Ping Fan wrote: >>>> From: Liu Ping Fan >>>> >>>> The func call chain can suffer from recursively hold >>>> qemu_mutex_lock_iothread. We introduce lockmap to record the >>>> lock depth. >>> >>> What is the root cause? io handlers initiating I/O? >>> >> cpu_physical_memory_rw() can be called nested, and when called, it can >> be protected by no-lock/device lock/ big-lock. >> I think without big lock, io-dispatcher should face the same issue. >> As to main-loop, have not carefully consider, but at least, dma-helper >> will call cpu_physical_memory_rw() with big-lock. > > That is our core problem: inconsistent invocation of existing services > /wrt locking. For portio, I was lucky that there is no nesting and I was > able to drop the big lock around all (x86) call sites. But MMIO is way > more tricky due to DMA nesting. Maybe we need to switch to a continuation style. Instead of expecting cpu_physical_memory_rw() to complete synchronously, it becomes an asynchronous call and you provide it with a completion. That means devices which use it are forced to drop the lock in between. Block and network clients will be easy to convert since they already use APIs that drop the lock (except for accessing the descriptors). > We could try to introduce a different version of cpu_physical_memory_rw, > cpu_physical_memory_rw_unlocked. But the problem remains that an MMIO > request can trigger the very same access in a nested fashion, and we > will have to detect this to avoid locking up QEMU (locking up the guest > might be OK). An async version of c_p_m_rw() will just cause a completion to bounce around, consuming cpu but not deadlocking anything. If we can keep a count of the bounces, we might be able to stall it indefinitely or at least ratelimit it. -- error compiling committee.c: too many arguments to function