From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46317) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T7k2b-0000AI-W3 for qemu-devel@nongnu.org; Sat, 01 Sep 2012 05:30:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T7k2a-0006Z3-Lz for qemu-devel@nongnu.org; Sat, 01 Sep 2012 05:30:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:6372) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T7k2a-0006Yw-Cm for qemu-devel@nongnu.org; Sat, 01 Sep 2012 05:30:56 -0400 Message-ID: <5041D5CA.5010200@redhat.com> Date: Sat, 01 Sep 2012 02:30:50 -0700 From: Avi Kivity MIME-Version: 1.0 References: <1345801763-24227-1-git-send-email-qemulist@gmail.com> <1345801763-24227-11-git-send-email-qemulist@gmail.com> <503792F1.4090109@redhat.com> <503B1B4B.6050108@redhat.com> <503B260E.70607@web.de> <503BA9BC.5010207@redhat.com> <503BAAF0.2020103@siemens.com> <503BB7E7.4050709@redhat.com> <503BB9C5.3030605@siemens.com> <503BBA77.4090006@redhat.com> <503BBED4.9050705@siemens.com> <503BC1EE.4060608@redhat.com> <503BCCA1.10403@siemens.com> <503C9275.5040105@siemens.com> <503E4FF4.7060506@redhat.com> <503E51BB.9070308@siemens.com> <503E5422.5050805@redhat.com> <503E5637.4040701@siemens.com> <5041C7FE.5090306@redhat.com> <5041CDEC.4070105@web.de> In-Reply-To: <5041CDEC.4070105@web.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Paolo Bonzini , Liu Ping Fan , liu ping fan , Anthony Liguori , "qemu-devel@nongnu.org" On 09/01/2012 01:57 AM, Jan Kiszka wrote: > On 2012-09-01 10:31, Avi Kivity wrote: > > On 08/29/2012 10:49 AM, Jan Kiszka wrote: > >>> > >>> Let's experiment with refcounting MemoryRegion. We can move the entire > >>> contents of MemoryRegion to MemoryRegionImpl, add a reference count (to > >>> MemoryRegionImpl), and change MemoryRegion to contain a pointer to the > >>> refcounted MemoryRegionImpl: > >>> > >>> struct MemoryRegion { > >>> struct MemoryRegionImpl *impl; > >>> }; > >>> > >>> struct MemoryRegionImpl { > >>> atomic int refs; > >>> ... > >>> }; > >>> > >>> The memory core can then store MemoryRegion copies (with elevated > >>> refcounts) instead of pointers. Devices can destroy MemoryRegions at > >>> any time, the implementation will not go away. However, what of the > >>> opaque stored in MemoryRegionImpl? It becomes a dangling pointer. > >>> > >>> One way out is to add a lock to MemoryRegionImpl. Dispatch takes the > >>> lock, examines the ->enabled member, and bails out if it is cleared. > >>> The (MemoryRegion, not MemoryRegionImpl) destructor also takes the lock, > >>> clears ->enabled, releases the lock, and drops the reference. > >> > >> That means holding the MemoryRegionImpl lock across the handler call? > > > > Blech. As you said on the other side of this thread, we must not take a > > coarse grained lock within a fine grained one, and > > MemoryRegionImpl::lock is as fine as they get. > > Not sure what you compare here. MemoryRegionImpl::lock would be per > memory region, so with finer scope than the BQL but with similar scope > like a per-device lock. The dispatch may acquire the bql (in fact most will, unless we convert everything). A design that doesn't allow this is broken. Even the device lock is more coarse grained (since it covers all regions) and is taken in a deadlock scenario as region manipulation will be done under the device lock. You say we can detect this, but I dislike it. > > > >>> > >>> The advantage to this scheme is that all changes are localized to the > >>> memory core, no need for a full sweep. It is a little complicated, but > >>> we may be able to simplify it (or find another one). > >> > >> May work. We just need to detect if memory region tries to delete itself > >> from its handler to prevent the deadlock. > > > > Those types of hacks are fragile. IMO it just demonstrates what I said > > earlier (then tried to disprove with this): if we call an opaque's > > method, we must refcount or otherwise lock that opaque. No way around it. > > But that still doesn't solve the problem that we need to lock down the > *state* of the opaque during dispatch /wrt to memory region changes. > Just ensuring its existence is not enough unless we declare memory > region transactions to be asynchronous - and adapt all users. I expect no users will need change. Changing the region offset has no effect on dispatch (in fact the region itself doesn't change). Changing subregions is fine. Disabling a region concurrently with access is the only potential problem, but this is rare, and I expect it to just work. We will have to audit all users, yes. > >>>> + wait for refcount(object) == 0 in deregister(object). That's what I'm > >>>> proposing. > >>> > >>> Consider timer_del() called from a timer callback. It's often not doable. > >> > >> This is inherently synchronous already (when working against the same > >> alarm timer backend). We can detect this in timer_del and skip waiting, > >> as in the above scenario. > > > > It can always be broken. The timer callback takes the device lock to > > update the device. The device mmio handler, holding the device lock, > > takes the timer lock to timer_mod. Deadlock. > > Well, how is this solved in Linux? By waiting on the callback in > hrtimer_cancel. Not by wait_on_magic_opaque (well, there is even no > opaque in the hrtimer API). I don't consider that busy loop an elegant solution. The original proposal mirrors dcache. You can perform operations on a dentry even after it is unlinked. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.