From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:48282) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T6m37-0001xY-2r for qemu-devel@nongnu.org; Wed, 29 Aug 2012 13:27:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T6m35-0003Bp-Qu for qemu-devel@nongnu.org; Wed, 29 Aug 2012 13:27:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:20013) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T6m35-0003Be-Ii for qemu-devel@nongnu.org; Wed, 29 Aug 2012 13:27:27 -0400 Message-ID: <503E50F8.5000405@redhat.com> Date: Wed, 29 Aug 2012 10:27:20 -0700 From: Avi Kivity MIME-Version: 1.0 References: <1345801763-24227-1-git-send-email-qemulist@gmail.com> <1345801763-24227-11-git-send-email-qemulist@gmail.com> <503792F1.4090109@redhat.com> <503B1B4B.6050108@redhat.com> <503B260E.70607@web.de> <503BA9BC.5010207@redhat.com> <503BAAF0.2020103@siemens.com> <503BB7E7.4050709@redhat.com> <503BB9C5.3030605@siemens.com> <503BBA77.4090006@redhat.com> <503BBED4.9050705@siemens.com> <503BC1EE.4060608@redhat.com> <503BCCA1.10403@siemens.com> <503BDE63.7090602@redhat.com> <503C185B.4030401@web.de> <503E4DBC.6080802@redhat.com> <503E4F8C.5070901@siemens.com> In-Reply-To: <503E4F8C.5070901@siemens.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life cycle problem List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Paolo Bonzini , Liu Ping Fan , liu ping fan , Anthony Liguori , "qemu-devel@nongnu.org" On 08/29/2012 10:21 AM, Jan Kiszka wrote: > On 2012-08-29 19:13, Avi Kivity wrote: > > On 08/27/2012 06:01 PM, Jan Kiszka wrote: > >> On 2012-08-27 22:53, Avi Kivity wrote: > >>> On 08/27/2012 12:38 PM, Jan Kiszka wrote: > >>>>>> Even worse, apply > >>>>>> restrictions on how the dispatched objects, the regions, have to be > >>>>>> treated because of this. > >>>>> > >>>>> Please elaborate. > >>>> > >>>> The fact that you can't manipulate a memory region object arbitrarily > >>>> after removing it from the mapping because you track the references to > >>>> the object that the region points to, not the region itself. The region > >>>> remains in use by the dispatching layer and potentially the called > >>>> device, even after deregistration. > >>> > >>> That object will be a container_of() the region, usually literally but > >>> sometimes only in spirit. Reference counting the regions means they > >>> cannot be embedded into other objects any more. > >> > >> I cannot follow the logic of the last sentence. Reference counting just > >> means that we track if there are users left, not necessarily that we > >> perform asynchronous releases. We can simply wait for those users to > >> complete. > > > > I don't see how. Suppose you add a reference count to MemoryRegion. > > How do you delay its containing object's destructor from running? Do > > you iterate over all member MemoryRegion and examine their reference counts? > > memory_region_transaction_commit (or calls that trigger this) will have > to wait. That is required as the caller may start fiddling with the > region right afterward. However, it may be running within an mmio dispatch, so it may wait forever. Ignoring this, how does it wait? sleep()? or wait for a completion? > > > > Usually a reference count controls the lifetime of the reference counted > > object, what are you suggesting here? > > To synchronize on reference count going to zero. Quite unorthodox. Do you have real life examples? > Or all readers leaving > the read-side critical sections. This is rcu. But again, we may be in an rcu read-side critical section while needing to wait. In fact this was what I suggested in the first place, until Marcelo pointed out the deadlock, so we came up with the refcount. > > > > >>> > >>> We can probably figure out a way to flush out accesses. After switching > >>> to rcu, for example, all we need is synchronize_rcu() in a > >>> non-deadlocking place. But my bet is that it will not be needed. > >> > >> If you properly flush out accesses, you don't need to track at device > >> level anymore - and mess with abstraction layers. That's my whole point. > > > > To flush out an access you need either rwlock_write_lock() or > > synchronize_rcu() (depending on the implementation). But neither of > > these can be run from an rcu read-side critical section or > > rwlock_read_lock(). > > > > You could defer the change to a bottom half, but if the hardware demands > > that the change be complete before returning, that doesn't work. > > Right, therefore synchronous transactions. I don't follow. Synchronous transactions mean you can't synchronize_rcu() or upgrade the lock or wait for the refcount. That's the source of the problem! -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.