From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:46317)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1T7k2b-0000AI-W3
	for qemu-devel@nongnu.org; Sat, 01 Sep 2012 05:30:59 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1T7k2a-0006Z3-Lz
	for qemu-devel@nongnu.org; Sat, 01 Sep 2012 05:30:57 -0400
Received: from mx1.redhat.com ([209.132.183.28]:6372)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1T7k2a-0006Yw-Cm
	for qemu-devel@nongnu.org; Sat, 01 Sep 2012 05:30:56 -0400
Message-ID: <5041D5CA.5010200@redhat.com>
Date: Sat, 01 Sep 2012 02:30:50 -0700
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <1345801763-24227-1-git-send-email-qemulist@gmail.com>
	<1345801763-24227-11-git-send-email-qemulist@gmail.com>
	<503792F1.4090109@redhat.com>
	<CAJnKYQk2zDFiCmr3Qo=Cj4B21oWd0FoC8c2N3OG3ujm=pMoD9Q@mail.gmail.com>
	<503B1B4B.6050108@redhat.com> <503B260E.70607@web.de>
	<503BA9BC.5010207@redhat.com> <503BAAF0.2020103@siemens.com>
	<503BB7E7.4050709@redhat.com> <503BB9C5.3030605@siemens.com>
	<503BBA77.4090006@redhat.com> <503BBED4.9050705@siemens.com>
	<503BC1EE.4060608@redhat.com> <503BCCA1.10403@siemens.com>
	<CAJnKYQmP7-BXG=qgOc-fPFYdVrig-jVHASPDRvYrStHMmqHsog@mail.gmail.com>
	<CAJnKYQnPXAf0T6vGg3t60b8iC2isn4vSFys9cqHnUHHmnm_MkQ@mail.gmail.com>
	<503C9275.5040105@siemens.com> <503E4FF4.7060506@redhat.com>
	<503E51BB.9070308@siemens.com> <503E5422.5050805@redhat.com>
	<503E5637.4040701@siemens.com> <5041C7FE.5090306@redhat.com>
	<5041CDEC.4070105@web.de>
In-Reply-To: <5041CDEC.4070105@web.de>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 10/10] qdev: fix create in place obj's life
	cycle problem
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jan Kiszka <jan.kiszka@web.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Liu Ping Fan <pingfank@linux.vnet.ibm.com>, liu ping fan <qemulist@gmail.com>, Anthony Liguori <anthony@codemonkey.ws>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

On 09/01/2012 01:57 AM, Jan Kiszka wrote:
> On 2012-09-01 10:31, Avi Kivity wrote:
> > On 08/29/2012 10:49 AM, Jan Kiszka wrote:
> >>>
> >>> Let's experiment with refcounting MemoryRegion.  We can move the entire
> >>> contents of MemoryRegion to MemoryRegionImpl, add a reference count (to
> >>> MemoryRegionImpl), and change MemoryRegion to contain a pointer to the
> >>> refcounted MemoryRegionImpl:
> >>>
> >>> struct MemoryRegion {
> >>>     struct MemoryRegionImpl *impl;
> >>> };
> >>>
> >>> struct MemoryRegionImpl {
> >>>     atomic int refs;
> >>>     ...
> >>> };
> >>>
> >>> The memory core can then store MemoryRegion copies (with elevated
> >>> refcounts) instead of pointers.  Devices can destroy MemoryRegions at
> >>> any time, the implementation will not go away.  However, what of the
> >>> opaque stored in MemoryRegionImpl?  It becomes a dangling pointer.
> >>>
> >>> One way out is to add a lock to MemoryRegionImpl.  Dispatch takes the
> >>> lock, examines the ->enabled member, and bails out if it is cleared. 
> >>> The (MemoryRegion, not MemoryRegionImpl) destructor also takes the lock,
> >>> clears ->enabled,  releases the lock, and drops the reference.
> >>
> >> That means holding the MemoryRegionImpl lock across the handler call?
> > 
> > Blech.  As you said on the other side of this thread, we must not take a
> > coarse grained lock within a fine grained one, and
> > MemoryRegionImpl::lock is as fine as they get.
>
> Not sure what you compare here. MemoryRegionImpl::lock would be per
> memory region, so with finer scope than the BQL but with similar scope
> like a per-device lock.

The dispatch may acquire the bql (in fact most will, unless we convert
everything).  A design that doesn't allow this is broken.  Even the
device lock is more coarse grained (since it covers all regions) and is
taken in a deadlock scenario as region manipulation will be done under
the device lock.  You say we can detect this, but I dislike it.

> > 
> >>>
> >>> The advantage to this scheme is that all changes are localized to the
> >>> memory core, no need for a full sweep.  It is a little complicated, but
> >>> we may be able to simplify it (or find another one).
> >>
> >> May work. We just need to detect if memory region tries to delete itself
> >> from its handler to prevent the deadlock.
> > 
> > Those types of hacks are fragile.  IMO it just demonstrates what I said
> > earlier (then tried to disprove with this): if we call an opaque's
> > method, we must refcount or otherwise lock that opaque.  No way around it.
>
> But that still doesn't solve the problem that we need to lock down the
> *state* of the opaque during dispatch /wrt to memory region changes.
> Just ensuring its existence is not enough unless we declare memory
> region transactions to be asynchronous - and adapt all users.

I expect no users will need change.

Changing the region offset has no effect on dispatch (in fact the region
itself doesn't change).  Changing subregions is fine.  Disabling a
region concurrently with access is the only potential problem, but this
is rare, and I expect it to just work.  We will have to audit all users,
yes.

> >>>> + wait for refcount(object) == 0 in deregister(object). That's what I'm
> >>>> proposing.
> >>>
> >>> Consider timer_del() called from a timer callback.  It's often not doable.
> >>
> >> This is inherently synchronous already (when working against the same
> >> alarm timer backend). We can detect this in timer_del and skip waiting,
> >> as in the above scenario.
> > 
> > It can always be broken.  The timer callback takes the device lock to
> > update the device.  The device mmio handler, holding the device lock,
> > takes the timer lock to timer_mod.  Deadlock.
>
> Well, how is this solved in Linux? By waiting on the callback in
> hrtimer_cancel. Not by wait_on_magic_opaque (well, there is even no
> opaque in the hrtimer API).

I don't consider that busy loop an elegant solution.

The original proposal mirrors dcache.  You can perform operations on a
dentry even after it is unlinked.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.