From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48263)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VLyii-000828-LJ
	for qemu-devel@nongnu.org; Tue, 17 Sep 2013 13:05:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VLyic-0007qD-4A
	for qemu-devel@nongnu.org; Tue, 17 Sep 2013 13:05:48 -0400
Received: from mx1.redhat.com ([209.132.183.28]:49057)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VLyib-0007q6-De
	for qemu-devel@nongnu.org; Tue, 17 Sep 2013 13:05:41 -0400
Received: from int-mx12.intmail.prod.int.phx2.redhat.com
	(int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r8HH5eD1022992
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <qemu-devel@nongnu.org>; Tue, 17 Sep 2013 13:05:41 -0400
Date: Tue, 17 Sep 2013 20:07:53 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20130917170752.GA20986@redhat.com>
References: <1378211609-16121-1-git-send-email-pbonzini@redhat.com>
	<20130917124724.GA18965@redhat.com> <52386A29.9090908@redhat.com>
	<20130917144541.GA19882@redhat.com> <52387840.8090405@redhat.com>
	<20130917155909.GA20460@redhat.com> <52387FBF.4050504@redhat.com>
	<20130917162928.GA20672@redhat.com> <52388A3D.4090909@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <52388A3D.4090909@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory
 regions to instance_finalize
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org

On Tue, Sep 17, 2013 at 06:58:37PM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 18:29, Michael S. Tsirkin ha scritto:
> > > BTW, qemu_del_nic is another one that I forgot to mention.  You could
> > > have MMIO that triggers a transmit while the device is going down, for
> > > example.
> > 
> > Wait a second.  This API simply does not make sense.
> > If region is not visible it's MMIO really mustn't trigger,
> > exit or no exit.  Disabling region and still getting op callbacks
> > afterwards is not what any caller of this API expects.
> > 
> > I'm not sure what to do about the bounce buffer thing
> > but it needs to be fixed some other way without
> > breaking API.
> 
> I don't think it's breaking the API.  The very same thing can happen
> with RAM.  The only difference is that MMIO calls ops.

We can argue about RAM but getting callback after disable is
not really sane.

> Also, this problem is subject to race conditions from buggy or
> misbehaving guests.  If you want to have any hope of breaking devices
> free of the BQL and do "simple" register I/O without taking a lock,
> there is simply no precise moment to stop MMIO at.

I don't see why can't disable MR flush whatever is outstanding.


> All these problems do not happen in real hardware because real hardware
> has buffers between the PHY and DMA circuitries, and because bus master
> transactions transfer few bytes at a time (for example in PCI even when
> a device does burst transactions, the other party can halt them with
> such a small granularity).  A device can be quiesced in a matter of
> microseconds, and other times (the time for the OS to react to hotplug
> requests, the time for the driver to shut down, the time for the human
> to physically unplug the connector) can be several order of magnitudes
> larger.

They don't happen on real hardware because once you disable
memory in a PCI device, it does not accept memory
transactions.

> Instead we have the opposite scenario, because we want to have as few
> buffers as possible and map large amounts of memory (even 4K used by the
> bounce buffer is comparatively large) for the host OS's benefit.  When
> we do so, and the host backend fails (e.g. a disk is on NFS and there is
> a networking problem), memory can remain mapped for a long time.

I don't see why is this a problem.
So memory disable will take a long time.
Who cares? It's not data path.

> DMA-to-MMIO may be a theoretical problems only, but if we don't cover it
> we have a bogus solution to the problem, because exactly the same thing
> can and will happen for memory hot-unplug.
> 
> Paolo

We need to cover it without breaking APIs.

After memory_region_del_subregion returns,
it's a promise that there will not be accesses
to the region.

So I'm not even sure we really need to move destroy to finalize anymore ...

-- 
MST