From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LBu8Q-0000PN-LL
	for qemu-devel@nongnu.org; Sun, 14 Dec 2008 11:48:02 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LBu8P-0000OE-8o
	for qemu-devel@nongnu.org; Sun, 14 Dec 2008 11:48:02 -0500
Received: from [199.232.76.173] (port=35122 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LBu8P-0000O9-09
	for qemu-devel@nongnu.org; Sun, 14 Dec 2008 11:48:01 -0500
Received: from mx2.redhat.com ([66.187.237.31]:40682)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <aarcange@redhat.com>) id 1LBu8O-0003iW-Fs
	for qemu-devel@nongnu.org; Sun, 14 Dec 2008 11:48:00 -0500
Date: Sun, 14 Dec 2008 17:47:52 +0100
From: Andrea Arcangeli <aarcange@redhat.com>
Message-ID: <20081214164751.GF30537@random.random>
References: <cc5d812eb9369a7ad2ef.1229105804@duo.random>
	<4942B841.6010900@codemonkey.ws>
	<20081213143944.GD30537@random.random>
	<4943E6F9.1050001@codemonkey.ws>
	<20081213165306.GE30537@random.random>
	<4944251D.8080109@codemonkey.ws>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4944251D.8080109@codemonkey.ws>
Subject: [Qemu-devel] Re: [PATCH 2 of 5] add can_dma/post_dma for direct IO
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: chrisw@redhat.com, avi@redhat.com, Gerd Hoffmann <kraxel@redhat.com>, kvm@vger.kernel.org, qemu-devel@nongnu.org

On Sat, Dec 13, 2008 at 03:11:57PM -0600, Anthony Liguori wrote:
> cause an overflow.  You will naturally validate this in the map() function 
> because you cannot map something that is greater than can fit in a void *.  

When you told me to pass ram_addr_t instead of size_t in my patch, I
didn't mean it was just for validating that callers would comply with
the clear dma interface. With my patch I was going to truly support
dma operations larger than 4G on 32bit host and 64bit guest, but only
with mmio regions as destination, and with a max overhead of the
max-bounce-size of 1M.

To me map/unmap looks backwards. There's absolutely no point at all to
pretend that RAM isn't always mapped. Furthermore bouncing inside that
layer (at least with the api that you're proposing that can't handle
partial I/O and restart) is obviously broken design.

Once memory hotplug will emerge we've just to add a read write lock
before invoking can-dma/post-dma and stuff. There's no reason to ever
call anything after a read dma completed.

After my stuff would work, my next step would be to get rid entirely
of that per-page array that translates a ram_addr_t to a virtual
address and replace it with a rbtree of linear ranges, and then the
iovec would need to be passed down to exec.c so that it would be
filled with direct dma even if the whole range isn't linear. And in
average a single lookup of the tree would return us immediate
information.

I'm ok to support a not entirely flat ram space, but pretending to
support an API that requires to mangle host ptes (and sptes on kvm
case) every time there's a dma is entirely overkill and backwards,
besides preventing you to bounce sanely if you go over mmio regions
and preventing you as well to dma >4G space of mmio on 32bit build
with 64bit guest.

The whole concept of having to map something is flawed, there's
nothing to map. At most you've to take a read lock to prevent future
memory hotplug to change the memory layout from under you, but the
concept of mapping has nothing to do with that. RAM is always mapped,
and mmio has to be emulated anyway so it's worthless to map it.