From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1L68Pq-0001Vq-Jp
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 13:50:10 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1L68Pp-0001VI-Lp
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 13:50:10 -0500
Received: from [199.232.76.173] (port=40462 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1L68Pp-0001V5-2f
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 13:50:09 -0500
Received: from mx2.redhat.com ([66.187.237.31]:58760)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <aarcange@redhat.com>) id 1L68Po-0007SO-Jy
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 13:50:08 -0500
Date: Fri, 28 Nov 2008 19:50:01 +0100
From: Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1
Message-ID: <20081128185001.GD31011@random.random>
References: <20081127123538.GC10348@random.random>
	<f43fc5580811271114u7ecc3277kc7518fb7dbf9b4c0@mail.gmail.com>
	<20081128015602.GA31011@random.random>
	<f43fc5580811280959k3410e62eq7a2a46417b438b64@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <f43fc5580811280959k3410e62eq7a2a46417b438b64@mail.gmail.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Blue Swirl <blauwirbel@gmail.com>
Cc: qemu-devel@nongnu.org

On Fri, Nov 28, 2008 at 07:59:13PM +0200, Blue Swirl wrote:
> I don't know, here's a pointer:
> http://lists.gnu.org/archive/html/qemu-devel/2008-08/msg00092.html

I'm in total agreement with it. The missing "proper vectored AIO
operations" are bdrv_aio_readv/writev ;).

I wonder how can possibly aio_readv/writev be missing in posix aio?
Unbelievable. It'd be totally trivial to add those to glibc, much
easier infact than to pthread_create by hand, but how can we add a
dependency on a certain glibc version? Ironically it'll be more
user-friendly to add dependency on linux kernel-aio implementation
that is already available for ages and it's guaranteed to run faster
(or at least not slower).

> Sorry, my description seems to have lead you to a totally wrong track.
> I meant this scenario: device (Lance Ethernet) -> DMA controller
> (MACIO) -> IOMMU -> physical memory. (In this case vectored DMA won't
> be useful since there is byte swapping involved, but serves as an
> example about generic DMA). At each step the DMA address is rewritten.
> It would be nice if the interface between Lance and DMA, DMA and IOMMU
> and IOMMU and memory was the same.

No problem. So you think I should change it to qemu_dma_sg instead of
pci_dma_sg? We can decide it later, but surely we can think about it
in the meantime ;).

> Here's some history, please have a look.
> 
> My first failed attempt:
> http://lists.gnu.org/archive/html/qemu-devel/2007-08/msg00179.html
> 
> My second failed rough sketch:
> http://lists.gnu.org/archive/html/qemu-devel/2007-10/msg00626.html
> 
> Anthony's version:
> http://lists.gnu.org/archive/html/qemu-devel/2008-03/msg00474.html
> 
> Anthony's second version:
> http://lists.gnu.org/archive/html/qemu-devel/2008-04/msg00077.html

Thanks a lot for the pointers.

BTW, lots of the credit in the design of my current implementation
goes to Avi, I forgot to mention it in previous emails.

The little cache layer I added at the last minute was very buggy so
don't look much of it, just assume it works when reading the patch ;).
I think I fixed it now in my tree, so next version will be much
better. I've also noticed some problems with windows (I didn't test
windows before posting), those aren't related to the cache layer as I
added a #define to disable it and replace it with malloc/free. But
that's not the cache layer, as soon as windows runs completely
flawlessy I post an update.

The iov cache layer is now also improved so that it only caches at
most N elements, where N is the max number of simultaneous in-flight
dma that ever happened during the runtime, so it's a bit smarter than
a generic slab cache.

Last but not the least there's still one malloc in the direct fast
path but the plan is to eliminate that too, by embedding the iov
inside the param (keeping it at the end of the struct like if I'd be
extending the linear_iov), so then the cache layer will handle it all
and there will be zero mallocs. The bounce path will be penalized as
it'll have to allocate the direct-iov too, but we don't care.

Thanks!
Andrea