From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1L67cf-0001F4-VN
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 12:59:22 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1L67ce-0001Eg-Ff
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 12:59:21 -0500
Received: from [199.232.76.173] (port=50773 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1L67ce-0001Ec-C5
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 12:59:20 -0500
Received: from mu-out-0910.google.com ([209.85.134.187]:42190)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <blauwirbel@gmail.com>) id 1L67cd-0000kn-Me
	for qemu-devel@nongnu.org; Fri, 28 Nov 2008 12:59:20 -0500
Received: by mu-out-0910.google.com with SMTP id w1so1322359mue.2
	for <qemu-devel@nongnu.org>; Fri, 28 Nov 2008 09:59:14 -0800 (PST)
Message-ID: <f43fc5580811280959k3410e62eq7a2a46417b438b64@mail.gmail.com>
Date: Fri, 28 Nov 2008 19:59:13 +0200
From: "Blue Swirl" <blauwirbel@gmail.com>
Subject: Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1
In-Reply-To: <20081128015602.GA31011@random.random>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20081127123538.GC10348@random.random>
	<f43fc5580811271114u7ecc3277kc7518fb7dbf9b4c0@mail.gmail.com>
	<20081128015602.GA31011@random.random>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: qemu-devel@nongnu.org

On 11/28/08, Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Thu, Nov 27, 2008 at 09:14:45PM +0200, Blue Swirl wrote:
>  > The previous similar attempt by Anthony for generic DMA using vectored
>  > IO was abandoned because the malloc/free overhead was more than the
>
>
> Even if there were dynamic allocations in the fast path, the overhead
>  of malloc/free is nothing if compared to running and waiting a host
>  kernel syscall to return every 4k, not to tell with O_DIRECT enabled
>  which is the whole point of having a direct-dma API that truly doesn't
>  pollute the cache. With O_DIRECT, without a real readv/writev I/O
>  performance would be destroyed going down to something like 10M/sec
>  even on the fastest storage/CPU/ram combinations.
>
>  So the question is how those benchmarks were run, with or without a
>  real readv/writev and with or without O_DIRECT to truly eliminate all
>  CPU cache pollution out of the memory copies?

I don't know, here's a pointer:
http://lists.gnu.org/archive/html/qemu-devel/2008-08/msg00092.html

>  About malloc, all we care about is the direct-io fast path, and with
>  my patch there is no allocation whatsoever in the fast path. About the
>  bounce layer, that is there for correctness only (purely to do DMA to
>  non-RAM or in non-linear RAM ranges with non-RAM holes in between) and
>  we don't care about it in performance terms.
>
>
>  > performance gain. Have you made any performance measurements? How does
>  > this version compare to the previous ones?
>
>
> I run some minor benchmark but it's basically futile to benchmark with
>  the bdrv_aio_readv/writev_em.
>
>
>  > I think the pci_ prefix can be removed, there is little PCI specific.
>
>
> Adding the pci_ prefix looked a requirement in naming terms from
>  previous threads on the topic. Before I learnt about it, I didn't want
>  to have a pci_ prefix too so I can certainly agree with you ;).
>
>  There's is nothing PCI specific so far. Anyway this is just a naming
>  matter, it's up to you to decide what you like :).
>
>
>  > For Sparc32 IOMMU (and probably other IOMMUS), it should be possible
>  > to register a function used in place of  cpu_physical_memory_rw,
>  > c_p_m_can_dma etc. The goal is that it should be possible to stack the
>  > DMA resolvers (think of devices behind a number of buses).
>
>
> The hardware thing being emulated in the real world wouldn't attach to
>  both buses I think, hence you can specify it in the driver what kind
>  of iommu it has (then behind it you can emulate whatever hardware you
>  want but still the original device was pci or not-pci). I personally
>  don't see much difference as renaming later wouldn't be harder than a
>  sed script...

Sorry, my description seems to have lead you to a totally wrong track.
I meant this scenario: device (Lance Ethernet) -> DMA controller
(MACIO) -> IOMMU -> physical memory. (In this case vectored DMA won't
be useful since there is byte swapping involved, but serves as an
example about generic DMA). At each step the DMA address is rewritten.
It would be nice if the interface between Lance and DMA, DMA and IOMMU
and IOMMU and memory was the same.

Here's some history, please have a look.

My first failed attempt:
http://lists.gnu.org/archive/html/qemu-devel/2007-08/msg00179.html

My second failed rough sketch:
http://lists.gnu.org/archive/html/qemu-devel/2007-10/msg00626.html

Anthony's version:
http://lists.gnu.org/archive/html/qemu-devel/2008-03/msg00474.html

Anthony's second version:
http://lists.gnu.org/archive/html/qemu-devel/2008-04/msg00077.html