From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1L6v7W-0004nW-N0 for qemu-devel@nongnu.org; Sun, 30 Nov 2008 17:50:30 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1L6v7V-0004mR-Nk for qemu-devel@nongnu.org; Sun, 30 Nov 2008 17:50:30 -0500 Received: from [199.232.76.173] (port=56069 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L6v7V-0004mA-HX for qemu-devel@nongnu.org; Sun, 30 Nov 2008 17:50:29 -0500 Received: from yx-out-1718.google.com ([74.125.44.155]:7675) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1L6v7V-00071Q-25 for qemu-devel@nongnu.org; Sun, 30 Nov 2008 17:50:29 -0500 Received: by yx-out-1718.google.com with SMTP id 3so802634yxi.82 for ; Sun, 30 Nov 2008 14:50:28 -0800 (PST) Message-ID: <493318B0.7020506@codemonkey.ws> Date: Sun, 30 Nov 2008 16:50:24 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC 1/1] pci-dma-api-v2 References: <20081127123538.GC10348@random.random> <20081128015602.GA31011@random.random> <20081128185001.GD31011@random.random> <20081130174133.GC32172@random.random> In-Reply-To: <20081130174133.GC32172@random.random> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Blue Swirl , Andrea Arcangeli Andrea Arcangeli wrote: > On Fri, Nov 28, 2008 at 07:50:01PM +0100, Andrea Arcangeli wrote: This patch is too large. It should be split up. > Signed-off-by: Andrea Arcangeli > > Index: block_int.h > =================================================================== > --- block_int.h (revision 5818) > +++ block_int.h (working copy) > @@ -55,6 +55,8 @@ > int64_t sector_num, const uint8_t *buf, int nb_sectors, > BlockDriverCompletionFunc *cb, void *opaque); > void (*bdrv_aio_cancel)(BlockDriverAIOCB *acb); > + BlockDriverAIOIOV *bdrv_aio_readv; > + BlockDriverAIOIOV *bdrv_aio_writev; > int aiocb_size; > Please maintain consistency with the rest of the struct by not introducing typedefs for these function pointers. > Index: exec.c > =================================================================== > --- exec.c (revision 5818) > +++ exec.c (working copy) > @@ -2807,7 +2807,7 @@ > /* physical memory access (slow version, mainly for debug) */ > #if defined(CONFIG_USER_ONLY) > void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf, > - int len, int is_write) > + size_t len, int is_write) > { > int l, flags; > target_ulong page; > @@ -2848,7 +2848,7 @@ > > #else > void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf, > - int len, int is_write) > + size_t len, int is_write) > { > int l, io_index; > uint8_t *ptr; > @@ -2938,6 +2938,111 @@ > } > } > This API change should be a separate patch. It's orthogonal to this one. > +uint8_t *cpu_physical_memory_can_dma(target_phys_addr_t addr, > + size_t len, int is_write, > + int alignment) > Your patch is tab-damaged. This API seems flawed. It's duplicating most of what is in cpu_physical_memory_rw. I think you need something more sophisticated that can split up a scatter/gather list into a set of partial bounce buffers and partial direct copies. Just checking the vector once is a bit hackish. > + /* qemu doesn't execute guest code directly, but kvm does > + therefore fluch instruction caches */ > + if (kvm_enabled()) > + flush_icache_range((unsigned long)ptr, > + ((unsigned long)ptr)+l); > This is incorrect for upstream QEMU and just as wrong for kvm-userspace. This is a nasty hack for ia64. flush_icache_range() is dyngen generated. ia64 support is not in upstream QEMU. > > @@ -135,6 +149,9 @@ > /* add synchronous IO emulation layer */ > bdrv->bdrv_read = bdrv_read_em; > bdrv->bdrv_write = bdrv_write_em; > + bdrv->bdrv_aio_readv = bdrv_aio_readv_em; /* FIXME */ > + bdrv->bdrv_aio_writev = bdrv_aio_writev_em; /* FIXME */ > + bdrv->bdrv_aio_cancel = bdrv_aio_cancel_em; /* FIXME */ > What should be fixed? Are these emulated functions wrong? There's a lack of symmetry here. We should have a bdrv_readv and bdrv_aio_readv. bdrv_read and bdrv_aio_read should disappear. We can maintain wrappers that create a compatible interface for older code but just adding a new API is wrong. > +BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, > + struct iovec *iov, int iovcnt, size_t len, > + BlockDriverCompletionFunc *cb, void *opaque) > +{ > struct iovec does not exist on Windows. I also don't think you've got the abstraction right. Reading and writing may require actions to happen. I don't think you can just introduce something as simple as an iovec here. I think we need something more active. > Index: hw/ide.c > =================================================================== > --- hw/ide.c (revision 5818) > +++ hw/ide.c (working copy) > @@ -31,6 +31,7 @@ > #include "qemu-timer.h" > #include "sysemu.h" > #include "ppc_mac.h" > +#include "pci_dma.h" All of the changes to IDE/SCSI should be separate patches. That will help isolate failures when bisecting. > diff --git a/qemu/hw/pci_dma.c b/qemu/hw/pci_dma.c > new file mode 100644 > index 0000000..48762a8 > --- /dev/null > +++ b/qemu/hw/pci_dma.c > @@ -0,0 +1,366 @@ > +/* > + * QEMU PCI DMA operations > + * > + * Copyright (c) 2008 Red Hat, Inc. > + * > + * Permission is hereby granted, free of charge, to any person obtaining a copy > + * of this software and associated documentation files (the "Software"), to deal > + * in the Software without restriction, including without limitation the rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > + * THE SOFTWARE. > + */ > + > +#include > +#include "pci_dma.h" > + > +#define MAX_DMA_BOUNCE_BUFFER (1024*1024) > +//#define DEBUG_BOUNCE > +//#define MAX_DMA_BOUNCE_BUFFER (512) > +//#define DISABLE_IOVEC_CACHE > + > +typedef struct QEMUPciDmaSgParam { > + QEMUPciDmaSgSubmit *pci_dma_sg_submit; > + QEMUPciDmaSgComplete *pci_dma_sg_complete; > + void *pci_dma_sg_opaque; > + int dma_to_memory; > + int alignment; > + uint8_t *bounce; > + QEMUPciDmaSg *sg; > + int iovcnt; > + int restart_iovcnt; > + size_t restart_offset; > + int curr_restart_iovcnt; > + size_t curr_restart_offset; > + size_t curr_len; > +#ifndef DISABLE_IOVEC_CACHE > + struct QEMUPciDmaSgParam *next; > +#endif > + struct iovec iov; > +} QEMUPciDmaSgParam; > > What is the IOVEC_CACHE and why do we need it? Either way, it should be using sys-queue. I think you missed the mark here. This API needs to go through the PCIBus and be pluggable at that level. There can be a default implementation but it may differ for each PCI bus. Please read the previous threads about a DMA API. All of this has been discussed at length. Regards, Anthony Liguori