From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1L6qeU-0007vG-Vy for qemu-devel@nongnu.org; Sun, 30 Nov 2008 13:04:15 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1L6qeU-0007tb-8g for qemu-devel@nongnu.org; Sun, 30 Nov 2008 13:04:14 -0500 Received: from [199.232.76.173] (port=50299 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L6qeT-0007tS-SL for qemu-devel@nongnu.org; Sun, 30 Nov 2008 13:04:13 -0500 Received: from mx2.redhat.com ([66.187.237.31]:47663) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1L6qeT-00063E-5D for qemu-devel@nongnu.org; Sun, 30 Nov 2008 13:04:13 -0500 Date: Sun, 30 Nov 2008 19:04:08 +0100 From: Andrea Arcangeli Subject: Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1 Message-ID: <20081130180408.GD32172@random.random> References: <20081127123538.GC10348@random.random> <20081128015602.GA31011@random.random> <20081128185001.GD31011@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: qemu-devel@nongnu.org On Fri, Nov 28, 2008 at 09:03:06PM +0200, Blue Swirl wrote: > There's also lio_listio that provides for vectored AIO. Discussed this in answer to Jamie, basically no LIO_READV/WRITEV, no way to submit 'struct iovec' to the kernel with it still, which is a must to perform with cache=off. > > > Anthony's second version: > > > http://lists.gnu.org/archive/html/qemu-devel/2008-04/msg00077.html Actually this version of the emulated bdrv_writev/readv should run faster thanks to malloc+memcpy instead of not having any memcpy and running more syscalls. I opted for an emulated bdrv_aio_readv/writev that does true zerocopy. But it doesn't make a whole lot of difference as neither one should run on any host kernel supporting readv/writev syscalls, this is just to we can test the rest of the zerocopy dma api. The bdrv_aio_readv/writev support has to be in a separated from the pci dma api anyway and surely I intend to reject my version of bdrv_aio_readv/writev as I think all qemu targets supports at least pthread posix API and readv/writev sycsalls allowing not having to do hacks like my current _em. > Perhaps you could point out why the previous attempts failed, but > yours won't? ;-) One can always hope to be more lucky? ;) Seriously, just try to apply my last patch to your qemu tree (kvm rejects in Makefile.target ppc section but it'll work on kvm too for x86* targets) and try to test it. As an example also look at the below IDE code, much of an improvement compared to current code IMHO and it leaves all aiocb knowledge outside of the dma API itself, as it has to be. static int build_dma_sg(BMDMAState *bm) { struct { uint32_t addr; uint32_t size; } prd; int len; int idx; for (idx = 1; idx <= IDE_DMA_BUF_SECTORS; idx++) { cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8); bm->cur_addr += 8; bm->sg[idx-1].addr = le32_to_cpu(prd.addr); prd.size = le32_to_cpu(prd.size); len = prd.size & 0xfffe; if (len == 0) len = 0x10000; bm->sg[idx-1].len = len; /* end of table (with a fail safe of one page) */ if ((prd.size & 0x80000000) || (bm->cur_addr - bm->addr) >= 4096) break; } if (idx > IDE_DMA_BUF_SECTORS) printf("build_dma_sg: too many sg entries\n"); return idx; } static void ide_dma_complete(void *opaque, int ret) { BMDMAState *bm = opaque; IDEState *s = bm->ide_if; bm->bdrv_aio_iov = NULL; bm->ide_if = NULL; bm->aiocb = NULL; /* end of transfer ? */ if (s->nsector == 0 && !ret) { s->status = READY_STAT | SEEK_STAT; ide_set_irq(s); bm->status &= ~BM_STATUS_DMAING; bm->status |= BM_STATUS_INT; } else { ide_dma_error(s); printf("ide_dma_complete error: nsector %d err %d\n", s->nsector, ret); } } static int ide_dma_submit(void *opaque, struct iovec *dma_iov, int iovcnt, size_t len, BlockDriverCompletionFunc dma_cb, void *dma_cb_param) { BMDMAState *bm = opaque; IDEState *s = bm->ide_if; size_t sectors; int64_t sector_num; sectors = len >> 9; if (s->nsector < sectors) return -3000; sector_num = ide_get_sector(s); ide_set_sector(s, sector_num + sectors); s->nsector -= sectors; #ifdef DEBUG_AIO printf("ide_dma_submit_write: sector_num=%lld n=%d\n", sector_num, sectors); #endif bm->aiocb = bm->bdrv_aio_iov(s->bs, sector_num, dma_iov, iovcnt, len, dma_cb, dma_cb_param); if (!bm->aiocb) return -3001; return 0; }