From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41092) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYt3m-0006Iq-Dh for qemu-devel@nongnu.org; Thu, 25 Feb 2016 05:22:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aYt3h-0006yV-8x for qemu-devel@nongnu.org; Thu, 25 Feb 2016 05:22:14 -0500 Received: from mail-wm0-x243.google.com ([2a00:1450:400c:c09::243]:34524) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aYt3g-0006yN-VY for qemu-devel@nongnu.org; Thu, 25 Feb 2016 05:22:09 -0500 Received: by mail-wm0-x243.google.com with SMTP id b205so2592228wmb.1 for ; Thu, 25 Feb 2016 02:22:08 -0800 (PST) Sender: Paolo Bonzini References: <1450284917-10508-1-git-send-email-apyrgio@arrikto.com> <1450284917-10508-2-git-send-email-apyrgio@arrikto.com> <56BC6DC0.4070204@redhat.com> <56C70168.2050408@arrikto.com> <56CAE64B.4070608@redhat.com> <56CED32C.6020001@arrikto.com> From: Paolo Bonzini Message-ID: <56CED5CD.1070400@redhat.com> Date: Thu, 25 Feb 2016 11:22:05 +0100 MIME-Version: 1.0 In-Reply-To: <56CED32C.6020001@arrikto.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Pyrgiotis Cc: kwolf@redhat.com, famz@redhat.com, qemu-devel@nongnu.org On 25/02/2016 11:10, Alex Pyrgiotis wrote: > All normal regions in a QEMUSGList point to an address range in the > guest's RAM. The MMIO regions of QEMU's virtual devices, however, do not > correspond to such an address range, so QEMU must create a bounce buffer > to represent them. This bounce buffer is added in the I/O vector which > contains the rest of the mapped addresses and is later given to a > readv()/writev() call. Correct. >>> 9. This leads to a partial read/write and the mapping loop will resume >>> once the partial read/write() has finished. > > The MMIO region is the trigger for a partial read/write, but it's not > the actual reason. The actual reason is that there is only *one* > *global* bounce buffer. This means that if it's in use it or we > need to use it twice, we will have to wait. Yes. >>>> However, it is not possible to do the same for ioctls. This is actually >>>> the reason why no one has ever tried to make scsi-generic do anything >>>> but bounce-buffering. I think that your code breaks horribly in this >>>> case, and I don't see a way to fix it, except for reverting to bounce >>>> buffering. > > Sure, you're right, there's no sensible way to break an ioctl() > operation in many. However, I'd argue that we shouldn't need to, as it > would be much better if the DMA operation was never restarted in the > first place. Instead, if we dealt with the bigger issue of the global > bounce buffer, we could kill two birds with one stone. > > I see that there was an attempt [1] to replace the global bounce buffer > with something more dynamic. In short, the objections [2] were the > following: > > 1. It introduced locking/unlocking a global mutex in the hot path as > well as a hash table lookup. > 2. It allowed for unbounded memory allocations. > > An improvement that would address (1) is to get rid of any global state: > > Since the mapping operation takes place in the context of a DMA > operation, we could provide a ctx-type struct to the dma_memory_(un)map > --> address_space_(un)map functions that would contain a hash table. If > any memory allocations were needed, we would track them using that hash > table, which would require no locks. Moreover, if the initialization of > the hash table hurts the performance in the general case, we could use > instead a skip list, if the number of memory allocations is small (e.g. > < 100). You don't need a hash table either if you manage the bounce buffer list per DMA transfer, and the simplest way to achieve that is to move the bounce buffer from exec.c to dma-helpers.c entirely. The patch could first introduce address_space_map_direct that never uses the bounce buffer. dma-helpers.c can call address_space_map_direct and, if it fails, proceed to allocate (and fill if writing to the device) a bounce buffer. Since the QEMUSGList is mapped and unmapped beginning-to-end, you can just use a FIFO queue. The FIFO queue stores a (QEMUSGList, buffer) tuple. When unmapping a QEMUSGList you check if it matches the head of the queue; if it does, you write back the contents of the bounce buffer (for reads from the device) and free it. If it doesn't match, you call address_space_unmap. Then, once the bounce buffer is implemented within dma-helpers.c, you remove address_space_map and rename address_space_map_direct to address_space_map. cpu_register_map_client goes away. The unbounded memory allocation can be avoided by bounding the number of entries in the queue. In the case of scsi-generic you could just as well allow INT_MAX entries, because scsi-generic would do unbounded memory allocation anyway for the bounce buffer. Modulo the "& BDRV_SECTOR_MASK" issue, this actually seems simpler than what this series was doing. Paolo