From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41092)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1aYt3m-0006Iq-Dh
	for qemu-devel@nongnu.org; Thu, 25 Feb 2016 05:22:15 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1aYt3h-0006yV-8x
	for qemu-devel@nongnu.org; Thu, 25 Feb 2016 05:22:14 -0500
Received: from mail-wm0-x243.google.com ([2a00:1450:400c:c09::243]:34524)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1aYt3g-0006yN-VY
	for qemu-devel@nongnu.org; Thu, 25 Feb 2016 05:22:09 -0500
Received: by mail-wm0-x243.google.com with SMTP id b205so2592228wmb.1
	for <qemu-devel@nongnu.org>; Thu, 25 Feb 2016 02:22:08 -0800 (PST)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
References: <1450284917-10508-1-git-send-email-apyrgio@arrikto.com>
	<1450284917-10508-2-git-send-email-apyrgio@arrikto.com>
	<56BC6DC0.4070204@redhat.com> <56C70168.2050408@arrikto.com>
	<56CAE64B.4070608@redhat.com> <56CED32C.6020001@arrikto.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <56CED5CD.1070400@redhat.com>
Date: Thu, 25 Feb 2016 11:22:05 +0100
MIME-Version: 1.0
In-Reply-To: <56CED32C.6020001@arrikto.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping
	logic
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Pyrgiotis <apyrgio@arrikto.com>
Cc: kwolf@redhat.com, famz@redhat.com, qemu-devel@nongnu.org


On 25/02/2016 11:10, Alex Pyrgiotis wrote:
> All normal regions in a QEMUSGList point to an address range in the
> guest's RAM. The MMIO regions of QEMU's virtual devices, however, do not
> correspond to such an address range, so QEMU must create a bounce buffer
> to represent them. This bounce buffer is added in the I/O vector which
> contains the rest of the mapped addresses and is later given to a
> readv()/writev() call.

Correct.

>>> 9. This leads to a partial read/write and the mapping loop will resume
>>>    once the partial read/write() has finished.
> 
> The MMIO region is the trigger for a partial read/write, but it's not
> the actual reason. The actual reason is that there is only *one*
> *global* bounce buffer. This means that if it's in use it or we
> need to use it twice, we will have to wait.

Yes.

>>>> However, it is not possible to do the same for ioctls.  This is actually
>>>> the reason why no one has ever tried to make scsi-generic do anything
>>>> but bounce-buffering. I think that your code breaks horribly in this
>>>> case, and I don't see a way to fix it, except for reverting to bounce
>>>> buffering.
> 
> Sure, you're right, there's no sensible way to break an ioctl()
> operation in many. However, I'd argue that we shouldn't need to, as it
> would be much better if the DMA operation was never restarted in the
> first place. Instead, if we dealt with the bigger issue of the global
> bounce buffer, we could kill two birds with one stone.
> 
> I see that there was an attempt [1] to replace the global bounce buffer
> with something more dynamic. In short, the objections [2] were the
> following:
> 
> 1. It introduced locking/unlocking a global mutex in the hot path as
>    well as a hash table lookup.
> 2. It allowed for unbounded memory allocations.
> 
> An improvement that would address (1) is to get rid of any global state:
> 
> Since the mapping operation takes place in the context of a DMA
> operation, we could provide a ctx-type struct to the dma_memory_(un)map
> --> address_space_(un)map functions that would contain a hash table. If
> any memory allocations were needed, we would track them using that hash
> table, which would require no locks. Moreover, if the initialization of
> the hash table hurts the performance in the general case, we could use
> instead a skip list, if the number of memory allocations is small (e.g.
> < 100).

You don't need a hash table either if you manage the bounce buffer list
per DMA transfer, and the simplest way to achieve that is to move the
bounce buffer from exec.c to dma-helpers.c entirely.

The patch could first introduce address_space_map_direct that never uses
the bounce buffer.  dma-helpers.c can call address_space_map_direct and,
if it fails, proceed to allocate (and fill if writing to the device) a
bounce buffer.  Since the QEMUSGList is mapped and unmapped
beginning-to-end, you can just use a FIFO queue.  The FIFO queue stores
a (QEMUSGList, buffer) tuple.  When unmapping a QEMUSGList you check if
it matches the head of the queue; if it does, you write back the
contents of the bounce buffer (for reads from the device) and free it.
If it doesn't match, you call address_space_unmap.

Then, once the bounce buffer is implemented within dma-helpers.c, you
remove address_space_map and rename address_space_map_direct to
address_space_map.  cpu_register_map_client goes away.

The unbounded memory allocation can be avoided by bounding the number of
entries in the queue.  In the case of scsi-generic you could just as
well allow INT_MAX entries, because scsi-generic would do unbounded
memory allocation anyway for the bounce buffer.

Modulo the "& BDRV_SECTOR_MASK" issue, this actually seems simpler than
what this series was doing.

Paolo