From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LOxvC-0005g9-O2
	for qemu-devel@nongnu.org; Mon, 19 Jan 2009 12:28:22 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LOxvB-0005ee-Vi
	for qemu-devel@nongnu.org; Mon, 19 Jan 2009 12:28:22 -0500
Received: from [199.232.76.173] (port=53540 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LOxvB-0005e5-Ev
	for qemu-devel@nongnu.org; Mon, 19 Jan 2009 12:28:21 -0500
Received: from mx2.redhat.com ([66.187.237.31]:56896)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1LOxvA-0002Z8-K7
	for qemu-devel@nongnu.org; Mon, 19 Jan 2009 12:28:20 -0500
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n0JHSHQ5011475
	for <qemu-devel@nongnu.org>; Mon, 19 Jan 2009 12:28:17 -0500
Message-ID: <4974B82F.9020805@redhat.com>
Date: Mon, 19 Jan 2009 19:28:15 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH 1/5] Add target memory mapping API
References: <1232308399-21679-1-git-send-email-avi@redhat.com>	<1232308399-21679-2-git-send-email-avi@redhat.com>	<18804.34053.211615.181730@mariner.uk.xensource.com>	<4974943B.4020507@redhat.com>
	<18804.44271.868488.32192@mariner.uk.xensource.com>
In-Reply-To: <18804.44271.868488.32192@mariner.uk.xensource.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Ian Jackson wrote:
>> Correct.  If you need to perform read-modify-write, you need to use 
>> cpu_physical_memory_rw(), twice.  If we ever want to support RMW, we'll 
>> need to add another value for is_write.  I don't think we have 
>> interesting devices at this point which require efficient RMW.
>>     
>
> Efficient read-modify-write may be very hard for some setups to
> achieve.  It can't be done with the bounce buffer implementation.
> I think ond good rule of thumb would be to make sure that the interface
> as specified can be implemented in terms of cpu_physical_memory_rw.
>   

What is the motivation for efficient rmw?

>> Alternatively, only use this interface with devices where this doesn't 
>> matter.  Given that bouncing happens for mmio only, this would be all 
>> devices which you'd want to use this interface with anyway.
>>     
>
> That would be one alternative but isn't it the case that (for example)
> with a partial DMA completion, the guest can assume that the
> supposedly-untouched parts of the DMA target memory actually remain
> untouched rather than (say) zeroed ?
>   

For block devices, I don't think it can.  In any case, this will only 
occur with mmio.  I don't think the guest can assume much in such cases.

In fact, we could even say that the virtual hardware doesn't support 
dma-to-mmio at all and MCE the guest.  I'm sure no x86 guest would even 
notice.  Don't know about non-x86.

> In a system where we're trying to do zero copy, we may issue the map
> request for a large transfer, before we know how much the host kernel
> will actually provide.
>
>   

Won't it be at least 1GB?  Partition you requests to that size.

>> (I'm assuming that you'll implement the fastpath by directly mapping 
>> guest memory, not bouncing).
>>     
>
> Yes.  We can do that in Xen too but it's less of a priority for us
> given that we expect people who really care about performance to
> install PV drivers in the guest.
>   

I'm all in favor of accommodating Xen, but as long as you're out-of-tree 
you need to conform to qemu, not the other way around.

>> A variant of this API (posted by Andrea) hid all of the scheduling away 
>> within the implementation.
>>     
>
> I remember seeing this before but I don't think your previous one
> provided a callback for map completion ?  I thought it just blocked
> the caller until the map could complete.  That's obviously not ideal.
>   

It didn't block, it scheduled.

>>> This function should return a separate handle as well as the physical
>>> memory pointer.  That will make it much easier to provide an
>>> implementation which permits multiple bounce buffers or multiple
>>> mappings simultaneously.
>>>       
>> The downside to a separate handle is that device emulation code will now 
>> need to maintain the handle in addition to the the virtual address.  
>> Since the addresses will typically be maintained in an iovec, this means 
>> another array to be allocated and resized.
>>     
>
> Err, no, I don't really see that.  In my proposal the `handle' is
> actually allocated by the caller.  The implementation provides the
> private data and that can be empty.  There is no additional memory
> allocation.
>   

You need to store multiple handles (one per sg element), so you need to 
allocate a variable size vector for it.  Preallocation may be possible 
but perhaps wasteful.

>   
>> The design goals here were to keep things as simple as possible for the 
>> fast path.  Since the API fits all high-bandwidth devices that I know 
>> of, I don't think it's a good tradeoff to make the API more complex in 
>> order to be applicable to some corner cases.
>>     
>
> I think my question about partial DMA writes is very relevant.  If we
> don't care about that, nor about the corresponding notification for
> reads, then the API can be a lot simpler.

I don't see a concrete reason to care about it.

-- 
error compiling committee.c: too many arguments to function