From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LPKJt-00044O-32
	for qemu-devel@nongnu.org; Tue, 20 Jan 2009 12:23:21 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LPKJr-000410-1l
	for qemu-devel@nongnu.org; Tue, 20 Jan 2009 12:23:20 -0500
Received: from [199.232.76.173] (port=38207 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LPKJq-00040I-Cq
	for qemu-devel@nongnu.org; Tue, 20 Jan 2009 12:23:18 -0500
Received: from mx2.redhat.com ([66.187.237.31]:39384)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1LPKJp-00042N-PU
	for qemu-devel@nongnu.org; Tue, 20 Jan 2009 12:23:18 -0500
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n0KHNHHU018024
	for <qemu-devel@nongnu.org>; Tue, 20 Jan 2009 12:23:17 -0500
Message-ID: <49760882.6010309@redhat.com>
Date: Tue, 20 Jan 2009 19:23:14 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH 1/5] Add target memory mapping API
References: <1232308399-21679-1-git-send-email-avi@redhat.com>	<1232308399-21679-2-git-send-email-avi@redhat.com>	<18804.34053.211615.181730@mariner.uk.xensource.com>	<4974943B.4020507@redhat.com>	<18804.44271.868488.32192@mariner.uk.xensource.com>	<4974B82F.9020805@redhat.com>	<18804.48642.929024.908906@mariner.uk.xensource.com>	<4974C694.8070004@redhat.com>
	<18805.57449.348449.492647@mariner.uk.xensource.com>
In-Reply-To: <18805.57449.348449.492647@mariner.uk.xensource.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Ian Jackson wrote:
> I think the key points in Avi's message are this:
>
> Avi Kivity writes:
>   
>> You don't know afterwards either. Maybe read() is specced as you
>> say, but practical implementations will return the minimum bytes
>> read, not exact.
>>     
>
> And this:
>
>   
>> I really doubt that any guest will be affected by this. It's a tradeoff 
>> between decent performance and needlessly accurate emulation. I don't 
>> see how we can choose the latter.
>>     
>
> I don't think this is the right way to analyse this situation.  We are
> trying to define a general-purpose DMA API for _all_ emulated devices,
> not just the IDE emulation and block devices that you seem to be
> considering.
>   

No.  There already exists a general API: cpu_physical_memory_rw().  We 
are trying to define an API which will allow the high-throughput devices 
(IDE, scsi, virtio-blk, virtio-net) to be implemented efficiently.

If device X does not work well with the API, then, unless it's important 
for some reason, it shouldn't use it.  If it is important, we can adapt 
the API then.

> If there is ever any hardware which behaves `properly' with partial
> DMA, and any host kernel device which can tell us what succeeded and
> what failed, then it is necessary for the DMA API we are now inventing
> to allow that device to be properly emulated.
>
> Even if we can't come up with an example right now of such a device
> then I would suggest that it's very likely that we will encounter one
> eventually.  But actually I can think of one straight away: a SCSI
> tapestreamer.  Tapestreamers often give partial transfers at the end
> of tapefiles; hosts (ie, qemu guests) talking to the SCSI controller
> do not expect the controller to DMA beyond the successful SCSI
> transfer length; and the (qemu host's) kernel's read() call will not
> overwrite beyond the successful transfer length either.
>   

That will work out fine as the DMA will be to kernel memory, and read() 
will copy just the interesting parts.

> If it is difficult for a block device to provide the faithful
> behaviour then it might be acceptable for the block device to always
> indicate to the DMA API that the entire transfer had taken place, even
> though actually some of it had failed.
>
> But personally I think you're mistaken about the behaviour of the
> (qemu host's) kernel's {aio_,p,}read(2).
>   

I'm pretty sure reads to software RAIDs will be submitted in parallel.  
If those reads are O_DIRECT, then it's impossible to maintain DMA ordering.

>>> In the initial implementation in Xen, we will almost certainly simply
>>> emulate everything with cpu_physical_memory_rw.  So it will happen all
>>> the time.
>>>       
>> Try it out. I'm sure it will work just fine (if incredibly slowly, 
>> unless you provide multiple bounce buffers).
>>     
>
> It will certainly work except (a) there are partial (interrupted)
> transfers and (b) the host relies on the partial DMA not overwriting
> more data than it successfully transferred.  So what that means is
> that if this introduces bugs they will be very difficult to find in
> testing.  I don't think testing is the answer here.
>   

The only workaround I can think of is not to DMA.  But that will be 
horribly slow.

-- 
error compiling committee.c: too many arguments to function