From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43262)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1WzPYy-000152-A4
	for qemu-devel@nongnu.org; Tue, 24 Jun 2014 08:11:08 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1WzPYp-0004IP-PT
	for qemu-devel@nongnu.org; Tue, 24 Jun 2014 08:11:00 -0400
Message-ID: <53A96ACA.8010807@suse.de>
Date: Tue, 24 Jun 2014 14:10:50 +0200
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <53A44548.7020605@ilande.co.uk>
	<alpine.LMD.2.02.1406202057290.14951@jedlik.phy.bme.hu>
	<53A48B28.8070808@ilande.co.uk> <53A85668.10505@suse.de>
	<alpine.LMD.2.02.1406232125370.2806@jedlik.phy.bme.hu>
	<53A8AD26.2050305@ilande.co.uk>
	<alpine.LMD.2.02.1406241230120.5999@jedlik.phy.bme.hu>
	<53A95AA9.9000703@suse.de>
	<20140624112230.GE3458@noname.redhat.com>
	<53A960B9.8070900@suse.de>
	<20140624120703.GG3458@noname.redhat.com>
In-Reply-To: <20140624120703.GG3458@noname.redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PULL 075/118] macio: handle non-block ATAPI DMA
 transfers
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, qemu-devel <qemu-devel@nongnu.org>


On 24.06.14 14:07, Kevin Wolf wrote:
> Am 24.06.2014 um 13:27 hat Alexander Graf geschrieben:
>> On 24.06.14 13:22, Kevin Wolf wrote:
>>> Am 24.06.2014 um 13:02 hat Alexander Graf geschrieben:
>>>> The way DBDMA works is that you put in something similar to a
>>>> scatter-gather list: A list of chunks to read / write and where in
>>>> memory those chunks live. DBDMA then goes over its list and does the
>>>> pokes. So for example if the list is
>>>>
>>>>    [ memaddr = 0x12000 | len = 500 ]
>>>>    [ memaddr = 0x13098 | len = 12 ]
>>>>
>>>> then it reads 500 bytes from IDE, writes them at memory offset
>>>> 0x12000 and after that reads another 12 bytes from IDE and puts them
>>>> at memory offset 0x13098.
>>>>
>>>> The reason we have such complicated code for real DMA is that we
>>>> can't model this easily with our direct block-to-memory API. That
>>>> one can only work on a 512 byte granularity. So when we see
>>>> unaligned accesses like above, we have to split them out and handle
>>>> them lazily.
>>> Wait... What kind of granularity are you talking about?
>>>
>>> We do need disk accesses with a 512 byte granularity, because the API
>>> takes a sector number. This is also what real IDE disks do, they don't
>>> provide byte access.
>>>
>>> However, for the memory, I can't see why you couldn't pass a s/g list
>>> like what you wrote above to the DMA functions. This is not unusual at
>>> all and is the same as ide/pci.c does. There is no 512-byte alignment
>>> needed for the individual s/g list entries, only the total size should
>>> obviously be a multiple of 512 in the general case (otherwise the list
>>> would be too short or too long for the request).
>>>
>>> If this is really what we're talking about, then I think your problem is
>>> just that you try to handle the 500 byte and the 12 byte as individual
>>> requests instead of building up the s/g list and then sending a single
>>> request.
>> The 500 and 12 byte requests can come in as separate requests that
>> require previous requests to have finished. What Mac OS X does for
>> example is
>>
>>    [ memaddr = 0x2000 | len = 1024 ]
>>    [ memaddr = 0x1000 | len = 510 ]
>>
>> <wait for ack>
>>
>>    [ memaddr = 0x10fe | len = 2 ]
>>    [ memaddr = 0x3000 | len = 2048 ]
>>
>> If it was as simple as creating a working sglist, I would've
>> certainly done so long ago :).
> Thanks, that's the explanation that was missing for me (I'm sure you
> explained it more than once to me in the past few years, but I keep
> forgetting).
>
> This means, however, that exposing the byte access in the block layer is
> probably not what you want. Otherwise you would read the same sector
> twice from the image (assuming cache=none, so the backend must have
> 512-byte alignment). If you do the handling in the device emulation you
> can read the full request once and then only do the DMA part with a byte
> granularity. I suppose this is the complicated code that you have today?

Yes and no. We are trying to be slightly smarter than that. For all the 
aligned pieces in between the just have a fast path doing straight DMA 
to/from memory. For the tiny unaligned chunks, we read into a temporary 
buffer and do DMA byte-wise from that one manually. For writes it's the 
reverse - we only issue the full sector IDE transfer after our temporary 
buffer is fully filled.

I think it's perfectly reasonable to read the sector twice from the 
image though if that makes the DBDMA emulation code easier to maintain 
;). Right now there are just way too many tiny corner cases lingering.


Alex