From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36722)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1UzRtV-0004XF-GF
	for qemu-devel@nongnu.org; Wed, 17 Jul 2013 09:35:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1UzRtT-0000MK-5T
	for qemu-devel@nongnu.org; Wed, 17 Jul 2013 09:35:49 -0400
Received: from mx1.redhat.com ([209.132.183.28]:52675)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1UzRtS-0000ME-TA
	for qemu-devel@nongnu.org; Wed, 17 Jul 2013 09:35:47 -0400
Date: Wed, 17 Jul 2013 15:35:38 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20130717133538.GJ2458@dhcp-200-207.str.redhat.com>
References: <51E64692.1010407@ilande.co.uk>
	<20130717081627.GB2458@dhcp-200-207.str.redhat.com>
	<51E69398.6080709@ilande.co.uk>
	<47766825-6E23-4404-B06C-2F27A70091DF@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47766825-6E23-4404-B06C-2F27A70091DF@suse.de>
Subject: Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU
	DMA API?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Graf <agraf@suse.de>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, qemu-devel <qemu-devel@nongnu.org>

Am 17.07.2013 um 14:59 hat Alexander Graf geschrieben:
> 
> On 17.07.2013, at 14:52, Mark Cave-Ayland wrote:
> 
> > On 17/07/13 09:16, Kevin Wolf wrote:
> > 
> > Hi Kevin,
> > 
> > Thanks for the reply - CC to qemu-devel as requested.
> > 
> >>> I've been testing some of Alex Graf's patches for running Darwin
> >>> under QEMU PPC and have been experiencing some timeout problems on
> >>> block devices. My attention is drawn to this commit in particular:
> >>> https://github.com/qemu/qemu/commit/80fc95d8bdaf3392106b131a97ca701fd374489a.
> >>> 
> >>> The reason for this commit is that Darwin programs the DBDMA
> >>> controller to transfer data from the ATA FIFO in chunks that aren't
> >>> sector aligned, e.g. the ATA command requests 0x10000 (256 sectors)
> >>> but transfers the DMA engine to transfer the data to memory as 3
> >>> chunks of 0xfffe, 0xfffe and 0x4 bytes.
> >> 
> >> I'm not familiar with how DMA works for the macio IDE device. Do you
> >> have any pointers to specs or something?
> > 
> > It works by setting up a DMA descriptor table (which is a list of commands) which are then "executed" when the RUN status bit is set until a STOP command is reached. Things are slightly more complicated in that commands can have conditional branches set on them.
> > 
> >> The one important point I'm wondering about is why you call
> >> dma_bdrv_read() with a single 0xfffe QEMUSGList. Shouldn't it really be
> >> called with a QEMUSGList { 0xfffe, 0xfffe, 0x4 }, which should enable
> >> dma-helpers.c to do the right thing?
> > 
> > Hmmm I guess you could perhaps scan down the command list from the current position looking for all INPUT/OUTPUT commands until the next STOP command, and maybe build up a single QEMUSGList from that? I'm not sure exactly how robust that would be with the conditional branching though - Alex?
> 
> It'd at least be vastly different from how real hardware works, yes. We'd basically have to throw away the current interpretation code and instead emulate the device based on assumptions.

Okay, so I've had a quick look at that DMA controller, and it seems that
for a complete emulation, there's no way around using a bounce buffer
(and calling directly into the block layer instead of using
dma-helpers.c) for the general case.

You can have a fast path that is triggered if one or more directly
following INPUT/OUTPUT commands cover the whole IDE command, and that
creates an QEMUSGList as described above and uses dma-helpers.c to
implement zero-copy requests. I suspect that your Darwin requests would
actually fall into this category.

Essentially I think Alex' patches are doing something similar, just not
implementing the complete DMA controller feature set and with the
regular slow path hacked as additional code into the fast path. So the
code could be cleaner, it could use asynchronous block layer functions
and handle errors, and it could be more complete, but at the end of
the day you'd still have some fast-path zero-copy I/O and some calls
into the block layer using bounce buffers.

> > The main culprit for these transfers is Darwin which limits large transfers to 0xfffe (see http://searchcode.com/codesearch/view/23337208 line 382). Hence most large disk transactions get broken down into irregularly-sized chunks which highlights this issue.
> 
> The main issue is that we're dealing with 3 separate pieces of hardware here. There is the IDE controller which works on sector level. And then there's the DMA controller which fetches data from the IDE controller byte-wise (from what I understand). Both work independently, but we try to shoehorn both into the same callback.

But this is not really visible to software. At some point the bytes are
gathered until they fill up at least a full sector and only then they
are written to disk. The emulation must do the same.

Kevin