qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Alexander Graf <agraf@suse.de>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API?
Date: Thu, 18 Jul 2013 15:55:02 +0200	[thread overview]
Message-ID: <20130718135501.GM3582@dhcp-200-207.str.redhat.com> (raw)
In-Reply-To: <9931658C-2020-42F3-9A0F-7AC3811E8AA5@suse.de>

Am 18.07.2013 um 15:44 hat Alexander Graf geschrieben:
> 
> On 18.07.2013, at 09:41, Kevin Wolf wrote:
> 
> > Am 17.07.2013 um 22:12 hat Mark Cave-Ayland geschrieben:
> >> On 17/07/13 14:35, Kevin Wolf wrote:
> >> 
> >>> Okay, so I've had a quick look at that DMA controller, and it seems that
> >>> for a complete emulation, there's no way around using a bounce buffer
> >>> (and calling directly into the block layer instead of using
> >>> dma-helpers.c) for the general case.
> >>> 
> >>> You can have a fast path that is triggered if one or more directly
> >>> following INPUT/OUTPUT commands cover the whole IDE command, and that
> >>> creates an QEMUSGList as described above and uses dma-helpers.c to
> >>> implement zero-copy requests. I suspect that your Darwin requests would
> >>> actually fall into this category.
> >>> 
> >>> Essentially I think Alex' patches are doing something similar, just not
> >>> implementing the complete DMA controller feature set and with the
> >>> regular slow path hacked as additional code into the fast path. So the
> >>> code could be cleaner, it could use asynchronous block layer functions
> >>> and handle errors, and it could be more complete, but at the end of
> >>> the day you'd still have some fast-path zero-copy I/O and some calls
> >>> into the block layer using bounce buffers.
> >> 
> >> I think the key concept to understand here is at what point does the
> >> data hit the disk? From the comments in various parts of
> >> Darwin/Linux, it could be understood that the DMA transfers are
> >> between memory and the ATA drive *buffer*, so for writes especially
> >> there is no guarantee that they even hit the disk until some point
> >> in the future, unless of course the FLUSH flag is set in the control
> >> register.
> >> 
> >> So part of me makes me think that maybe we are over-thinking this
> >> and we should just go with Kevin's original suggestion: what about
> >> if we start a new QEMUSGList for each IDE transfer, and just keep
> >> appending QEMUSGList entries until we find an OUTPUT_LAST/INPUT_LAST
> >> command?
> >> 
> >> Why is this valid? We can respond with a complete status for the
> >> intermediate INPUT_MORE/OUTPUT_MORE commands without touching the
> >> disk because all that guarantees is that data has been passed
> >> between memory and the drive *buffer* - not that it has actually hit
> >> the disk. And what is the point of having explicit _LAST commands if
> >> they aren't used to signify completion of the whole transfer between
> >> drive and memory?
> > 
> > I don't think there is even a clear relation between the DMA controller
> > status and whether the data is on disk or not. It's the IDE register's
> > job to tell the driver when a request has completed. The DMA controller
> > is only responsible for getting the data from the RAM to the device,
> > which might start doing a write only after it has received all data and
> > completed the DMA operation. (cf. PIO operation in core.c where the
> > bytes are gathered in a bounce buffer and only when the last byte
> > arrives, the whole sector is written out)
> > 
> > What I would do, however, is to complete even the INPUT/OUTPUT_MORE
> > commands only at the end of the whole request. This is definitely
> > allowed behaviour, and it ensures that a memory region isn't already
> > reused by the OS while e.g. a write request is still running and taking
> > data from this memory. We should only complete the DMA command as
> > soon as we don't touch the memory any more.
> 
> Yes, that's the version that I described as "throw away almost all of
> today's code and rewrite it" :).

Well, Mark didn't ask me what's easy to implement, but what's the Right
Thing to do. :-)

> Keep in mind that the same DMA controller can be used for Ethernet, so
> coupling it very tightly with IDE doesn't sound overly appealing to me
> either.

Taking this into consideration will make it even harder, but of course,
once designed right, it's also the most useful approach.

Kevin

  reply	other threads:[~2013-07-18 13:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <51E64692.1010407@ilande.co.uk>
     [not found] ` <20130717081627.GB2458@dhcp-200-207.str.redhat.com>
2013-07-17 12:52   ` [Qemu-devel] Possibility of unaligned DMA accesses via the QEMU DMA API? Mark Cave-Ayland
2013-07-17 12:59     ` Alexander Graf
2013-07-17 13:35       ` Kevin Wolf
2013-07-17 20:12         ` Mark Cave-Ayland
2013-07-18  7:41           ` Kevin Wolf
2013-07-18 13:44             ` Alexander Graf
2013-07-18 13:55               ` Kevin Wolf [this message]
2013-07-22  8:04               ` Mark Cave-Ayland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130718135501.GM3582@dhcp-200-207.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=agraf@suse.de \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).