From: Avi Kivity <avi@redhat.com>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/5] Add target memory mapping API
Date: Mon, 19 Jan 2009 20:29:40 +0200 [thread overview]
Message-ID: <4974C694.8070004@redhat.com> (raw)
In-Reply-To: <18804.48642.929024.908906@mariner.uk.xensource.com>
Ian Jackson wrote:
>>> Efficient read-modify-write may be very hard for some setups to
>>> achieve. It can't be done with the bounce buffer implementation.
>>> I think ond good rule of thumb would be to make sure that the interface
>>> as specified can be implemented in terms of cpu_physical_memory_rw.
>>>
>> What is the motivation for efficient rmw?
>>
>
> I think you've misunderstood me. I don't think there is such a
> motivation. I was saying it was so difficult to implement that we
> might as well exclude it.
>
Then we agree. The map API is for read OR write operations, not both at
the same time.
>
>>> That would be one alternative but isn't it the case that (for example)
>>> with a partial DMA completion, the guest can assume that the
>>> supposedly-untouched parts of the DMA target memory actually remain
>>> untouched rather than (say) zeroed ?
>>>
>> For block devices, I don't think it can.
>>
>
> `Block devices' ? We're talking about (say) IDE controllers here. I
> would be very surprised if an IDE controller used DMA to overwrite RAM
> beyond the amount of successful transfer.
>
> If a Unix variant does zero copy IO using DMA direct into process
> memory space, then it must even rely on the IDE controller not doing
> DMA beyond the end of the successful transfer, as the read(2) API
> promises to the calling process that data beyond the successful read
> is left untouched.
>
> And even if the IDE spec happily says that the (IDE) host (ie our
> guest) is not allowed to assume that that memory (ie the memory beyond
> the extent of the successful part of a partially successful transfer)
> is unchanged, there will almost certainly be some other IO device on
> some some platform that will make that promise.
>
> So we need a call into the DMA API from the device model to say which
> regions have actually been touched.
>
>
It's not possible to implement this efficiently. The qemu block layer
will submit the results of the map operation to the kernel in an async
zero copy operation. The kernel may break up this operation into several
parts (if the underlying backing store is fragmented) and submit in
parallel to the underlying device(s). Those requests will complete
out-of-order, so you can't guarantee that if an error occurs all memory
before will have been written and none after.
I really doubt that any guest will be affected by this. It's a tradeoff
between decent performance and needlessly accurate emulation. I don't
see how we can choose the latter.
>>> In a system where we're trying to do zero copy, we may issue the map
>>> request for a large transfer, before we know how much the host kernel
>>> will actually provide.
>>>
>> Won't it be at least 1GB? Partition you requests to that size.
>>
>
> No, I mean, before we know how much data qemu's read(2) will transfer.
>
You don't know afterwards either. Maybe read() is specced as you say,
but practical implementations will return the minimum bytes read, not exact.
Think software RAID.
>> In any case, this will only occur with mmio. I don't think the
>> guest can assume much in such cases.
>>
>
> No, it won't only occur with mmio.
>
> In the initial implementation in Xen, we will almost certainly simply
> emulate everything with cpu_physical_memory_rw. So it will happen all
> the time.
>
Try it out. I'm sure it will work just fine (if incredibly slowly,
unless you provide multiple bounce buffers).
>>> Err, no, I don't really see that. In my proposal the `handle' is
>>> actually allocated by the caller. The implementation provides the
>>> private data and that can be empty. There is no additional memory
>>> allocation.
>>>
>> You need to store multiple handles (one per sg element), so you need to
>> allocate a variable size vector for it. Preallocation may be possible
>> but perhaps wasteful.
>>
>
> See my reply to Anthony Ligouri, which shows how this can be avoided.
> Since you hope for a single call to map everything, you can do an sg
> list with a single handle.
>
That's a very different API.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
next prev parent reply other threads:[~2009-01-19 18:29 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-18 19:53 [Qemu-devel] [PATCH 0/5] Direct memory access for devices Avi Kivity
2009-01-18 19:53 ` [Qemu-devel] [PATCH 1/5] Add target memory mapping API Avi Kivity
2009-01-19 13:49 ` Ian Jackson
2009-01-19 14:54 ` Avi Kivity
2009-01-19 15:39 ` Anthony Liguori
2009-01-19 16:18 ` Paul Brook
2009-01-19 16:33 ` Anthony Liguori
2009-01-19 16:39 ` Avi Kivity
2009-01-19 19:15 ` Anthony Liguori
2009-01-20 10:09 ` Avi Kivity
2009-01-19 16:57 ` Ian Jackson
2009-01-19 19:23 ` Anthony Liguori
2009-01-20 10:17 ` Avi Kivity
2009-01-20 14:18 ` Ian Jackson
2009-01-19 16:40 ` Ian Jackson
2009-01-19 17:28 ` Avi Kivity
2009-01-19 17:53 ` Ian Jackson
2009-01-19 18:29 ` Avi Kivity [this message]
2009-01-20 14:32 ` Ian Jackson
2009-01-20 17:23 ` Avi Kivity
2009-01-19 18:25 ` Jamie Lokier
2009-01-19 18:43 ` Avi Kivity
2009-01-20 14:49 ` Ian Jackson
2009-01-20 17:42 ` Avi Kivity
2009-01-20 18:08 ` Jamie Lokier
2009-01-20 20:27 ` Avi Kivity
2009-01-21 16:53 ` Ian Jackson
2009-01-21 16:50 ` Ian Jackson
2009-01-21 17:18 ` Avi Kivity
2009-01-21 21:54 ` Anthony Liguori
2009-01-20 14:44 ` Ian Jackson
2009-01-21 12:06 ` [Qemu-devel] " Mike Day
2009-01-21 12:18 ` Avi Kivity
2009-01-19 15:05 ` [Qemu-devel] [PATCH 1/5] " Gerd Hoffmann
2009-01-19 15:23 ` Avi Kivity
2009-01-19 15:29 ` Avi Kivity
2009-01-19 15:57 ` Gerd Hoffmann
2009-01-19 16:25 ` Avi Kivity
2009-01-19 17:08 ` Ian Jackson
2009-01-19 17:16 ` Avi Kivity
2009-01-19 14:56 ` [Qemu-devel] " Anthony Liguori
2009-01-19 15:03 ` Avi Kivity
2009-01-19 15:49 ` Anthony Liguori
2009-01-19 15:51 ` Avi Kivity
2009-01-20 18:43 ` Anthony Liguori
2009-01-21 17:09 ` Ian Jackson
2009-01-21 18:56 ` [Qemu-devel] " Mike Day
2009-01-21 19:35 ` Avi Kivity
2009-01-21 19:36 ` [Qemu-devel] Re: [PATCH 1/5] " Anthony Liguori
2009-01-22 12:18 ` Ian Jackson
2009-01-22 18:46 ` Anthony Liguori
2009-01-26 12:23 ` Ian Jackson
2009-01-26 18:03 ` Anthony Liguori
2009-01-21 11:52 ` [Qemu-devel] " Mike Day
2009-01-21 12:17 ` Avi Kivity
2009-01-21 17:37 ` Paul Brook
2009-01-18 19:53 ` [Qemu-devel] [PATCH 2/5] Add map client retry notification Avi Kivity
2009-01-19 14:58 ` [Qemu-devel] " Anthony Liguori
2009-01-18 19:53 ` [Qemu-devel] [PATCH 3/5] Vectored block device API Avi Kivity
2009-01-19 16:54 ` Blue Swirl
2009-01-19 17:19 ` Avi Kivity
2009-01-18 19:53 ` [Qemu-devel] [PATCH 4/5] I/O vector helpers Avi Kivity
2009-01-18 19:53 ` [Qemu-devel] [PATCH 5/5] Convert IDE to directly access guest memory Avi Kivity
2009-01-19 16:50 ` [Qemu-devel] [PATCH 0/5] Direct memory access for devices Blue Swirl
-- strict thread matches above, loose matches on Subject: below --
2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
2009-01-22 10:36 ` [Qemu-devel] [PATCH 1/5] Add target memory mapping API Avi Kivity
2009-01-22 12:24 ` Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4974C694.8070004@redhat.com \
--to=avi@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).