From: Anthony Liguori <anthony@codemonkey.ws>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
"Michal Simek" <monstr@monstr.eu>,
"qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>,
"Peter Crosthwaite" <peter.crosthwaite@petalogix.com>,
"Paul Brook" <paul@codesourcery.com>,
"Edgar E. Iglesias" <edgar.iglesias@gmail.com>,
"Andreas Färber" <afaerber@suse.de>,
"John Williams" <john.williams@petalogix.com>,
"Avi Kivity" <avi@redhat.com>
Subject: Re: [Qemu-devel] [RFC] QOMification of AXI streams
Date: Mon, 11 Jun 2012 20:33:52 -0500 [thread overview]
Message-ID: <4FD69C80.3080807@codemonkey.ws> (raw)
In-Reply-To: <1339458400.9220.49.camel@pasglop>
On 06/11/2012 06:46 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2012-06-11 at 17:29 -0500, Anthony Liguori wrote:
>
>> When we add a subregion, it always removes the offset from the address when it
>> dispatches. This more often than not works out well but for what you're
>> describing above, it sounds like you'd really want to get an adjusted size (that
>> could be transformed).
>>
>> Today we generate a linear dispatch table. This prevents us from applying
>> device-level transforms.
>
> subregions being relative to the parent would probably work for me.
>
> Typically I have something like:
>
> 0x1234_0000_0000 .. 0x1234_3fff_ffff : window to PCI memory
>
> Which generates PCI cycles into the range:
>
> 0x0000_c000_0000 .. 0x0000_ffff_ffff
>
> Which is a combination of bit masking& offset, ie, the HW algorithm is
> to forward some bits (0x3fff_ffff in this case) and replace the other
> ones (with 0xc000_0000) in this case.
>
> Is that properly represented in the current scheme by a subregion ?
Does this mean that both 0x1234..0000_0000is a valid address to issue PCI
requests along with 0x0000_c000_000?
If they both are valid, that sounds pretty much like what aliases were made for.
>> So if you stick with the notion of subregions, you would still have a single
>> MemoryRegion at the PCI bus layer that has all of it's children as sub regions.
>> Presumably that "scan for all siblings" is a binary search which shouldn't
>> really be that expensive considering that we're likely to have a shallow depth
>> in the memory hierarchy.
>
> Possibly but on every DMA access ...
We can always as a TLB :-)
>>> Additionally to handle iommu's etc... you need the option for a given
>>> memory region to have functions to perform the transformation in the
>>> upstream direction.
>>
>> I think that transformation function lives in the bus layer MemoryRegion. It's
>> a bit tricky though because you need some sort of notion of "who is asking". So
>> you need:
>>
>> dma_memory_write(MemoryRegion *parent, DeviceState *caller,
>> const void *data, size_t size);
>
> Why do you need the parent argument ? Can't it be implied from the
> DeviceState ? Or do you envision devices sitting on multiple busses ? Or
> it's just a matter that each device "class" will store that separately ?
Yeah, you may have multiple MemoryRegions you want to write to. I don't think
we should assume all devices have exactly one address space it writes to.
> Also you want the parent to be the direct parent P2P no ? Not the host
> bridge ? Ie. to be completely correct, you want to look at every sibling
> device under a given bridge, then go up if there is no match, etc...
> until you reach the PHB at which point you hit the iommu and eventually
> the system fabric.
Yes.
>> Not quite... subtractive decoding only happens for very specific devices IIUC.
>> For instance, an PCI-ISA bridge. Normally, it's positive decoding and a
>> bridge has to describe the full region of MMIO/PIO that it handles.
>
> That's for downstream. Upstream is essentially substractive as far as I
> can tell. IE. If no sibling device decodes the cycle, then it goes up
> no ?
Why would that be necessary? Are upstream requests p2p? I would have assumed
that they went to the host bus and then were dispatched by the host bus (but I'm
just guess).
>> So it's only necessary to transverse down the tree again for the very special
>> case of PCI-ISA bridges. Normally you can tell just by looking at siblings.
>>
>>> That will be a significant overhead for your DMA ops I believe, though
>>> doable.
>>
>> Worst case scenario, 256 devices with what, a 3 level deep hierarchy? we're
>> still talking about 24 simple address compares. That shouldn't be so bad.
>
> Per DMA access...
>
>>> Then we'd have to add map/unmap to MemoryRegion as well, with the
>>> understanding that they may not be supported at every level...
>>
>> map/unmap can always fall back to bounce buffers.
>
> Right.
>
>>> So yeah, it sounds doable and it would handle what DMAContext doesn't
>>> handle which is access to peer devices without going all the way back to
>>> the "top level", but it's complex and ... I need something in qemu
>>> 1.2 :-)
>>
>> I think we need a longer term vision here. We can find incremental solutions
>> for the short term but I'm pretty nervous about having two parallel APIs only to
>> discover that we need to converge in 2 years.
>
> Can we agree that what we need to avoid having to change every second
> day is the driver API ?
>
> In which case, what about I come up with some pci_* DMA APIs such as the
> ones you suggested above and fixup a handful of devices we care about to
> them.
Yes!
If I recall, that was what I suggested originally and you talked me out of it :-)
I think that's the best future proof design.
Regards,
Anthony Liguori
>
> Under the hood, we can make it use the DMAContext for now and have that
> working for us, but we can easily "merge" DMAContext and MemoryRegion
> later since it's not directly exposed to devices.
>
> IE. The devices would only use:
>
> pci_device_read/write/map/unmap(PCIDevice,...)
>
> And DMAContext remains burried in the implementation.
>
> Would that be ok with you ?
>
> Cheers,
> Ben.
>
>> Regards,
>>
>> Anthony Liguori
>>
>>
>>> In addition there's the memory barrier business so we probably want to
>>> keep the idea of having DMA specific accessors ...
>>>
>>> Could we keep the DMAContext for now and just rename it to MemoryRegion
>>> (keeping the accessors) when we go for a more in depth transformation ?
>>>
>>> Cheers,
>>> Ben.
>>>
>>>
>>>
>
>
next prev parent reply other threads:[~2012-06-12 1:34 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-08 4:23 [Qemu-devel] [RFC] QOMification of AXI stream Peter Crosthwaite
2012-06-08 9:13 ` Paul Brook
2012-06-08 9:34 ` Peter Maydell
2012-06-08 13:13 ` Paul Brook
2012-06-08 13:39 ` Anthony Liguori
2012-06-08 13:59 ` Paul Brook
2012-06-08 14:17 ` Anthony Liguori
2012-06-08 13:41 ` Anthony Liguori
2012-06-08 13:53 ` Paul Brook
2012-06-08 13:55 ` Peter Maydell
2012-06-08 9:45 ` Andreas Färber
2012-06-09 1:53 ` Peter Crosthwaite
2012-06-09 2:12 ` Andreas Färber
2012-06-09 3:28 ` Peter Crosthwaite
2012-06-11 5:54 ` Paolo Bonzini
2012-06-11 13:05 ` Peter Maydell
2012-06-11 13:17 ` Anthony Liguori
2012-06-11 13:41 ` Paolo Bonzini
2012-06-08 14:15 ` Anthony Liguori
2012-06-09 1:24 ` Peter Crosthwaite
2012-06-11 13:15 ` Anthony Liguori
2012-06-11 13:39 ` Peter Maydell
2012-06-11 14:38 ` Edgar E. Iglesias
2012-06-11 14:53 ` Peter Maydell
2012-06-11 14:58 ` Edgar E. Iglesias
2012-06-11 15:03 ` Anthony Liguori
2012-06-11 15:34 ` Peter Maydell
2012-06-11 15:56 ` Edgar E. Iglesias
2012-06-12 0:33 ` Peter Crosthwaite
2012-06-12 7:58 ` Edgar E. Iglesias
2012-06-14 1:01 ` Peter Crosthwaite
2012-06-11 15:01 ` Anthony Liguori
2012-06-11 17:31 ` Avi Kivity
2012-06-11 18:35 ` Anthony Liguori
2012-06-11 22:00 ` [Qemu-devel] [RFC] QOMification of AXI streams Benjamin Herrenschmidt
2012-06-11 22:29 ` Anthony Liguori
2012-06-11 23:46 ` Benjamin Herrenschmidt
2012-06-12 1:33 ` Anthony Liguori [this message]
2012-06-12 2:06 ` Benjamin Herrenschmidt
2012-06-12 9:46 ` Avi Kivity
2012-06-13 0:37 ` Benjamin Herrenschmidt
2012-06-13 20:57 ` Anthony Liguori
2012-06-13 21:25 ` Benjamin Herrenschmidt
2012-06-14 0:00 ` Edgar E. Iglesias
2012-06-14 1:34 ` Benjamin Herrenschmidt
2012-06-14 2:03 ` Edgar E. Iglesias
2012-06-14 2:16 ` Benjamin Herrenschmidt
2012-06-14 2:31 ` Edgar E. Iglesias
2012-06-14 2:41 ` Benjamin Herrenschmidt
2012-06-14 3:17 ` Edgar E. Iglesias
2012-06-14 3:43 ` Benjamin Herrenschmidt
2012-06-14 5:16 ` Benjamin Herrenschmidt
2012-06-12 1:04 ` Andreas Färber
2012-06-12 2:42 ` Benjamin Herrenschmidt
2012-06-12 9:31 ` [Qemu-devel] [RFC] QOMification of AXI stream Avi Kivity
2012-06-12 9:42 ` Edgar E. Iglesias
2012-06-11 18:36 ` Anthony Liguori
2012-06-12 9:51 ` Avi Kivity
2012-06-12 12:58 ` Peter Maydell
2012-06-12 13:18 ` Avi Kivity
2012-06-12 13:32 ` Peter Maydell
2012-06-12 13:48 ` Avi Kivity
2012-06-12 13:55 ` Andreas Färber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FD69C80.3080807@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=afaerber@suse.de \
--cc=avi@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=edgar.iglesias@gmail.com \
--cc=john.williams@petalogix.com \
--cc=monstr@monstr.eu \
--cc=paul@codesourcery.com \
--cc=peter.crosthwaite@petalogix.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).