From: Anthony Liguori <anthony@codemonkey.ws>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
"Michal Simek" <monstr@monstr.eu>,
"qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>,
"Peter Crosthwaite" <peter.crosthwaite@petalogix.com>,
"Paul Brook" <paul@codesourcery.com>,
"Edgar E. Iglesias" <edgar.iglesias@gmail.com>,
"Andreas Färber" <afaerber@suse.de>,
"John Williams" <john.williams@petalogix.com>,
"Avi Kivity" <avi@redhat.com>
Subject: Re: [Qemu-devel] [RFC] QOMification of AXI streams
Date: Mon, 11 Jun 2012 20:33:52 -0500 [thread overview]
Message-ID: <4FD69C80.3080807@codemonkey.ws> (raw)
In-Reply-To: <1339458400.9220.49.camel@pasglop>
On 06/11/2012 06:46 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2012-06-11 at 17:29 -0500, Anthony Liguori wrote:
>
>> When we add a subregion, it always removes the offset from the address when it
>> dispatches. This more often than not works out well but for what you're
>> describing above, it sounds like you'd really want to get an adjusted size (that
>> could be transformed).
>>
>> Today we generate a linear dispatch table. This prevents us from applying
>> device-level transforms.
>
> subregions being relative to the parent would probably work for me.
>
> Typically I have something like:
>
> 0x1234_0000_0000 .. 0x1234_3fff_ffff : window to PCI memory
>
> Which generates PCI cycles into the range:
>
> 0x0000_c000_0000 .. 0x0000_ffff_ffff
>
> Which is a combination of bit masking& offset, ie, the HW algorithm is
> to forward some bits (0x3fff_ffff in this case) and replace the other
> ones (with 0xc000_0000) in this case.
>
> Is that properly represented in the current scheme by a subregion ?
Does this mean that both 0x1234..0000_0000is a valid address to issue PCI
requests along with 0x0000_c000_000?
If they both are valid, that sounds pretty much like what aliases were made for.
>> So if you stick with the notion of subregions, you would still have a single
>> MemoryRegion at the PCI bus layer that has all of it's children as sub regions.
>> Presumably that "scan for all siblings" is a binary search which shouldn't
>> really be that expensive considering that we're likely to have a shallow depth
>> in the memory hierarchy.
>
> Possibly but on every DMA access ...
We can always as a TLB :-)
>>> Additionally to handle iommu's etc... you need the option for a given
>>> memory region to have functions to perform the transformation in the
>>> upstream direction.
>>
>> I think that transformation function lives in the bus layer MemoryRegion. It's
>> a bit tricky though because you need some sort of notion of "who is asking". So
>> you need:
>>
>> dma_memory_write(MemoryRegion *parent, DeviceState *caller,
>> const void *data, size_t size);
>
> Why do you need the parent argument ? Can't it be implied from the
> DeviceState ? Or do you envision devices sitting on multiple busses ? Or
> it's just a matter that each device "class" will store that separately ?
Yeah, you may have multiple MemoryRegions you want to write to. I don't think
we should assume all devices have exactly one address space it writes to.
> Also you want the parent to be the direct parent P2P no ? Not the host
> bridge ? Ie. to be completely correct, you want to look at every sibling
> device under a given bridge, then go up if there is no match, etc...
> until you reach the PHB at which point you hit the iommu and eventually
> the system fabric.
Yes.
>> Not quite... subtractive decoding only happens for very specific devices IIUC.
>> For instance, an PCI-ISA bridge. Normally, it's positive decoding and a
>> bridge has to describe the full region of MMIO/PIO that it handles.
>
> That's for downstream. Upstream is essentially substractive as far as I
> can tell. IE. If no sibling device decodes the cycle, then it goes up
> no ?
Why would that be necessary? Are upstream requests p2p? I would have assumed
that they went to the host bus and then were dispatched by the host bus (but I'm
just guess).
>> So it's only necessary to transverse down the tree again for the very special
>> case of PCI-ISA bridges. Normally you can tell just by looking at siblings.
>>
>>> That will be a significant overhead for your DMA ops I believe, though
>>> doable.
>>
>> Worst case scenario, 256 devices with what, a 3 level deep hierarchy? we're
>> still talking about 24 simple address compares. That shouldn't be so bad.
>
> Per DMA access...
>
>>> Then we'd have to add map/unmap to MemoryRegion as well, with the
>>> understanding that they may not be supported at every level...
>>
>> map/unmap can always fall back to bounce buffers.
>
> Right.
>
>>> So yeah, it sounds doable and it would handle what DMAContext doesn't
>>> handle which is access to peer devices without going all the way back to
>>> the "top level", but it's complex and ... I need something in qemu
>>> 1.2 :-)
>>
>> I think we need a longer term vision here. We can find incremental solutions
>> for the short term but I'm pretty nervous about having two parallel APIs only to
>> discover that we need to converge in 2 years.
>
> Can we agree that what we need to avoid having to change every second
> day is the driver API ?
>
> In which case, what about I come up with some pci_* DMA APIs such as the
> ones you suggested above and fixup a handful of devices we care about to
> them.
Yes!
If I recall, that was what I suggested originally and you talked me out of it :-)
I think that's the best future proof design.
Regards,
Anthony Liguori
>
> Under the hood, we can make it use the DMAContext for now and have that
> working for us, but we can easily "merge" DMAContext and MemoryRegion
> later since it's not directly exposed to devices.
>
> IE. The devices would only use:
>
> pci_device_read/write/map/unmap(PCIDevice,...)
>
> And DMAContext remains burried in the implementation.
>
> Would that be ok with you ?
>
> Cheers,
> Ben.
>
>> Regards,
>>
>> Anthony Liguori
>>
>>
>>> In addition there's the memory barrier business so we probably want to
>>> keep the idea of having DMA specific accessors ...
>>>
>>> Could we keep the DMAContext for now and just rename it to MemoryRegion
>>> (keeping the accessors) when we go for a more in depth transformation ?
>>>
>>> Cheers,
>>> Ben.
>>>
>>>
>>>
>
>
next prev parent reply other threads:[~2012-06-12 1:34 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-08 4:23 [Qemu-devel] [RFC] QOMification of AXI stream Peter Crosthwaite
2012-06-08 9:13 ` Paul Brook
2012-06-08 9:34 ` Peter Maydell
2012-06-08 13:13 ` Paul Brook
2012-06-08 13:39 ` Anthony Liguori
2012-06-08 13:59 ` Paul Brook
2012-06-08 14:17 ` Anthony Liguori
2012-06-08 13:41 ` Anthony Liguori
2012-06-08 13:53 ` Paul Brook
2012-06-08 13:55 ` Peter Maydell
2012-06-08 9:45 ` Andreas Färber
2012-06-09 1:53 ` Peter Crosthwaite
2012-06-09 2:12 ` Andreas Färber
2012-06-09 3:28 ` Peter Crosthwaite
2012-06-11 5:54 ` Paolo Bonzini
2012-06-11 13:05 ` Peter Maydell
2012-06-11 13:17 ` Anthony Liguori
2012-06-11 13:41 ` Paolo Bonzini
2012-06-08 14:15 ` Anthony Liguori
2012-06-09 1:24 ` Peter Crosthwaite
2012-06-11 13:15 ` Anthony Liguori
2012-06-11 13:39 ` Peter Maydell
2012-06-11 14:38 ` Edgar E. Iglesias
2012-06-11 14:53 ` Peter Maydell
2012-06-11 14:58 ` Edgar E. Iglesias
2012-06-11 15:03 ` Anthony Liguori
2012-06-11 15:34 ` Peter Maydell
2012-06-11 15:56 ` Edgar E. Iglesias
2012-06-12 0:33 ` Peter Crosthwaite
2012-06-12 7:58 ` Edgar E. Iglesias
2012-06-14 1:01 ` Peter Crosthwaite
2012-06-11 15:01 ` Anthony Liguori
2012-06-11 17:31 ` Avi Kivity
2012-06-11 18:35 ` Anthony Liguori
2012-06-11 22:00 ` [Qemu-devel] [RFC] QOMification of AXI streams Benjamin Herrenschmidt
2012-06-11 22:29 ` Anthony Liguori
2012-06-11 23:46 ` Benjamin Herrenschmidt
2012-06-12 1:33 ` Anthony Liguori [this message]
2012-06-12 2:06 ` Benjamin Herrenschmidt
2012-06-12 9:46 ` Avi Kivity
2012-06-13 0:37 ` Benjamin Herrenschmidt
2012-06-13 20:57 ` Anthony Liguori
2012-06-13 21:25 ` Benjamin Herrenschmidt
2012-06-14 0:00 ` Edgar E. Iglesias
2012-06-14 1:34 ` Benjamin Herrenschmidt
2012-06-14 2:03 ` Edgar E. Iglesias
2012-06-14 2:16 ` Benjamin Herrenschmidt
2012-06-14 2:31 ` Edgar E. Iglesias
2012-06-14 2:41 ` Benjamin Herrenschmidt
2012-06-14 3:17 ` Edgar E. Iglesias
2012-06-14 3:43 ` Benjamin Herrenschmidt
2012-06-14 5:16 ` Benjamin Herrenschmidt
2012-06-12 1:04 ` Andreas Färber
2012-06-12 2:42 ` Benjamin Herrenschmidt
2012-06-12 9:31 ` [Qemu-devel] [RFC] QOMification of AXI stream Avi Kivity
2012-06-12 9:42 ` Edgar E. Iglesias
2012-06-11 18:36 ` Anthony Liguori
2012-06-12 9:51 ` Avi Kivity
2012-06-12 12:58 ` Peter Maydell
2012-06-12 13:18 ` Avi Kivity
2012-06-12 13:32 ` Peter Maydell
2012-06-12 13:48 ` Avi Kivity
2012-06-12 13:55 ` Andreas Färber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FD69C80.3080807@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=afaerber@suse.de \
--cc=avi@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=edgar.iglesias@gmail.com \
--cc=john.williams@petalogix.com \
--cc=monstr@monstr.eu \
--cc=paul@codesourcery.com \
--cc=peter.crosthwaite@petalogix.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.