qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Michal Simek" <monstr@monstr.eu>,
	"qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>,
	"Peter Crosthwaite" <peter.crosthwaite@petalogix.com>,
	"Paul Brook" <paul@codesourcery.com>,
	"Edgar E. Iglesias" <edgar.iglesias@gmail.com>,
	"Andreas Färber" <afaerber@suse.de>,
	"John Williams" <john.williams@petalogix.com>,
	"Avi Kivity" <avi@redhat.com>
Subject: Re: [Qemu-devel] [RFC] QOMification of AXI streams
Date: Mon, 11 Jun 2012 20:33:52 -0500	[thread overview]
Message-ID: <4FD69C80.3080807@codemonkey.ws> (raw)
In-Reply-To: <1339458400.9220.49.camel@pasglop>

On 06/11/2012 06:46 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2012-06-11 at 17:29 -0500, Anthony Liguori wrote:
>
>> When we add a subregion, it always removes the offset from the address when it
>> dispatches.  This more often than not works out well but for what you're
>> describing above, it sounds like you'd really want to get an adjusted size (that
>> could be transformed).
>>
>> Today we generate a linear dispatch table.  This prevents us from applying
>> device-level transforms.
>
> subregions being relative to the parent would probably work for me.
>
> Typically I have something like:
>
>    0x1234_0000_0000 .. 0x1234_3fff_ffff : window to PCI memory
>
> Which generates PCI cycles into the range:
>
>    0x0000_c000_0000 .. 0x0000_ffff_ffff
>
> Which is a combination of bit masking&  offset, ie, the HW algorithm is
> to forward some bits (0x3fff_ffff in this case) and replace the other
> ones (with 0xc000_0000) in this case.
>
> Is that properly represented in the current scheme by a subregion ?

Does this mean that both 0x1234..0000_0000is a valid address to issue PCI 
requests along with 0x0000_c000_000?

If they both are valid, that sounds pretty much like what aliases were made for.

>> So if you stick with the notion of subregions, you would still have a single
>> MemoryRegion at the PCI bus layer that has all of it's children as sub regions.
>>    Presumably that "scan for all siblings" is a binary search which shouldn't
>> really be that expensive considering that we're likely to have a shallow depth
>> in the memory hierarchy.
>
> Possibly but on every DMA access ...

We can always as a TLB :-)

>>> Additionally to handle iommu's etc... you need the option for a given
>>> memory region to have functions to perform the transformation in the
>>> upstream direction.
>>
>> I think that transformation function lives in the bus layer MemoryRegion.  It's
>> a bit tricky though because you need some sort of notion of "who is asking".  So
>> you need:
>>
>> dma_memory_write(MemoryRegion *parent, DeviceState *caller,
>>                    const void *data, size_t size);
>
> Why do you need the parent argument ? Can't it be implied from the
> DeviceState ? Or do you envision devices sitting on multiple busses ? Or
> it's just a matter that each device "class" will store that separately ?

Yeah, you may have multiple MemoryRegions you want to write to.  I don't think 
we should assume all devices have exactly one address space it writes to.

> Also you want the parent to be the direct parent P2P no ? Not the host
> bridge ? Ie. to be completely correct, you want to look at every sibling
> device under a given bridge, then go up if there is no match, etc...
> until you reach the PHB at which point you hit the iommu and eventually
> the system fabric.

Yes.

>> Not quite...  subtractive decoding only happens for very specific devices IIUC.
>>    For instance, an PCI-ISA bridge.  Normally, it's positive decoding and a
>> bridge has to describe the full region of MMIO/PIO that it handles.
>
> That's for downstream. Upstream is essentially substractive as far as I
> can tell. IE. If no sibling device decodes the cycle, then it goes up
> no ?

Why would that be necessary?  Are upstream requests p2p?  I would have assumed 
that they went to the host bus and then were dispatched by the host bus (but I'm 
just guess).

>> So it's only necessary to transverse down the tree again for the very special
>> case of PCI-ISA bridges.  Normally you can tell just by looking at siblings.
>>
>>> That will be a significant overhead for your DMA ops I believe, though
>>> doable.
>>
>> Worst case scenario, 256 devices with what, a 3 level deep hierarchy?  we're
>> still talking about 24 simple address compares.  That shouldn't be so bad.
>
> Per DMA access...
>
>>> Then we'd have to add map/unmap to MemoryRegion as well, with the
>>> understanding that they may not be supported at every level...
>>
>> map/unmap can always fall back to bounce buffers.
>
> Right.
>
>>> So yeah, it sounds doable and it would handle what DMAContext doesn't
>>> handle which is access to peer devices without going all the way back to
>>> the "top level", but it's complex and ... I need something in qemu
>>> 1.2 :-)
>>
>> I think we need a longer term vision here.  We can find incremental solutions
>> for the short term but I'm pretty nervous about having two parallel APIs only to
>> discover that we need to converge in 2 years.
>
> Can we agree that what we need to avoid having to change every second
> day is the driver API ?
>
> In which case, what about I come up with some pci_* DMA APIs such as the
> ones you suggested above and fixup a handful of devices we care about to
> them.

Yes!

If I recall, that was what I suggested originally and you talked me out of it :-)

I think that's the best future proof design.

Regards,

Anthony Liguori

>
> Under the hood, we can make it use the DMAContext for now and have that
> working for us, but we can easily "merge" DMAContext and MemoryRegion
> later since it's not directly exposed to devices.
>
> IE. The devices would only use:
>
>    pci_device_read/write/map/unmap(PCIDevice,...)
>
> And DMAContext remains burried in the implementation.
>
> Would that be ok with you ?
>
> Cheers,
> Ben.
>
>> Regards,
>>
>> Anthony Liguori
>>
>>
>>> In addition there's the memory barrier business so we probably want to
>>> keep the idea of having DMA specific accessors ...
>>>
>>> Could we keep the DMAContext for now and just rename it to MemoryRegion
>>> (keeping the accessors) when we go for a more in depth transformation ?
>>>
>>> Cheers,
>>> Ben.
>>>
>>>
>>>
>
>

  reply	other threads:[~2012-06-12  1:34 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-08  4:23 [Qemu-devel] [RFC] QOMification of AXI stream Peter Crosthwaite
2012-06-08  9:13 ` Paul Brook
2012-06-08  9:34   ` Peter Maydell
2012-06-08 13:13     ` Paul Brook
2012-06-08 13:39       ` Anthony Liguori
2012-06-08 13:59         ` Paul Brook
2012-06-08 14:17           ` Anthony Liguori
2012-06-08 13:41   ` Anthony Liguori
2012-06-08 13:53     ` Paul Brook
2012-06-08 13:55     ` Peter Maydell
2012-06-08  9:45 ` Andreas Färber
2012-06-09  1:53   ` Peter Crosthwaite
2012-06-09  2:12     ` Andreas Färber
2012-06-09  3:28       ` Peter Crosthwaite
2012-06-11  5:54       ` Paolo Bonzini
2012-06-11 13:05         ` Peter Maydell
2012-06-11 13:17         ` Anthony Liguori
2012-06-11 13:41           ` Paolo Bonzini
2012-06-08 14:15 ` Anthony Liguori
2012-06-09  1:24   ` Peter Crosthwaite
2012-06-11 13:15     ` Anthony Liguori
2012-06-11 13:39       ` Peter Maydell
2012-06-11 14:38         ` Edgar E. Iglesias
2012-06-11 14:53           ` Peter Maydell
2012-06-11 14:58             ` Edgar E. Iglesias
2012-06-11 15:03             ` Anthony Liguori
2012-06-11 15:34               ` Peter Maydell
2012-06-11 15:56               ` Edgar E. Iglesias
2012-06-12  0:33                 ` Peter Crosthwaite
2012-06-12  7:58                   ` Edgar E. Iglesias
2012-06-14  1:01                     ` Peter Crosthwaite
2012-06-11 15:01         ` Anthony Liguori
2012-06-11 17:31           ` Avi Kivity
2012-06-11 18:35             ` Anthony Liguori
2012-06-11 22:00               ` [Qemu-devel] [RFC] QOMification of AXI streams Benjamin Herrenschmidt
2012-06-11 22:29                 ` Anthony Liguori
2012-06-11 23:46                   ` Benjamin Herrenschmidt
2012-06-12  1:33                     ` Anthony Liguori [this message]
2012-06-12  2:06                       ` Benjamin Herrenschmidt
2012-06-12  9:46                   ` Avi Kivity
2012-06-13  0:37                     ` Benjamin Herrenschmidt
2012-06-13 20:57                       ` Anthony Liguori
2012-06-13 21:25                         ` Benjamin Herrenschmidt
2012-06-14  0:00                       ` Edgar E. Iglesias
2012-06-14  1:34                         ` Benjamin Herrenschmidt
2012-06-14  2:03                           ` Edgar E. Iglesias
2012-06-14  2:16                             ` Benjamin Herrenschmidt
2012-06-14  2:31                               ` Edgar E. Iglesias
2012-06-14  2:41                                 ` Benjamin Herrenschmidt
2012-06-14  3:17                                   ` Edgar E. Iglesias
2012-06-14  3:43                                     ` Benjamin Herrenschmidt
2012-06-14  5:16                                 ` Benjamin Herrenschmidt
2012-06-12  1:04                 ` Andreas Färber
2012-06-12  2:42                   ` Benjamin Herrenschmidt
2012-06-12  9:31               ` [Qemu-devel] [RFC] QOMification of AXI stream Avi Kivity
2012-06-12  9:42                 ` Edgar E. Iglesias
2012-06-11 18:36             ` Anthony Liguori
2012-06-12  9:51               ` Avi Kivity
2012-06-12 12:58             ` Peter Maydell
2012-06-12 13:18               ` Avi Kivity
2012-06-12 13:32                 ` Peter Maydell
2012-06-12 13:48                   ` Avi Kivity
2012-06-12 13:55                   ` Andreas Färber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD69C80.3080807@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=afaerber@suse.de \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=edgar.iglesias@gmail.com \
    --cc=john.williams@petalogix.com \
    --cc=monstr@monstr.eu \
    --cc=paul@codesourcery.com \
    --cc=peter.crosthwaite@petalogix.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).