Re: [Qemu-devel] [RFC] QOMification of AXI streams

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Anthony Liguori <anthony@codemonkey.ws>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Anthony Liguori" <aliguori@us.ibm.com>,
	"qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>,
	"Peter Crosthwaite" <peter.crosthwaite@petalogix.com>,
	"Michal Simek" <monstr@monstr.eu>, "Avi Kivity" <avi@redhat.com>,
	"Edgar E. Iglesias" <edgar.iglesias@gmail.com>,
	"Andreas Färber" <afaerber@suse.de>,
	"John Williams" <john.williams@petalogix.com>,
	"Paul Brook" <paul@codesourcery.com>
Subject: Re: [Qemu-devel] [RFC] QOMification of AXI streams
Date: Mon, 11 Jun 2012 17:29:06 -0500	[thread overview]
Message-ID: <4FD67132.1060001@codemonkey.ws> (raw)
In-Reply-To: <1339452058.9220.32.camel@pasglop>

On 06/11/2012 05:00 PM, Benjamin Herrenschmidt wrote:
>>>      system_memory
>>>         alias ->   pci
>>>         alias ->   ram
>>>      pci
>>>         bar1
>>>         bar2
>>>      pcibm
>>>         alias ->   pci  (prio 1)
>>>         alias ->   system_memory (prio 0)
>>>
>>> cpu_physical_memory_rw() would be implemented as
>>> memory_region_rw(system_memory, ...) while pci_dma_rw() would be
>>> implemented as memory_region_rw(pcibm, ...).  This would allo
>> different address transformations for the two accesses.
>>
>> Yeah, this is what I'm basically thinking although I don't quite
>> understand what  'pcibm' stands for.
>>
>> My biggest worry is that we'll end up with parallel memory API
>> implementations split between memory.c and dma.c.
>
> So it makes some amount of sense to use the same structure. For example,
> if a device issues accesses, those could be caught by a sibling device
> memory region... or go upstream.
>
> Let's just look at downstream transformation for a minute...
>
> We do need to be a bit careful about transformation here: I need to
> double check but I don't think we do transformation downstream today in
> a clean way and we'd have to do that. IE. On pseries for example, the
> PCI host bridge has a window in the CPU address space of [A...A+S], but
> accesses to that window generates PCI cycles with different addresses
> [B...B+S] (with typically A and B both being naturally aligned on S so
> it's just a bit masking in HW).

I don't know that we really have bit masking done right in the memory API.

When we add a subregion, it always removes the offset from the address when it 
dispatches.  This more often than not works out well but for what you're 
describing above, it sounds like you'd really want to get an adjusted size (that 
could be transformed).

Today we generate a linear dispatch table.  This prevents us from applying 
device-level transforms.

> We somewhat implements that in spapr_pci today since it works but I
> don't quite understand how :-) Or rather, the terminology "alias" seems
> to be fairly bogus, we aren't talking about aliases here...
>
> So today we create a memory region with an "alias" (whatever that means)
> that is [B...B+S] and add a subregion which is [A...A+S]. That seems to
> work but but it's obscure.
>
> If I was to implement that, I would make it so that the struct
> MemoryRegion used in that hierarchy contains the address in the local
> domain -and- the transformed address in the CPU domain, so you can still
> sort them by CPU addresses for quick access and make this offsetting a
> standard property of any memory region since it's very common that
> busses drop address bits along the way.
>
> Now, if you want to use that structure for DMA, what you need to do
> first is when an access happens, walk up the region tree and scan for
> all siblings at every level, which can be costly.

So if you stick with the notion of subregions, you would still have a single 
MemoryRegion at the PCI bus layer that has all of it's children as sub regions. 
  Presumably that "scan for all siblings" is a binary search which shouldn't 
really be that expensive considering that we're likely to have a shallow depth 
in the memory hierarchy.

>
> Additionally to handle iommu's etc... you need the option for a given
> memory region to have functions to perform the transformation in the
> upstream direction.

I think that transformation function lives in the bus layer MemoryRegion.  It's 
a bit tricky though because you need some sort of notion of "who is asking".  So 
you need:

dma_memory_write(MemoryRegion *parent, DeviceState *caller,
                  const void *data, size_t size);

This could be simplified at each layer via:

void pci_device_write(PCIDevice *dev, const void *data, size_t size) {
     dma_memory_write(dev->bus->mr, DEVICE(dev), data, size);
}

> To be true to the HW, each bridge should have its memory region, so a
> setup with
>
>        /pci-host
>            |
>            |--/p2p
>                 |
> 	       |--/device
>
> Any DMA done by the device would walk through the p2p region to the host
> which would contain a region with transform ops.
>
> However, at each level, you'd have to search for sibling regions that
> may decode the address at that level before moving up, ie implement
> essentially the equivalent of the PCI substractive decoding scheme.

Not quite...  subtractive decoding only happens for very specific devices IIUC. 
  For instance, an PCI-ISA bridge.  Normally, it's positive decoding and a 
bridge has to describe the full region of MMIO/PIO that it handles.

So it's only necessary to transverse down the tree again for the very special 
case of PCI-ISA bridges.  Normally you can tell just by looking at siblings.

> That will be a significant overhead for your DMA ops I believe, though
> doable.

Worst case scenario, 256 devices with what, a 3 level deep hierarchy?  we're 
still talking about 24 simple address compares.  That shouldn't be so bad.

> Then we'd have to add map/unmap to MemoryRegion as well, with the
> understanding that they may not be supported at every level...

map/unmap can always fall back to bounce buffers.

> So yeah, it sounds doable and it would handle what DMAContext doesn't
> handle which is access to peer devices without going all the way back to
> the "top level", but it's complex and ... I need something in qemu
> 1.2 :-)

I think we need a longer term vision here.  We can find incremental solutions 
for the short term but I'm pretty nervous about having two parallel APIs only to 
discover that we need to converge in 2 years.

Regards,

Anthony Liguori


> In addition there's the memory barrier business so we probably want to
> keep the idea of having DMA specific accessors ...
>
> Could we keep the DMAContext for now and just rename it to MemoryRegion
> (keeping the accessors) when we go for a more in depth transformation ?
>
> Cheers,
> Ben.
>
>
>

next prev parent reply	other threads:[~2012-06-11 22:29 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-08  4:23 [Qemu-devel] [RFC] QOMification of AXI stream Peter Crosthwaite
2012-06-08  9:13 ` Paul Brook
2012-06-08  9:34   ` Peter Maydell
2012-06-08 13:13     ` Paul Brook
2012-06-08 13:39       ` Anthony Liguori
2012-06-08 13:59         ` Paul Brook
2012-06-08 14:17           ` Anthony Liguori
2012-06-08 13:41   ` Anthony Liguori
2012-06-08 13:53     ` Paul Brook
2012-06-08 13:55     ` Peter Maydell
2012-06-08  9:45 ` Andreas Färber
2012-06-09  1:53   ` Peter Crosthwaite
2012-06-09  2:12     ` Andreas Färber
2012-06-09  3:28       ` Peter Crosthwaite
2012-06-11  5:54       ` Paolo Bonzini
2012-06-11 13:05         ` Peter Maydell
2012-06-11 13:17         ` Anthony Liguori
2012-06-11 13:41           ` Paolo Bonzini
2012-06-08 14:15 ` Anthony Liguori
2012-06-09  1:24   ` Peter Crosthwaite
2012-06-11 13:15     ` Anthony Liguori
2012-06-11 13:39       ` Peter Maydell
2012-06-11 14:38         ` Edgar E. Iglesias
2012-06-11 14:53           ` Peter Maydell
2012-06-11 14:58             ` Edgar E. Iglesias
2012-06-11 15:03             ` Anthony Liguori
2012-06-11 15:34               ` Peter Maydell
2012-06-11 15:56               ` Edgar E. Iglesias
2012-06-12  0:33                 ` Peter Crosthwaite
2012-06-12  7:58                   ` Edgar E. Iglesias
2012-06-14  1:01                     ` Peter Crosthwaite
2012-06-11 15:01         ` Anthony Liguori
2012-06-11 17:31           ` Avi Kivity
2012-06-11 18:35             ` Anthony Liguori
2012-06-11 22:00               ` [Qemu-devel] [RFC] QOMification of AXI streams Benjamin Herrenschmidt
2012-06-11 22:29                 ` Anthony Liguori [this message]
2012-06-11 23:46                   ` Benjamin Herrenschmidt
2012-06-12  1:33                     ` Anthony Liguori
2012-06-12  2:06                       ` Benjamin Herrenschmidt
2012-06-12  9:46                   ` Avi Kivity
2012-06-13  0:37                     ` Benjamin Herrenschmidt
2012-06-13 20:57                       ` Anthony Liguori
2012-06-13 21:25                         ` Benjamin Herrenschmidt
2012-06-14  0:00                       ` Edgar E. Iglesias
2012-06-14  1:34                         ` Benjamin Herrenschmidt
2012-06-14  2:03                           ` Edgar E. Iglesias
2012-06-14  2:16                             ` Benjamin Herrenschmidt
2012-06-14  2:31                               ` Edgar E. Iglesias
2012-06-14  2:41                                 ` Benjamin Herrenschmidt
2012-06-14  3:17                                   ` Edgar E. Iglesias
2012-06-14  3:43                                     ` Benjamin Herrenschmidt
2012-06-14  5:16                                 ` Benjamin Herrenschmidt
2012-06-12  1:04                 ` Andreas Färber
2012-06-12  2:42                   ` Benjamin Herrenschmidt
2012-06-12  9:31               ` [Qemu-devel] [RFC] QOMification of AXI stream Avi Kivity
2012-06-12  9:42                 ` Edgar E. Iglesias
2012-06-11 18:36             ` Anthony Liguori
2012-06-12  9:51               ` Avi Kivity
2012-06-12 12:58             ` Peter Maydell
2012-06-12 13:18               ` Avi Kivity
2012-06-12 13:32                 ` Peter Maydell
2012-06-12 13:48                   ` Avi Kivity
2012-06-12 13:55                   ` Andreas Färber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD67132.1060001@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=afaerber@suse.de \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=edgar.iglesias@gmail.com \
    --cc=john.williams@petalogix.com \
    --cc=monstr@monstr.eu \
    --cc=paul@codesourcery.com \
    --cc=peter.crosthwaite@petalogix.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).