From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:36821) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SeFza-0005TX-Mn for qemu-devel@nongnu.org; Mon, 11 Jun 2012 21:34:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SeFzY-0003BI-Hd for qemu-devel@nongnu.org; Mon, 11 Jun 2012 21:33:58 -0400 Received: from mail-ob0-f173.google.com ([209.85.214.173]:40853) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SeFzY-0003AK-9Y for qemu-devel@nongnu.org; Mon, 11 Jun 2012 21:33:56 -0400 Received: by obbwd20 with SMTP id wd20so8142902obb.4 for ; Mon, 11 Jun 2012 18:33:54 -0700 (PDT) Message-ID: <4FD69C80.3080807@codemonkey.ws> Date: Mon, 11 Jun 2012 20:33:52 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <4FD208F6.3020307@codemonkey.ws> <4FD5EF75.7060707@us.ibm.com> <4FD60834.2020208@codemonkey.ws> <4FD62B5C.4040106@redhat.com> <4FD63A7F.8090902@codemonkey.ws> <1339452058.9220.32.camel@pasglop> <4FD67132.1060001@codemonkey.ws> <1339458400.9220.49.camel@pasglop> In-Reply-To: <1339458400.9220.49.camel@pasglop> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] QOMification of AXI streams List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Benjamin Herrenschmidt Cc: Peter Maydell , Michal Simek , "qemu-devel@nongnu.org Developers" , Peter Crosthwaite , Paul Brook , "Edgar E. Iglesias" , =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= , John Williams , Avi Kivity On 06/11/2012 06:46 PM, Benjamin Herrenschmidt wrote: > On Mon, 2012-06-11 at 17:29 -0500, Anthony Liguori wrote: > >> When we add a subregion, it always removes the offset from the address when it >> dispatches. This more often than not works out well but for what you're >> describing above, it sounds like you'd really want to get an adjusted size (that >> could be transformed). >> >> Today we generate a linear dispatch table. This prevents us from applying >> device-level transforms. > > subregions being relative to the parent would probably work for me. > > Typically I have something like: > > 0x1234_0000_0000 .. 0x1234_3fff_ffff : window to PCI memory > > Which generates PCI cycles into the range: > > 0x0000_c000_0000 .. 0x0000_ffff_ffff > > Which is a combination of bit masking& offset, ie, the HW algorithm is > to forward some bits (0x3fff_ffff in this case) and replace the other > ones (with 0xc000_0000) in this case. > > Is that properly represented in the current scheme by a subregion ? Does this mean that both 0x1234..0000_0000is a valid address to issue PCI requests along with 0x0000_c000_000? If they both are valid, that sounds pretty much like what aliases were made for. >> So if you stick with the notion of subregions, you would still have a single >> MemoryRegion at the PCI bus layer that has all of it's children as sub regions. >> Presumably that "scan for all siblings" is a binary search which shouldn't >> really be that expensive considering that we're likely to have a shallow depth >> in the memory hierarchy. > > Possibly but on every DMA access ... We can always as a TLB :-) >>> Additionally to handle iommu's etc... you need the option for a given >>> memory region to have functions to perform the transformation in the >>> upstream direction. >> >> I think that transformation function lives in the bus layer MemoryRegion. It's >> a bit tricky though because you need some sort of notion of "who is asking". So >> you need: >> >> dma_memory_write(MemoryRegion *parent, DeviceState *caller, >> const void *data, size_t size); > > Why do you need the parent argument ? Can't it be implied from the > DeviceState ? Or do you envision devices sitting on multiple busses ? Or > it's just a matter that each device "class" will store that separately ? Yeah, you may have multiple MemoryRegions you want to write to. I don't think we should assume all devices have exactly one address space it writes to. > Also you want the parent to be the direct parent P2P no ? Not the host > bridge ? Ie. to be completely correct, you want to look at every sibling > device under a given bridge, then go up if there is no match, etc... > until you reach the PHB at which point you hit the iommu and eventually > the system fabric. Yes. >> Not quite... subtractive decoding only happens for very specific devices IIUC. >> For instance, an PCI-ISA bridge. Normally, it's positive decoding and a >> bridge has to describe the full region of MMIO/PIO that it handles. > > That's for downstream. Upstream is essentially substractive as far as I > can tell. IE. If no sibling device decodes the cycle, then it goes up > no ? Why would that be necessary? Are upstream requests p2p? I would have assumed that they went to the host bus and then were dispatched by the host bus (but I'm just guess). >> So it's only necessary to transverse down the tree again for the very special >> case of PCI-ISA bridges. Normally you can tell just by looking at siblings. >> >>> That will be a significant overhead for your DMA ops I believe, though >>> doable. >> >> Worst case scenario, 256 devices with what, a 3 level deep hierarchy? we're >> still talking about 24 simple address compares. That shouldn't be so bad. > > Per DMA access... > >>> Then we'd have to add map/unmap to MemoryRegion as well, with the >>> understanding that they may not be supported at every level... >> >> map/unmap can always fall back to bounce buffers. > > Right. > >>> So yeah, it sounds doable and it would handle what DMAContext doesn't >>> handle which is access to peer devices without going all the way back to >>> the "top level", but it's complex and ... I need something in qemu >>> 1.2 :-) >> >> I think we need a longer term vision here. We can find incremental solutions >> for the short term but I'm pretty nervous about having two parallel APIs only to >> discover that we need to converge in 2 years. > > Can we agree that what we need to avoid having to change every second > day is the driver API ? > > In which case, what about I come up with some pci_* DMA APIs such as the > ones you suggested above and fixup a handful of devices we care about to > them. Yes! If I recall, that was what I suggested originally and you talked me out of it :-) I think that's the best future proof design. Regards, Anthony Liguori > > Under the hood, we can make it use the DMAContext for now and have that > working for us, but we can easily "merge" DMAContext and MemoryRegion > later since it's not directly exposed to devices. > > IE. The devices would only use: > > pci_device_read/write/map/unmap(PCIDevice,...) > > And DMAContext remains burried in the implementation. > > Would that be ok with you ? > > Cheers, > Ben. > >> Regards, >> >> Anthony Liguori >> >> >>> In addition there's the memory barrier business so we probably want to >>> keep the idea of having DMA specific accessors ... >>> >>> Could we keep the DMAContext for now and just rename it to MemoryRegion >>> (keeping the accessors) when we go for a more in depth transformation ? >>> >>> Cheers, >>> Ben. >>> >>> >>> > >