From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Wood Subject: Re: RFC: vfio / device assignment -- layout of device fd files Date: Mon, 29 Aug 2011 18:14:29 -0500 Message-ID: <4E5C1D55.4040306@freescale.com> References: <9F6FE96B71CF29479FF1CDC8046E15031B3313@039-SN1MPN1-002.039d.mgd.msft.net> <1314647500.2859.354.camel@bling.home> <4E5C0B8D.3040703@freescale.com> <1314658013.2859.399.camel@bling.home> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Yoder Stuart-B08248 , Benjamin Herrenschmidt , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , Alexander Graf , Wood Scott-B07421 , "Joerg.Roedel@amd.com" , "avi@redhat.com" , David Gibson To: Alex Williamson Return-path: Received: from ch1ehsobe005.messaging.microsoft.com ([216.32.181.185]:30176 "EHLO ch1outboundpool.messaging.microsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755259Ab1H2XOf (ORCPT ); Mon, 29 Aug 2011 19:14:35 -0400 In-Reply-To: <1314658013.2859.399.camel@bling.home> Sender: kvm-owner@vger.kernel.org List-ID: On 08/29/2011 05:46 PM, Alex Williamson wrote: > On Mon, 2011-08-29 at 16:58 -0500, Scott Wood wrote: >> On 08/29/2011 02:51 PM, Alex Williamson wrote: >>> On Mon, 2011-08-29 at 16:51 +0000, Yoder Stuart-B08248 wrote: >>>> The device info records following the file header have the following >>>> record types each with content encoded in a record specific way: >>>> >>>> REGION - describes an addressable address range for the device >>>> DTPATH - describes the device tree path for the device >>>> DTINDEX - describes the index into the related device tree >>>> property (reg,ranges,interrupts,interrupt-map) >>> >>> I don't quite understand if these are physical or virtual. >> >> If what are physical or virtual? > > Can you give an example of a path vs an index? I don't understand > enough about these to ask a useful question about what they're > describing. You'd have both path and index. Example, for this tree: / { ... foo { ... bar { reg = <0x1000 64 0x1800 64>; ranges = <0 0x20000 0x10000>; ... child { reg = <0x100 0x100>; ... }; }; }; }; There would be 4 regions if you bind to /foo/bar: // this is 64 bytes at 0x1000 DTPATH "/foo/bar" DTINDEX prop_type=REG prop_index=0 // this is 64 bytes at 0x1800 DTPATH "/foo/bar" DTINDEX prop_type=REG prop_index=1 // this is 16K at 0x20000 DTPATH "/foo/bar" DTINDEX prop_type=RANGES prop_index=0 // this is 256 bytes at 0x20100 DTPATH "/foo/bar/child" DTINDEX prop_type=REG prop_index=0 Both ranges and the child reg are needed, since ranges could be a simple "ranges;" that passes everything with no translation, and child nodes could be absent-but-implied in some other cases (such as when they represent PCI devices which can be probed -- we still need to map the ranges that correspond to PCI controller windows). >>>> INTERRUPT - describes an interrupt for the device >>>> PCI_CONFIG_SPACE - describes config space for the device >>> >>> I would have expected this to be a REGION with a property of >>> PCI_CONFIG_SPACE. >> >> Could be, if physical address is made optional. > > Or physical address is also a property, aka sub-region. A subrecord of REGION is fine with me. >>> Would we only need to expose phys addr for 1:1 mapping requirements? >>> I'm not sure why we'd care to expose this otherwise. >> >> It's more important for non-PCI, where it avoids the need for userspace >> to parse the device tree to find the guest address (we'll usually want >> 1:1), or to consolidate pages shared by multiple regions. It could be >> nice for debugging, as well. > > So the device tree path is ripped straight from the system, so it's the > actual 1:1, matching physical hardware, path. Yes. >>> Even for non-PCI we need to >>> know if the region is pio/mmio32/mmio64/prefetchable/etc. >> >> Outside of PCI, what standardized form would you put such information >> in? Where would the kernel get this information? What does >> mmio32/mmio64 mean in this context? > > I could imagine a platform device described by ACPI that might want to > differentiate. The physical device doesn't get moved of course, but > guest drivers might care how the device is described if we need to > rebuild those ACPI tables. ACPI might even be a good place to leverage > these data structures... /me ducks. ACPI info could be another subrecord type, but in the device tree system-bus case we generally don't have this information at the generic infrastructure level. Drivers are expected to know how their devices' regions should be mapped. >>> BAR index could really just translate to a REGION instance number. >> >> How would that work if you make non-BAR things (such as config space) >> into regions? > > Put their instance numbers outside of the BAR region? We have a fixed > REGION space on PCI, so we could just define BAR0 == instance 0, BAR1 == > instance 1... ROM == instance 6, CONFIG == instance 0xF (or 7). Seems more awkward than just having each region say what it is. What do you do to fill in the gaps? -Scott