From mboxrd@z Thu Jan 1 00:00:00 1970 From: Manish Jaggi Subject: Re: PCI Pass-through in Xen ARM - Draft 2 Date: Sun, 5 Jul 2015 11:37:49 +0530 Message-ID: <5598C9B5.8090000@caviumnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell , "xen-devel@lists.xen.org" , Prasun Kapoor , Julien Grall , "Kumar, Vijaya" List-Id: xen-devel@lists.xenproject.org >Ian Campbell Wrote: >>On Mon, 2015-06-29 at 00:08 +0530, Manish Jaggi wrote: >> PCI Pass-through in Xen ARM >> -------------------------- >> >> Draft 2 >> >> Index >> >> 1. Background >> >> 2. Basic PCI Support in Xen ARM >> 2.1 pci_hostbridge and pci_hostbridge_ops >> 2.2 PHYSDEVOP_HOSTBRIDGE_ADD hypercall >> >> 3. Dom0 Access PCI devices >> >> 4. DomU assignment of PCI device >> 4.1 Holes in guest memory space >> 4.2 New entries in xenstore for device BARs >> 4.3 Hypercall for bdf mapping noification to xen >> 4.4 Change in Linux PCI FrontEnd - backend driver >> for MSI/X programming >> >> 5. NUMA and PCI passthrough >> >> 6. DomU pci device attach flow >> >> >> Revision History >> ---------------- >> Changes from Draft 1 >> a) map_mmio hypercall removed from earlier draft >> b) device bar mapping into guest not 1:1 >> c) holes in guest address space 32bit / 64bit for MMIO virtual BARs >> d) xenstore device's BAR info addition. >> >> >> 1. Background of PCI passthrough >> -------------------------------- >> Passthrough refers to assigning a pci device to a guest domain (domU) such >> that >> the guest has full control over the device.The MMIO space and interrupts are >> managed by the guest itself, close to how a bare kernel manages a device. >> >> Device's access to guest address space needs to be isolated and protected. >> SMMU >> (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow device >> access guest memory for data transfer and sending MSI/X interrupts. In case of >> MSI/X the device writes to GITS (ITS address space) Interrupt Translation >> Register. >> >> 2. Basic PCI Support for ARM >> ---------------------------- >> The apis to read write from pci configuration space are based on segment:bdf. >> How the sbdf is mapped to a physical address is under the realm of the pci >> host controller. >> >> ARM PCI support in Xen, introduces pci host controller similar to what exists >> in Linux. Each drivers registers callbacks, which are invoked on matching the >> compatible property in pci device tree node. >> >> 2.1: >> The init function in the pci host driver calls to register hostbridge >> callbacks: >> int pci_hostbridge_register(pci_hostbridge_t *pcihb); >> >> struct pci_hostbridge_ops { >> u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn, >> u32 reg, u32 bytes); >> void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn, >> u32 reg, u32 bytes, u32 val); >> }; >> >> struct pci_hostbridge{ >> u32 segno; >> paddr_t cfg_base; >> paddr_t cfg_size; >> struct dt_device_node *dt_node; >> struct pci_hostbridge_ops ops; >> struct list_head list; >> }; >> >> A pci conf read function would internally be as follows: >> u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 reg, u32 bytes) >> { >> pci_hostbridge_t *pcihb; >> list_for_each_entry(pcihb, &pci_hostbridge_list, list) >> { >> if(pcihb->segno == seg) >> return pcihb->ops.pci_conf_read(pcihb, bus, devfn, reg, bytes); >> } >> return -1; >> } >> >> 2.2 PHYSDEVOP_pci_host_bridge_add hypercall >> >> Xen code accesses PCI configuration space based on the sbdf received from the >> guest. The order in which the pci device tree node appear may not be the same >> order of device enumeration in dom0. Thus there needs to be a mechanism to >> bind >> the segment number assigned by dom0 to the pci host controller. The hypercall >> is introduced: >> >> #define PHYSDEVOP_pci_host_bridge_add 44 >> struct physdev_pci_host_bridge_add { >> /* IN */ >> uint16_t seg; >> uint64_t cfg_base; >> uint64_t cfg_size; >> }; >> >> This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add >> hypercall. The handler code invokes to update segment number in >> pci_hostbridge: >> >> int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t >> cfg_size); >> >> Subsequent calls to pci_conf_read/write are completed by the >> pci_hostbridge_ops >> of the respective pci_hostbridge. >> >> 3. Dom0 access PCI device >> --------------------------------- >> As per the design of xen hypervisor, dom0 enumerates the PCI devices. For each >> device the MMIO space has to be mapped in the Stage2 translation for dom0. > >Here "device" is really host bridge, isn't it? i.e. this is done by >mapping the entire MMIO window of each host bridge, not the individual >BAR registers of each device one at a time. No the device means the PCIe EP device not RC. > >IOW this is functionality of the pci host driver's intitial setup, not >something which is driven from the dom0 enumeration of the bus. > >> For >> dom0 xen maps the ranges in pci nodes in stage 2 translation. >> >> GITS_ITRANSLATER space (4k( must be programmed in Stage2 translation so that >> MSI/X >> must work. This is done in vits initialization in dom0/domU. > >This also happens at start of day, but what isn't mentioned is that >(AIUI) the SMMU will need to be programmed to map each SBDF to the dom0 >p2m as the devices are discovered and reported. Right? > Yes, I will add SMMU section in the Draft3. >> >> 4. DomU access / assignment PCI device >> -------------------------------------- >> In the flow of pci-attach device, the toolkit > >I assume you mean "toolstack" throughout? If so then please run >s/toolkit/toolstack/g so as to use the usual terminology. > yes >> will read the pci configuration >> space BAR registers. The toolkit has the guest memory map and the information >> of the MMIO holes. >> >> When the first pci device is assigned to domU, toolkit allocates a virtual >> BAR region from the MMIO hole area. toolkit then sends domctl >> xc_domain_memory_mapping >> to map in stage2 translation. >> >> 4.1 Holes in guest memory space >> ---------------------------- >> Holes are added in the guest memory space for mapping pci device's BAR >> regions. >> These are defined in arch-arm.h >> >> /* For 32bit */ >> GUEST_MMIO_HOLE0_BASE, GUEST_MMIO_HOLE0_SIZE >> >> /* For 64bit */ >> GUEST_MMIO_HOLE1_BASE , GUEST_MMIO_HOLE1_SIZE >> >> 4.2 New entries in xenstore for device BARs >> -------------------------------------------- >> toolkit also updates the xenstore information for the device >> (virtualbar:physical bar). >> This information is read by xenpciback and returned to the pcifront driver >> configuration >> space accesses. >> >> 4.3 Hypercall for bdf mapping notification to xen > ^v (I think) or maybe vs? > >> ----------------------------------------------- >> #define PHYSDEVOP_map_sbdf 43 >> typedef struct { >> u32 s; >> u8 b; >> u8 df; >> u16 res; >> } sbdf_t; >> struct physdev_map_sbdf { >> int domain_id; >> sbdf_t sbdf; >> sbdf_t gsbdf; >> }; >> >> Each domain has a pdev list, which contains the list of all pci devices. The >> pdev structure already has a sbdf information. The arch_pci_dev is updated to >> ------------------------------------------------------------- >> On the Pci frontend bus a msi-parent as gicv3-its is added. > >Are you talking about a device tree property or something else? > Device tree property. xl creates a device tree for domU. It is assumed that the its node be there in domU device treee. >Note that pcifront is not described in the DT, only in the xenstore >structure. So a dt property is unlikely to be the right way to describe >this. > >We need to think of some way of specifying this such that we don't tie >ourselves into a single vits ABI. > Please suggest >> As there is a single >> virtual its for a domU, as there is only a single virtual pci bus in domU. >> This >> ensures that the config_msi calls are handled by the gicv3 its driver in domU >> kernel and not utilizing frontend-backend communication between dom0-domU. >> >> 5. NUMA domU and vITS >> ----------------------------- >> a) On NUMA systems domU still have a single its node. >> b) How can xen identify the ITS on which a device is connected. >> - Using segment number query using api which gives pci host controllers >> device node >> >> struct dt_device_node* pci_hostbridge_dt_node(uint32_t segno) >> >> c) Query the interrupt parent of the pci device node to find out the its. >> >> 6. DomU Bootup flow >> --------------------- >> a. DomU boots up without any pci devices assigned. > >I don't think we can/should rule out cold plug at this stage. IOW it >must be possible to boot a domU with PCI devices already assigned. > As per my understanding the pci front driver receives a notification form xenwatch. Upon which it starts enumeration. see: pcifront_backend_changed() >> A daemon listens to events >> from the xenstore. When a device is attached to domU, the frontend pci bus >> driver >> starts enumerating the devices.Front end driver communicates with backend >> driver >> in dom0 to read the pci config space. > >I'm afraid I don't follow any of this. What "daemon"? Is it in the front >or backend? What does it do with the events it is listening for? > xenwatch >> >> b. backend driver returns the virtual BAR ranges which are already mapped in >> domU >> stage 2 translation. >> >> c. Device driver of the specific pci device invokes methods to configure the >> msi/x interrupt which are handled by the its driver in domU kernel. The >> read/writes >> by the its driver are trapped in xen. ITS driver finds out the actual sbdf >> based >> on the map_sbdf hypercall information. > >