From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i)) Date: Tue, 07 Jan 2014 12:42:17 +0000 Message-ID: <5dcec6d652a27688050262f949e9dc9e@mail.shatteredsilicon.net> References: <6748185fb950f1aca45678675dc87b0f@mail.shatteredsilicon.net> <52CBFDDD020000780011112C@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <52CBFDDD020000780011112C@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Andrew Cooper , Feng Wu , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 2014-01-07 12:15, Jan Beulich wrote: >>>> On 07.01.14 at 12:35, Gordan Bobic wrote: >> On 2014-01-07 11:26, Wu, Feng wrote: >>>> -----Original Message----- >>>> From: xen-devel-bounces@lists.xen.org >>>> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Gordan Bobic >>>> Sent: Tuesday, January 07, 2014 6:44 PM >>>> To: Andrew Cooper >>>> Cc: xen-devel@lists.xen.org >>>> Subject: Re: [Xen-devel] Multi-bridged PCIe devices (Was: Re: >>>> iommuu/vt-d >>>> issues with LSI MegaSAS (PERC5i)) >>>> >>>> On 2014-01-07 10:38, Andrew Cooper wrote: >>>> > On 07/01/14 10:35, Gordan Bobic wrote: >>>> >> On 2014-01-07 03:17, Zhang, Yang Z wrote: >>>> >>> Konrad Rzeszutek Wilk wrote on 2014-01-07: >>>> >>>>> Which would look like this: >>>> >>>>> >>>> >>>>> C220 ---> Tundra Bridge -----> (HB6 PCI bridge -> Brooktree BDFs) >>>> >>>>> on the card >>>> >>>>> \--------------> IEEE-1394a >>>> >>>>> >>>> >>>>> I am actually wondering if this 07:00.0 device is the one that >>>> >>>>> reports itself as 08:00.0 (which I think is what you alluding to >>>> >>>>> Jan) >>>> >>>>> >>>> >>>> >>>> >>>> And to double check that theory I decided to pass in the IEEE-1394a >>>> >>>> to a guest: >>>> >>>> >>>> >>>> +-1c.5-[07-08]----00.0-[08]----03.0 Texas Instruments >>>> >>>> TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx] >>>> >>>> >>>> >>>> >>>> >>>> (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow (XEN) >>>> >>>> [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault (XEN) >>>> >>>> [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] >>>> >>>> fault >>>> >>>> addr 370f1000, iommu reg = ffff82c3ffd53000 (XEN) DMAR:[fault reason >>>> >>>> 02h] Present bit in context entry is clear (XEN) print_vtd_entries: >>>> >>>> iommu ffff83083d4939b0 dev 0000:08:00.0 gmfn 370f1 (XEN) >>>> >>>> root_entry >>>> >>>> = ffff83083d47f000 (XEN) root_entry[8] = 72569b001 (XEN) >>>> >>>> context >>>> >>>> = ffff83072569b000 (XEN) context[0] = 0_0 (XEN) >>>> >>>> ctxt_entry[0] >>>> >>>> not present >>>> >>>> >>>> >>>> So, capture card OK - Likely the Tundra bridge has an issue: >>>> >>>> >>>> >>>> 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01) >>>> >>>> (prog-if 01 [Subtractive decode]) >>>> >>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- >>>> VGASnoop- >>>> >>>> ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ >>>> >>>> 66MHz- >>>> >>>> UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>> >>> >>>> >SERR- >>> >>>> secondary=08, >>>> >>>> subordinate=08, sec-latency=32 Memory behind bridge: >>>> >>>> f0600000-f06fffff Secondary status: 66MHz+ FastB2B+ ParErr- >>>> >>>> DEVSEL=medium TAbort- >>> >>>> BridgeCtl: >>>> >>>> Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- >>>> >>>> PriDiscTmr- SecDiscTmr- DiscTmrStat- >>>> DiscTmrSERREn- >>>> >>>> Capabilities: [60] Subsystem: Super Micro Computer Inc >>>> >>>> Device 0805 >>>> >>>> Capabilities: [a0] Power Management version 3 >>>> >>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA >>>> >>>> PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 >>>> >>>> NoSoftRst+ >>>> >>>> PME-Enable- DSel=0 DScale=0 PME- >>>> >>>> >>>> >>>> or there is some unknown bridge in the motherboard. >>>> >>> >>>> >>> According your description above, the upstream Linux should also have >>>> >>> the same problem. Did you see it with upstream Linux? >>>> >> >>>> >> The problem I was seeing with LSI cards (phantom device doing DMA) >>>> >> does, indeed, also occur in upstream Linux. If I enable intel-iommu on >>>> >> bare metal Linux, the same problem occurs as with Xen. >>>> >> >>>> >>> There may be some buggy device that generate DMA request with >>>> >>> internal >>>> >>> BDF but it didn't expose it(not like Phantom device). For those >>>> >>> devices, I think we need to setup the VT-d page table manually. >>>> >> >>>> >> I think what is needed is a pci-phantom style override that tells the >>>> >> hypervisor to tell the IOMMU to allow DMA traffic from a specific >>>> >> invisible device ID. >>>> >> >>>> >> Gordan >>>> > >>>> > There is. See "pci-phantom" in >>>> > http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html >>>> >>>> I thought this was only applicable to phantom _functions_ (number >>>> after >>>> the >>>> dot) rather than whole phantom _devices_. Is that not the case? >>> >>> I think that's right. I go through the related code for the pci >>> phantom device just now, I find that >>> the information of command line 'pci-phantom' is stored in variable ' >>> phantom_devs[8] ' >>> with type of s truct phantom_dev{}. This variable is used in function >>> alloc_pdev() as follow: >>> >>> >>> for ( i = 0; i < nr_phantom_devs; ++i ) >>> if ( phantom_devs[i].seg == pseg->nr && >>> phantom_devs[i].bus == bus && >>> phantom_devs[i].slot == PCI_SLOT(devfn) && >>> phantom_devs[i].stride > PCI_FUNC(devfn) ) >>> { >>> pdev->phantom_stride = >>> phantom_devs[i].stride; >>> break; >>> } >>> >>> So from the code, we can see this command line only works for phantom >>> _function_, not for whole phantom _devices_. >> >> What would it take to make it work for a whole phantom device? > > First and foremost a definition of what a phantom device is and > how one would behave. Once again - phantom functions are part > of the PCIe specification, so those don't require a definition. Konrad's patch from a while back seemed to do the required thing to allow an otherwise invisible/undetected device to do DMA transfers without freaking out the IOMMU that doesn't know about it.