From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i)) Date: Wed, 11 Dec 2013 21:15:17 +0000 Message-ID: <52A8D5E5.2030902@bobich.net> References: <523075C502000078000F2617@nat28.tlf.novell.com> <9bd7c39084e6b264dda5b6ac256ea97b@mail.shatteredsilicon.net> <52307EAA02000078000F26B1@nat28.tlf.novell.com> <3f1e678224a3f94125b5050b794882a8@mail.shatteredsilicon.net> <5230863202000078000F2712@nat28.tlf.novell.com> <52308ACB02000078000F272A@nat28.tlf.novell.com> <387b2f80a3866e53ec471421558cf4de@mail.shatteredsilicon.net> <52308E1402000078000F2748@nat28.tlf.novell.com> <20131211183233.GA2760@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Vqr7p-0001OZ-Ri for xen-devel@lists.xenproject.org; Wed, 11 Dec 2013 21:15:22 +0000 In-Reply-To: <20131211183233.GA2760@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: "Zhang, Yang Z" , "xen-devel@lists.xenproject.org" , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 12/11/2013 06:32 PM, Konrad Rzeszutek Wilk wrote: > On Thu, Sep 12, 2013 at 06:20:18AM +0000, Zhang, Yang Z wrote: >> Jan Beulich wrote on 2013-09-11: >>>>>> On 11.09.13 at 15:26, Gordan Bobic wrote: >>>> On Wed, 11 Sep 2013 14:22:51 +0100, "Jan Beulich" >>>> >>>> wrote: >>>>>>>> On 11.09.13 at 15:10, Gordan Bobic wrote: >>>>>> On Wed, 11 Sep 2013 14:03:14 +0100, "Jan Beulich" >>>>>> >>>>>> wrote: >>>>>>>>>> On 11.09.13 at 14:45, Gordan Bobic wrote: >>>>>>>> dmesg, xl dmesg, lspci -vvvnn and lspci -tvnn output is attached. >>>>>>>> >>>>>>>> I'll try adding one of my LSI cards and see the comparative >>>>>>>> behaviour. Right now I don't even know if the phantom device is >>>>>>>> on the SAS card or the motherboard. >>>>>>> >>>>>>> The Adaptec card being the only thing on bus 0f makes it pretty >>>>>>> likely that this other device also is on that card. >>>>>>> >>>>>>> I guess the issue is mainly because the device itself is a PCI >>>>>>> one, while the immediately upstream bridge (where I mean only the >>>>>>> visible one) is PCIe. There _must_ be a PCIe-PCI bridge between >>>>>>> them. And as long as firmware doesn't know about that bridge and >>>>>>> the bridge doesn't properly handle config space accesses to it, >>>>>>> such a device just can't be used with an IOMMU (without some yet >>>>>>> to be invented workaround). >>>>>>> >>>>>> I'm actually thinking about Konrad's proposed hack in that >>>>>> thread from 3 years ago. If the device IDs are parameterized out >>>>>> rather than hard-coded, then this could work in nearly the same >>>>>> was as xen-pciback in terms of usage. Pass the phantom device IDs >>>>>> as parameters to the module. Done that way it might even be >>>>>> considered clean enough to be fit for public consumption. >>>>> >>>>> Except that, short of being able to determine it via config space >>>>> reads, we also need the resulting command line option to tell us >>>>> that what kind of device that is. >>>>> >>>> Not sure I follow. Why do we need to know the device type? >>> >>> Just look at set_msi_source_id() as well as >>> domain_context_{mapping,unmap}() (just the most prominent >>> examples): Behavior here heavily depends on the type of the device >>> itself _and_ that of the upstream bridge(s). >> Looks like there are many devices are failed to work. I wonder whether the PCI/PCIe specification tells how to detect the hidden device behind those devices (Like detection of phantom device). If not, I think those devices are buggy. Or we can say those devices are not really PCI/PCIe compatible. Since VT-d only covers the PCI/PCIe device, it's reasonable that non-PCI/PCIe device failed to work under VT-d. >> >> As Jan's suggestion, we need the user to tell us whether there is a hidden device or BDF behind anther device that the OS is unaware. We need to pass that info to Xen before pass-thought the device. >> > > Interestingly enough I just hit this with my brand-new Haswell CPU and > new motherboard when passing in a capture card. It shows: > > +-1c.5-[07-09]----00.0-[08-09]--+-01.0-[09]--+-08.0 Brooktree Corporation Bt878 Video Capture > | | +-08.1 Brooktree Corporation Bt878 Audio Capture > | | +-09.0 Brooktree Corporation Bt878 Video Capture > | | +-09.1 Brooktree Corporation Bt878 Audio Capture > | | +-0a.0 Brooktree Corporation Bt878 Video Capture > | | +-0a.1 Brooktree Corporation Bt878 Audio Capture > | | +-0b.0 Brooktree Corporation Bt878 Video Capture > | | \-0b.1 Brooktree Corporation Bt878 Audio Capture > | \-03.0 Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx] > > And Xen says: > (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow > (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault > (XEN) [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] fault addr 36aa3000, iommu reg = ffff82c3ffd53000 > (XEN) DMAR:[fault reason 02h] Present bit in context entry is clear > (XEN) print_vtd_entries: iommu ffff83083d4939b0 dev 0000:08:00.0 gmfn 36aa3 > (XEN) root_entry = ffff83083d47e000 > (XEN) root_entry[8] = 72569a001 > (XEN) context = ffff83072569a000 > (XEN) context[0] = 0_0 > (XEN) ctxt_entry[0] not present > (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow > (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault > (XEN) [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] fault addr 36aa3000, iommu reg = ffff82c3ffd53000 > > > Oddly enough it was working fine in a box with an AMD IOMMU. But > to be fair - that machine was running with Xen 4.1. > > The hack I developed: http://lists.xen.org/archives/html/xen-devel/2010-06/msg00093.html > ends up with this: > > (XEN) alloc_pdev: unknown type: 0000:08:00.0 > (XEN) [VT-D]iommu.c:1484: d0:unknown(0): 0000:08:00.0 > (XEN) [VT-D]iommu.c:1888: d0: context mapping failed > > (FYI, this Xen 4.3.1) > > Let me retry on the AMD box with the same version of Xen. I may be wrong, but this doesn't look like the same problem (phantom PCI device on the bus). Or am I missing something? As far as I can tell, the original problem was arising on cards that are PCIe, but based on a PCIX chipset, i.e. with a PCIe-PCIX bridge. Xen wasn't the only thing affected in my case - bare metal Linux kernel was also having problems with intel-iommu=1 in the kernel boot parameters. If might be worth trying that with your card to see what happens. If bare metal Linux with intel-iommu=1 works for your card, it's probably not the same problem (of course it could be similar/related). Out of interest, I noticed recently there is a xen parameter "pci-phantom", but I haven't been able to find documentation for it. Can you point me in the right direction? Does it, perchance, allow specifying the PCI slot ID of a phantom device so that IOMMU doesn't freak out when a seemingly non-existant device starts trying to do DMA? Gordan