xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Gordan Bobic <gordan@bobich.net>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: "Zhang, Yang Z" <yang.z.zhang@intel.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i))
Date: Wed, 11 Dec 2013 21:15:17 +0000	[thread overview]
Message-ID: <52A8D5E5.2030902@bobich.net> (raw)
In-Reply-To: <20131211183233.GA2760@phenom.dumpdata.com>

On 12/11/2013 06:32 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 12, 2013 at 06:20:18AM +0000, Zhang, Yang Z wrote:
>> Jan Beulich wrote on 2013-09-11:
>>>>>> On 11.09.13 at 15:26, Gordan Bobic <gordan@bobich.net> wrote:
>>>> On Wed, 11 Sep 2013 14:22:51 +0100, "Jan Beulich"
>>>> <JBeulich@suse.com>
>>>>   wrote:
>>>>>>>> On 11.09.13 at 15:10, Gordan Bobic <gordan@bobich.net> wrote:
>>>>>> On Wed, 11 Sep 2013 14:03:14 +0100, "Jan Beulich"
>>>>>> <JBeulich@suse.com>
>>>>>>   wrote:
>>>>>>>>>> On 11.09.13 at 14:45, Gordan Bobic <gordan@bobich.net> wrote:
>>>>>>>>   dmesg, xl dmesg, lspci -vvvnn and lspci -tvnn output is attached.
>>>>>>>>
>>>>>>>>   I'll try adding one of my LSI cards and see the comparative
>>>>>>>> behaviour. Right now I don't even know if the phantom device  is
>>>>>>>> on the SAS card or the motherboard.
>>>>>>>
>>>>>>> The Adaptec card being the only thing on bus 0f makes it pretty
>>>>>>> likely that this other device also is on that card.
>>>>>>>
>>>>>>> I guess the issue is mainly because the device itself is a PCI
>>>>>>> one, while the immediately upstream bridge (where I mean only the
>>>>>>> visible one) is PCIe. There _must_ be a PCIe-PCI bridge between
>>>>>>> them. And as long as firmware doesn't know about that bridge and
>>>>>>> the bridge doesn't properly handle config space accesses to it,
>>>>>>> such a device just can't be used with an IOMMU (without some yet
>>>>>>> to be invented workaround).
>>>>>>>
>>>>>>   I'm actually thinking about Konrad's proposed hack in that
>>>>>> thread from 3 years ago. If the device IDs are parameterized  out
>>>>>> rather than hard-coded, then this could work in nearly the  same
>>>>>> was as xen-pciback in terms of usage. Pass the phantom  device IDs
>>>>>> as parameters to the module. Done that way it  might even be
>>>>>> considered clean enough to be fit for public  consumption.
>>>>>
>>>>> Except that, short of being able to determine it via config space
>>>>> reads, we also need the resulting command line option to tell us
>>>>> that what kind of device that is.
>>>>>
>>>>   Not sure I follow. Why do we need to know the device type?
>>>
>>> Just look at set_msi_source_id() as well as
>>> domain_context_{mapping,unmap}() (just the most prominent
>>> examples): Behavior here heavily depends on the type of the device
>>> itself _and_ that of the upstream bridge(s).
>> Looks like there are many devices are failed to work. I wonder whether the PCI/PCIe specification tells how to detect the hidden device behind those devices (Like detection of phantom device). If not, I think those devices are buggy. Or we can say those devices are not really PCI/PCIe compatible. Since VT-d only covers the PCI/PCIe device, it's reasonable that non-PCI/PCIe device failed to work under VT-d.
>>
>> As Jan's suggestion, we need the user to tell us whether there is a hidden device or BDF behind anther device that the OS is unaware. We need to pass that info to Xen before pass-thought the device.
>>
>
> Interestingly enough I just hit this with my brand-new Haswell CPU and
> new motherboard when passing in a capture card. It shows:
>
>      +-1c.5-[07-09]----00.0-[08-09]--+-01.0-[09]--+-08.0  Brooktree Corporation Bt878 Video Capture
>             |                               |            +-08.1  Brooktree Corporation Bt878 Audio Capture
>             |                               |            +-09.0  Brooktree Corporation Bt878 Video Capture
>             |                               |            +-09.1  Brooktree Corporation Bt878 Audio Capture
>             |                               |            +-0a.0  Brooktree Corporation Bt878 Video Capture
>             |                               |            +-0a.1  Brooktree Corporation Bt878 Audio Capture
>             |                               |            +-0b.0  Brooktree Corporation Bt878 Video Capture
>             |                               |            \-0b.1  Brooktree Corporation Bt878 Audio Capture
>             |                               \-03.0  Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
>
> And Xen says:
> (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
> (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
> (XEN) [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] fault addr 36aa3000, iommu reg = ffff82c3ffd53000
> (XEN) DMAR:[fault reason 02h] Present bit in context entry is clear
> (XEN) print_vtd_entries: iommu ffff83083d4939b0 dev 0000:08:00.0 gmfn 36aa3
> (XEN)     root_entry = ffff83083d47e000
> (XEN)     root_entry[8] = 72569a001
> (XEN)     context = ffff83072569a000
> (XEN)     context[0] = 0_0
> (XEN)     ctxt_entry[0] not present
> (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow
> (XEN) [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault
> (XEN) [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0] fault addr 36aa3000, iommu reg = ffff82c3ffd53000
>
>
> Oddly enough it was working fine in a box with an AMD IOMMU. But
> to be fair - that machine was running with Xen 4.1.
>
> The hack I developed: http://lists.xen.org/archives/html/xen-devel/2010-06/msg00093.html
> ends up with this:
>
> (XEN) alloc_pdev: unknown type: 0000:08:00.0
> (XEN) [VT-D]iommu.c:1484: d0:unknown(0): 0000:08:00.0
> (XEN) [VT-D]iommu.c:1888: d0: context mapping failed
>
> (FYI, this Xen 4.3.1)
>
> Let me retry on the AMD box with the same version of Xen.

I may be wrong, but this doesn't look like the same problem (phantom PCI 
device on the bus). Or am I missing something?

As far as I can tell, the original problem was arising on cards that are 
PCIe, but based on a PCIX chipset, i.e. with a PCIe-PCIX bridge. Xen 
wasn't the only thing affected in my case - bare metal Linux kernel was 
also having problems with intel-iommu=1 in the kernel boot parameters. 
If might be worth trying that with your card to see what happens. If 
bare metal Linux with intel-iommu=1 works for your card, it's probably 
not the same problem (of course it could be similar/related).

Out of interest, I noticed recently there is a xen parameter 
"pci-phantom", but I haven't been able to find documentation for it. Can 
you point me in the right direction? Does it, perchance, allow 
specifying the PCI slot ID of a phantom device so that IOMMU doesn't 
freak out when a seemingly non-existant device starts trying to do DMA?

Gordan

  reply	other threads:[~2013-12-11 21:15 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-11 11:05 Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i)) Gordan Bobic
2013-09-11 11:25 ` Gordan Bobic
2013-09-11 11:44   ` Gordan Bobic
2013-09-11 11:57     ` Jan Beulich
2013-09-11 12:19       ` Gordan Bobic
2013-09-11 12:56         ` Pasi Kärkkäinen
2013-09-11 11:53 ` Jan Beulich
2013-09-11 12:14   ` Gordan Bobic
2013-09-11 12:31     ` Jan Beulich
2013-09-11 12:45       ` Gordan Bobic
2013-09-11 13:03         ` Jan Beulich
2013-09-11 13:10           ` Gordan Bobic
2013-09-11 13:22             ` Jan Beulich
2013-09-11 13:26               ` Gordan Bobic
2013-09-11 13:36                 ` Jan Beulich
2013-09-12  6:20                   ` Zhang, Yang Z
2013-12-11 18:32                     ` Konrad Rzeszutek Wilk
2013-12-11 21:15                       ` Gordan Bobic [this message]
2013-12-11 21:30                         ` Konrad Rzeszutek Wilk
2013-12-13 11:13                           ` Jan Beulich
2013-12-13 14:43                             ` Konrad Rzeszutek Wilk
2013-12-13 14:56                               ` Jan Beulich
2013-12-13 15:27                                 ` Gordan Bobic
2014-01-06 20:26                                   ` Konrad Rzeszutek Wilk
2014-01-06 21:45                                     ` Konrad Rzeszutek Wilk
2014-01-07  3:17                                       ` Zhang, Yang Z
2014-01-07 10:35                                         ` Gordan Bobic
2014-01-07 10:38                                           ` Andrew Cooper
2014-01-07 10:44                                             ` Gordan Bobic
2014-02-21 19:08                                               ` Konrad Rzeszutek Wilk
2014-02-24 10:14                                                 ` Jan Beulich
2013-09-11 13:23           ` Gordan Bobic
2013-09-11 13:34             ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2014-01-07 11:26 Wu, Feng
2014-01-07 11:35 ` Gordan Bobic
2014-01-07 12:15   ` Jan Beulich
2014-01-07 12:42     ` Gordan Bobic
2014-01-07 14:38       ` Konrad Rzeszutek Wilk
2014-01-07 14:47         ` Jan Beulich
2014-01-07 15:40           ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52A8D5E5.2030902@bobich.net \
    --to=gordan@bobich.net \
    --cc=JBeulich@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).