xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Gordan Bobic <gordan@bobich.net>
To: Jan Beulich <JBeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Feng Wu <feng.wu@intel.com>,
	xen-devel@lists.xen.org
Subject: Re: Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i))
Date: Tue, 07 Jan 2014 12:42:17 +0000	[thread overview]
Message-ID: <5dcec6d652a27688050262f949e9dc9e@mail.shatteredsilicon.net> (raw)
In-Reply-To: <52CBFDDD020000780011112C@nat28.tlf.novell.com>

On 2014-01-07 12:15, Jan Beulich wrote:
>>>> On 07.01.14 at 12:35, Gordan Bobic <gordan@bobich.net> wrote:
>> On 2014-01-07 11:26, Wu, Feng wrote:
>>>> -----Original Message-----
>>>> From: xen-devel-bounces@lists.xen.org
>>>> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Gordan Bobic
>>>> Sent: Tuesday, January 07, 2014 6:44 PM
>>>> To: Andrew Cooper
>>>> Cc: xen-devel@lists.xen.org
>>>> Subject: Re: [Xen-devel] Multi-bridged PCIe devices (Was: Re:
>>>> iommuu/vt-d
>>>> issues with LSI MegaSAS (PERC5i))
>>>> 
>>>> On 2014-01-07 10:38, Andrew Cooper wrote:
>>>> > On 07/01/14 10:35, Gordan Bobic wrote:
>>>> >> On 2014-01-07 03:17, Zhang, Yang Z wrote:
>>>> >>> Konrad Rzeszutek Wilk wrote on 2014-01-07:
>>>> >>>>> Which would look like this:
>>>> >>>>>
>>>> >>>>> C220 ---> Tundra Bridge -----> (HB6 PCI bridge -> Brooktree BDFs)
>>>> >>>>> on the card
>>>> >>>>>           \--------------> IEEE-1394a
>>>> >>>>>
>>>> >>>>> I am actually wondering if this 07:00.0 device is the one that
>>>> >>>>> reports itself as 08:00.0 (which I think is what you alluding to
>>>> >>>>> Jan)
>>>> >>>>>
>>>> >>>>
>>>> >>>> And to double check that theory I decided to pass in the IEEE-1394a
>>>> >>>> to a guest:
>>>> >>>>
>>>> >>>>            +-1c.5-[07-08]----00.0-[08]----03.0  Texas Instruments
>>>> >>>> TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
>>>> >>>>
>>>> >>>>
>>>> >>>> (XEN) [VT-D]iommu.c:885: iommu_fault_status: Fault Overflow (XEN)
>>>> >>>> [VT-D]iommu.c:887: iommu_fault_status: Primary Pending Fault (XEN)
>>>> >>>> [VT-D]iommu.c:865: DMAR:[DMA Read] Request device [0000:08:00.0]
>>>> >>>> fault
>>>> >>>> addr 370f1000, iommu reg = ffff82c3ffd53000 (XEN) DMAR:[fault reason
>>>> >>>> 02h] Present bit in context entry is clear (XEN) print_vtd_entries:
>>>> >>>> iommu ffff83083d4939b0 dev 0000:08:00.0 gmfn 370f1 (XEN)
>>>> >>>> root_entry
>>>> >>>> = ffff83083d47f000 (XEN)     root_entry[8] = 72569b001 (XEN)
>>>> >>>> context
>>>> >>>> = ffff83072569b000 (XEN)     context[0] = 0_0 (XEN)
>>>> >>>> ctxt_entry[0]
>>>> >>>> not present
>>>> >>>>
>>>> >>>> So, capture card OK - Likely the Tundra bridge has an issue:
>>>> >>>>
>>>> >>>> 07:00.0 PCI bridge: Tundra Semiconductor Corp. Device 8113 (rev 01)
>>>> >>>> (prog-if 01 [Subtractive decode])
>>>> >>>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV-
>>>> VGASnoop-
>>>> >>>>         ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+
>>>> >>>> 66MHz-
>>>> >>>>         UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
>>>> <MAbort+
>>>> >>>>         >SERR- <PERR- INTx- Latency: 0 Bus: primary=07,
>>>> >>>> secondary=08,
>>>> >>>>         subordinate=08, sec-latency=32 Memory behind bridge:
>>>> >>>>         f0600000-f06fffff Secondary status: 66MHz+ FastB2B+ ParErr-
>>>> >>>>         DEVSEL=medium TAbort- <TAbort- <MAbort+ <SERR- <PERR-
>>>> >>>> BridgeCtl:
>>>> >>>>         Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>>>> >>>>                 PriDiscTmr- SecDiscTmr- DiscTmrStat-
>>>> DiscTmrSERREn-
>>>> >>>>         Capabilities: [60] Subsystem: Super Micro Computer Inc
>>>> >>>> Device 0805
>>>> >>>>         Capabilities: [a0] Power Management version 3
>>>> >>>>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>>>> >>>>                 PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0
>>>> >>>> NoSoftRst+
>>>> >>>>                 PME-Enable- DSel=0 DScale=0 PME-
>>>> >>>>
>>>> >>>> or there is some unknown bridge in the motherboard.
>>>> >>>
>>>> >>> According your description above, the upstream Linux should also have
>>>> >>> the same problem. Did you see it with upstream Linux?
>>>> >>
>>>> >> The problem I was seeing with LSI cards (phantom device doing DMA)
>>>> >> does, indeed, also occur in upstream Linux. If I enable intel-iommu on
>>>> >> bare metal Linux, the same problem occurs as with Xen.
>>>> >>
>>>> >>> There may be some buggy device that generate DMA request with
>>>> >>> internal
>>>> >>> BDF but it didn't expose it(not like Phantom device). For those
>>>> >>> devices, I think we need to setup the VT-d page table manually.
>>>> >>
>>>> >> I think what is needed is a pci-phantom style override that tells the
>>>> >> hypervisor to tell the IOMMU to allow DMA traffic from a specific
>>>> >> invisible device ID.
>>>> >>
>>>> >> Gordan
>>>> >
>>>> > There is.  See "pci-phantom" in
>>>> > http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html
>>>> 
>>>> I thought this was only applicable to phantom _functions_ (number
>>>> after
>>>> the
>>>> dot) rather than whole phantom _devices_. Is that not the case?
>>> 
>>> I think that's right. I go through the related code for the pci
>>> phantom device just now, I find that
>>> the information of command line 'pci-phantom' is stored in variable '
>>> phantom_devs[8] '
>>> with type of s truct phantom_dev{}. This variable is used in function
>>> alloc_pdev() as follow:
>>> 
>>> 
>>>                 for ( i = 0; i < nr_phantom_devs; ++i )
>>>                     if ( phantom_devs[i].seg == pseg->nr &&
>>>                          phantom_devs[i].bus == bus &&
>>>                          phantom_devs[i].slot == PCI_SLOT(devfn) &&
>>>                          phantom_devs[i].stride > PCI_FUNC(devfn) )
>>>                     {
>>>                         pdev->phantom_stride = 
>>> phantom_devs[i].stride;
>>>                         break;
>>>                     }
>>> 
>>> So from the code, we can see this command line only works for phantom
>>> _function_, not for whole phantom _devices_.
>> 
>> What would it take to make it work for a whole phantom device?
> 
> First and foremost a definition of what a phantom device is and
> how one would behave. Once again - phantom functions are part
> of the PCIe specification, so those don't require a definition.

Konrad's patch from a while back seemed to do the required thing to
allow an otherwise invisible/undetected device to do DMA transfers
without freaking out the IOMMU that doesn't know about it.

  reply	other threads:[~2014-01-07 12:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07 11:26 Multi-bridged PCIe devices (Was: Re: iommuu/vt-d issues with LSI MegaSAS (PERC5i)) Wu, Feng
2014-01-07 11:35 ` Gordan Bobic
2014-01-07 12:15   ` Jan Beulich
2014-01-07 12:42     ` Gordan Bobic [this message]
2014-01-07 14:38       ` Konrad Rzeszutek Wilk
2014-01-07 14:47         ` Jan Beulich
2014-01-07 15:40           ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2013-09-11 11:05 Gordan Bobic
2013-09-11 11:25 ` Gordan Bobic
2013-09-11 11:44   ` Gordan Bobic
2013-09-11 11:57     ` Jan Beulich
2013-09-11 12:19       ` Gordan Bobic
2013-09-11 12:56         ` Pasi Kärkkäinen
2013-09-11 11:53 ` Jan Beulich
2013-09-11 12:14   ` Gordan Bobic
2013-09-11 12:31     ` Jan Beulich
2013-09-11 12:45       ` Gordan Bobic
2013-09-11 13:03         ` Jan Beulich
2013-09-11 13:10           ` Gordan Bobic
2013-09-11 13:22             ` Jan Beulich
2013-09-11 13:26               ` Gordan Bobic
2013-09-11 13:36                 ` Jan Beulich
2013-09-12  6:20                   ` Zhang, Yang Z
2013-12-11 18:32                     ` Konrad Rzeszutek Wilk
2013-12-11 21:15                       ` Gordan Bobic
2013-12-11 21:30                         ` Konrad Rzeszutek Wilk
2013-12-13 11:13                           ` Jan Beulich
2013-12-13 14:43                             ` Konrad Rzeszutek Wilk
2013-12-13 14:56                               ` Jan Beulich
2013-12-13 15:27                                 ` Gordan Bobic
2014-01-06 20:26                                   ` Konrad Rzeszutek Wilk
2014-01-06 21:45                                     ` Konrad Rzeszutek Wilk
2014-01-07  3:17                                       ` Zhang, Yang Z
2014-01-07 10:35                                         ` Gordan Bobic
2014-01-07 10:38                                           ` Andrew Cooper
2014-01-07 10:44                                             ` Gordan Bobic
2014-02-21 19:08                                               ` Konrad Rzeszutek Wilk
2014-02-24 10:14                                                 ` Jan Beulich
2013-09-11 13:23           ` Gordan Bobic
2013-09-11 13:34             ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5dcec6d652a27688050262f949e9dc9e@mail.shatteredsilicon.net \
    --to=gordan@bobich.net \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=feng.wu@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).