* 3.2rc1: bootup fails: DRHD: handling fault status reg 2 @ 2011-11-14 22:33 Arnd Hannemann 2011-11-15 0:53 ` Arnd Hannemann 0 siblings, 1 reply; 8+ messages in thread From: Arnd Hannemann @ 2011-11-14 22:33 UTC (permalink / raw) To: linux-kernel Hi, when trying to boot kernel 3.2rc1 on my thinkpad t510 I get an endless loop of errors: DRHD: handling fault status reg 2 DMAR: [DMA Read] Request device [0d:00.0] fault addr fffff000 DMAR: [fault reason 02] Present bit in context entry is clear screenshot can be found here: http://arndnet.de/lkml/screenshot3.2rc1.jpg kernel 3.1.1 is booting up flawlessly. Any idea? Best regards Arnd ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2rc1: bootup fails: DRHD: handling fault status reg 2 2011-11-14 22:33 3.2rc1: bootup fails: DRHD: handling fault status reg 2 Arnd Hannemann @ 2011-11-15 0:53 ` Arnd Hannemann 2011-11-15 4:31 ` Robert Hancock 2011-11-15 4:36 ` Chris Wright 0 siblings, 2 replies; 8+ messages in thread From: Arnd Hannemann @ 2011-11-15 0:53 UTC (permalink / raw) To: linux-kernel; +Cc: dwmw2, iommu Hi, Am 14.11.2011 23:33, schrieb Arnd Hannemann: > when trying to boot kernel 3.2rc1 on my thinkpad t510 I get an endless loop of errors: > > DRHD: handling fault status reg 2 > DMAR: [DMA Read] Request device [0d:00.0] fault addr fffff000 > DMAR: [fault reason 02] Present bit in context entry is clear > > screenshot can be found here: > http://arndnet.de/lkml/screenshot3.2rc1.jpg > > kernel 3.1.1 is booting up flawlessly. I must have inadvertently enabled CONFIG_INTEL_IOMMU_DEFAULT_ON in my config for 3.2-rc1. With disabled CONFIG_INTEL_IOMMU_DEFAULT_ON my thinkpad boots up again. Not sure if this is expected? Best regards Arnd ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2rc1: bootup fails: DRHD: handling fault status reg 2 2011-11-15 0:53 ` Arnd Hannemann @ 2011-11-15 4:31 ` Robert Hancock 2011-11-15 6:29 ` Arnd Hannemann 2011-11-15 4:36 ` Chris Wright 1 sibling, 1 reply; 8+ messages in thread From: Robert Hancock @ 2011-11-15 4:31 UTC (permalink / raw) To: Arnd Hannemann; +Cc: linux-kernel, dwmw2, iommu On 11/14/2011 06:53 PM, Arnd Hannemann wrote: > Hi, > > Am 14.11.2011 23:33, schrieb Arnd Hannemann: > >> when trying to boot kernel 3.2rc1 on my thinkpad t510 I get an endless loop of errors: >> >> DRHD: handling fault status reg 2 >> DMAR: [DMA Read] Request device [0d:00.0] fault addr fffff000 >> DMAR: [fault reason 02] Present bit in context entry is clear >> >> screenshot can be found here: >> http://arndnet.de/lkml/screenshot3.2rc1.jpg >> >> kernel 3.1.1 is booting up flawlessly. > > I must have inadvertently enabled CONFIG_INTEL_IOMMU_DEFAULT_ON in my config > for 3.2-rc1. > > With disabled CONFIG_INTEL_IOMMU_DEFAULT_ON my thinkpad boots up again. > Not sure if this is expected? No, that's not supposed to happen. Can you post the output of "lspci -vv"? Apparently that device 0d:00.0 is generating unexpected DMA accesses for some reason. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2rc1: bootup fails: DRHD: handling fault status reg 2 2011-11-15 4:31 ` Robert Hancock @ 2011-11-15 6:29 ` Arnd Hannemann 0 siblings, 0 replies; 8+ messages in thread From: Arnd Hannemann @ 2011-11-15 6:29 UTC (permalink / raw) To: Robert Hancock; +Cc: linux-kernel, dwmw2, iommu Am 15.11.2011 05:31, schrieb Robert Hancock: > On 11/14/2011 06:53 PM, Arnd Hannemann wrote: >> Hi, >> >> Am 14.11.2011 23:33, schrieb Arnd Hannemann: >> >>> when trying to boot kernel 3.2rc1 on my thinkpad t510 I get an endless loop of errors: >>> >>> DRHD: handling fault status reg 2 >>> DMAR: [DMA Read] Request device [0d:00.0] fault addr fffff000 >>> DMAR: [fault reason 02] Present bit in context entry is clear >>> >>> screenshot can be found here: >>> http://arndnet.de/lkml/screenshot3.2rc1.jpg >>> >>> kernel 3.1.1 is booting up flawlessly. >> >> I must have inadvertently enabled CONFIG_INTEL_IOMMU_DEFAULT_ON in my config >> for 3.2-rc1. >> >> With disabled CONFIG_INTEL_IOMMU_DEFAULT_ON my thinkpad boots up again. >> Not sure if this is expected? > > No, that's not supposed to happen. Can you post the output of "lspci -vv"? Apparently that device 0d:00.0 is generating unexpected DMA accesses for some reason. Looks lite a "Ricoh Co Ltd MMC/SD Host Controller" is the culprit: 0d:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) Subsystem: Lenovo Device 2133 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f2500000 (32-bit, non-prefetchable) [size=256] Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME- Capabilities: [80] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn+ AttnInd+ PwrInd+ RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [800 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Kernel driver in use: sdhci-pci Kernel modules: sdhci-pci Best regards, Arnd ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2rc1: bootup fails: DRHD: handling fault status reg 2 2011-11-15 0:53 ` Arnd Hannemann 2011-11-15 4:31 ` Robert Hancock @ 2011-11-15 4:36 ` Chris Wright 2011-11-15 6:34 ` Arnd Hannemann 2011-11-15 8:55 ` David Woodhouse 1 sibling, 2 replies; 8+ messages in thread From: Chris Wright @ 2011-11-15 4:36 UTC (permalink / raw) To: Arnd Hannemann; +Cc: linux-kernel, iommu, dwmw2 * Arnd Hannemann (arnd@arndnet.de) wrote: > Am 14.11.2011 23:33, schrieb Arnd Hannemann: > > when trying to boot kernel 3.2rc1 on my thinkpad t510 I get an endless loop of errors: > > > > DRHD: handling fault status reg 2 > > DMAR: [DMA Read] Request device [0d:00.0] fault addr fffff000 > > DMAR: [fault reason 02] Present bit in context entry is clear > > > > screenshot can be found here: > > http://arndnet.de/lkml/screenshot3.2rc1.jpg > > > > kernel 3.1.1 is booting up flawlessly. > > I must have inadvertently enabled CONFIG_INTEL_IOMMU_DEFAULT_ON in my config > for 3.2-rc1. > > With disabled CONFIG_INTEL_IOMMU_DEFAULT_ON my thinkpad boots up again. > Not sure if this is expected? With CONFIG_INTEL_IOMMU_DEFAULT_ON=n, you have to manually enabled the IOMMU on the kernel commandline. So, yes, disabling that and having your laptop boot is not surprising. The Kconfig item changed names, and the default is yes, so you may have had CONFIG_DMAR_DEFAULT_ON=n, but this would not have propagated forward. As for the endless loop of DMAR faults...sounds like the Ricoh cardbus/firewire issue where the firewire fucntion does DMA from function 0. I thought this was quirked and fixed though. thanks, -chris ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2rc1: bootup fails: DRHD: handling fault status reg 2 2011-11-15 4:36 ` Chris Wright @ 2011-11-15 6:34 ` Arnd Hannemann 2011-11-15 8:55 ` David Woodhouse 1 sibling, 0 replies; 8+ messages in thread From: Arnd Hannemann @ 2011-11-15 6:34 UTC (permalink / raw) To: Chris Wright; +Cc: linux-kernel, iommu, dwmw2 Am 15.11.2011 05:36, schrieb Chris Wright: > * Arnd Hannemann (arnd@arndnet.de) wrote: >> Am 14.11.2011 23:33, schrieb Arnd Hannemann: >>> when trying to boot kernel 3.2rc1 on my thinkpad t510 I get an endless loop of errors: >>> >>> DRHD: handling fault status reg 2 >>> DMAR: [DMA Read] Request device [0d:00.0] fault addr fffff000 >>> DMAR: [fault reason 02] Present bit in context entry is clear >>> >>> screenshot can be found here: >>> http://arndnet.de/lkml/screenshot3.2rc1.jpg >>> >>> kernel 3.1.1 is booting up flawlessly. >> >> I must have inadvertently enabled CONFIG_INTEL_IOMMU_DEFAULT_ON in my config >> for 3.2-rc1. >> >> With disabled CONFIG_INTEL_IOMMU_DEFAULT_ON my thinkpad boots up again. >> Not sure if this is expected? > > With CONFIG_INTEL_IOMMU_DEFAULT_ON=n, you have to manually enabled the > IOMMU on the kernel commandline. So, yes, disabling that and having > your laptop boot is not surprising. The Kconfig item changed names, > and the default is yes, so you may have had CONFIG_DMAR_DEFAULT_ON=n, > but this would not have propagated forward. > > As for the endless loop of DMAR faults...sounds like the Ricoh > cardbus/firewire issue where the firewire fucntion does DMA from > function 0. I thought this was quirked and fixed though. In this case it seems to be 0d:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01) (using sdhci-pci) The Fireware function seems to be at a different address: 0d:00.3 FireWire (IEEE 1394): Ricoh Co Ltd FireWire Host Controller (rev 01) (prog-if 10 [OHCI]) Maybe another quirk is needed? Where does this need to be fixed? I suppose sdhci-pci.ko is loaded much later on bootup. Best regards Arnd ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2rc1: bootup fails: DRHD: handling fault status reg 2 2011-11-15 4:36 ` Chris Wright 2011-11-15 6:34 ` Arnd Hannemann @ 2011-11-15 8:55 ` David Woodhouse 2011-11-16 4:01 ` Alex Williamson 1 sibling, 1 reply; 8+ messages in thread From: David Woodhouse @ 2011-11-15 8:55 UTC (permalink / raw) To: Chris Wright, alex.williamson; +Cc: Arnd Hannemann, linux-kernel, iommu [-- Attachment #1: Type: text/plain, Size: 400 bytes --] On Mon, 2011-11-14 at 20:36 -0800, Chris Wright wrote: > As for the endless loop of DMAR faults...sounds like the Ricoh > cardbus/firewire issue where the firewire fucntion does DMA from > function 0. I thought this was quirked and fixed though. Alex? You were handling this as part of the IOMMU 'group' functionality, weren't you? Did you have a test machine... you do now :) -- dwmw2 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5818 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 3.2rc1: bootup fails: DRHD: handling fault status reg 2 2011-11-15 8:55 ` David Woodhouse @ 2011-11-16 4:01 ` Alex Williamson 0 siblings, 0 replies; 8+ messages in thread From: Alex Williamson @ 2011-11-16 4:01 UTC (permalink / raw) To: David Woodhouse; +Cc: Chris Wright, Arnd Hannemann, linux-kernel, iommu On Tue, 2011-11-15 at 08:55 +0000, David Woodhouse wrote: > On Mon, 2011-11-14 at 20:36 -0800, Chris Wright wrote: > > As for the endless loop of DMAR faults...sounds like the Ricoh > > cardbus/firewire issue where the firewire fucntion does DMA from > > function 0. I thought this was quirked and fixed though. > > Alex? You were handling this as part of the IOMMU 'group' functionality, > weren't you? Did you have a test machine... you do now :) The group functionality is primarily for iommu_ops so we can expose to users which devices need to be grouped because of iommu restrictions. We can add a quirk to the grouping for this device, but then we have to figure out how dma_ops makes use of that info too. Maybe we could have a 'struct pci_dev *pci_get_iommu_alias_quirk(struct pci_dev *pdev)' function that given device 0d:00.3 would return 0d:00.0 for this case and both get_domain_for_dev() and intel_iommu_device_group() could make use of it. Thanks, Alex ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-11-16 4:01 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-14 22:33 3.2rc1: bootup fails: DRHD: handling fault status reg 2 Arnd Hannemann 2011-11-15 0:53 ` Arnd Hannemann 2011-11-15 4:31 ` Robert Hancock 2011-11-15 6:29 ` Arnd Hannemann 2011-11-15 4:36 ` Chris Wright 2011-11-15 6:34 ` Arnd Hannemann 2011-11-15 8:55 ` David Woodhouse 2011-11-16 4:01 ` Alex Williamson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox