linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ZhenHua <zhen-hual@hp.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Li, Zhen-Hua" <zhen-hual@hp.com>
Subject: Re: [PATCH 1/1] ia64/pci: set mmio decoding on for some host bridge
Date: Wed, 10 Jul 2013 15:10:25 +0800	[thread overview]
Message-ID: <51DD08E1.8040307@hp.com> (raw)
In-Reply-To: <CAErSpo6pCrzCOuthrgD_+oRyw7ZhqztVSi=ti336s2h5xeH_uA@mail.gmail.com>

Hi Bjorn,
On the system that this bug happens,  an MCA event is generated while 
kernel crashed:
     Transaction Address: memory write to address 0x00000ae041428 (LMMIO 
- SBL Blade 1 SFW DDR Memory)

I guess the there is some module trying to visit the address 
0x00000ae041428 right after this line is run:
      pci_write_config_word(dev, PCI_COMMAND,
                         orig_cmd & ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO));


The output of lspci -vvv is followed.
40:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express 
Root Port 1 (rev 22) (prog-if 00 [Normal decode])
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr+ Stepping- SERR+ FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0, Cache Line Size: 64 bytes
         Bus: primary=40, secondary=41, subordinate=41, sec-latency=0
         I/O behind bridge: 0000f000-00000fff
         Memory behind bridge: ae000000-af8fffff
         Prefetchable memory behind bridge: 
fffffffffff00000-00000000000fffff
         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- <SERR- <PERR-
         BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
         Capabilities: [40] Subsystem: Intel Corporation 5520/5500/X58 
I/O Hub PCI Express Root Port 1
         Capabilities: [60] Message Signalled Interrupts: Mask+ 64bit- 
Count=1/2 Enable+
                 Address: fee00000  Data: 4046
                 Masking: 00000002  Pending: 00000000
         Capabilities: [90] Express (v2) Root Port (Slot-), MSI 00
                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 
<64ns, L1 <1us
                         ExtTag+ RBE+ FLReset-
                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
Unsupported+
                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                         MaxPayload 128 bytes, MaxReadReq 128 bytes
                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- 
AuxPwr- TransPend-
                 LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, 
Latency L0 <512ns, L1 <64us
                         ClockPM- Suprise+ LLActRep+ BwNot+
                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
CommClk-
                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive+ BWMgmt- ABWMgmt-
                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- 
PMEIntEna+ CRSVisible-
                 RootCap: CRSVisible-
                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                 DevCap2: Completion Timeout: Range BCD, TimeoutDis+ ARIFwd+
                 DevCtl2: Completion Timeout: 260ms to 900ms, 
TimeoutDis- ARIFwd-
                 LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- 
SpeedDis-, Selectable De-emphasis: -3.5dB
                          Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                          Compliance De-emphasis: -6dB
                 LnkSta2: Current De-emphasis Level: -3.5dB
         Capabilities: [e0] Power Management version 3
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [100] Advanced Error Reporting
                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                 UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO- CmpltAbrt- 
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol-
                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
NonFatalErr-
                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
NonFatalErr+
                 AERCap: First Error Pointer: 00, GenCap- CGenEn- 
ChkCap- ChkEn-
         Capabilities: [150] Access Control Services
                 ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ 
UpstreamFwd+ EgressCtrl- DirectTrans-
                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- 
UpstreamFwd- EgressCtrl- DirectTrans-
         Capabilities: [160] Vendor Specific Information <?>
         Kernel driver in use: pcieport
         Kernel modules: shpchp

Thanks
ZhenHua
On 07/10/2013 12:49 AM, Bjorn Helgaas wrote:
> On Mon, Jul 8, 2013 at 11:42 PM, Li, Zhen-Hua <zhen-hual@hp.com> wrote:
>> On some IA64 platforms with intel PCI bridge, for example, HP BL890c i2
>> with  Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port,
>> when kernel tries to disable the mmio decoding on the PCI bridge devices,
>> kernel may crash.
>>
>> And in the comment of function quirk_mmio_always_on, it also says:
>> "But doing so (disable the mmio decoding) may cause problems on host bridge
>>   and perhaps other key system devices"
>>
>> So, for this PCI bridge,  dev->mmio_always_on bit should be set to 1.
>>
>> To avoid affecting the use of quirk_mmio_always_on, a new function is created.
>>
>> Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
>> ---
>>   drivers/pci/quirks.c    |   17 +++++++++++++++++
>>   include/linux/pci_ids.h |    1 +
>>   2 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index e85d230..665af3e 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -44,6 +44,23 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
>>   DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
>>                                  PCI_CLASS_BRIDGE_HOST, 8, quirk_mmio_always_on);
>>
>> +#ifdef CONFIG_IA64
>> +/*
>> + * On some IA64 platforms, for some intel PCI bridge devices, for example,
>> + * the Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port,
>> + * disable the mmio decoding on this device may cause system crash.
>> + * So dev->mmio_always_on bit should be set to 1.
>> + */
>> +static void quirk_mmio_on_intel_pcibridge(struct pci_dev *dev)
>> +{
>> +       dev->mmio_always_on = 1;
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL,
>> +                       PCI_DEVICE_ID_INTEL_5520_5550_X58,
>> +                       PCI_CLASS_BRIDGE_PCI,
>> +                       8, quirk_mmio_on_intel_pcibridge);
>> +#endif
>> +
>>   /* The Mellanox Tavor device gives false positive parity errors
>>    * Mark this device with a broken_parity_status, to allow
>>    * PCI scanning code to "skip" this now blacklisted device.
>> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
>> index 3bed2e8..d8c60b7 100644
>> --- a/include/linux/pci_ids.h
>> +++ b/include/linux/pci_ids.h
>> @@ -2742,6 +2742,7 @@
>>   #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2  0x2db2
>>   #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV2    0x2db3
>>   #define PCI_DEVICE_ID_INTEL_82855PM_HB 0x3340
>> +#define PCI_DEVICE_ID_INTEL_5520_5550_X58       0x3408
>>   #define PCI_DEVICE_ID_INTEL_IOAT_TBG4  0x3429
>>   #define PCI_DEVICE_ID_INTEL_IOAT_TBG5  0x342a
>>   #define PCI_DEVICE_ID_INTEL_IOAT_TBG6  0x342b
>> --
>> 1.7.10.4
>>
> You need to figure out what the problem is, not just avoid it.  It's
> very unlikely that the problem is something unique to ia64.  In fact,
> I think it's very doubtful that the problem is even something unique
> to the 5520 root ports.  My guess is there's something special about
> the system you're testing.
>
> Evidently you have traffic going to a device behind the root port at
> the same time as we're trying to read the root port's BARs.  Linux
> should not generate traffic like that while we're enumerating the root
> port.  Does the problem happen on a root port with an iLO behind it?
> Can you collect "lspci -vvv" output and identify the root port where
> the problem occurs?
>
> Bjorn


  reply	other threads:[~2013-07-10  7:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-09  5:42 [PATCH 1/1] ia64/pci: set mmio decoding on for some host bridge Li, Zhen-Hua
2013-07-09  5:46 ` ZhenHua
2013-07-09 16:49 ` Bjorn Helgaas
2013-07-10  7:10   ` ZhenHua [this message]
     [not found]   ` <51DCFDC7.3060406@hp.com>
2013-07-10 16:12     ` Bjorn Helgaas
2013-07-12  2:25       ` ZhenHua
  -- strict thread matches above, loose matches on Subject: below --
2013-07-08  0:16 Li, Zhen-Hua
2013-07-08 20:35 ` Bjorn Helgaas
2013-07-09  5:43   ` ZhenHua

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DD08E1.8040307@hp.com \
    --to=zhen-hual@hp.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).