From: Gavin Shan <shangw@linux.vnet.ibm.com>
To: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [PATCH v5 00/21] EEH reorganization
Date: Tue, 17 Apr 2012 09:29:15 +0800 [thread overview]
Message-ID: <20120417012915.GA3806@shangw> (raw)
In-Reply-To: <20120413120346.42e01402@kryten>
>> I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08).
>> Haven't had a chance to narrow it down yet.
Thanks for the information. I'll try to reproduce the issue on
Firebird-L today. By the way, it seems that "mstmread" is some
user-level application accessing the config space while the problem
happened?
>
>Looking closer, it was caused by an EEH error at boot. It looks like
>the Mellanox infiniband card gets an error when probed by their
>firmware tool (mstmread), but only if the kernel driver is not loaded.
>I see this EEH error back on 3.0, so it's not new.
>
>The question now is why we oops in the EEH code on mainline.
>
It seems the crash was caused by something like WARN_ON(). I checked
the function pointed by the backtrace (eeh_dn_check_failure) and I
didn't find any place has called WARN_ON() staff. Maybe I missed something
here.
Anyway, I'll try to reproduce it on Firebird-L machine first of all
and then narrow it down.
>Anton
>
Thanks,
Gavin
>------------[ cut here ]------------
>WARNING: at arch/powerpc/platforms/pseries/eeh.c:492
>Modules linked in:
>NIP: c000000000056cc4 LR: c000000000056cc0 CTR: c00000000051dd60
>REGS: c000001f3953f6a0 TRAP: 0700 Not tainted (3.4.0-rc2-00065-gf549e08-dirty)
>MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28004482 XER: 0000000f
>SOFTE: 0
>CFAR: c00000000074ea30
>TASK = c000001f39685040[19058] 'mstmread' THREAD: c000001f3953c000 CPU: 38
>GPR00: c000000000056cc0 c000001f3953f920 c000000000bd3a28 0000000000000021
>GPR04: 0000000000000000 ffffffffffffffff 00000000000323f7 0000000000000000
>GPR08: 000000006365203c c000000000b10a20 0000000000020000 c000000000a74cc0
>GPR12: 0000000024004422 c00000000eda8500 000000003a58582e 00000000583a5858
>GPR16: 000000002f585858 0000000069636573 000000002f646576 0000000010003b48
>GPR20: 00000fffc7a3d17c 0000000000000058 0000000000000004 c000001f3953fb90
>GPR24: 0000000000000000 0000000000000000 c000000000c77088 c000003e6fffeee8
>GPR28: c000000000d82680 0000000000000000 c000000000c770d0 0000000000000000
>NIP [c000000000056cc4] .eeh_dn_check_failure+0x304/0x320
>LR [c000000000056cc0] .eeh_dn_check_failure+0x300/0x320
>Call Trace:
>[c000001f3953f920] [c000000000056cc0] .eeh_dn_check_failure+0x300/0x320 (unreliable)
>[c000001f3953f9d0] [c00000000002717c] .rtas_read_config+0x13c/0x1b0
>[c000001f3953fa70] [c0000000003d543c] .pci_user_read_config_dword+0xcc/0x150
>[c000001f3953fb20] [c0000000003e19d8] .pci_read_config+0xe8/0x2a0
>[c000001f3953fc00] [c00000000022d330] .read+0x130/0x210
>[c000001f3953fce0] [c0000000001a723c] .vfs_read+0xec/0x1e0
>[c000001f3953fd80] [c0000000001a73ec] .SyS_pread64+0xbc/0xd0
>[c000001f3953fe30] [c000000000009780] syscall_exit+0x0/0x7c
>Instruction dump:
>7f83e378 48001909 60000000 2fbf0000 419e002c e89f00d8 2fa40000 409e0008
>e89f0098 e8629fb8 486f7d39 60000000 <0fe00000> 3b200001 4bfffdb4 e8829fa8
>---[ end trace a6e6d788c9869e00 ]---
>EEH: Detected PCI bus error on device 0006:01:00.0
>EEH: This PCI device has failed 1 times in the last hour:
>EEH: Bus location=U78AB.001.WZSGRFL-P1-C4-T1 driver= pci addr=0006:01:00.0
>EEH: Device location=U78AB.001.WZSGRFL-P1-C4-T1 driver= pci addr=0006:01:00.0
>EEH: of node=/pci@800000020000203/pci1014,415@0
>EEH: PCI device/vendor: 673c15b3
>EEH: PCI cmd/status register: 00100140
>
next prev parent reply other threads:[~2012-04-17 1:29 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-28 6:03 [PATCH v5 00/21] EEH reorganization Gavin Shan
2012-02-28 6:03 ` [PATCH 01/21] Cleanup on comments of EEH core Gavin Shan
2012-02-28 6:03 ` [PATCH 02/21] Cleanup on function names " Gavin Shan
2012-02-28 6:03 ` [PATCH 03/21] Platform dependent EEH operations Gavin Shan
2012-02-28 6:03 ` [PATCH 04/21] pSeries platform EEH initialization Gavin Shan
2012-02-28 6:03 ` [PATCH 05/21] pSeries platform EEH operation Gavin Shan
2012-02-28 6:03 ` [PATCH 06/21] pSeries platform EEH PE address retrieval Gavin Shan
2012-02-28 6:03 ` [PATCH 07/21] pSeries platform PE state retrieval Gavin Shan
2012-02-28 6:03 ` [PATCH 08/21] pSeries platform EEH wait PE state Gavin Shan
2012-02-28 6:03 ` [PATCH 09/21] pSeries platform EEH reset PE Gavin Shan
2012-02-28 6:04 ` [PATCH 10/21] pSeries platform EEH error log retrieval Gavin Shan
2012-02-28 6:04 ` [PATCH 11/21] pSeries platform EEH configure bridge Gavin Shan
2012-02-28 6:04 ` [PATCH 12/21] Cleanup on comments of EEH aux components Gavin Shan
2012-02-28 6:04 ` [PATCH 13/21] Cleanup on function names " Gavin Shan
2012-02-28 6:04 ` [PATCH 14/21] Introduce EEH device Gavin Shan
2012-02-28 6:04 ` [PATCH 15/21] Replace pci_dn with eeh_dev for EEH sysfs Gavin Shan
2012-02-28 6:04 ` [PATCH 16/21] Replace pci_dn with eeh_dev for EEH address cache Gavin Shan
2012-02-28 6:04 ` [PATCH 17/21] Replace pci_dn with eeh_dev for EEH core Gavin Shan
2012-02-28 6:04 ` [PATCH 18/21] Replace pci_dn with eeh_dev for EEH aux components Gavin Shan
2012-02-28 6:04 ` [PATCH 19/21] Replace pci_dn with eeh_dev for EEH on pSeries Gavin Shan
2012-02-28 6:04 ` [PATCH 20/21] Introduce struct eeh_stats for EEH Gavin Shan
2012-02-28 10:04 ` David Laight
2012-02-29 1:08 ` Gavin Shan
2012-02-29 2:25 ` Gavin Shan
2012-02-29 12:56 ` Michael Ellerman
2012-03-01 1:14 ` Gavin Shan
2012-03-01 1:47 ` [PATCH 20/21] Introduce struct eeh_stats for EEH - Reworked Gavin Shan
2012-02-28 6:04 ` [PATCH 21/21] pSeries platform config space access in EEH Gavin Shan
2012-02-29 3:04 ` [PATCH v5 00/21] EEH reorganization Gavin Shan
2012-04-12 21:39 ` Anton Blanchard
2012-04-13 2:03 ` Anton Blanchard
2012-04-17 1:29 ` Gavin Shan [this message]
2012-04-17 1:37 ` Anton Blanchard
2012-04-17 1:57 ` Benjamin Herrenschmidt
2012-04-17 5:30 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120417012915.GA3806@shangw \
--to=shangw@linux.vnet.ibm.com \
--cc=anton@samba.org \
--cc=linuxppc-dev@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).