linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <shangw@linux.vnet.ibm.com>
To: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@ozlabs.org
Subject: Re: [PATCH v5 00/21] EEH reorganization
Date: Tue, 17 Apr 2012 09:29:15 +0800	[thread overview]
Message-ID: <20120417012915.GA3806@shangw> (raw)
In-Reply-To: <20120413120346.42e01402@kryten>

>> I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08).
>> Haven't had a chance to narrow it down yet.

Thanks for the information. I'll try to reproduce the issue on
Firebird-L today. By the way, it seems that "mstmread" is some
user-level application accessing the config space while the problem
happened?


>
>Looking closer, it was caused by an EEH error at boot. It looks like
>the Mellanox infiniband card gets an error when probed by their
>firmware tool (mstmread), but only if the kernel driver is not loaded.
>I see this EEH error back on 3.0, so it's not new.
>
>The question now is why we oops in the EEH code on mainline.
>

It seems the crash was caused by something like WARN_ON(). I checked
the function pointed by the backtrace (eeh_dn_check_failure) and I
didn't find any place has called WARN_ON() staff. Maybe I missed something
here.

Anyway, I'll try to reproduce it on Firebird-L machine first of all
and then narrow it down.

>Anton
>

Thanks,
Gavin

>------------[ cut here ]------------
>WARNING: at arch/powerpc/platforms/pseries/eeh.c:492
>Modules linked in:
>NIP: c000000000056cc4 LR: c000000000056cc0 CTR: c00000000051dd60
>REGS: c000001f3953f6a0 TRAP: 0700   Not tainted  (3.4.0-rc2-00065-gf549e08-dirty)
>MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 28004482  XER: 0000000f
>SOFTE: 0
>CFAR: c00000000074ea30
>TASK = c000001f39685040[19058] 'mstmread' THREAD: c000001f3953c000 CPU: 38
>GPR00: c000000000056cc0 c000001f3953f920 c000000000bd3a28 0000000000000021 
>GPR04: 0000000000000000 ffffffffffffffff 00000000000323f7 0000000000000000 
>GPR08: 000000006365203c c000000000b10a20 0000000000020000 c000000000a74cc0 
>GPR12: 0000000024004422 c00000000eda8500 000000003a58582e 00000000583a5858 
>GPR16: 000000002f585858 0000000069636573 000000002f646576 0000000010003b48 
>GPR20: 00000fffc7a3d17c 0000000000000058 0000000000000004 c000001f3953fb90 
>GPR24: 0000000000000000 0000000000000000 c000000000c77088 c000003e6fffeee8 
>GPR28: c000000000d82680 0000000000000000 c000000000c770d0 0000000000000000 
>NIP [c000000000056cc4] .eeh_dn_check_failure+0x304/0x320
>LR [c000000000056cc0] .eeh_dn_check_failure+0x300/0x320
>Call Trace:
>[c000001f3953f920] [c000000000056cc0] .eeh_dn_check_failure+0x300/0x320 (unreliable)
>[c000001f3953f9d0] [c00000000002717c] .rtas_read_config+0x13c/0x1b0
>[c000001f3953fa70] [c0000000003d543c] .pci_user_read_config_dword+0xcc/0x150
>[c000001f3953fb20] [c0000000003e19d8] .pci_read_config+0xe8/0x2a0
>[c000001f3953fc00] [c00000000022d330] .read+0x130/0x210
>[c000001f3953fce0] [c0000000001a723c] .vfs_read+0xec/0x1e0
>[c000001f3953fd80] [c0000000001a73ec] .SyS_pread64+0xbc/0xd0
>[c000001f3953fe30] [c000000000009780] syscall_exit+0x0/0x7c
>Instruction dump:
>7f83e378 48001909 60000000 2fbf0000 419e002c e89f00d8 2fa40000 409e0008 
>e89f0098 e8629fb8 486f7d39 60000000 <0fe00000> 3b200001 4bfffdb4 e8829fa8 
>---[ end trace a6e6d788c9869e00 ]---
>EEH: Detected PCI bus error on device 0006:01:00.0
>EEH: This PCI device has failed 1 times in the last hour:
>EEH: Bus location=U78AB.001.WZSGRFL-P1-C4-T1 driver= pci addr=0006:01:00.0
>EEH: Device location=U78AB.001.WZSGRFL-P1-C4-T1 driver= pci addr=0006:01:00.0
>EEH: of node=/pci@800000020000203/pci1014,415@0
>EEH: PCI device/vendor: 673c15b3
>EEH: PCI cmd/status register: 00100140
>

  reply	other threads:[~2012-04-17  1:29 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-28  6:03 [PATCH v5 00/21] EEH reorganization Gavin Shan
2012-02-28  6:03 ` [PATCH 01/21] Cleanup on comments of EEH core Gavin Shan
2012-02-28  6:03 ` [PATCH 02/21] Cleanup on function names " Gavin Shan
2012-02-28  6:03 ` [PATCH 03/21] Platform dependent EEH operations Gavin Shan
2012-02-28  6:03 ` [PATCH 04/21] pSeries platform EEH initialization Gavin Shan
2012-02-28  6:03 ` [PATCH 05/21] pSeries platform EEH operation Gavin Shan
2012-02-28  6:03 ` [PATCH 06/21] pSeries platform EEH PE address retrieval Gavin Shan
2012-02-28  6:03 ` [PATCH 07/21] pSeries platform PE state retrieval Gavin Shan
2012-02-28  6:03 ` [PATCH 08/21] pSeries platform EEH wait PE state Gavin Shan
2012-02-28  6:03 ` [PATCH 09/21] pSeries platform EEH reset PE Gavin Shan
2012-02-28  6:04 ` [PATCH 10/21] pSeries platform EEH error log retrieval Gavin Shan
2012-02-28  6:04 ` [PATCH 11/21] pSeries platform EEH configure bridge Gavin Shan
2012-02-28  6:04 ` [PATCH 12/21] Cleanup on comments of EEH aux components Gavin Shan
2012-02-28  6:04 ` [PATCH 13/21] Cleanup on function names " Gavin Shan
2012-02-28  6:04 ` [PATCH 14/21] Introduce EEH device Gavin Shan
2012-02-28  6:04 ` [PATCH 15/21] Replace pci_dn with eeh_dev for EEH sysfs Gavin Shan
2012-02-28  6:04 ` [PATCH 16/21] Replace pci_dn with eeh_dev for EEH address cache Gavin Shan
2012-02-28  6:04 ` [PATCH 17/21] Replace pci_dn with eeh_dev for EEH core Gavin Shan
2012-02-28  6:04 ` [PATCH 18/21] Replace pci_dn with eeh_dev for EEH aux components Gavin Shan
2012-02-28  6:04 ` [PATCH 19/21] Replace pci_dn with eeh_dev for EEH on pSeries Gavin Shan
2012-02-28  6:04 ` [PATCH 20/21] Introduce struct eeh_stats for EEH Gavin Shan
2012-02-28 10:04   ` David Laight
2012-02-29  1:08     ` Gavin Shan
2012-02-29  2:25   ` Gavin Shan
2012-02-29 12:56   ` Michael Ellerman
2012-03-01  1:14     ` Gavin Shan
2012-03-01  1:47   ` [PATCH 20/21] Introduce struct eeh_stats for EEH - Reworked Gavin Shan
2012-02-28  6:04 ` [PATCH 21/21] pSeries platform config space access in EEH Gavin Shan
2012-02-29  3:04 ` [PATCH v5 00/21] EEH reorganization Gavin Shan
2012-04-12 21:39 ` Anton Blanchard
2012-04-13  2:03   ` Anton Blanchard
2012-04-17  1:29     ` Gavin Shan [this message]
2012-04-17  1:37       ` Anton Blanchard
2012-04-17  1:57         ` Benjamin Herrenschmidt
2012-04-17  5:30           ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120417012915.GA3806@shangw \
    --to=shangw@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).