From: Benjamin Herrenschmidt <benh@au1.ibm.com>
To: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org, Anton Blanchard <anton@samba.org>
Subject: Re: [PATCH v5 00/21] EEH reorganization
Date: Tue, 17 Apr 2012 11:57:51 +1000 [thread overview]
Message-ID: <1334627871.25353.26.camel@pasglop> (raw)
In-Reply-To: <20120417113738.0f091da4@kryten>
On Tue, 2012-04-17 at 11:37 +1000, Anton Blanchard wrote:
>
> No. I replaced that backtrace in eeh_dn_check_failure with a WARN_ON()
> because the backtrace doesn't give us enough info. I'm submitting a
> patch for that today.
>
> Bottom line is mstmread has been causing an EEH error since at least
> 3.0, but in 3.4 we now oops instead of recovering. The signs all point
> to the EEH rework in 3.4.
More precisely, the original oops reported by Anton decodes as such:
>Oops: Kernel access of bad area, sig: 11 [#1]
This is typically a bad memory access..
>SMP NR_CPUS=1024 NUMA pSeries
>Modules linked in:
>NIP: c000000000055af8 LR: c000000000033204 CTR: 0000000000000000
>REGS: c000001f42fb7990 TRAP: 0300 Tainted: G W (3.4.0-rc2-00065-gf549e08-dirty)
TRAP: 300 means that it's the result of a data access interrupts, ie,
load or store to a bad address
>MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24008084 XER: 00000000
>SOFTE: 1
>CFAR: 00000000000049b8
>DAR: 0000000000000070, DSISR: 40000000
Here the DAR tells us what address was accessed. 0x70 is a strong indication
that this was an access to a NULL pointer (at offset 0x70 from that pointer).
It -might- be something else (such as a NULL passed to a list head or such)
but the idea that there's a NULL floating around is a good hint.
>TASK = c000001f6c7dfc40[19010] 'eehd' THREAD: c000001f42fb4000 CPU: 6
>GPR00: 0000000000000001 c000001f42fb7c10 c000000000bd3a28 c000001f80ab0800
>GPR04: c000001f7c57d418 0000000000000380 c000001f7c57e070 c000000000ed5360
>GPR08: 0000000000000000 c000000000c77088 0000000000000000 0000000000000001
>GPR12: 0000000044008088 c00000000eda1500 00000000019ffa78 0000000000a70000
>GPR16: 00000000000000bb c000000000a9f754 c000000000963230 000000000000005e
>GPR20: 0000000001b37e80 00000000000000bb 0000000000000000 c000000000b0ad90
>GPR24: 0000000000000000 c000000000b10588 0000000000000001 c000001f80ab0800
>GPR28: 0000000000000000 c000001f80ab0828 0000000000000000 c000001f7ee10000
>NIP [c000000000055af8] .eeh_add_device_tree_late+0x58/0xf0
This is the function where it happened (eeh_add_device_tree_late)
>LR [c000000000033204] .pcibios_finish_adding_to_bus+0x34/0x50
>Call Trace:
>[c000001f42fb7c10] [00000000fdffffff] 0xfdffffff (unreliable)
>[c000001f42fb7ca0] [c000000000033204] .pcibios_finish_adding_to_bus+0x34/0x50
>[c000001f42fb7d20] [c000000000059a5c] .pcibios_add_pci_devices+0x7c/0x190
>[c000001f42fb7db0] [c000000000057a6c] .eeh_reset_device+0xfc/0x1a0
>[c000001f42fb7e50] [c000000000057e18] .handle_eeh_events+0x308/0x480
>[c000001f42fb7f00] [c0000000000584dc] .eeh_event_handler+0x13c/0x1d0
>[c000001f42fb7f90] [c00000000002099c] .kernel_thread+0x54/0x70
And your backtrace. You can see that you got an eeh event, which triggered an
eeh reset, which triggered a pcibios_add_pci_devices() etc...
>Instruction dump:
>480000a8 60000000 ebff0000 7fbfe800 419e0098 2fbf0000 419e005c e9229eb0
>80090008 2f800000 419e004c ebdf01d0 <e81e0070> 7fbf0000 3160ffff
>7d2b0110
Cheers,
Ben.
next prev parent reply other threads:[~2012-04-17 1:58 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-28 6:03 [PATCH v5 00/21] EEH reorganization Gavin Shan
2012-02-28 6:03 ` [PATCH 01/21] Cleanup on comments of EEH core Gavin Shan
2012-02-28 6:03 ` [PATCH 02/21] Cleanup on function names " Gavin Shan
2012-02-28 6:03 ` [PATCH 03/21] Platform dependent EEH operations Gavin Shan
2012-02-28 6:03 ` [PATCH 04/21] pSeries platform EEH initialization Gavin Shan
2012-02-28 6:03 ` [PATCH 05/21] pSeries platform EEH operation Gavin Shan
2012-02-28 6:03 ` [PATCH 06/21] pSeries platform EEH PE address retrieval Gavin Shan
2012-02-28 6:03 ` [PATCH 07/21] pSeries platform PE state retrieval Gavin Shan
2012-02-28 6:03 ` [PATCH 08/21] pSeries platform EEH wait PE state Gavin Shan
2012-02-28 6:03 ` [PATCH 09/21] pSeries platform EEH reset PE Gavin Shan
2012-02-28 6:04 ` [PATCH 10/21] pSeries platform EEH error log retrieval Gavin Shan
2012-02-28 6:04 ` [PATCH 11/21] pSeries platform EEH configure bridge Gavin Shan
2012-02-28 6:04 ` [PATCH 12/21] Cleanup on comments of EEH aux components Gavin Shan
2012-02-28 6:04 ` [PATCH 13/21] Cleanup on function names " Gavin Shan
2012-02-28 6:04 ` [PATCH 14/21] Introduce EEH device Gavin Shan
2012-02-28 6:04 ` [PATCH 15/21] Replace pci_dn with eeh_dev for EEH sysfs Gavin Shan
2012-02-28 6:04 ` [PATCH 16/21] Replace pci_dn with eeh_dev for EEH address cache Gavin Shan
2012-02-28 6:04 ` [PATCH 17/21] Replace pci_dn with eeh_dev for EEH core Gavin Shan
2012-02-28 6:04 ` [PATCH 18/21] Replace pci_dn with eeh_dev for EEH aux components Gavin Shan
2012-02-28 6:04 ` [PATCH 19/21] Replace pci_dn with eeh_dev for EEH on pSeries Gavin Shan
2012-02-28 6:04 ` [PATCH 20/21] Introduce struct eeh_stats for EEH Gavin Shan
2012-02-28 10:04 ` David Laight
2012-02-29 1:08 ` Gavin Shan
2012-02-29 2:25 ` Gavin Shan
2012-02-29 12:56 ` Michael Ellerman
2012-03-01 1:14 ` Gavin Shan
2012-03-01 1:47 ` [PATCH 20/21] Introduce struct eeh_stats for EEH - Reworked Gavin Shan
2012-02-28 6:04 ` [PATCH 21/21] pSeries platform config space access in EEH Gavin Shan
2012-02-29 3:04 ` [PATCH v5 00/21] EEH reorganization Gavin Shan
2012-04-12 21:39 ` Anton Blanchard
2012-04-13 2:03 ` Anton Blanchard
2012-04-17 1:29 ` Gavin Shan
2012-04-17 1:37 ` Anton Blanchard
2012-04-17 1:57 ` Benjamin Herrenschmidt [this message]
2012-04-17 5:30 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1334627871.25353.26.camel@pasglop \
--to=benh@au1.ibm.com \
--cc=anton@samba.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=shangw@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).