* [parisc-linux] PIM after c3k crash w/ VisEG PCI card
@ 2002-02-02 19:06 Helge Deller
2002-02-03 6:57 ` Grant Grundler
0 siblings, 1 reply; 2+ messages in thread
From: Helge Deller @ 2002-02-02 19:06 UTC (permalink / raw)
To: parisc-linux
[-- Attachment #1: Type: text/plain, Size: 317 bytes --]
Hi all,
the attached file shows the PIM of a 64bit kernel after my
machine crashed while trying to initialize the STI with a
VisEG PCI card in PCI slot 2. So it's the same problem
as with a 32bit kernel.
Hopefully/Maybe this log may be usefull for someone of
you helping me to debug this problem ?
TIA,
Helge
[-- Attachment #2: hpmc.txt --]
[-- Type: text/plain, Size: 8301 bytes --]
pim
PROCESSOR PIM INFORMATION
----------------- Processor 0 HPMC Information ------------------
Timestamp =
Sat Feb 2 18:48:51 GMT 2002 (20:02:02:02:18:48:51)
HPMC Chassis Codes = 2cbf0 2500b 27821 2cbf4 2cbfc
General Registers 0 - 31
00-03 0000000000000000 ffffffffffffffff 00000000001072a0 00000000004c5240
04-07 000000007fb70da8 00000000fa100000 00000000fa380000 00000000003376c8
08-11 0000000010494740 00000000000000c1 0000000010494740 000000001040084c
12-15 0000000010494740 00000000fb000000 0000000010494740 00000000f0400004
16-19 0000000010494740 00000000f000017c 0000000010494740 000000000804000e
20-23 00000000f0000174 000000008fb70ef0 0000000010494740 00000000004c5240
24-27 000000007fb70da8 00000000fa380000 00000000000000f0 00000000103ebd20
28-31 0000000000496788 0000000000000038 0000000000494958 0000000000000000
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 00000000000000c0 000000000000003f
12-15 0000000000000000 0000000000000000 0000000000104000 ffffffffffffffff
16-19 0000001c8128124b 0000000000000000 000000007fb31878 0000000048c20008
20-23 00000000a627ffe8 c0000000e0380004 000000ff0000ff08 8000000000000000
24-27 0000000000380000 0000000000380000 00000000ffffffff 00000000ffffffff
28-31 00000000ffffffff 00000000ffffffff 000000008fb70000 0000000010480000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
<Press any key to continue (q to quit)>
IIA Space = 0x0000000000000000
IIA Offset = 0x000000007fb3187c
Check Type = 0x20000000
CPU State = 0x9e000004
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x0030103b
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x000000fffa380004
System Requestor Address = 0xfffffffffffa0000
Floating-Point Registers 0 - 31
00-03 0000001f00000000 0000000000000000 0000000000000000 0000000000000000
04-07 00003d091037f038 000000788fb70000 00000000001c9c38 000000000800000f
08-11 00000000103ebd20 00000000ffffffff 00000000103ebd20 0000000010474e88
12-15 0000000010374cf8 0000000000000001 000000001016c2c8 000000000804000e
16-19 000000001037f038 0000000000000002 0000000000000002 000000001040084c
20-23 000000000804000e 00000000000000c1 0000000000000000 0000000000000000
24-27 834e0b5f0800000f 00000000103ebd20 00000000103ebd20 0000000000000000
28-31 0000000000000802 0000000000000000 0000000010158288 0000000000000002
<Press any key to continue (q to quit)>
'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:
Check Summary = 0xcb81045028000000
Available Memory = 0x0000000080000000
CPU Diagnose Register 2 = 0x0203000000802004
CPU Status Register 0 = 0x2420c20000000000
CPU Status Register 1 = 0x8002000000000000
SADD LOG = 0xaf115ebd36f73fff
Read Short LOG = 0xc1a0f0fffa380004
ERROR_STATUS = 0x0000000000500050
MEM_ADDR = 0x000001ff3fffffff
MEM_SYND = 0x0000000000000000
MEM_ADDR_CORR = 0x000000100000442f
MEM_SYND_CORR = 0x8c008c0000008c00
RUN_DATA_HIGH = 0xc1bff0fffed08040
RUN_DATA_LOW = 0xc1bff0fffed08040
RUN_CTRL = 0x0000021c00001418
RUN_ADDR = 0xc1bff0fffed08040
System Responder Path = 0x00ffffff0a060200
HPMC PIM Analysis Information:
Timestamp =
Sat Feb 2 18:48:51 GMT 2002 (20:02:02:02:18:48:51)
'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:
A Data I/O Fetch Timeout occurred while CPU 0 was
requesting information from a device at the path 10/6/2/0 (PCI slot 2).
Memory/IO Controller Error Analysis Information:
There were multiple correctable memory errors. See 'Memory Error Log Info'.
<Press any key to continue (q to quit)>
----------------- Processor 0 LPMC Information ------------------
Check Type = 0x00000000
I/D Cache Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
----------------- Processor 0 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
IIA Space = 0x0000000000000000
IIA Offset = 0x0000000000000000
CPU State = 0x00000000
<Press any key to continue (q to quit)>
Memory Error Log Information:
Timestamp =
Sat Feb 2 18:48:51 GMT 2002 (20:02:02:02:18:48:51)
'9000/785 B,C,J Workstation Memory Error Log', rev 0, 64 bytes:
This log displays the contents of memory specific registers when the
HPMC occurred. If there are multiple memory errors, the order they are
listed is not indicative of the order they occurred.
Trans Addr
Memory Error Type(s) OV MID ID par CP DIMM Runway Address
-------------------- -- --- ----- ---- -- ------- -------------------
1) Correctable Mem 1 0x0 0x10 na na 01 0x 0000110bc0
Syndrome
------------------
1) 0x8c008c0000008c00
<Press any key to continue (q to quit)>
I/O Module Error Log Information:
Timestamp =
Sat Feb 2 18:48:51 GMT 2002 (20:02:02:02:18:48:51)
'9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:
Rope Word1 Word2 Word3
------ ------------ ------------
0 0x00000000 0x0e0cc009 0x00000000fed30048
1 0x00000000 0x1e0cc009 0x00000000fed32048
2 ---------- 0x2e0cc009 ------------------
3 ---------- 0x3e0cc009 ------------------
4 0x00000000 0x4e0cc009 0x00000000fed38048
5 ---------- 0x5e0cc009 ------------------
6 0x0000e000 0x6e0cc009 0x00000000fa38003c
7 ---------- 0x7e0cc009 ------------------
Service Menu: Enter command >
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [parisc-linux] PIM after c3k crash w/ VisEG PCI card
2002-02-02 19:06 [parisc-linux] PIM after c3k crash w/ VisEG PCI card Helge Deller
@ 2002-02-03 6:57 ` Grant Grundler
0 siblings, 0 replies; 2+ messages in thread
From: Grant Grundler @ 2002-02-03 6:57 UTC (permalink / raw)
To: Helge Deller; +Cc: parisc-linux
Helge Deller wrote:
> Hi all,
>
> the attached file shows the PIM of a 64bit kernel after my
> machine crashed while trying to initialize the STI with a
> VisEG PCI card in PCI slot 2. So it's the same problem
> as with a 32bit kernel.
>
> Hopefully/Maybe this log may be usefull for someone of
> you helping me to debug this problem ?
I can try.
notes/thoughts mixed in.
Much of the original text deleted.
> Timestamp =
> Sat Feb 2 18:48:51 GMT 2002 (20:02:02:02:18:48:51)
You should verify the timestamp actually matches the incident.
(This looks ok)
> HPMC Chassis Codes = 2cbf0 2500b 27821 2cbf4 2cbfc
Normally these are useful - if you have the magic decoder
for them. I don't know what the first digit "2" means.
The cbf0/500b/7821/cbf4/cbfc look familiar.
Here's what I *think* these mean based on some *really old* notes:
cbf0: HPMC
500b: Bus Timeout
7821: 782x == Mem Correctable Err, 1 == DIMM 1
This seems to match the "clear" text that was printed later.
Sounds like an orthogonal problem. Perhaps swap DIMM 1 with
one of the other DIMMS?
cbf4: invalid OS HPMC checksum - page zero OS entry ptr was invalid
cbfc: couldn't call OS HPMC handler
So fixing this would probably help get more info to console
when it dies. Perhaps we try to setup the console before
enabling the OS HPMC handler?
But no console means no output unless EARLY_BOOTUP_DEBUG
is defined in pdc_cons.c.
> General Registers 0 - 31
> 00-03 0000000000000000 ffffffffffffffff 00000000001072a0 00000000004c524
> 0
I'm going to guess GR02 is a realmode address (matching
virtual addr would be 101072a0).
Or perhaps a "double" HPMC occurred?
First one happened in STI code and then the OS HPMC handler
tripped again when it tried to output?
> IIA Space = 0x0000000000000000
> IIA Offset = 0x000000007fb3187c
Is this were STI gets loaded?
Looks like an awefully high address.
Artifact of no OS HPMC handler?
I'd hope STI would work the same on all boxes.
> Check Type = 0x20000000
> CPU State = 0x9e000004
> Cache Check = 0x00000000
> TLB Check = 0x00000000
> Bus Check = 0x0030103b
> Assists Check = 0x00000000
> Assist State = 0x00000000
> Path Info = 0x00000000
> System Responder Address = 0x000000fffa380004
Address CPU was trying to read.
> System Requestor Address = 0xfffffffffffa0000
HPA of CPU that timed out.
...
> '9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:
>
> Check Summary = 0xcb81045028000000
> Available Memory = 0x0000000080000000
> CPU Diagnose Register 2 = 0x0203000000802004
> CPU Status Register 0 = 0x2420c20000000000
> CPU Status Register 1 = 0x8002000000000000
> SADD LOG = 0xaf115ebd36f73fff
> Read Short LOG = 0xc1a0f0fffa380004
> ERROR_STATUS = 0x0000000000500050
> MEM_ADDR = 0x000001ff3fffffff
> MEM_SYND = 0x0000000000000000
> MEM_ADDR_CORR = 0x000000100000442f
> MEM_SYND_CORR = 0x8c008c0000008c00
> RUN_DATA_HIGH = 0xc1bff0fffed08040
> RUN_DATA_LOW = 0xc1bff0fffed08040
> RUN_CTRL = 0x0000021c00001418
> RUN_ADDR = 0xc1bff0fffed08040
> System Responder Path = 0x00ffffff0a060200
Much is actually interesting - but I think Read Short LOG
was the address of the most recent sub-cacheline read.
Not sure if this is only IO.
...
> A Data I/O Fetch Timeout occurred while CPU 0 was
> requesting information from a device at the path 10/6/2/0 (PCI slot 2).
Typical of two scenarios:
o device wasn't initialized/enabled
(ie PCI CMD Bus Master and/or MMIO Enable bits not set)
o Some Bridge chip betwen CPU and PCI Device was already Fatal
(eg DMA to invalid address with cause Astro/U2 to go fatal
because of unresolved IO TLB fault)
> Memory/IO Controller Error Analysis Information:
>
> There were multiple correctable memory errors. See 'Memory Error Log Info'.
I'm wondering if this is related. Do these happen with out Viz-EG
enabled too?
You can "ser clearpim", boot, build a kernel or something, reboot
and check PIM info again.
> ----------------- Processor 0 LPMC Information ------------------
FWIW, typically LPMC is for correctable memory errors.
I believe the OS gets notified of these since it may chose
to evacuate the memory page that's getting those.
> This log displays the contents of memory specific registers when the
> HPMC occurred. If there are multiple memory errors, the order they are
> listed is not indicative of the order they occurred.
>
> Trans Addr
> Memory Error Type(s) OV MID ID par CP DIMM Runway Address
> -------------------- -- --- ----- ---- -- ------- -----------------
> --
> 1) Correctable Mem 1 0x0 0x10 na na 01 0x 0000110b
> c0
hmmm...is 00110bc0 a kernel address?
that's not far off from GR02.
> '9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:
>
> Rope Word1 Word2 Word3
> ------ ------------ ------------
> 0 0x00000000 0x0e0cc009 0x00000000fed30048
> 1 0x00000000 0x1e0cc009 0x00000000fed32048
> 2 ---------- 0x2e0cc009 ------------------
> 3 ---------- 0x3e0cc009 ------------------
> 4 0x00000000 0x4e0cc009 0x00000000fed38048
> 5 ---------- 0x5e0cc009 ------------------
> 6 0x0000e000 0x6e0cc009 0x00000000fa38003c
> 7 ---------- 0x7e0cc009 ------------------
Rope 6 went fatal (0xe). Forgot what word3 is - offending address?
hth,
grant
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-02-03 6:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-02 19:06 [parisc-linux] PIM after c3k crash w/ VisEG PCI card Helge Deller
2002-02-03 6:57 ` Grant Grundler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox