* [parisc-linux] 2.4.18 SMP instability
@ 2002-05-26 0:48 Robert Stanford
2002-05-26 6:09 ` Grant Grundler
0 siblings, 1 reply; 23+ messages in thread
From: Robert Stanford @ 2002-05-26 0:48 UTC (permalink / raw)
To: HP900 PARISC mailing list
Regarding the below post, have the SMP issues been worked out on 2.4.18
yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
to use apt with an smp kernel.
apt-get(3766): unaligned access to 0x403ce094 at ip=0x4005e47f
Although I was doing some benchmarking and was able to make -j 3 vmlinux
using a 2.4.18-25 SMP kernel with no problems.
Robert Stanford
*cut*
--------------------------------------------------------------------
From: Matthew Wilcox (willy@debian.org)
Date: Thu Apr 11 2002 - 14:40:12 MDT
On Thu, Apr 11, 2002 at 03:16:25PM -0400, D'Ausilio, John wrote:
> Is the 2.4.18 which comes down from the archive as recent as the ones
in the
> FTP server? I'm going to boot back into the original kernel and try
getting
> the latest from the FTP server .. if that dosn't work I guess I'll get
the
> sources and build from CVS. Any other hints/clues/suggestions? Should
I just
> run single proc for now?
Yes, we've also found 2.4.18 to be unstable SMP. I believe Grant has a
handle on this problem now, so expect it to be fixed quite soon.
-------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-26 0:48 Robert Stanford
@ 2002-05-26 6:09 ` Grant Grundler
2002-05-26 7:29 ` Jeremy Drake
0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-26 6:09 UTC (permalink / raw)
To: Robert Stanford; +Cc: parisc-linux
Robert Stanford wrote:
> Regarding the below post, have the SMP issues been worked out on 2.4.18
> yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
> to use apt with an smp kernel.
uhm...I see that I'm using UP kernels on my boxes right now.
I'll rebuild SMP and retest.
I did just find an SMP problem in the current EIEM handling.
Can't say if this is really causing any problems right now though.
Stop reading now if you don't know about (or don't want to) EIEM.
If enable_irq or disable_irq gets called from a CPU other than
the one the device driver is supposed to interrupt, it will set the
EIEM bit in only *that* (the wrong) CPU. The result is the interrupt
will remain masked on the target CPU. I think the solution
is to use a global "eiem_val" (set/clear bits here) to match
the global EIRR switch table. I've thought about moving to a
per-CPU EIEM/EIRR switch table. But that's more work than I
have time for right now and would have a similar problem.
For now, we just need to update EIEM on all CPUs whenever the
eiem_val global changes.
We do NOT currently distribute interrupts.
I did write a patch to distribute IO interrupts:
ftp://ftp.parisc-linux.org/patches/irq_distr.diff
This diff can't be applied until the EIEM issue is fixed.
I suspect we don't (usually) have a problem with EIEM since all
interrupts are going to CPU 0 (aka Monarch) and nearly all driver
initialization takes place before the system is multithreaded.
The only other possibility is processes are only running on CPU 0.
ie when loading a device driver later, it always gets initialized on the
monarch. This scenario would also match the "top" output where
a 2-way system is always 50% idle and a 4-way is 75% idle.
I'd like to learn some way of seeing which CPU is running which
processes. top doesn't seem to indicate that. I'll look at sysstat
package later.
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-26 6:09 ` Grant Grundler
@ 2002-05-26 7:29 ` Jeremy Drake
2002-05-26 20:23 ` Jeremy Drake
0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-26 7:29 UTC (permalink / raw)
To: parisc-linux
On Sun, 26 May 2002, Grant Grundler wrote:
> Robert Stanford wrote:
> > Regarding the below post, have the SMP issues been worked out on 2.4.18
> > yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
> > to use apt with an smp kernel.
In my playing w/ a J5000, the SMP kernel locks up when loading samba.
With samba disabled, the box boots, but eventually crashes for some
mysterious reason (I never tracked it down, just said "oh well" and went
back to UP).
...
> The only other possibility is processes are only running on CPU 0.
> ie when loading a device driver later, it always gets initialized on the
> monarch. This scenario would also match the "top" output where
> a 2-way system is always 50% idle and a 4-way is 75% idle.
>
> I'd like to learn some way of seeing which CPU is running which
> processes. top doesn't seem to indicate that. I'll look at sysstat
> package later.
I tried all sorts of things to try to find out what CPU stuff's on. Top
is no help, /proc/stat shows all but a tiny amount of time on CPU1 (?),
and /proc/(pid)/cpu tends to agree. It's been a little while since I
tried SMP, and looked at this stuff. I forget exactly what
/proc/(pid)/cpu said. I havent booted an SMP kernel for about a week.
Probably should do a cvs update and rebuild, see what happens. If samba
crashes that thing again, I think I'll scream :) I just went back to the
logs to see if anything useful was there. There wasn't. Just standard
boot stuff, then it stops for about a day (I tend to screw with the box
by remote on off-hours), then starts again. I'll try again w/ latest
kernel and report what happens.
--
Kaufman's First Law of Party Physics:
Population density is inversely proportional
to the square of the distance from the keg.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-26 7:29 ` Jeremy Drake
@ 2002-05-26 20:23 ` Jeremy Drake
2002-05-27 2:04 ` Grant Grundler
0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-26 20:23 UTC (permalink / raw)
To: parisc-linux
On Sun, 26 May 2002, Jeremy Drake wrote:
> On Sun, 26 May 2002, Grant Grundler wrote:
>
> > Robert Stanford wrote:
> > > Regarding the below post, have the SMP issues been worked out on 2.4.18
> > > yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
> > > to use apt with an smp kernel.
> In my playing w/ a J5000, the SMP kernel locks up when loading samba.
2.4.18-pa26 does it too. Here's the bootup sequence. I put in the whole
thing in case anyone is interested. If not, just skip to the bottom :)
Next time I'll just include relevant pieces.
What is causing that error, and why does it only happen on SMP?
Now I have to find some time to go and power-cycle that box before I can
do any more testing. :(
Firmware Version 5.0
Duplex Console IO Dependent Code (IODC) revision 1
------------------------------------------------------------------------------
(c) Copyright 1995-2000, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------
Processor Speed State Coprocessor State I/D Cache
--------- -------- --------------------- ----------------- -------------
0 440 MHz Active Functional 512 kB/1 MB
1 440 MHz Idle Functional 512 kB/1 MB
Central Bus Speed: 120 MHz
Available memory: 536870912 bytes
Good memory required: 46678016 bytes
Primary boot path: FWSCSI.6.0
Alternate boot path: SCSI.6.0
Console path: GRAPHICS(7)
Keyboard path: USB
Processor is booting from first available device.
To discontinue, press any key within 10 seconds.
Boot terminated.
----- Main Menu -------------------------------------------------------------
Command Description
------- -----------
BOot [PRI|ALT|<path>] Boot from specified path
PAth [PRI|ALT|CON|KEY [<path>]] Display or modify a path
SEArch [DIsplay|[[IPL] [<path>]]] Search for boot devices
COnfiguration [<command>] Access Configuration menu/commands
INformation [<command>] Access Information menu/commands
SERvice [<command>] Access Service menu/commands
DIsplay Redisplay the current menu
HElp [<menu>|<command>] Display help for menu or command
RESET Restart the system
-----
Main Menu: Enter command > bo pri
Interact with IPL (Y, N, Q)?> y
Booting...
Boot IO Dependent Code (IODC) revision 0
HARD Booted.
palo ipl 1.0 root@palinux Mon Apr 1 10:02:53 MST 2002
Bad DOS magic in extended partition
Partition Start(MB) End(MB) Id Type
1 1 15 f0 Palo
2 16 78 83 ext2
4 79 34514 83 ext2
PALO(F0) partition contains:
0/vmlinux32 3366227 bytes @ 0x48000
Information: No console specified on kernel command line. This is normal.
PALO will choose the console currently used by firmware (serial).Current command line:
2/vmlinux root=/dev/sda4 HOME=/ console=ttyS0 TERM=vt102
0: 2/vmlinux
1: root=/dev/sda4
2: HOME=/
3: console=ttyS0
4: TERM=vt102
Edit which field?
(or 'b' to boot with this command line)? b
Command line for kernel: 'root=/dev/sda4 HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux'
Selected kernel: /vmlinux from partition 2
ELF32 executable
Entry 00100298 first 00100000 n 5
Segment 0 load 00100000 size 2322260 mediaptr 0x1000
Segment 1 load 00338000 size 840924 mediaptr 0x238000
Segment 2 load 00408000 size 8192 mediaptr 0x306000
Segment 3 load 00410000 size 32768 mediaptr 0x308000
Segment 4 load 00446258 size 102480 mediaptr 0x310258
Branching to kernel entry point 0x00100298. If this is the last
message you see, you may need to switch your console. This is
a common symptom -- search the FAQ and mailing list at parisc-linux.org
Linux version 2.4.18-pa26 (root@krakatoa) (gcc version 3.0.4) #1 SMP Sun May 26 00:35:59 PDT 2002
FP[0] enabled: Rev 1 Model 16
The 32-bit Kernel has started...
Determining PDC firmware type: System Map.
model 00005bd0 00000491 00000000 00000002 776c6453 100000f0 00000008 000000b2 000000b2
vers 00000201
CPUID vers 17 rev 5 (0x00000225)
capabilities 0x3
model 9000/785/J5000
Total Memory: 512 Mb
pagetable_init
On node 0 totalpages: 131072
zone(0): 131072 pages.
zone(1): 0 pages.
zone(2): 0 pages.
LCD display at f05d0008,f05d0000 registered
Kernel command line: root=/dev/sda4 HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux
Console: colour dummy device 160x64
Calibrating delay loop... 878.18 BogoMIPS
Memory: 507900k available
Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
Searching for devices...
Found devices:
1. Astro BC Runway Port (12) at 0xfed00000 [10], versions 0x582, 0x0, 0xb
2. Elroy PCI Bridge (13) at 0xfed30000 [10/0], versions 0x782, 0x0, 0xa
3. Elroy PCI Bridge (13) at 0xfed32000 [10/1], versions 0x782, 0x0, 0xa
4. Elroy PCI Bridge (13) at 0xfed38000 [10/4], versions 0x782, 0x0, 0xa
5. Elroy PCI Bridge (13) at 0xfed3c000 [10/6], versions 0x782, 0x0, 0xa
6. Forte W 2-way (0) at 0xfffa0000 [32], versions 0x5bd, 0x0, 0x4
7. Forte W 2-way (0) at 0xfffa2000 [34], versions 0x5bd, 0x0, 0x4
8. Memory (1) at 0xfed10200 [49], versions 0x88, 0x0, 0x9
CPU(s): 2 x PA8500 (PCX-W) at 440.000000 MHz
SBA found Astro 2.1 at 0xfed00000
lba version TR2.1 (0x2) found at 0xfed30000
lba version TR2.1 (0x2) found at 0xfed32000
lba version TR2.1 (0x2) found at 0xfed38000
lba version TR2.1 (0x2) found at 0xfed3c000
POSIX conformance testing by UNIFIX
FP[1] enabled: Rev 1 Model 16
SMP: Total 2 of 2 processors activated (1756.36 BogoMIPS noticed).
Waiting on wait_init_idle (map = 0x2)
All processors have done init_idle
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Soft power switch enabled, polling @ 0xf0400804.
SuperIO: Found NS87560 Legacy I/O device at 00:0e.1 (IRQ 64)
SuperIO: Serial port 1 at 0x3f8
SuperIO: Serial port 2 at 0x2f8
SuperIO: Parallel port at 0x378
SuperIO: Floppy controller at 0x3f0
SuperIO: ACPI at 0x7e0
SuperIO: USB regulator enabled
parport0: PC-style at 0x378, irq 101 [PCSPP(,...)]
Starting kswapd
Journalled Block Device driver loaded
STI GSC/PCI graphics driver version 0.9
STI PCI graphic ROM found at f7000000 (128 kB), fb at fb000000 (16 MB)
STI word mode ROM at f7000044, hpa at fb000000
STI id 35acda16-9a02587, conforms to spec rev. 8.0c
STI device: HPA4982A
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at port 0x03f8 (irq = 99) is a 16550A
ttyS01 at port 0x02f8 (irq = 100) is a 16550A
lp0: using parport0 (interrupt-driven).
Generic RTC Driver v1.02 05/27/1999 Sam Creasey (sammy@oh.verio.com)
block: 128 slots per queue, batch=32
RAMDISK driver initialized: 16 RAM disks of 6144K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NS87415: IDE controller on PCI bus 00 dev 70
NS87415: chipset revision 3
NS87415: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x0a00-0x0a07, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0x0a08-0x0a0f, BIOS settings: hdc:pio, hdd:pio
hda: SONY CD-ROM CDU4821, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 103
hda: ATAPI 48X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.12
loop: loaded (max 8 devices)
Linux Tulip driver version 0.9.15-pre9 (Nov 6, 2001)
tulip0: no phy info, aborting mtable build
tulip0: MII transceiver #1 config 1000 status 782d advertising 01e1.
eth0: Digital DS21143 Tulip rev 48 at 0x1000, 00:10:83:35:0D:63, IRQ 66.
SCSI subsystem driver Revision: 1.00
sym53c8xx: at PCI bus 0, device 15, function 0
sym53c8xx: 53c896 detected
sym53c8xx: at PCI bus 0, device 15, function 1
sym53c8xx: 53c896 detected
sym53c896-0: rev 0x4 on pci bus 0 device 15 function 0 irq 65
sym53c896-0: ID 7, Fast-20, Parity Checking
sym53c896-0: handling phase mismatch from SCRIPTS.
sym53c896-1: rev 0x4 on pci bus 0 device 15 function 1 irq 65
sym53c896-1: ID 7, Fast-40, Parity Checking
sym53c896-1: handling phase mismatch from SCRIPTS.
scsi0 : sym53c8xx-1.7.3c-20010512
scsi1 : sym53c8xx-1.7.3c-20010512
Vendor: SEAGATE Model: ST336752LC Rev: 0002
Type: Direct-Access ANSI SCSI revision: 03
Attached scsi disk sda at scsi1, channel 0, id 6, lun 0
sym53c896-1-<6,*>: FAST-20 WIDE SCSI 40.0 MB/s (50.0 ns, offset 31)
SCSI device sda: 71687369 512-byte hdwr sectors (36704 MB)
Partition check:
sda: sda1 sda2 sda3 < sda5 > sda4
sticonsole_init: searching for STI ROMs
Console: switching to colour STI console 160x64
md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
8regs : 1060.000 MB/sec
8regs_prefetch: 1060.000 MB/sec
32regs : 752.800 MB/sec
32regs_prefetch: 752.800 MB/sec
raid5: using function: 8regs_prefetch (1060.000 MB/sec)
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 512 buckets, 24Kbytes
TCP: Hash tables configured (established 4096 bind 8192)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 389k freed
INIT: version 2.84 booting
Activating swap.
Adding Swap: 497972k swap-space (priority -1)
Checking root file system...
fsck 1.27 (8-Mar-2002)
/dev/sda4: clean, 68142/4415040 files, 1437754/8815668 blocks
System time was Sun May 26 20:04:48 UTC 2002.
Setting the System Clock using the Hardware Clock as reference...
System Clock set. System local time is now Sun May 26 20:04:50 UTC 2002.
Calculating module dependencies... done.
Loading modules:
Checking all file systems...
fsck 1.27 (8-Mar-2002)
/dev/sda2: clean, 26/16064 files, 19944/64260 blocks
Setting kernel variables.
Loading the saved-state of the serial devices...
/dev/ttyS0 at 0x03f8 (irq = 99) is a 16550A
/dev/ttyS1 at 0x02f8 (irq = 100) is a 16550A
Mounting local filesystems...
/dev/sda2 on /boot type ext2 (rw)
Cleaning: /etc/network/ifstate.
Setting up IP spoofing protection: rp_filter.
Configuring network interfaces: done.
Starting portmap daemon: portmap.
Starting portmapper... Mounting remote filesystems...
Setting the System Clock using the Hardware Clock as reference...
eth0: Setting full-duplex based on MII#1 link partner capability of 41e1.
System Clock set. Local time: Sun May 26 13:04:57 PDT 2002
Running ntpdate to synchronize clock.
Cleaning: /tmp /var/lock /var/run.
Initializing random number generator... done.
INIT: Entering runlevel: 2
Starting system log daemon: syslogd.
Starting kernel log daemon: klogd.
Starting NFS common utilities: statd.
Starting mouse interface server: gpm.
Starting internet superserver: inetd.
Starting printer spooler: lpd.
Not starting NFS kernel daemon: No exports.
Starting mail transport agent: Postfix.
Starting Samba daemons: nmbd smbdsmbd(276): unaligned access to 0x4001a2b8 at ip=0x4012ea1f
--
He who is known as an early riser need not get up until noon.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-26 20:23 ` Jeremy Drake
@ 2002-05-27 2:04 ` Grant Grundler
2002-05-27 6:17 ` Jeremy Drake
0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-27 2:04 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> 2.4.18-pa26 does it too. Here's the bootup sequence. I put in the whole
> thing in case anyone is interested. If not, just skip to the bottom :)
> Next time I'll just include relevant pieces.
It was ok to post the whole thing.
Did the machine "hang"? Can you provide "TOC" output?
(push TOC button on the back and then at PDC prompt "ser pim toc")
...
> Starting Samba daemons: nmbd smbdsmbd(276): unaligned access to 0x4001a2b8 at
> ip=0x4012ea1f
The "unaligned access" just tells us the app is touching data that
isn't aligned. That shouldn't cause a crash. Or at least if it does,
then it should crash the same way on a UP machine.
I don't know a damn thing about samba. Is it multi-threaded or
anything special? Send out broadcast packets maybe?
thanks,
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-27 2:04 ` Grant Grundler
@ 2002-05-27 6:17 ` Jeremy Drake
2002-05-27 12:04 ` Matthew Wilcox
2002-05-27 18:44 ` Jeremy Drake
0 siblings, 2 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-27 6:17 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Sun, 26 May 2002, Grant Grundler wrote:
> Jeremy Drake wrote:
> > 2.4.18-pa26 does it too. Here's the bootup sequence. I put in the whole
> > thing in case anyone is interested. If not, just skip to the bottom :)
> > Next time I'll just include relevant pieces.
>
> It was ok to post the whole thing.
> Did the machine "hang"? Can you provide "TOC" output?
> (push TOC button on the back and then at PDC prompt "ser pim toc")
It hung.
Could you tell me where exactly I can find this button on a J5000? Then I
can get it for you. I'll have physical access to the box all day
tomorrow.
>
> ...
> > Starting Samba daemons: nmbd smbdsmbd(276): unaligned access to 0x4001a2b8 at
> > ip=0x4012ea1f
>
> The "unaligned access" just tells us the app is touching data that
> isn't aligned. That shouldn't cause a crash. Or at least if it does,
> then it should crash the same way on a UP machine.
>
> I don't know a damn thing about samba. Is it multi-threaded or
> anything special? Send out broadcast packets maybe?
Probably multi-threaded, definitely broadcasts. It works w/o issues on
UP, but on SMP, samba stops it cold. With samba disabled, it seems to
work fine. I built a kernel on smp (make -j 2) with no issue, which is an
improvement over the last time I tried this...
>
> thanks,
> grant
>
--
Save the whales. Collect the whole set.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-27 6:17 ` Jeremy Drake
@ 2002-05-27 12:04 ` Matthew Wilcox
2002-05-27 18:44 ` Jeremy Drake
1 sibling, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2002-05-27 12:04 UTC (permalink / raw)
To: Jeremy Drake; +Cc: Grant Grundler, parisc-linux
On Sun, May 26, 2002 at 11:17:33PM -0700, Jeremy Drake wrote:
> Could you tell me where exactly I can find this button on a J5000? Then I
> can get it for you. I'll have physical access to the box all day
> tomorrow.
Little blue button, on the back near the serial ports. It's recessed
a bit so you probably need to use a pen to push it.
--
Revolutions do not require corporate support.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-27 6:17 ` Jeremy Drake
2002-05-27 12:04 ` Matthew Wilcox
@ 2002-05-27 18:44 ` Jeremy Drake
1 sibling, 0 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-27 18:44 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Sun, 26 May 2002, Jeremy Drake wrote:
> On Sun, 26 May 2002, Grant Grundler wrote:
>
> > Jeremy Drake wrote:
> > > 2.4.18-pa26 does it too. Here's the bootup sequence. I put in the whole
> > > thing in case anyone is interested. If not, just skip to the bottom :)
> > > Next time I'll just include relevant pieces.
> >
> > It was ok to post the whole thing.
> > Did the machine "hang"? Can you provide "TOC" output?
> > (push TOC button on the back and then at PDC prompt "ser pim toc")
The first time I tried today, it gave the unaligned error and said the
following in a loop on the LCD
HPMC initiated
multiple HPMCs
HPMC initiated
Runway broad err
bad OS HPMC cksm
OS HPMC br err
When I pressed the button on the back, it said
Runway broad err
and stopped. Had to pull the power cable -- the button wouldn't work
Here's the second time
/etc/init.d/samba start
Starting Samba daemons: nmbd smbd
[hung here]
Firmware Version 5.0
Duplex Console IO Dependent Code (IODC) revision 1
------------------------------------------------------------------------------
(c) Copyright 1995-2000, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------
Processor Speed State Coprocessor State I/D Cache
--------- -------- --------------------- ----------------- -------------
0 440 MHz Active Functional 512 kB/1 MB
1 440 MHz Idle Functional 512 kB/1 MB
Central Bus Speed: 120 MHz
Available memory: 536870912 bytes
Good memory required: 46678016 bytes
Primary boot path: FWSCSI.6.0
Alternate boot path: SCSI.6.0
Console path: SERIAL_1.9600.8.none
Keyboard path: PCI8.0.0
Processor is booting from first available device.
To discontinue, press any key within 10 seconds.
\aBoot terminated.
----- Main Menu -------------------------------------------------------------
Command Description
------- -----------
BOot [PRI|ALT|<path>] Boot from specified path
PAth [PRI|ALT|CON|KEY [<path>]] Display or modify a path
SEArch [DIsplay|[[IPL] [<path>]]] Search for boot devices
COnfiguration [<command>] Access Configuration menu/commands
INformation [<command>] Access Information menu/commands
SERvice [<command>] Access Service menu/commands
DIsplay Redisplay the current menu
HElp [<menu>|<command>] Display help for menu or command
RESET Restart the system
-----
Main Menu: Enter command > ser pim toc
PROCESSOR PIM INFORMATION
----------------- Processor 0 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000000358cf0 000000004012e987 00000000faf010d0
04-07 00000000400190b0 0000000000000018 00000000401f4ab0 00000000400190b0
08-11 40000000400190b0 0000000000000000 00000000faf00350 00000000000ba144
12-15 000000000006f800 000000000006f800 0000000000000000 0000000000000000
16-19 0000000000000000 00000000000b2248 0000000000029494 00000000401f4ab0
20-23 0000000000000000 0000000000001f38 00000000faf010e8 0000000000000018
24-27 000000000000012c 000000000ca6b064 00000000400190a4 00000000000a0944
28-31 ffffffffffffffff 00000000000000ac 00000000faf01240 000000000006b937
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000258 0000000000000000 00000000000000c0 0000000000000018
12-15 0000000000000000 0000000000000000 0000000000106000 00000000ffffffff
16-19 0000001d9616a712 000000000000012c 000000004012ea1f 000000000f541298
20-23 000000000000012c 40000000400190b0 0000000000000000 00000000a8000000
24-27 0000000000366000 000000000ca41000 0000000000044021 00000000f0412000
28-31 0000000055555555 0000000055555555 000000001ca6c000 0000000010410000
Space Registers 0 - 7
00-03 00000000 0000012c 00000000 0000012c
04-07 0000012c 0000012c 0000012c 0000012c
IIA Space = 0x000000000000012c
IIA Offset = 0x000000004012ea23
CPU State = 0x9e000001
Main Menu: Enter command >
And the third:
/etc/init.d/samba start
Starting Samba daemons: nmbd smbd
Firmware Version 5.0
Duplex Console IO Dependent Code (IODC) revision 1
------------------------------------------------------------------------------
(c) Copyright 1995-2000, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------
Processor Speed State Coprocessor State I/D Cache
--------- -------- --------------------- ----------------- -------------
0 440 MHz Active Functional 512 kB/1 MB
1 440 MHz Idle Functional 512 kB/1 MB
Central Bus Speed: 120 MHz
Available memory: 536870912 bytes
Good memory required: 46678016 bytes
Primary boot path: FWSCSI.6.0
Alternate boot path: SCSI.6.0
Console path: SERIAL_1.9600.8.none
Keyboard path: PCI8.0.0
Processor is booting from first available device.
To discontinue, press any key within 10 seconds.
\aBoot terminated.
----- Main Menu -------------------------------------------------------------
Command Description
------- -----------
BOot [PRI|ALT|<path>] Boot from specified path
PAth [PRI|ALT|CON|KEY [<path>]] Display or modify a path
SEArch [DIsplay|[[IPL] [<path>]]] Search for boot devices
COnfiguration [<command>] Access Configuration menu/commands
INformation [<command>] Access Information menu/commands
SERvice [<command>] Access Service menu/commands
DIsplay Redisplay the current menu
HElp [<menu>|<command>] Display help for menu or command
RESET Restart the system
-----
Main Menu: Enter command > ser pim toc
PROCESSOR PIM INFORMATION
----------------- Processor 0 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000000358cf0 000000004012e987 00000000faf010d0
04-07 00000000400190b0 0000000000000018 00000000401f4ab0 00000000400190b0
08-11 40000000400190b0 0000000000000000 00000000faf00350 00000000000ba144
12-15 000000000006f800 000000000006f800 0000000000000000 0000000000000000
16-19 0000000000000000 00000000000b2248 0000000000029494 00000000401f4ab0
20-23 0000000000000000 0000000000001f38 00000000faf010e8 0000000000000018
24-27 0000000000000130 000000000ca1c064 00000000400190a4 00000000000a0944
28-31 ffffffffffffffff 00000000000000ac 00000000faf01240 000000000006b937
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000260 0000000000000000 00000000000000c0 0000000000000018
12-15 0000000000000000 0000000000000000 0000000000106000 00000000ffffffff
16-19 000000e968d9578d 0000000000000130 000000004012ea1f 000000000f541298
20-23 0000000000000130 40000000400190b0 0000000000000000 00000000a8000000
24-27 0000000000366000 000000000ca1e000 0000000000044021 00000000f0412000
28-31 0000000055555555 0000000055555555 000000001ca18000 0000000010410000
Space Registers 0 - 7
00-03 00000000 00000130 00000000 00000130
04-07 00000130 00000130 00000130 00000130
IIA Space = 0x0000000000000130
IIA Offset = 0x000000004012ea23
CPU State = 0x9e000001
Main Menu: Enter command >
--
I am a man: nothing human is alien to me.
-- Publius Terentius Afer (Terence)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
[not found] <Pine.LNX.4.44.0205271438590.11012-200000@garibaldi.apptechsys.com>
@ 2002-05-28 17:07 ` Grant Grundler
2002-05-28 19:35 ` Jeremy Drake
0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-28 17:07 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> I'll try. BTW, the HPMC only happens sometimes. Most of the time it just
> hangs. But HPMC starts if I hit the button on the back and let it boot.
ok. This is an interesting symptom.
...
> General Registers 0 - 31
> 00-03 0000000000000000 0000000a44b3921e 0000000000019bf0 00000000f400400
> 0
GR02 is the return pointer - but it's not a kernel address.
Possible PDC or something else.
...
> IIA Space = 0x0000000000000000
> IIA Offset = 0x0000000000019bf8
IIA is the instruction pointer. Also not a valid kernel address.
It's possible we are getting a "double fault" and the first
one is overwriting the original HPMC.
> Check Type = 0x20000000
> CPU State = 0x9e000004
> Cache Check = 0x00000000
> TLB Check = 0x00000000
> Bus Check = 0x0030103b
> Assists Check = 0x00000000
> Assist State = 0x00000000
> Path Info = 0x00000000
> System Responder Address = 0x000000fff4004014
> System Requestor Address = 0xfffffffffffa0000
This is useful. The system *probably* died trying to access 0xf4004014.
I could try to look up CPU State but I'm out of time.
Here are the next steps:
1) figure out who is touching 0xf4004014.
I didn't see anything in the console output.
(http://lists.parisc-linux.org/pipermail/parisc-linux/2002-May/016342.html)
Can you look in /proc/iomem?
My C3000 has:
f4000000-f4ffffff : LBA PCI LMMIO
f4007000-f4007fff : usb-ohci
f4008000-f40083ff : tulip
2) figure out if the access is because of bad DMA killing the IOMMU
or just the chip not responding.
It remotely possible the latest commit I made will affect this problem.
Can you retry with -pa28 (or -pa29)?
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-28 17:07 ` [parisc-linux] 2.4.18 SMP instability Grant Grundler
@ 2002-05-28 19:35 ` Jeremy Drake
2002-05-28 19:45 ` Jeremy Drake
0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-28 19:35 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Tue, 28 May 2002, Grant Grundler wrote:
> It remotely possible the latest commit I made will affect this problem.
> Can you retry with -pa28 (or -pa29)?
Sure. No problem. I've been trying to keep the kernel as up-to-date as
possible...
>
> grant
>
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
>
--
Adult, n.:
One old enough to know better.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-28 19:35 ` Jeremy Drake
@ 2002-05-28 19:45 ` Jeremy Drake
2002-05-28 21:56 ` Jeremy Drake
2002-05-29 4:39 ` Grant Grundler
0 siblings, 2 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-28 19:45 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Tue, 28 May 2002, Jeremy Drake wrote:
> On Tue, 28 May 2002, Grant Grundler wrote:
>
> > It remotely possible the latest commit I made will affect this problem.
> > Can you retry with -pa28 (or -pa29)?
> Sure. No problem. I've been trying to keep the kernel as up-to-date as
> possible...
OK, I was doing an apt-get update, and the damn thing died at Reading
Package Lists... 0%. I'll see what's up with it when I can, do you want
ser pim, ser pim toc, or just wait for a new kernel? (this sort of thing
happens a lot on smp, but this box is surprisingly stable on UP)
Some things that occured to me about the hardware that may be influencing
this. The on-board USB on this box is broken (as in physically damaged).
I have a pci usb card in there for typing on the graphics console, and
when I installed it into the first slot recommended by the manual (2 I
think) the box did some HPMC stuff when it tried to do selftests. I moved
it to slot 8 and everything seems happy with it. Maybe something in the
smp code is aggrevating these problems...
>
> >
> > grant
> >
> > _______________________________________________
> > parisc-linux mailing list
> > parisc-linux@lists.parisc-linux.org
> > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> >
>
>
--
Tact in audacity is knowing how far you can go without going too far.
-- Jean Cocteau
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-28 19:45 ` Jeremy Drake
@ 2002-05-28 21:56 ` Jeremy Drake
2002-05-29 4:56 ` Grant Grundler
2002-05-29 4:39 ` Grant Grundler
1 sibling, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-28 21:56 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Tue, 28 May 2002, Jeremy Drake wrote:
> On Tue, 28 May 2002, Jeremy Drake wrote:
>
> > On Tue, 28 May 2002, Grant Grundler wrote:
> >
> > > It remotely possible the latest commit I made will affect this problem.
> > > Can you retry with -pa28 (or -pa29)?
> > Sure. No problem. I've been trying to keep the kernel as up-to-date as
> > possible...
> OK, I was doing an apt-get update, and the damn thing died at Reading
> Package Lists... 0%. I'll see what's up with it when I can, do you want
> ser pim, ser pim toc, or just wait for a new kernel? (this sort of thing
> happens a lot on smp, but this box is surprisingly stable on UP)
>
The LCD has a network, the HDD and an unfilled heart on the screen -- not
changing.
The console says apt-get (668): unaligned access to 0x403ce08c it
ip=0x4005e4f7
The TOC button had no effect. Here's a ser pim from after I pulled the
power and restarted it. It doesn't look particularly helpful.
ser pim
PROCESSOR PIM INFORMATION
----------------- Processor 0 HPMC Information ------------------
No valid timestamp
HPMC Chassis Codes = 2cbf0
General Registers 0 - 31
00-03 0000000000000000 000000001035eee0 00000000101009dc 0000000000800327
04-07 000000000001efff 000000000006cd00 0000000010410000 00000000f0002f68
08-11 0000000000000000 0000000000000003 000000000004000e 00000000103a5178
12-15 0000000000000000 00000000ffffffff 0000000000000001 00000000f0400004
16-19 00000000f00008c4 00000000f000017c 00000000f0000174 0000000010408000
20-23 0000000000000000 00000000103382a0 00000000103597c4 0000000000000000
24-27 00000000103598a0 0000000000000032 0000000000000019 0000000010338010
28-31 0000000000000000 0000000000000010 0000000010408700 00000000103598a0
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000106 0000000000000000 00000000000000c0 000000000000001f
12-15 0000000000000000 0000000000000000 0000000000106000 00000000ffffffff
16-19 00001d631d9a90dc 0000000000000000 00000000101009e0 000000004a740028
20-23 0000000000000000 0000000000000000 000000000004ff0f 0000000000000000
24-27 0000000000366000 000000001f571000 0000000000044021 00000000f0412000
28-31 0000000055555555 0000000055555555 0000000010408000 0000000010410000
Space Registers 0 - 7
00-03 00000000 00000083 00000000 00000083
04-07 00000000 00000000 00000000 00000000
<Press any key to continue (q to quit)>
IIA Space = 0x0000000000000000
IIA Offset = 0x00000000101009e4
Check Type = 0x20000000
CPU State = 0x9e000004
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x0030000d
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0xfffffffffffa0000
System Requestor Address = 0xfffffffffffa2000
Floating-Point Registers 0 - 31
00-03 0000001f00000000 0000000000000000 0000000000000000 0000000000000000
04-07 2ff8e00000000001 000000011015fa8c 1036505000000000 00000001f0400004
08-11 1036505000000002 ffffffff0000000a 0000000100000000 1041fdd31035d020
12-15 ffffffff000000ff 103a4000101482f4 103a4000ffff99ef 1115070010110264
16-19 2ff8e00011150000 0000000000000002 000000001035d010 1035981010358810
20-23 1035901010359810 103598102ff8e000 cccccccd51eb874f 0000000333333334
24-27 b38cf9b100000450 5555555555555555 5555555555555555 5555555555555555
28-31 3031323334353637 383961621014859c 6768696a6b6c6d6e 6f70717273747576
<Press any key to continue (q to quit)>
'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:
Check Summary = 0xcb81841000000000
Available Memory = 0x0000000020000000
CPU Diagnose Register 2 = 0x0201000000000004
CPU Status Register 0 = 0x3440c24000000000
CPU Status Register 1 = 0x8000000000000000
SADD LOG = 0x4820000000000000
Read Short LOG = 0xc1a0f0f0f0400804
ERROR_STATUS = 0x0000000000100010
MEM_ADDR = 0x000001ff3fffffff
MEM_SYND = 0x0000000000000000
MEM_ADDR_CORR = 0x000001ff3fffffff
MEM_SYND_CORR = 0x0000000000000000
RUN_DATA_HIGH = 0xc1bff0fffed08040
RUN_DATA_LOW = 0xc1bff0fffed08040
RUN_CTRL = 0x0000021c00001418
RUN_ADDR = 0xc1bff0fffed08040
System Responder Path = 0x00ffffffffffffff
HPMC PIM Analysis Information:
No valid timestamp
Memory/IO Controller Error Analysis Information:
<Press any key to continue (q to quit)>
----------------- Processor 0 LPMC Information ------------------
Check Type = 0x00000000
I/D Cache Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
----------------- Processor 0 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
IIA Space = 0x0000000000000000
IIA Offset = 0x0000000000000000
CPU State = 0x00000000
<Press any key to continue (q to quit)>
----------------- Processor 1 HPMC Information ------------------
No valid timestamp
HPMC Chassis Codes = No chassis codes logged
General Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
<Press any key to continue (q to quit)>
IIA Space = 0x0000000000000000
IIA Offset = 0x0000000000000000
Check Type = 0x00000000
CPU State = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
Floating-Point Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
<Press any key to continue (q to quit)>
Check Summary = 0x0000000000000000
Available Memory = 0x0000000000000000
CPU Diagnose Register 2 = 0x0000000000000000
CPU Status Register 0 = 0x0000000000000000
CPU Status Register 1 = 0x0000000000000000
SADD LOG = 0x0000000000000000
Read Short LOG = 0x0000000000000000
ERROR_STATUS = 0x0000000000000000
MEM_ADDR = 0x0000000000000000
MEM_SYND = 0x0000000000000000
MEM_ADDR_CORR = 0x0000000000000000
MEM_SYND_CORR = 0x0000000000000000
RUN_DATA_HIGH = 0x0000000000000000
RUN_DATA_LOW = 0x0000000000000000
RUN_CTRL = 0x0000000000000000
RUN_ADDR = 0x0000000000000000
System Responder Path = 0x0000000000000000
HPMC PIM Analysis Information:
No valid timestamp
Memory/IO Controller Error Analysis Information:
<Press any key to continue (q to quit)>
----------------- Processor 1 LPMC Information ------------------
Check Type = 0x00000000
I/D Cache Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
----------------- Processor 1 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000000
12-15 0000000000000000 0000000000000000 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000000000000 0000000000000000
20-23 0000000000000000 0000000000000000 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000000 0000000000000000 0000000000000000
28-31 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
IIA Space = 0x0000000000000000
IIA Offset = 0x0000000000000000
CPU State = 0x00000000
<Press any key to continue (q to quit)>
Memory Error Log Information:
No valid timestamp
No memory errors logged
I/O Module Error Log Information:
No valid timestamp
No I/O module errors logged
Main Menu: Enter command >
Main Menu: Enter command >
>
> >
> > >
> > > grant
> > >
> > > _______________________________________________
> > > parisc-linux mailing list
> > > parisc-linux@lists.parisc-linux.org
> > > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> > >
> >
> >
>
>
--
I called my parents the other night, but I forgot about the time difference.
They're still living in the fifties.
-- Strange de Jim
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-28 19:45 ` Jeremy Drake
2002-05-28 21:56 ` Jeremy Drake
@ 2002-05-29 4:39 ` Grant Grundler
2002-05-29 6:26 ` Jeremy Drake
1 sibling, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-29 4:39 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> OK, I was doing an apt-get update, and the damn thing died at Reading
> Package Lists... 0%. I'll see what's up with it when I can, do you want
> ser pim, ser pim toc, or just wait for a new kernel?
If the box HPMC'd, I'd like the "ser pim".
> Some things that occured to me about the hardware that may be influencing
> this. The on-board USB on this box is broken (as in physically damaged).
That's a good observation. Can you characterize how extensive is the
physical damage?
I don't recall anything on the previous console output that suggests
the USB interface driver isn't happy.
> I have a pci usb card in there for typing on the graphics console, and
> when I installed it into the first slot recommended by the manual (2 I
> think) the box did some HPMC stuff when it tried to do selftests. I moved
> it to slot 8 and everything seems happy with it. Maybe something in the
> smp code is aggrevating these problems...
Possible. Which manual are you referring to?
one that came with the USB card or some HP PARISC manual?
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-28 21:56 ` Jeremy Drake
@ 2002-05-29 4:56 ` Grant Grundler
0 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2002-05-29 4:56 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> The LCD has a network, the HDD and an unfilled heart on the screen -- not
> changing.
Not an HPMC then.
> The console says apt-get (668): unaligned access to 0x403ce08c it
> ip=0x4005e4f7
That's odd - i've never seen that from apt-get.
> The TOC button had no effect. Here's a ser pim from after I pulled the
> power and restarted it. It doesn't look particularly helpful.
Again I didn't look up the arcane stuff.
GR02 and IAOQ were both pointing at cpu_idle()
CR23 was zero; no external interrupts pending
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-29 4:39 ` Grant Grundler
@ 2002-05-29 6:26 ` Jeremy Drake
2002-05-29 6:35 ` Grant Grundler
0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-29 6:26 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Tue, 28 May 2002, Grant Grundler wrote:
> Jeremy Drake wrote:
> > Some things that occured to me about the hardware that may be influencing
> > this. The on-board USB on this box is broken (as in physically damaged).
>
> That's a good observation. Can you characterize how extensive is the
> physical damage?
The plastic thing that the pins sit on was missing from one of the 2 usb
ports when the box arrived. I, not having noticed this, plugged a
keyboard into it. It worked, but after playing with serial consoles and
such it wouldn't go back. The pins were bent because the reinforcing
plastic was missing. I tried to bend the pins so that they didn't short
anything, and one broke off. Both onboard USB ports haven't worked since.
When booting, it said "initializing keyboard" and then "IODC error". I
put it on serial console and left it alone after that.
> > I don't recall anything on the previous console output that suggests
> the USB interface driver isn't happy.
>
> > I have a pci usb card in there for typing on the graphics console, and
> > when I installed it into the first slot recommended by the manual (2 I
> > think) the box did some HPMC stuff when it tried to do selftests. I moved
> > it to slot 8 and everything seems happy with it. Maybe something in the
> > smp code is aggrevating these problems...
>
> Possible. Which manual are you referring to?
> one that came with the USB card or some HP PARISC manual?
The J5000 owners manual,
http://www.hp.com/workstations/support/documentation/manuals/user_guides/j_class/A5991-90000.pdf
near the top of page 54 it says "For non-graphics cards, insert them in
this order: Slot 2, then 8, 3, 5, and finally 6."
>
> grant
>
--
If a man has a strong faith he can indulge in the luxury of skepticism.
-- Friedrich Nietzsche
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-29 6:26 ` Jeremy Drake
@ 2002-05-29 6:35 ` Grant Grundler
2002-06-01 6:34 ` Jeremy Drake
0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-29 6:35 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> tried to bend the pins so that they didn't short
> anything, and one broke off. Both onboard USB ports haven't worked since.
Ok. So broken physically but not eletrically.
Not sure that should cause any problems.
You run the risk now of being the only person using an add-on
USB card regularly for parisc.
> The J5000 owners manual,
> http://www.hp.com/workstations/support/documentation/manuals/user_guides/j_cl
> ass/A5991-90000.pdf
> near the top of page 54 it says "For non-graphics cards, insert them in
> this order: Slot 2, then 8, 3, 5, and finally 6."
ok. Not sure why they offer that advice...but whatever.
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
[not found] <20020527223132.661F54843@dsl2.external.hp.com>
@ 2002-05-29 18:56 ` Jeremy Drake
0 siblings, 0 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-29 18:56 UTC (permalink / raw)
To: parisc-linux
OK, I just tried pa30. It boots successfully, but just died doing apt-get
update:
Fetched 554B in 10s (53B/s)
apt-get(320): unaligned access to 0x403ce094 at ip=0x4005e47f
Reading Package
And there it stopped. I don't know what it's doing but I'll see what kind
of info I can get from it.
These "unaligned access" messages only show up when running smp. apt-get
works perfectly with a UP kernel...
I should set up a webcam pointing at the LCD screen of that box, so I can
look at it remotely, to know if it HPMC'd or just locked up...
--
He missed an invaluable opportunity to hold his tongue.
-- Andrew Lang
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-05-29 6:35 ` Grant Grundler
@ 2002-06-01 6:34 ` Jeremy Drake
2002-06-02 16:32 ` Grant Grundler
0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-06-01 6:34 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Wed, 29 May 2002, Grant Grundler wrote:
> Jeremy Drake wrote:
> > tried to bend the pins so that they didn't short
> > anything, and one broke off. Both onboard USB ports haven't worked since.
>
> Ok. So broken physically but not eletrically.
> Not sure that should cause any problems.
I pulled the drive and plugged it into an identical (but not broken)
J5000. The box HPMC'd when doing an apt-get update. Since this box was
on graphics console, I got a large amount of hex numbers spewed to the
screen. So, at least we can rule out any damage to the box as the cause
of this.
--
Here I am, fifty-eight, and I still don't know what I want to be when
I grow up.
-- Peter Drucker
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-06-01 6:34 ` Jeremy Drake
@ 2002-06-02 16:32 ` Grant Grundler
2002-06-02 19:48 ` Jeremy Drake
0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-06-02 16:32 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> So, at least we can rule out any damage to the box as the cause of this.
Yup - thanks for trying that.
Offhand, Here are the differences I'm aware of between J5k and c3k:
o 2-CPU vs 1
o cache is 4-way associative vs 1-way (Same PA8500 CPU though!)
o J5K requires newer rev CPU (some SMP-related bugs fixed)
o Same PDC, but probably initializes a few things differently
o Though IO subsystem is identical chip set, J5k has more PCI busses
and more slots.
Since dirty cache writeback can be sensitive to how busy the system is,
it's possible the HPMC is caused by a similar problem to what we saw
on PA8700 systems. You might try building a kernel with
"ioc_needs_fdc" forced true in arch/parisc/kernel/sba_iommu.c.
If it avoids the HPMC (but we still see other hangs), then it's
a clue we don't have caching working right for that CPU setup.
hth,
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-06-02 16:32 ` Grant Grundler
@ 2002-06-02 19:48 ` Jeremy Drake
2002-06-03 3:28 ` Grant Grundler
2002-06-03 21:58 ` Jeremy Drake
0 siblings, 2 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-06-02 19:48 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Sun, 2 Jun 2002, Grant Grundler wrote:
> If it avoids the HPMC (but we still see other hangs), then it's
> a clue we don't have caching working right for that CPU setup.
What "other hangs" would I expect to see? This thing hangs periodically
anyway on SMP, and doesn't consistently give the HPMC. Sometimes it just
hangs. I'm building with the change you mentioned now -- we'll see what
happens...
OK. No HPMC, but a new and interesting message. The serial console
hangs, as always.
Fetched 2696kB in 26s (103kB/s)
apt-get(263): unaligned access to 0x403ce08c at ip=0x4005e4f7
Reading Package
But, the LCD screen has a new message for me:
INI 3001: SYS BD
PDH control init
If you think it would help, I could pay the box a visit today and get
whatever "ser pim" or "ser pim toc" I can...
> > hth,
> grant
>
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
>
--
Novinson's Revolutionary Discovery:
When comes the revolution, things will be different --
not better, just different.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-06-02 19:48 ` Jeremy Drake
@ 2002-06-03 3:28 ` Grant Grundler
2002-06-03 21:58 ` Jeremy Drake
1 sibling, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2002-06-03 3:28 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> What "other hangs" would I expect to see? This thing hangs periodically
> anyway on SMP, and doesn't consistently give the HPMC.
> Sometimes it just hangs.
Well, we don't really know anything about hungs you've been seeing or if
it's the same problem each time. I'm comfortable the SMP kernel works
with apt-get since I'm not able to reproduce the problem with either of
the two SMP machines I have (PA8500 and PA8700).
> I'm building with the change you mentioned now -- we'll see what
> happens...
>
> OK. No HPMC, but a new and interesting message. The serial console
> hangs, as always.
...
> But, the LCD screen has a new message for me:
>
> INI 3001: SYS BD
> PDH control init
hmmm...that might just be an intermediate state for HPMC.
This seems to be part of the reset sequence.
> If you think it would help, I could pay the box a visit today and get
> whatever "ser pim" or "ser pim toc" I can...
nah...get "ser pim" tomorrow. But I may not be able to look at it until
the end of the week. Try repeating with the kluged kernel a few times and
see if it now always gets the same symptom (ie no HPMC). If it doesn't HPMC,
most likely you need to run UP kernels until someone who understands
cache handling and J5000 a bit can look at it.
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-06-02 19:48 ` Jeremy Drake
2002-06-03 3:28 ` Grant Grundler
@ 2002-06-03 21:58 ` Jeremy Drake
2002-06-05 21:24 ` Grant Grundler
1 sibling, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-06-03 21:58 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
On Sun, 2 Jun 2002, Jeremy Drake wrote:
> On Sun, 2 Jun 2002, Grant Grundler wrote:
>
> > If it avoids the HPMC (but we still see other hangs), then it's
> > a clue we don't have caching working right for that CPU setup.
> OK. No HPMC, but a new and interesting message. The serial console
> hangs, as always.
>
> Fetched 2696kB in 26s (103kB/s)
> apt-get(263): unaligned access to 0x403ce08c at ip=0x4005e4f7
> Reading Package
>
> But, the LCD screen has a new message for me:
>
> INI 3001: SYS BD
> PDH control init
>
> If you think it would help, I could pay the box a visit today and get
> whatever "ser pim" or "ser pim toc" I can...
Here it is... BTW, maybe you could explain how to interpret these, so I
don't have to send you all of this...
ser pim
PROCESSOR PIM INFORMATION
----------------- Processor 0 HPMC Information ------------------
Timestamp =
Tue May 28 23:38:36 GMT 2002 (20:02:05:28:23:38:36)
HPMC Chassis Codes = 2cbf0 2500b 2cbf1 2cbfc
General Registers 0 - 31
00-03 0000000000000000 000000095bf6dde5 0000000000019bf0 00000000f4004000
04-07 0000000000001d58 0000000000002710 ffffffffffffffce 0000000000002000
08-11 0000000044657266 fffffffff4004000 000000000000000a fffffff0f0000834
12-15 0000000000000000 ffffffffffffffff 0000000000000001 fffffff0f0400004
16-19 fffffff0f00008c4 fffffff0f000017c fffffff0f0000174 00000000000019fc
20-23 00000000f4004014 00000000000001f4 0000000000019bf0 ffffffffffffffff
24-27 ffffffffffffffff 0000000000000000 000000fa00000000 fffffff0f0412000
28-31 0000000000035b60 ffffffffffffffff 0000000000001e90 0000000000002710
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000004 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 0000000000000000 0000000000000006
12-15 0000000000000000 0000000000000000 000000f0f0003800 0000000000000000
16-19 000000095d2ccf91 0000000000000000 0000000000019bf4 000000000e80103d
20-23 00000000a607ffd0 c000000001004014 000000ff0000ff08 8800000000000000
24-27 0000000055555555 0000000055555555 0000000000041020 00000000f0412000
28-31 0000000055555555 0000000055555555 00000000f04088d8 0000000000000020
Space Registers 0 - 7
00-03 00000000 c9af9dd0 00000000 00000000
04-07 00000000 00000000 00000000 00000000
<Press any key to continue (q to quit)>
IIA Space = 0x0000000000000000
IIA Offset = 0x0000000000019bf8
Check Type = 0x20000000
CPU State = 0x9e000004
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x0030103b
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x000000fff4004014
System Requestor Address = 0xfffffffffffa0000
Floating-Point Registers 0 - 31
00-03 0000001f00000000 0000000000000000 0000000000000000 0000000000000000
04-07 2ffa200000000001 000000011015fa8c 1036505000000000 00000001f0400004
08-11 1036505000000002 ffffffff0000000a 0000000100000000 1041fdd31035d020
12-15 ffffffff000000ff 103a4000101482f4 103a4000ffff99ef 1115070010110264
16-19 2ffa200011150000 0000000000000002 000000001035d010 1035981010358810
20-23 1035901010359810 103598102ffa2000 1115000000000000 0000000200000000
24-27 5555555555555555 5555555555555555 5555555555555555 5555555555555555
28-31 3031323334353637 383961621014859c 6768696a6b6c6d6e 6f70717273747576
<Press any key to continue (q to quit)>
'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:
Check Summary = 0xc381141008000000
Available Memory = 0x0000000020000000
CPU Diagnose Register 2 = 0x02010000ac802000
CPU Status Register 0 = 0x2040000000000000
CPU Status Register 1 = 0x8002000000000000
SADD LOG = 0x0221fd0050210df0
Read Short LOG = 0xc18080fff4004014
ERROR_STATUS = 0x0000000000100010
MEM_ADDR = 0x000001ff3fffffff
MEM_SYND = 0x0000000000000000
MEM_ADDR_CORR = 0x000001ff3fffffff
MEM_SYND_CORR = 0x0000000000000000
RUN_DATA_HIGH = 0x37dd3fa153c23ee1
RUN_DATA_LOW = 0xe840d00037de3f01
RUN_CTRL = 0x0000021c00001418
RUN_ADDR = 0xc13ff0f0f003ce50
System Responder Path = 0x00ffffff0a000f01
HPMC PIM Analysis Information:
Timestamp =
Tue May 28 23:38:36 GMT 2002 (20:02:05:28:23:38:36)
'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:
A Data I/O Fetch Timeout occurred while CPU 0 was
requesting information from a device at the path 10/0/15/1 (built-in PCI device).
Memory/IO Controller Error Analysis Information:
The Memory/IO Controller only observed the Broadcast Error. It did not log
any additional information about the HPMC.
<Press any key to continue (q to quit)>
----------------- Processor 0 LPMC Information ------------------
Check Type = 0x00000000
I/D Cache Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
----------------- Processor 0 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 0000000040000000 000000004000dc93 00000000faf00800
04-07 0000000040000000 0000000000000008 0000000040026758 0000000000000000
08-11 0000000000000000 00000000faf00798 0000000000000000 0000000000000000
12-15 00000000faf00890 0000000040026612 00000000faf00300 0000000000000000
16-19 0000000040000000 0000000010408000 0000000000000000 0000000040000000
20-23 00000000faf0089f 00000000faf006a0 000000001031a8b0 00000000faf00798
24-27 0000000000000008 0000000011150408 000000000000000f 000000001015fbb4
28-31 0000000000028000 0000000011150380 0000000011150640 000000004000f923
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000002 0000000000000000 00000000000000c0 0000000000000010
12-15 0000000000000000 0000000000000000 0000000000106000 00000000ff800000
16-19 000000110d4ad99d 0000000000000000 00000000101076c0 000000002f301221
20-23 0000000010340004 0000000054150408 000000000004000e 0000000000000000
24-27 0000000000366000 00000000003bb000 0000000000044021 00000000f0412000
28-31 0000000055555555 0000000055555555 0000000011150000 0000000010410000
Space Registers 0 - 7
00-03 00000001 00000001 00000000 00000001
04-07 00000000 00000000 00000000 00000000
IIA Space = 0x0000000000000000
IIA Offset = 0x00000000101076c4
CPU State = 0x9e000001
<Press any key to continue (q to quit)>
----------------- Processor 1 HPMC Information ------------------
Timestamp =
Sun Jun 2 19:40:32 GMT 2002 (20:02:06:02:19:40:32)
HPMC Chassis Codes = 2cbf0 2510b 2cbf4 2cbfc
General Registers 0 - 31
00-03 0000000000000000 fffffff0f009d000 fffffff0f0068d78 0000000000000000
04-07 7f00000000000000 feffffffffffffff 000000000031b6f8 0000000000000008
08-11 fffffffffed30300 fffffffffed22200 0100000000000000 000000000002cb90
12-15 00000000000f4000 000000000000c800 fffffffffed40000 fffffffffed22210
16-19 4000000000000000 0000000000000002 00000000f000016c fffffffffee003f9
20-23 fffffffffee003fb 0000000000000087 fffffffffee003f8 5871000000000000
24-27 7f00000000000000 fffffff0f0071eb8 fffffffffee003fa fffffff0f0412000
28-31 0000000000000000 fffffffffee003fb 000000000031b7d8 fffffffffee00000
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 000000000000010c 0000000000000000 00000000000000c0 0000000000000039
12-15 0000000000000000 0000000000000000 0000000000106000 00000000ff000000
16-19 000000124c7c7456 000000003ffffff0 fffffff0f0037354 000000000e80103a
20-23 00000000ae07fffb c0000000802003fb 0000000008000108 0000000080000000
24-27 0000000000336000 000000001f7e3000 0000000000044021 00000000f0412000
28-31 0000000055555555 0000000055555555 00000000100dc000 0000000011111111
Space Registers 0 - 7
00-03 00000000 00000086 00000000 00000086
04-07 00000000 00000000 00000000 00000000
<Press any key to continue (q to quit)>
IIA Space = 0x000000003ffffff0
IIA Offset = 0xfffffff0f0037358
Check Type = 0x20000000
CPU State = 0x9e000004
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x0030103b
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x000000fffee003fb
System Requestor Address = 0xfffffffffffa2000
Floating-Point Registers 0 - 31
00-03 0000001f00000000 0000000000000000 0000000000000000 0000000000000000
04-07 2ffe200000000001 000000011015fa58 1033505000000000 00000001f0400004
08-11 1033505000000002 ffffffff0000000a 000000010000003f 103dfdd300000040
12-15 00000000103caf14 103caf4010148768 00000000ffff9b5f 100d470000000000
16-19 2ffe2000100d4000 0000000000000002 000000001032d010 1032981010328810
20-23 1032901010329810 103298102ffe2000 cccccccd51eb874f 0000000333333334
24-27 b38cf9b100000450 5555555555555555 5555555555555555 5555555555555555
28-31 3031323334353637 3839616210148a10 6768696a6b6c6d6e 6f70717273747576
<Press any key to continue (q to quit)>
'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:
Check Summary = 0xcb81041008000000
Available Memory = 0x0000000020000000
CPU Diagnose Register 2 = 0x0201010000000004
CPU Status Register 0 = 0x2440c24000000000
CPU Status Register 1 = 0x800a000000000000
SADD LOG = 0xc11ff0f0f0002b50
Read Short LOG = 0xc18100fffee003fb
ERROR_STATUS = 0x0000000000100010
MEM_ADDR = 0x000001ff3fffffff
MEM_SYND = 0x0000000000000000
MEM_ADDR_CORR = 0x000001ff3fffffff
MEM_SYND_CORR = 0x0000000000000000
RUN_DATA_HIGH = 0xe840c002000014bc
RUN_DATA_LOW = 0x379c00680f9a20dc
RUN_CTRL = 0x0000005c00001658
RUN_ADDR = 0xc13ff0f0f0002b50
System Responder Path = 0x00ffff0a000e0101
HPMC PIM Analysis Information:
Timestamp =
Sun Jun 2 19:40:32 GMT 2002 (20:02:06:02:19:40:32)
'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:
An Instruction I/O Fetch and Data I/O Fetch Timeout occurred while CPU 1 was
requesting information from a device at the path 10/0/14/1/1 (built-in PCI device).
Memory/IO Controller Error Analysis Information:
The Memory/IO Controller only observed the Broadcast Error. It did not log
any additional information about the HPMC.
<Press any key to continue (q to quit)>
----------------- Processor 1 LPMC Information ------------------
Check Type = 0x00000000
I/D Cache Parity Info = 0x00000000
Cache Check = 0x00000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
----------------- Processor 1 TOC Information -------------------
General Registers 0 - 31
00-03 0000000000000000 000000001035eee0 00000000101009dc 0000000000000000
04-07 0000000000366000 00000000f0400008 00000000000000fa 00000000f0002f68
08-11 0000000000000000 0000000000000000 000000000004000e 00000000103a7464
12-15 00000000000000f2 0000000000000001 0000000000000001 00000000000000f3
16-19 0000000002020202 0000000000000002 00000000f000016c 0000000011158000
20-23 0000000000000000 00000000103382b0 00000000103597c4 0000000000000000
24-27 00000000103598a0 0000000000000032 0000000000000019 0000000010338010
28-31 0000000000000000 0000000000000010 00000000111586c0 00000000103598a0
<Press any key to continue (q to quit)>
Control Registers 0 - 31
00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000000000 0000000000000000 00000000000000c0 000000000000001e
12-15 0000000000000000 0000000000000000 0000000000106000 00000000ff800000
16-19 0000001107f8d7df 0000000000000000 00000000101009dc 0000000003c008b3
20-23 0000000000000000 0000000000000000 000000000004ff0f 0000000000000000
24-27 0000000000366000 0000000000366000 0000000000044021 00000000f0412000
28-31 0000000055555555 0000000055555555 0000000011158000 0000000011111111
Space Registers 0 - 7
00-03 00000000 00000000 00000000 00000000
04-07 00000000 00000000 00000000 00000000
IIA Space = 0x0000000000000000
IIA Offset = 0x00000000101009e0
CPU State = 0x9e000001
<Press any key to continue (q to quit)>
Memory Error Log Information:
Timestamp =
Sun Jun 2 19:40:32 GMT 2002 (20:02:06:02:19:40:32)
'9000/785 B,C,J Workstation Memory Error Log', rev 0, 64 bytes:
No memory errors logged
I/O Module Error Log Information:
Timestamp =
Sun Jun 2 19:40:32 GMT 2002 (20:02:06:02:19:40:32)
'9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:
Rope Word1 Word2 Word3
------ ------------ ------------
0 0x0002e000 0x0e0cc009 0x00000000000007fc
1 0x00000000 0x1e0cc009 0x00000000fed32048
2 0x04000000 0x2e0cc009 0xffffffffffffffff
3 ---------- 0x3e0cc009 ------------------
4 0x00000000 0x4e0cc009 0x00000000fed38048
5 ---------- 0x5e0cc009 ------------------
6 0x00000000 0x6e0cc009 0x00000000fed3c048
7 ---------- 0x7e0cc009 ------------------
Main Menu: Enter command >
Main Menu: Enter command >
Main Menu: Enter command >
>
>
> > > hth,
> > grant
> >
> > _______________________________________________
> > parisc-linux mailing list
> > parisc-linux@lists.parisc-linux.org
> > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> >
>
>
--
On ability:
A dwarf is small, even if he stands on a mountain top;
a colossus keeps his height, even if he stands in a well.
-- Lucius Annaeus Seneca, 4BC - 65AD
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [parisc-linux] 2.4.18 SMP instability
2002-06-03 21:58 ` Jeremy Drake
@ 2002-06-05 21:24 ` Grant Grundler
0 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2002-06-05 21:24 UTC (permalink / raw)
To: Jeremy Drake; +Cc: parisc-linux
Jeremy Drake wrote:
> Here it is... BTW, maybe you could explain how to interpret these, so I
> don't have to send you all of this...
Generally, look at IOA offset and GR02 to see where it died.
If it's not a kernel address, start trying to figure out what it is.
Lots more magic in the PIM dump that I don't understand either.
In this HPMC dump, I don't know where 0x19bf0 is...
The firmware on the workstations tries to give a high level decoding
of the error:
> A Data I/O Fetch Timeout occurred while CPU 0 was
> requesting information from a device at the path 10/0/15/1 (built-in PCI devi
> ce).
>
>
> Memory/IO Controller Error Analysis Information:
>
> The Memory/IO Controller only observed the Broadcast Error. It did not log
> any additional information about the HPMC.
This typically means something in the IO path didn't respond
to a CPU read.
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2002-06-05 21:24 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.44.0205271438590.11012-200000@garibaldi.apptechsys.com>
2002-05-28 17:07 ` [parisc-linux] 2.4.18 SMP instability Grant Grundler
2002-05-28 19:35 ` Jeremy Drake
2002-05-28 19:45 ` Jeremy Drake
2002-05-28 21:56 ` Jeremy Drake
2002-05-29 4:56 ` Grant Grundler
2002-05-29 4:39 ` Grant Grundler
2002-05-29 6:26 ` Jeremy Drake
2002-05-29 6:35 ` Grant Grundler
2002-06-01 6:34 ` Jeremy Drake
2002-06-02 16:32 ` Grant Grundler
2002-06-02 19:48 ` Jeremy Drake
2002-06-03 3:28 ` Grant Grundler
2002-06-03 21:58 ` Jeremy Drake
2002-06-05 21:24 ` Grant Grundler
[not found] <20020527223132.661F54843@dsl2.external.hp.com>
2002-05-29 18:56 ` Jeremy Drake
2002-05-26 0:48 Robert Stanford
2002-05-26 6:09 ` Grant Grundler
2002-05-26 7:29 ` Jeremy Drake
2002-05-26 20:23 ` Jeremy Drake
2002-05-27 2:04 ` Grant Grundler
2002-05-27 6:17 ` Jeremy Drake
2002-05-27 12:04 ` Matthew Wilcox
2002-05-27 18:44 ` Jeremy Drake
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.