All of lore.kernel.org
 help / color / mirror / Atom feed
* [parisc-linux] 2.4.18 SMP instability
@ 2002-05-26  0:48 Robert Stanford
  2002-05-26  6:09 ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Robert Stanford @ 2002-05-26  0:48 UTC (permalink / raw)
  To: HP900 PARISC mailing list

Regarding the below post, have the SMP issues been worked out on 2.4.18
yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
to use apt with an smp kernel.

apt-get(3766): unaligned access to 0x403ce094 at ip=0x4005e47f     

Although I was doing some benchmarking and was able to make -j 3 vmlinux
using a 2.4.18-25 SMP kernel with no problems.


Robert Stanford


*cut*
--------------------------------------------------------------------
From: Matthew Wilcox (willy@debian.org)
Date: Thu Apr 11 2002 - 14:40:12 MDT

On Thu, Apr 11, 2002 at 03:16:25PM -0400, D'Ausilio, John wrote:
> Is the 2.4.18 which comes down from the archive as recent as the ones
in the
> FTP server? I'm going to boot back into the original kernel and try
getting
> the latest from the FTP server .. if that dosn't work I guess I'll get
the
> sources and build from CVS. Any other hints/clues/suggestions? Should
I just
> run single proc for now?

Yes, we've also found 2.4.18 to be unstable SMP. I believe Grant has a
handle on this problem now, so expect it to be fixed quite soon.
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-26  0:48 [parisc-linux] 2.4.18 SMP instability Robert Stanford
@ 2002-05-26  6:09 ` Grant Grundler
  2002-05-26  7:29   ` Jeremy Drake
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-26  6:09 UTC (permalink / raw)
  To: Robert Stanford; +Cc: parisc-linux

Robert Stanford wrote:
> Regarding the below post, have the SMP issues been worked out on 2.4.18
> yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
> to use apt with an smp kernel.

uhm...I see that I'm using UP kernels on my boxes right now.
I'll rebuild SMP and retest.

I did just find an SMP problem in the current EIEM handling.
Can't say if this is really causing any problems right now though.
Stop reading now if you don't know about (or don't want to) EIEM.

If enable_irq or disable_irq gets called from a CPU other than
the one the device driver is supposed to interrupt, it will set the
EIEM bit in only *that* (the wrong) CPU. The result is the interrupt
will remain masked on the target CPU. I think the solution
is to use a global "eiem_val" (set/clear bits here) to match
the global EIRR switch table.  I've thought about moving to a
per-CPU EIEM/EIRR switch table. But that's more work than I
have time for right now and would have a similar problem.
For now, we just need to update EIEM on all CPUs whenever the
eiem_val global changes.

We do NOT currently distribute interrupts.
I did write a patch to distribute IO interrupts:
	ftp://ftp.parisc-linux.org/patches/irq_distr.diff

This diff can't be applied until the EIEM issue is fixed.

I suspect we don't (usually) have a problem with EIEM since all
interrupts are going to CPU 0 (aka Monarch) and nearly all driver
initialization takes place before the system is multithreaded.
The only other possibility is processes are only running on CPU 0.
ie when loading a device driver later, it always gets initialized on the
monarch. This scenario would also match the "top" output where
a 2-way system is always 50% idle and a 4-way is 75% idle.

I'd like to learn some way of seeing which CPU is running which
processes. top doesn't seem to indicate that. I'll look at sysstat
package later.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-26  6:09 ` Grant Grundler
@ 2002-05-26  7:29   ` Jeremy Drake
  2002-05-26 20:23     ` Jeremy Drake
  0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-26  7:29 UTC (permalink / raw)
  To: parisc-linux

On Sun, 26 May 2002, Grant Grundler wrote:

> Robert Stanford wrote:
> > Regarding the below post, have the SMP issues been worked out on 2.4.18
> > yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
> > to use apt with an smp kernel.
In my playing w/ a J5000, the SMP kernel locks up when loading samba.  
With samba disabled, the box boots, but eventually crashes for some 
mysterious reason (I never tracked it down, just said "oh well" and went 
back to UP).  

...
> The only other possibility is processes are only running on CPU 0.
> ie when loading a device driver later, it always gets initialized on the
> monarch. This scenario would also match the "top" output where
> a 2-way system is always 50% idle and a 4-way is 75% idle.
> 
> I'd like to learn some way of seeing which CPU is running which
> processes. top doesn't seem to indicate that. I'll look at sysstat
> package later.
I tried all sorts of things to try to find out what CPU stuff's on.  Top 
is no help, /proc/stat shows all but a tiny amount of time on CPU1 (?), 
and /proc/(pid)/cpu tends to agree.  It's been a little while since I 
tried SMP, and looked at this stuff.  I forget exactly what 
/proc/(pid)/cpu said.  I havent booted an SMP kernel for about a week.  
Probably should do a cvs update and rebuild, see what happens.  If samba 
crashes that thing again, I think I'll scream :)  I just went back to the 
logs to see if anything useful was there.  There wasn't.  Just standard 
boot stuff, then it stops for about a day (I tend to screw with the box 
by remote on off-hours), then starts again.  I'll try again w/ latest 
kernel and report what happens.

-- 
Kaufman's First Law of Party Physics:
	Population density is inversely proportional
	to the square of the distance from the keg.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-26  7:29   ` Jeremy Drake
@ 2002-05-26 20:23     ` Jeremy Drake
  2002-05-27  2:04       ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-26 20:23 UTC (permalink / raw)
  To: parisc-linux

On Sun, 26 May 2002, Jeremy Drake wrote:
> On Sun, 26 May 2002, Grant Grundler wrote:
> 
> > Robert Stanford wrote:
> > > Regarding the below post, have the SMP issues been worked out on 2.4.18
> > > yet? Im running 2.4.18-25 and the machine seems to lock whenever I try
> > > to use apt with an smp kernel.
> In my playing w/ a J5000, the SMP kernel locks up when loading samba.  

2.4.18-pa26 does it too. Here's the bootup sequence.  I put in the whole
thing in case anyone is interested.  If not, just skip to the bottom :)  
Next time I'll just include relevant pieces.  

What is causing that error, and why does it only happen on SMP?

Now I have to find some time to go and power-cycle that box before I can
do any more testing. :(




Firmware Version 5.0

Duplex Console IO Dependent Code (IODC) revision 1

------------------------------------------------------------------------------
   (c) Copyright 1995-2000, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------

  Processor   Speed            State           Coprocessor State  I/D Cache 
  ---------  --------   ---------------------  -----------------  -------------
      0      440 MHz    Active                 Functional         512 kB/1 MB
      1      440 MHz    Idle                   Functional         512 kB/1 MB

  Central Bus Speed:                   120 MHz

  Available memory:              536870912 bytes
  Good memory required:           46678016 bytes

  Primary boot path:    FWSCSI.6.0
  Alternate boot path:  SCSI.6.0
  Console path:         GRAPHICS(7)
  Keyboard path:        USB

Processor is booting from first available device.

To discontinue, press any key within 10 seconds.

Boot terminated.


----- Main Menu -------------------------------------------------------------

      Command                           Description
      -------                           -----------
      BOot [PRI|ALT|<path>]             Boot from specified path
      PAth [PRI|ALT|CON|KEY [<path>]]   Display or modify a path
      SEArch [DIsplay|[[IPL] [<path>]]] Search for boot devices

      COnfiguration [<command>]         Access Configuration menu/commands
      INformation [<command>]           Access Information menu/commands
      SERvice [<command>]               Access Service menu/commands

      DIsplay                           Redisplay the current menu
      HElp [<menu>|<command>]           Display help for menu or command
      RESET                             Restart the system
-----
Main Menu: Enter command > bo pri
Interact with IPL (Y, N, Q)?> y

Booting... 
Boot IO Dependent Code (IODC) revision 0


HARD Booted.
palo ipl 1.0 root@palinux Mon Apr  1 10:02:53 MST 2002
Bad DOS magic in extended partition

Partition Start(MB) End(MB) Id Type
1               1      15   f0 Palo
2              16      78   83 ext2
4              79   34514   83 ext2

PALO(F0) partition contains:
    0/vmlinux32 3366227 bytes @ 0x48000

Information: No console specified on kernel command line. This is normal.
PALO will choose the console currently used by firmware (serial).Current command line:
2/vmlinux root=/dev/sda4 HOME=/ console=ttyS0 TERM=vt102
 0: 2/vmlinux
 1: root=/dev/sda4
 2: HOME=/
 3: console=ttyS0
 4: TERM=vt102

Edit which field?
(or 'b' to boot with this command line)? b

Command line for kernel: 'root=/dev/sda4 HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux'
Selected kernel: /vmlinux from partition 2
ELF32 executable
Entry 00100298 first 00100000 n 5
Segment 0 load 00100000 size 2322260 mediaptr 0x1000
Segment 1 load 00338000 size 840924 mediaptr 0x238000
Segment 2 load 00408000 size 8192 mediaptr 0x306000
Segment 3 load 00410000 size 32768 mediaptr 0x308000
Segment 4 load 00446258 size 102480 mediaptr 0x310258
Branching to kernel entry point 0x00100298.  If this is the last
message you see, you may need to switch your console.  This is
a common symptom -- search the FAQ and mailing list at parisc-linux.org

Linux version 2.4.18-pa26 (root@krakatoa) (gcc version 3.0.4) #1 SMP Sun May 26 00:35:59 PDT 2002
FP[0] enabled: Rev 1 Model 16
The 32-bit Kernel has started...
Determining PDC firmware type: System Map.
model 00005bd0 00000491 00000000 00000002 776c6453 100000f0 00000008 000000b2 000000b2
vers  00000201
CPUID vers 17 rev 5 (0x00000225)
capabilities 0x3
model 9000/785/J5000
Total Memory: 512 Mb
pagetable_init
On node 0 totalpages: 131072
zone(0): 131072 pages.
zone(1): 0 pages.
zone(2): 0 pages.
LCD display at f05d0008,f05d0000 registered
Kernel command line: root=/dev/sda4 HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux
Console: colour dummy device 160x64
Calibrating delay loop... 878.18 BogoMIPS
Memory: 507900k available
Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
Searching for devices...
Found devices:
1. Astro BC Runway Port (12) at 0xfed00000 [10], versions 0x582, 0x0, 0xb
2. Elroy PCI Bridge (13) at 0xfed30000 [10/0], versions 0x782, 0x0, 0xa
3. Elroy PCI Bridge (13) at 0xfed32000 [10/1], versions 0x782, 0x0, 0xa
4. Elroy PCI Bridge (13) at 0xfed38000 [10/4], versions 0x782, 0x0, 0xa
5. Elroy PCI Bridge (13) at 0xfed3c000 [10/6], versions 0x782, 0x0, 0xa
6. Forte W 2-way (0) at 0xfffa0000 [32], versions 0x5bd, 0x0, 0x4
7. Forte W 2-way (0) at 0xfffa2000 [34], versions 0x5bd, 0x0, 0x4
8. Memory (1) at 0xfed10200 [49], versions 0x88, 0x0, 0x9
CPU(s): 2 x PA8500 (PCX-W) at 440.000000 MHz
SBA found Astro 2.1 at 0xfed00000
lba version TR2.1 (0x2) found at 0xfed30000
lba version TR2.1 (0x2) found at 0xfed32000
lba version TR2.1 (0x2) found at 0xfed38000
lba version TR2.1 (0x2) found at 0xfed3c000
POSIX conformance testing by UNIFIX
FP[1] enabled: Rev 1 Model 16
SMP: Total 2 of 2 processors activated (1756.36 BogoMIPS noticed).
Waiting on wait_init_idle (map = 0x2)
All processors have done init_idle
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Soft power switch enabled, polling @ 0xf0400804.
SuperIO: Found NS87560 Legacy I/O device at 00:0e.1 (IRQ 64) 
SuperIO: Serial port 1 at 0x3f8
SuperIO: Serial port 2 at 0x2f8
SuperIO: Parallel port at 0x378
SuperIO: Floppy controller at 0x3f0
SuperIO: ACPI at 0x7e0
SuperIO: USB regulator enabled
parport0: PC-style at 0x378, irq 101 [PCSPP(,...)]
Starting kswapd
Journalled Block Device driver loaded
STI GSC/PCI graphics driver version 0.9
STI PCI graphic ROM found at f7000000 (128 kB), fb at fb000000 (16 MB)
STI word mode ROM at f7000044, hpa at fb000000
STI id 35acda16-9a02587, conforms to spec rev. 8.0c
STI device: HPA4982A
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at port 0x03f8 (irq = 99) is a 16550A
ttyS01 at port 0x02f8 (irq = 100) is a 16550A
lp0: using parport0 (interrupt-driven).
Generic RTC Driver v1.02 05/27/1999 Sam Creasey (sammy@oh.verio.com)
block: 128 slots per queue, batch=32
RAMDISK driver initialized: 16 RAM disks of 6144K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NS87415: IDE controller on PCI bus 00 dev 70
NS87415: chipset revision 3
NS87415: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0x0a00-0x0a07, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0x0a08-0x0a0f, BIOS settings: hdc:pio, hdd:pio
hda: SONY CD-ROM CDU4821, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 103
hda: ATAPI 48X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.12
loop: loaded (max 8 devices)
Linux Tulip driver version 0.9.15-pre9 (Nov 6, 2001)
tulip0: no phy info, aborting mtable build
tulip0:  MII transceiver #1 config 1000 status 782d advertising 01e1.
eth0: Digital DS21143 Tulip rev 48 at 0x1000, 00:10:83:35:0D:63, IRQ 66.
SCSI subsystem driver Revision: 1.00
sym53c8xx: at PCI bus 0, device 15, function 0
sym53c8xx: 53c896 detected 
sym53c8xx: at PCI bus 0, device 15, function 1
sym53c8xx: 53c896 detected 
sym53c896-0: rev 0x4 on pci bus 0 device 15 function 0 irq 65
sym53c896-0: ID 7, Fast-20, Parity Checking
sym53c896-0: handling phase mismatch from SCRIPTS.
sym53c896-1: rev 0x4 on pci bus 0 device 15 function 1 irq 65
sym53c896-1: ID 7, Fast-40, Parity Checking
sym53c896-1: handling phase mismatch from SCRIPTS.
scsi0 : sym53c8xx-1.7.3c-20010512
scsi1 : sym53c8xx-1.7.3c-20010512
  Vendor: SEAGATE   Model: ST336752LC        Rev: 0002
  Type:   Direct-Access                      ANSI SCSI revision: 03
Attached scsi disk sda at scsi1, channel 0, id 6, lun 0
sym53c896-1-<6,*>: FAST-20 WIDE SCSI 40.0 MB/s (50.0 ns, offset 31)
SCSI device sda: 71687369 512-byte hdwr sectors (36704 MB)
Partition check:
 sda: sda1 sda2 sda3 < sda5 > sda4
sticonsole_init: searching for STI ROMs
Console: switching to colour STI console 160x64
md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
   8regs     :  1060.000 MB/sec
   8regs_prefetch:  1060.000 MB/sec
   32regs    :   752.800 MB/sec
   32regs_prefetch:   752.800 MB/sec
raid5: using function: 8regs_prefetch (1060.000 MB/sec)
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 512 buckets, 24Kbytes
TCP: Hash tables configured (established 4096 bind 8192)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 389k freed
INIT: version 2.84 booting
Activating swap.
Adding Swap: 497972k swap-space (priority -1)
Checking root file system...
fsck 1.27 (8-Mar-2002)
/dev/sda4: clean, 68142/4415040 files, 1437754/8815668 blocks
System time was Sun May 26 20:04:48 UTC 2002.
Setting the System Clock using the Hardware Clock as reference...
System Clock set. System local time is now Sun May 26 20:04:50 UTC 2002.
Calculating module dependencies... done.
Loading modules: 
Checking all file systems...
fsck 1.27 (8-Mar-2002)
/dev/sda2: clean, 26/16064 files, 19944/64260 blocks
Setting kernel variables.
Loading the saved-state of the serial devices... 
/dev/ttyS0 at 0x03f8 (irq = 99) is a 16550A
/dev/ttyS1 at 0x02f8 (irq = 100) is a 16550A
Mounting local filesystems...
/dev/sda2 on /boot type ext2 (rw)
Cleaning: /etc/network/ifstate.
Setting up IP spoofing protection: rp_filter.
Configuring network interfaces: done.
Starting portmap daemon: portmap.
Starting portmapper... Mounting remote filesystems...

Setting the System Clock using the Hardware Clock as reference...
eth0: Setting full-duplex based on MII#1 link partner capability of 41e1.
System Clock set. Local time: Sun May 26 13:04:57 PDT 2002

Running ntpdate to synchronize clock.
Cleaning: /tmp /var/lock /var/run.
Initializing random number generator... done.
INIT: Entering runlevel: 2
Starting system log daemon: syslogd.
Starting kernel log daemon: klogd.
Starting NFS common utilities: statd.
Starting mouse interface server: gpm.
Starting internet superserver: inetd.
Starting printer spooler: lpd.
Not starting NFS kernel daemon: No exports.
Starting mail transport agent: Postfix.
Starting Samba daemons: nmbd smbdsmbd(276): unaligned access to 0x4001a2b8 at ip=0x4012ea1f


-- 
He who is known as an early riser need not get up until noon.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-26 20:23     ` Jeremy Drake
@ 2002-05-27  2:04       ` Grant Grundler
  2002-05-27  6:17         ` Jeremy Drake
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-27  2:04 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> 2.4.18-pa26 does it too. Here's the bootup sequence.  I put in the whole
> thing in case anyone is interested.  If not, just skip to the bottom :)  
> Next time I'll just include relevant pieces.  

It was ok to post the whole thing.
Did the machine "hang"?  Can you provide "TOC" output?
(push TOC button on the back and then at PDC prompt "ser pim toc")

...
> Starting Samba daemons: nmbd smbdsmbd(276): unaligned access to 0x4001a2b8 at
>    ip=0x4012ea1f

The "unaligned access" just tells us the app is touching data that
isn't aligned. That shouldn't cause a crash. Or at least if it does,
then it should crash the same way on a UP machine.

I don't know a damn thing about samba. Is it multi-threaded or
anything special? Send out broadcast packets maybe?

thanks,
grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-27  2:04       ` Grant Grundler
@ 2002-05-27  6:17         ` Jeremy Drake
  2002-05-27 12:04           ` Matthew Wilcox
  2002-05-27 18:44           ` Jeremy Drake
  0 siblings, 2 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-27  6:17 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Sun, 26 May 2002, Grant Grundler wrote:

> Jeremy Drake wrote:
> > 2.4.18-pa26 does it too. Here's the bootup sequence.  I put in the whole
> > thing in case anyone is interested.  If not, just skip to the bottom :)  
> > Next time I'll just include relevant pieces.  
> 
> It was ok to post the whole thing.
> Did the machine "hang"?  Can you provide "TOC" output?
> (push TOC button on the back and then at PDC prompt "ser pim toc")
It hung.
Could you tell me where exactly I can find this button on a J5000?  Then I 
can get it for you.  I'll have physical access to the box all day 
tomorrow.
> 
> ...
> > Starting Samba daemons: nmbd smbdsmbd(276): unaligned access to 0x4001a2b8 at
> >    ip=0x4012ea1f
> 
> The "unaligned access" just tells us the app is touching data that
> isn't aligned. That shouldn't cause a crash. Or at least if it does,
> then it should crash the same way on a UP machine.
> 
> I don't know a damn thing about samba. Is it multi-threaded or
> anything special? Send out broadcast packets maybe?
Probably multi-threaded, definitely broadcasts.  It works w/o issues on 
UP, but on SMP, samba stops it cold.  With samba disabled, it seems to 
work fine.  I built a kernel on smp (make -j 2) with no issue, which is an 
improvement over the last time I tried this...

> 
> thanks,
> grant
> 

-- 
Save the whales.  Collect the whole set.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-27  6:17         ` Jeremy Drake
@ 2002-05-27 12:04           ` Matthew Wilcox
  2002-05-27 18:44           ` Jeremy Drake
  1 sibling, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2002-05-27 12:04 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: Grant Grundler, parisc-linux

On Sun, May 26, 2002 at 11:17:33PM -0700, Jeremy Drake wrote:
> Could you tell me where exactly I can find this button on a J5000?  Then I 
> can get it for you.  I'll have physical access to the box all day 
> tomorrow.

Little blue button, on the back near the serial ports.  It's recessed
a bit so you probably need to use a pen to push it.

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-27  6:17         ` Jeremy Drake
  2002-05-27 12:04           ` Matthew Wilcox
@ 2002-05-27 18:44           ` Jeremy Drake
  1 sibling, 0 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-27 18:44 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Sun, 26 May 2002, Jeremy Drake wrote:

> On Sun, 26 May 2002, Grant Grundler wrote:
> 
> > Jeremy Drake wrote:
> > > 2.4.18-pa26 does it too. Here's the bootup sequence.  I put in the whole
> > > thing in case anyone is interested.  If not, just skip to the bottom :)  
> > > Next time I'll just include relevant pieces.  
> > 
> > It was ok to post the whole thing.
> > Did the machine "hang"?  Can you provide "TOC" output?
> > (push TOC button on the back and then at PDC prompt "ser pim toc")

The first time I tried today, it gave the unaligned error and said the 
following in a loop on the LCD
HPMC initiated
multiple HPMCs
HPMC initiated
Runway broad err
bad OS HPMC cksm
OS HPMC br err

When I pressed the button on the back, it said
Runway broad err
and stopped.  Had to pull the power cable -- the button wouldn't work

Here's the second time

/etc/init.d/samba start
Starting Samba daemons: nmbd smbd
[hung here]

Firmware Version 5.0

Duplex Console IO Dependent Code (IODC) revision 1

------------------------------------------------------------------------------
   (c) Copyright 1995-2000, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------

  Processor   Speed            State           Coprocessor State  I/D Cache 
  ---------  --------   ---------------------  -----------------  -------------
      0      440 MHz    Active                 Functional         512 kB/1 MB
      1      440 MHz    Idle                   Functional         512 kB/1 MB

  Central Bus Speed:                   120 MHz

  Available memory:              536870912 bytes
  Good memory required:           46678016 bytes

  Primary boot path:    FWSCSI.6.0
  Alternate boot path:  SCSI.6.0
  Console path:         SERIAL_1.9600.8.none
  Keyboard path:        PCI8.0.0

Processor is booting from first available device.

To discontinue, press any key within 10 seconds.

\aBoot terminated.


----- Main Menu -------------------------------------------------------------

      Command                           Description
      -------                           -----------
      BOot [PRI|ALT|<path>]             Boot from specified path
      PAth [PRI|ALT|CON|KEY [<path>]]   Display or modify a path
      SEArch [DIsplay|[[IPL] [<path>]]] Search for boot devices

      COnfiguration [<command>]         Access Configuration menu/commands
      INformation [<command>]           Access Information menu/commands
      SERvice [<command>]               Access Service menu/commands

      DIsplay                           Redisplay the current menu
      HElp [<menu>|<command>]           Display help for menu or command
      RESET                             Restart the system
-----
Main Menu: Enter command > ser pim toc

PROCESSOR PIM INFORMATION

-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  0000000000358cf0  000000004012e987  00000000faf010d0
04-07   00000000400190b0  0000000000000018  00000000401f4ab0  00000000400190b0
08-11   40000000400190b0  0000000000000000  00000000faf00350  00000000000ba144
12-15   000000000006f800  000000000006f800  0000000000000000  0000000000000000
16-19   0000000000000000  00000000000b2248  0000000000029494  00000000401f4ab0
20-23   0000000000000000  0000000000001f38  00000000faf010e8  0000000000000018
24-27   000000000000012c  000000000ca6b064  00000000400190a4  00000000000a0944
28-31   ffffffffffffffff  00000000000000ac  00000000faf01240  000000000006b937

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000258  0000000000000000  00000000000000c0  0000000000000018
12-15   0000000000000000  0000000000000000  0000000000106000  00000000ffffffff
16-19   0000001d9616a712  000000000000012c  000000004012ea1f  000000000f541298
20-23   000000000000012c  40000000400190b0  0000000000000000  00000000a8000000
24-27   0000000000366000  000000000ca41000  0000000000044021  00000000f0412000
28-31   0000000055555555  0000000055555555  000000001ca6c000  0000000010410000
Space Registers 0 - 7

00-03   00000000          0000012c          00000000          0000012c
04-07   0000012c          0000012c          0000012c          0000012c

IIA Space                    = 0x000000000000012c
IIA Offset                   = 0x000000004012ea23
CPU State                    = 0x9e000001

Main Menu: Enter command >

And the third:
/etc/init.d/samba start
Starting Samba daemons: nmbd smbd





Firmware Version 5.0

Duplex Console IO Dependent Code (IODC) revision 1

------------------------------------------------------------------------------
   (c) Copyright 1995-2000, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------

  Processor   Speed            State           Coprocessor State  I/D Cache 
  ---------  --------   ---------------------  -----------------  -------------
      0      440 MHz    Active                 Functional         512 kB/1 MB
      1      440 MHz    Idle                   Functional         512 kB/1 MB

  Central Bus Speed:                   120 MHz

  Available memory:              536870912 bytes
  Good memory required:           46678016 bytes

  Primary boot path:    FWSCSI.6.0
  Alternate boot path:  SCSI.6.0
  Console path:         SERIAL_1.9600.8.none
  Keyboard path:        PCI8.0.0

Processor is booting from first available device.

To discontinue, press any key within 10 seconds.

\aBoot terminated.


----- Main Menu -------------------------------------------------------------

      Command                           Description
      -------                           -----------
      BOot [PRI|ALT|<path>]             Boot from specified path
      PAth [PRI|ALT|CON|KEY [<path>]]   Display or modify a path
      SEArch [DIsplay|[[IPL] [<path>]]] Search for boot devices

      COnfiguration [<command>]         Access Configuration menu/commands
      INformation [<command>]           Access Information menu/commands
      SERvice [<command>]               Access Service menu/commands

      DIsplay                           Redisplay the current menu
      HElp [<menu>|<command>]           Display help for menu or command
      RESET                             Restart the system
-----
Main Menu: Enter command > ser pim toc

PROCESSOR PIM INFORMATION

-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  0000000000358cf0  000000004012e987  00000000faf010d0
04-07   00000000400190b0  0000000000000018  00000000401f4ab0  00000000400190b0
08-11   40000000400190b0  0000000000000000  00000000faf00350  00000000000ba144
12-15   000000000006f800  000000000006f800  0000000000000000  0000000000000000
16-19   0000000000000000  00000000000b2248  0000000000029494  00000000401f4ab0
20-23   0000000000000000  0000000000001f38  00000000faf010e8  0000000000000018
24-27   0000000000000130  000000000ca1c064  00000000400190a4  00000000000a0944
28-31   ffffffffffffffff  00000000000000ac  00000000faf01240  000000000006b937

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000260  0000000000000000  00000000000000c0  0000000000000018
12-15   0000000000000000  0000000000000000  0000000000106000  00000000ffffffff
16-19   000000e968d9578d  0000000000000130  000000004012ea1f  000000000f541298
20-23   0000000000000130  40000000400190b0  0000000000000000  00000000a8000000
24-27   0000000000366000  000000000ca1e000  0000000000044021  00000000f0412000
28-31   0000000055555555  0000000055555555  000000001ca18000  0000000010410000
Space Registers 0 - 7

00-03   00000000          00000130          00000000          00000130
04-07   00000130          00000130          00000130          00000130

IIA Space                    = 0x0000000000000130
IIA Offset                   = 0x000000004012ea23
CPU State                    = 0x9e000001

Main Menu: Enter command > 
 

-- 
I am a man: nothing human is alien to me.
		-- Publius Terentius Afer (Terence)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
       [not found] <Pine.LNX.4.44.0205271438590.11012-200000@garibaldi.apptechsys.com>
@ 2002-05-28 17:07 ` Grant Grundler
  2002-05-28 19:35   ` Jeremy Drake
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-28 17:07 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> I'll try.  BTW, the HPMC only happens sometimes.  Most of the time it just 
> hangs.  But HPMC starts if I hit the button on the back and let it boot.

ok. This is an interesting symptom.

...
> General Registers 0 - 31
> 00-03   0000000000000000  0000000a44b3921e  0000000000019bf0  00000000f400400
>   0

GR02 is the return pointer - but it's not a kernel address.
Possible PDC or something else.

...
> IIA Space                    = 0x0000000000000000
> IIA Offset                   = 0x0000000000019bf8

IIA is the instruction pointer. Also not a valid kernel address.
It's possible we are getting a "double fault" and the first
one is overwriting the original HPMC.

> Check Type                   = 0x20000000
> CPU State                    = 0x9e000004
> Cache Check                  = 0x00000000
> TLB Check                    = 0x00000000
> Bus Check                    = 0x0030103b
> Assists Check                = 0x00000000
> Assist State                 = 0x00000000
> Path Info                    = 0x00000000
> System Responder Address     = 0x000000fff4004014
> System Requestor Address     = 0xfffffffffffa0000

This is useful. The system *probably* died trying to access 0xf4004014.
I could try to look up CPU State but I'm out of time.


Here are the next steps:
1) figure out who is touching 0xf4004014.
   I didn't see anything in the console output.
   (http://lists.parisc-linux.org/pipermail/parisc-linux/2002-May/016342.html)
   Can you look in /proc/iomem?
   My C3000 has:
   f4000000-f4ffffff : LBA PCI LMMIO
     f4007000-f4007fff : usb-ohci
     f4008000-f40083ff : tulip

2) figure out if the access is because of bad DMA killing the IOMMU
   or just the chip not responding.

It remotely possible the latest commit I made will affect this problem.
Can you retry with -pa28 (or -pa29)?

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-28 17:07 ` Grant Grundler
@ 2002-05-28 19:35   ` Jeremy Drake
  2002-05-28 19:45     ` Jeremy Drake
  0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-28 19:35 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Tue, 28 May 2002, Grant Grundler wrote:

> It remotely possible the latest commit I made will affect this problem.
> Can you retry with -pa28 (or -pa29)?
Sure.  No problem.  I've been trying to keep the kernel as up-to-date as 
possible...

> 
> grant
> 
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> 

-- 
Adult, n.:
	One old enough to know better.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-28 19:35   ` Jeremy Drake
@ 2002-05-28 19:45     ` Jeremy Drake
  2002-05-28 21:56       ` Jeremy Drake
  2002-05-29  4:39       ` Grant Grundler
  0 siblings, 2 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-28 19:45 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Tue, 28 May 2002, Jeremy Drake wrote:

> On Tue, 28 May 2002, Grant Grundler wrote:
> 
> > It remotely possible the latest commit I made will affect this problem.
> > Can you retry with -pa28 (or -pa29)?
> Sure.  No problem.  I've been trying to keep the kernel as up-to-date as 
> possible...
OK, I was doing an apt-get update, and the damn thing died at Reading 
Package Lists... 0%.  I'll see what's up with it when I can, do you want 
ser pim, ser pim toc, or just wait for a new kernel?  (this sort of thing 
happens a lot on smp, but this box is surprisingly stable on UP)

Some things that occured to me about the hardware that may be influencing 
this.  The on-board USB on this box is broken (as in physically damaged).  
I have a pci usb card in there for typing on the graphics console, and 
when I installed it into the first slot recommended by the manual (2 I 
think) the box did some HPMC stuff when it tried to do selftests.  I moved 
it to slot 8 and everything seems happy with it.  Maybe something in the 
smp code is aggrevating these problems...

> 
> > 
> > grant
> > 
> > _______________________________________________
> > parisc-linux mailing list
> > parisc-linux@lists.parisc-linux.org
> > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> > 
> 
> 

-- 
Tact in audacity is knowing how far you can go without going too far.
		-- Jean Cocteau

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-28 19:45     ` Jeremy Drake
@ 2002-05-28 21:56       ` Jeremy Drake
  2002-05-29  4:56         ` Grant Grundler
  2002-05-29  4:39       ` Grant Grundler
  1 sibling, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-28 21:56 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Tue, 28 May 2002, Jeremy Drake wrote:

> On Tue, 28 May 2002, Jeremy Drake wrote:
> 
> > On Tue, 28 May 2002, Grant Grundler wrote:
> > 
> > > It remotely possible the latest commit I made will affect this problem.
> > > Can you retry with -pa28 (or -pa29)?
> > Sure.  No problem.  I've been trying to keep the kernel as up-to-date as 
> > possible...
> OK, I was doing an apt-get update, and the damn thing died at Reading 
> Package Lists... 0%.  I'll see what's up with it when I can, do you want 
> ser pim, ser pim toc, or just wait for a new kernel?  (this sort of thing 
> happens a lot on smp, but this box is surprisingly stable on UP)
> 
The LCD has a network, the HDD and an unfilled heart on the screen -- not 
changing.

The console says apt-get (668): unaligned access to 0x403ce08c it 
ip=0x4005e4f7

The TOC button had no effect.  Here's a ser pim from after I pulled the 
power and restarted it.  It doesn't look particularly helpful.

ser pim

PROCESSOR PIM INFORMATION

-----------------  Processor 0 HPMC Information ------------------

   No valid timestamp

HPMC Chassis Codes = 2cbf0  

General Registers 0 - 31
00-03   0000000000000000  000000001035eee0  00000000101009dc  0000000000800327
04-07   000000000001efff  000000000006cd00  0000000010410000  00000000f0002f68
08-11   0000000000000000  0000000000000003  000000000004000e  00000000103a5178
12-15   0000000000000000  00000000ffffffff  0000000000000001  00000000f0400004
16-19   00000000f00008c4  00000000f000017c  00000000f0000174  0000000010408000
20-23   0000000000000000  00000000103382a0  00000000103597c4  0000000000000000
24-27   00000000103598a0  0000000000000032  0000000000000019  0000000010338010
28-31   0000000000000000  0000000000000010  0000000010408700  00000000103598a0

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000106  0000000000000000  00000000000000c0  000000000000001f
12-15   0000000000000000  0000000000000000  0000000000106000  00000000ffffffff
16-19   00001d631d9a90dc  0000000000000000  00000000101009e0  000000004a740028
20-23   0000000000000000  0000000000000000  000000000004ff0f  0000000000000000
24-27   0000000000366000  000000001f571000  0000000000044021  00000000f0412000
28-31   0000000055555555  0000000055555555  0000000010408000  0000000010410000
Space Registers 0 - 7

00-03   00000000          00000083          00000000          00000083
04-07   00000000          00000000          00000000          00000000

<Press any key to continue (q to quit)> 

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x00000000101009e4
Check Type                   = 0x20000000
CPU State                    = 0x9e000004
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x0030000d
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0xfffffffffffa0000
System Requestor Address     = 0xfffffffffffa2000

Floating-Point Registers 0 - 31
00-03   0000001f00000000  0000000000000000  0000000000000000  0000000000000000
04-07   2ff8e00000000001  000000011015fa8c  1036505000000000  00000001f0400004
08-11   1036505000000002  ffffffff0000000a  0000000100000000  1041fdd31035d020
12-15   ffffffff000000ff  103a4000101482f4  103a4000ffff99ef  1115070010110264
16-19   2ff8e00011150000  0000000000000002  000000001035d010  1035981010358810
20-23   1035901010359810  103598102ff8e000  cccccccd51eb874f  0000000333333334
24-27   b38cf9b100000450  5555555555555555  5555555555555555  5555555555555555
28-31   3031323334353637  383961621014859c  6768696a6b6c6d6e  6f70717273747576

<Press any key to continue (q to quit)> 


'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:

Check Summary                = 0xcb81841000000000
Available Memory             = 0x0000000020000000
CPU Diagnose Register 2      = 0x0201000000000004
CPU Status Register 0        = 0x3440c24000000000
CPU Status Register 1        = 0x8000000000000000
SADD LOG                     = 0x4820000000000000
Read Short LOG               = 0xc1a0f0f0f0400804
ERROR_STATUS                 = 0x0000000000100010
MEM_ADDR                     = 0x000001ff3fffffff
MEM_SYND                     = 0x0000000000000000
MEM_ADDR_CORR                = 0x000001ff3fffffff
MEM_SYND_CORR                = 0x0000000000000000
RUN_DATA_HIGH                = 0xc1bff0fffed08040
RUN_DATA_LOW                 = 0xc1bff0fffed08040
RUN_CTRL                     = 0x0000021c00001418
RUN_ADDR                     = 0xc1bff0fffed08040
System Responder Path        = 0x00ffffffffffffff


HPMC PIM Analysis Information:

   No valid timestamp



Memory/IO Controller Error Analysis Information:


<Press any key to continue (q to quit)> 

-----------------  Processor 0 LPMC Information ------------------

Check Type                   = 0x00000000
I/D Cache Parity Info        = 0x00000000
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x00000000
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x0000000000000000
System Requestor Address     = 0x0000000000000000


-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000000
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000000000000
CPU State                    = 0x00000000


<Press any key to continue (q to quit)> 

-----------------  Processor 1 HPMC Information ------------------

   No valid timestamp

HPMC Chassis Codes = No chassis codes logged

General Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000000
04-07   00000000          00000000          00000000          00000000

<Press any key to continue (q to quit)> 

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000000000000
Check Type                   = 0x00000000
CPU State                    = 0x00000000
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x00000000
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x0000000000000000
System Requestor Address     = 0x0000000000000000

Floating-Point Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000

<Press any key to continue (q to quit)> 

Check Summary                = 0x0000000000000000
Available Memory             = 0x0000000000000000
CPU Diagnose Register 2      = 0x0000000000000000
CPU Status Register 0        = 0x0000000000000000
CPU Status Register 1        = 0x0000000000000000
SADD LOG                     = 0x0000000000000000
Read Short LOG               = 0x0000000000000000
ERROR_STATUS                 = 0x0000000000000000
MEM_ADDR                     = 0x0000000000000000
MEM_SYND                     = 0x0000000000000000
MEM_ADDR_CORR                = 0x0000000000000000
MEM_SYND_CORR                = 0x0000000000000000
RUN_DATA_HIGH                = 0x0000000000000000
RUN_DATA_LOW                 = 0x0000000000000000
RUN_CTRL                     = 0x0000000000000000
RUN_ADDR                     = 0x0000000000000000
System Responder Path        = 0x0000000000000000


HPMC PIM Analysis Information:

   No valid timestamp



Memory/IO Controller Error Analysis Information:


<Press any key to continue (q to quit)> 

-----------------  Processor 1 LPMC Information ------------------

Check Type                   = 0x00000000
I/D Cache Parity Info        = 0x00000000
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x00000000
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x0000000000000000
System Requestor Address     = 0x0000000000000000


-----------------  Processor 1 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
16-19   0000000000000000  0000000000000000  0000000000000000  0000000000000000
20-23   0000000000000000  0000000000000000  0000000000000000  0000000000000000
24-27   0000000000000000  0000000000000000  0000000000000000  0000000000000000
28-31   0000000000000000  0000000000000000  0000000000000000  0000000000000000
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000000
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000000000000
CPU State                    = 0x00000000


<Press any key to continue (q to quit)> 

Memory Error Log Information:

   No valid timestamp

   No memory errors logged


I/O Module Error Log Information:

   No valid timestamp

   No I/O module errors logged

Main Menu: Enter command > 
Main Menu: Enter command > 
> 
> > 
> > > 
> > > grant
> > > 
> > > _______________________________________________
> > > parisc-linux mailing list
> > > parisc-linux@lists.parisc-linux.org
> > > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> > > 
> > 
> > 
> 
> 

-- 
I called my parents the other night, but I forgot about the time difference.
They're still living in the fifties.
		-- Strange de Jim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-28 19:45     ` Jeremy Drake
  2002-05-28 21:56       ` Jeremy Drake
@ 2002-05-29  4:39       ` Grant Grundler
  2002-05-29  6:26         ` Jeremy Drake
  1 sibling, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-29  4:39 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> OK, I was doing an apt-get update, and the damn thing died at Reading 
> Package Lists... 0%.  I'll see what's up with it when I can, do you want 
> ser pim, ser pim toc, or just wait for a new kernel?

If the box HPMC'd, I'd like the "ser pim".

> Some things that occured to me about the hardware that may be influencing 
> this.  The on-board USB on this box is broken (as in physically damaged).  

That's a good observation. Can you characterize how extensive is the
physical damage?

I don't recall anything on the previous console output that suggests
the USB interface driver isn't happy.

> I have a pci usb card in there for typing on the graphics console, and 
> when I installed it into the first slot recommended by the manual (2 I 
> think) the box did some HPMC stuff when it tried to do selftests.  I moved 
> it to slot 8 and everything seems happy with it.  Maybe something in the 
> smp code is aggrevating these problems...

Possible. Which manual are you referring to?
one that came with the USB card or some HP PARISC manual?

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-28 21:56       ` Jeremy Drake
@ 2002-05-29  4:56         ` Grant Grundler
  0 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2002-05-29  4:56 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> The LCD has a network, the HDD and an unfilled heart on the screen -- not 
> changing.

Not an HPMC then.

> The console says apt-get (668): unaligned access to 0x403ce08c it 
> ip=0x4005e4f7

That's odd - i've never seen that from apt-get.

> The TOC button had no effect.  Here's a ser pim from after I pulled the 
> power and restarted it.  It doesn't look particularly helpful.

Again I didn't look up the arcane stuff.
GR02 and IAOQ were both pointing at cpu_idle()
CR23 was zero; no external interrupts pending

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-29  4:39       ` Grant Grundler
@ 2002-05-29  6:26         ` Jeremy Drake
  2002-05-29  6:35           ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-05-29  6:26 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Tue, 28 May 2002, Grant Grundler wrote:

> Jeremy Drake wrote:
> > Some things that occured to me about the hardware that may be influencing 
> > this.  The on-board USB on this box is broken (as in physically damaged).  
> 
> That's a good observation. Can you characterize how extensive is the
> physical damage?

The plastic thing that the pins sit on was missing from one of the 2 usb 
ports when the box arrived.  I, not having noticed this, plugged a 
keyboard into it.  It worked, but after playing with serial consoles and 
such it wouldn't go back.  The pins were bent because the reinforcing 
plastic was missing.  I tried to bend the pins so that they didn't short 
anything, and one broke off.  Both onboard USB ports haven't worked since.  
When booting, it said "initializing keyboard" and then "IODC error".  I 
put it on serial console and left it alone after that.
 
> > I don't recall anything on the previous console output that suggests
> the USB interface driver isn't happy.
> 
> > I have a pci usb card in there for typing on the graphics console, and 
> > when I installed it into the first slot recommended by the manual (2 I 
> > think) the box did some HPMC stuff when it tried to do selftests.  I moved 
> > it to slot 8 and everything seems happy with it.  Maybe something in the 
> > smp code is aggrevating these problems...
> 
> Possible. Which manual are you referring to?
> one that came with the USB card or some HP PARISC manual?

The J5000 owners manual, 
http://www.hp.com/workstations/support/documentation/manuals/user_guides/j_class/A5991-90000.pdf 
near the top of page 54 it says "For non-graphics cards, insert them in 
this order: Slot 2, then 8, 3, 5, and finally 6."

> 
> grant
> 

-- 
If a man has a strong faith he can indulge in the luxury of skepticism.
		-- Friedrich Nietzsche

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-29  6:26         ` Jeremy Drake
@ 2002-05-29  6:35           ` Grant Grundler
  2002-06-01  6:34             ` Jeremy Drake
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-05-29  6:35 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> tried to bend the pins so that they didn't short 
> anything, and one broke off.  Both onboard USB ports haven't worked since.  

Ok. So broken physically but not eletrically.
Not sure that should cause any problems.
You run the risk now of being the only person using an add-on
USB card regularly for parisc.

> The J5000 owners manual, 
> http://www.hp.com/workstations/support/documentation/manuals/user_guides/j_cl
>   ass/A5991-90000.pdf 
> near the top of page 54 it says "For non-graphics cards, insert them in 
> this order: Slot 2, then 8, 3, 5, and finally 6."

ok. Not sure why they offer that advice...but whatever.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
       [not found] <20020527223132.661F54843@dsl2.external.hp.com>
@ 2002-05-29 18:56 ` Jeremy Drake
  0 siblings, 0 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-05-29 18:56 UTC (permalink / raw)
  To: parisc-linux

OK, I just tried pa30.  It boots successfully, but just died doing apt-get 
update:
Fetched 554B in 10s (53B/s)
apt-get(320): unaligned access to 0x403ce094 at ip=0x4005e47f
Reading Package

And there it stopped.  I don't know what it's doing but I'll see what kind 
of info I can get from it.

These "unaligned access" messages only show up when running smp.  apt-get 
works perfectly with a UP kernel...

I should set up a webcam pointing at the LCD screen of that box, so I can 
look at it remotely, to know if it HPMC'd or just locked up...

-- 
He missed an invaluable opportunity to hold his tongue.
		-- Andrew Lang

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-05-29  6:35           ` Grant Grundler
@ 2002-06-01  6:34             ` Jeremy Drake
  2002-06-02 16:32               ` Grant Grundler
  0 siblings, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-06-01  6:34 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Wed, 29 May 2002, Grant Grundler wrote:

> Jeremy Drake wrote:
> > tried to bend the pins so that they didn't short 
> > anything, and one broke off.  Both onboard USB ports haven't worked since.  
> 
> Ok. So broken physically but not eletrically.
> Not sure that should cause any problems.
I pulled the drive and plugged it into an identical (but not broken) 
J5000.  The box HPMC'd when doing an apt-get update.  Since this box was 
on graphics console, I got a large amount of hex numbers spewed to the 
screen.  So, at least we can rule out any damage to the box as the cause 
of this. 

-- 
Here I am, fifty-eight, and I still don't know what I want to be when
I grow up.
		-- Peter Drucker

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-06-01  6:34             ` Jeremy Drake
@ 2002-06-02 16:32               ` Grant Grundler
  2002-06-02 19:48                 ` Jeremy Drake
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2002-06-02 16:32 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> So, at least we can rule out any damage to the box as the cause of this. 

Yup - thanks for trying that.

Offhand, Here are the differences I'm aware of between J5k and c3k:
o 2-CPU vs 1
o cache is 4-way associative vs 1-way (Same PA8500 CPU though!)
o J5K requires newer rev CPU (some SMP-related bugs fixed)
o Same PDC, but probably initializes a few things differently
o Though IO subsystem is identical chip set, J5k has more PCI busses
  and more slots.

Since dirty cache writeback can be sensitive to how busy the system is,
it's possible the HPMC is caused by a similar problem to what we saw
on PA8700 systems. You might try building a kernel with
"ioc_needs_fdc" forced true in arch/parisc/kernel/sba_iommu.c.
If it avoids the HPMC (but we still see other hangs), then it's
a clue we don't have caching working right for that CPU setup.

hth,
grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-06-02 16:32               ` Grant Grundler
@ 2002-06-02 19:48                 ` Jeremy Drake
  2002-06-03  3:28                   ` Grant Grundler
  2002-06-03 21:58                   ` Jeremy Drake
  0 siblings, 2 replies; 23+ messages in thread
From: Jeremy Drake @ 2002-06-02 19:48 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Sun, 2 Jun 2002, Grant Grundler wrote:

> If it avoids the HPMC (but we still see other hangs), then it's
> a clue we don't have caching working right for that CPU setup.
What "other hangs" would I expect to see?  This thing hangs periodically 
anyway on SMP, and doesn't consistently give the HPMC.  Sometimes it just 
hangs.  I'm building with the change you mentioned now -- we'll see what 
happens...

OK.  No HPMC, but a new and interesting message.  The serial console 
hangs, as always.

Fetched 2696kB in 26s (103kB/s)
apt-get(263): unaligned access to 0x403ce08c at ip=0x4005e4f7
Reading Package

But, the LCD screen has a new message for me: 

INI 3001: SYS BD
PDH control init

If you think it would help, I could pay the box a visit today and get 
whatever "ser pim" or "ser pim toc" I can...


> > hth,
> grant
> 
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> 

-- 
Novinson's Revolutionary Discovery:
	When comes the revolution, things will be different --
	not better, just different.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-06-02 19:48                 ` Jeremy Drake
@ 2002-06-03  3:28                   ` Grant Grundler
  2002-06-03 21:58                   ` Jeremy Drake
  1 sibling, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2002-06-03  3:28 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> What "other hangs" would I expect to see?  This thing hangs periodically 
> anyway on SMP, and doesn't consistently give the HPMC.
> Sometimes it just hangs.

Well, we don't really know anything about hungs you've been seeing or if
it's the same problem each time. I'm comfortable the SMP kernel works
with apt-get since I'm not able to reproduce the problem with either of
the two SMP machines I have (PA8500 and PA8700).

> I'm building with the change you mentioned now -- we'll see what 
> happens...
> 
> OK.  No HPMC, but a new and interesting message.  The serial console 
> hangs, as always.
...
> But, the LCD screen has a new message for me: 
> 
> INI 3001: SYS BD
> PDH control init

hmmm...that might just be an intermediate state for HPMC.
This seems to be part of the reset sequence.

> If you think it would help, I could pay the box a visit today and get 
> whatever "ser pim" or "ser pim toc" I can...

nah...get "ser pim" tomorrow. But I may not be able to look at it until
the end of the week. Try repeating with the kluged kernel a few times and
see if it now always gets the same symptom (ie no HPMC). If it doesn't HPMC,
most likely you need to run UP kernels until someone who understands
cache handling and J5000 a bit can look at it.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-06-02 19:48                 ` Jeremy Drake
  2002-06-03  3:28                   ` Grant Grundler
@ 2002-06-03 21:58                   ` Jeremy Drake
  2002-06-05 21:24                     ` Grant Grundler
  1 sibling, 1 reply; 23+ messages in thread
From: Jeremy Drake @ 2002-06-03 21:58 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

On Sun, 2 Jun 2002, Jeremy Drake wrote:

> On Sun, 2 Jun 2002, Grant Grundler wrote:
> 
> > If it avoids the HPMC (but we still see other hangs), then it's
> > a clue we don't have caching working right for that CPU setup.

> OK.  No HPMC, but a new and interesting message.  The serial console 
> hangs, as always.
> 
> Fetched 2696kB in 26s (103kB/s)
> apt-get(263): unaligned access to 0x403ce08c at ip=0x4005e4f7
> Reading Package
> 
> But, the LCD screen has a new message for me: 
> 
> INI 3001: SYS BD
> PDH control init
> 
> If you think it would help, I could pay the box a visit today and get 
> whatever "ser pim" or "ser pim toc" I can...

Here it is...  BTW, maybe you could explain how to interpret these, so I 
don't have to send you all of this...

ser pim

PROCESSOR PIM INFORMATION

-----------------  Processor 0 HPMC Information ------------------

Timestamp = 
  Tue May  28 23:38:36 GMT 2002    (20:02:05:28:23:38:36)

HPMC Chassis Codes = 2cbf0  2500b  2cbf1  2cbfc  

General Registers 0 - 31
00-03   0000000000000000  000000095bf6dde5  0000000000019bf0  00000000f4004000
04-07   0000000000001d58  0000000000002710  ffffffffffffffce  0000000000002000
08-11   0000000044657266  fffffffff4004000  000000000000000a  fffffff0f0000834
12-15   0000000000000000  ffffffffffffffff  0000000000000001  fffffff0f0400004
16-19   fffffff0f00008c4  fffffff0f000017c  fffffff0f0000174  00000000000019fc
20-23   00000000f4004014  00000000000001f4  0000000000019bf0  ffffffffffffffff
24-27   ffffffffffffffff  0000000000000000  000000fa00000000  fffffff0f0412000
28-31   0000000000035b60  ffffffffffffffff  0000000000001e90  0000000000002710

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000004  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000006
12-15   0000000000000000  0000000000000000  000000f0f0003800  0000000000000000
16-19   000000095d2ccf91  0000000000000000  0000000000019bf4  000000000e80103d
20-23   00000000a607ffd0  c000000001004014  000000ff0000ff08  8800000000000000
24-27   0000000055555555  0000000055555555  0000000000041020  00000000f0412000
28-31   0000000055555555  0000000055555555  00000000f04088d8  0000000000000020
Space Registers 0 - 7

00-03   00000000          c9af9dd0          00000000          00000000
04-07   00000000          00000000          00000000          00000000

<Press any key to continue (q to quit)> 

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x0000000000019bf8
Check Type                   = 0x20000000
CPU State                    = 0x9e000004
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x0030103b
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x000000fff4004014
System Requestor Address     = 0xfffffffffffa0000

Floating-Point Registers 0 - 31
00-03   0000001f00000000  0000000000000000  0000000000000000  0000000000000000
04-07   2ffa200000000001  000000011015fa8c  1036505000000000  00000001f0400004
08-11   1036505000000002  ffffffff0000000a  0000000100000000  1041fdd31035d020
12-15   ffffffff000000ff  103a4000101482f4  103a4000ffff99ef  1115070010110264
16-19   2ffa200011150000  0000000000000002  000000001035d010  1035981010358810
20-23   1035901010359810  103598102ffa2000  1115000000000000  0000000200000000
24-27   5555555555555555  5555555555555555  5555555555555555  5555555555555555
28-31   3031323334353637  383961621014859c  6768696a6b6c6d6e  6f70717273747576

<Press any key to continue (q to quit)> 


'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:

Check Summary                = 0xc381141008000000
Available Memory             = 0x0000000020000000
CPU Diagnose Register 2      = 0x02010000ac802000
CPU Status Register 0        = 0x2040000000000000
CPU Status Register 1        = 0x8002000000000000
SADD LOG                     = 0x0221fd0050210df0
Read Short LOG               = 0xc18080fff4004014
ERROR_STATUS                 = 0x0000000000100010
MEM_ADDR                     = 0x000001ff3fffffff
MEM_SYND                     = 0x0000000000000000
MEM_ADDR_CORR                = 0x000001ff3fffffff
MEM_SYND_CORR                = 0x0000000000000000
RUN_DATA_HIGH                = 0x37dd3fa153c23ee1
RUN_DATA_LOW                 = 0xe840d00037de3f01
RUN_CTRL                     = 0x0000021c00001418
RUN_ADDR                     = 0xc13ff0f0f003ce50
System Responder Path        = 0x00ffffff0a000f01


HPMC PIM Analysis Information:

Timestamp = 
  Tue May  28 23:38:36 GMT 2002    (20:02:05:28:23:38:36)


'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:

A Data I/O Fetch Timeout occurred while CPU 0 was
requesting information from a device at the path 10/0/15/1 (built-in PCI device).


Memory/IO Controller Error Analysis Information:

The Memory/IO Controller only observed the Broadcast Error.  It did not log
any additional information about the HPMC.

<Press any key to continue (q to quit)> 

-----------------  Processor 0 LPMC Information ------------------

Check Type                   = 0x00000000
I/D Cache Parity Info        = 0x00000000
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x00000000
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x0000000000000000
System Requestor Address     = 0x0000000000000000


-----------------  Processor 0 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  0000000040000000  000000004000dc93  00000000faf00800
04-07   0000000040000000  0000000000000008  0000000040026758  0000000000000000
08-11   0000000000000000  00000000faf00798  0000000000000000  0000000000000000
12-15   00000000faf00890  0000000040026612  00000000faf00300  0000000000000000
16-19   0000000040000000  0000000010408000  0000000000000000  0000000040000000
20-23   00000000faf0089f  00000000faf006a0  000000001031a8b0  00000000faf00798
24-27   0000000000000008  0000000011150408  000000000000000f  000000001015fbb4
28-31   0000000000028000  0000000011150380  0000000011150640  000000004000f923

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000002  0000000000000000  00000000000000c0  0000000000000010
12-15   0000000000000000  0000000000000000  0000000000106000  00000000ff800000
16-19   000000110d4ad99d  0000000000000000  00000000101076c0  000000002f301221
20-23   0000000010340004  0000000054150408  000000000004000e  0000000000000000
24-27   0000000000366000  00000000003bb000  0000000000044021  00000000f0412000
28-31   0000000055555555  0000000055555555  0000000011150000  0000000010410000
Space Registers 0 - 7

00-03   00000001          00000001          00000000          00000001
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x00000000101076c4
CPU State                    = 0x9e000001


<Press any key to continue (q to quit)> 

-----------------  Processor 1 HPMC Information ------------------

Timestamp = 
  Sun Jun  2 19:40:32 GMT 2002    (20:02:06:02:19:40:32)

HPMC Chassis Codes = 2cbf0  2510b  2cbf4  2cbfc  

General Registers 0 - 31
00-03   0000000000000000  fffffff0f009d000  fffffff0f0068d78  0000000000000000
04-07   7f00000000000000  feffffffffffffff  000000000031b6f8  0000000000000008
08-11   fffffffffed30300  fffffffffed22200  0100000000000000  000000000002cb90
12-15   00000000000f4000  000000000000c800  fffffffffed40000  fffffffffed22210
16-19   4000000000000000  0000000000000002  00000000f000016c  fffffffffee003f9
20-23   fffffffffee003fb  0000000000000087  fffffffffee003f8  5871000000000000
24-27   7f00000000000000  fffffff0f0071eb8  fffffffffee003fa  fffffff0f0412000
28-31   0000000000000000  fffffffffee003fb  000000000031b7d8  fffffffffee00000

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   000000000000010c  0000000000000000  00000000000000c0  0000000000000039
12-15   0000000000000000  0000000000000000  0000000000106000  00000000ff000000
16-19   000000124c7c7456  000000003ffffff0  fffffff0f0037354  000000000e80103a
20-23   00000000ae07fffb  c0000000802003fb  0000000008000108  0000000080000000
24-27   0000000000336000  000000001f7e3000  0000000000044021  00000000f0412000
28-31   0000000055555555  0000000055555555  00000000100dc000  0000000011111111
Space Registers 0 - 7

00-03   00000000          00000086          00000000          00000086
04-07   00000000          00000000          00000000          00000000

<Press any key to continue (q to quit)> 

IIA Space                    = 0x000000003ffffff0
IIA Offset                   = 0xfffffff0f0037358
Check Type                   = 0x20000000
CPU State                    = 0x9e000004
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x0030103b
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x000000fffee003fb
System Requestor Address     = 0xfffffffffffa2000

Floating-Point Registers 0 - 31
00-03   0000001f00000000  0000000000000000  0000000000000000  0000000000000000
04-07   2ffe200000000001  000000011015fa58  1033505000000000  00000001f0400004
08-11   1033505000000002  ffffffff0000000a  000000010000003f  103dfdd300000040
12-15   00000000103caf14  103caf4010148768  00000000ffff9b5f  100d470000000000
16-19   2ffe2000100d4000  0000000000000002  000000001032d010  1032981010328810
20-23   1032901010329810  103298102ffe2000  cccccccd51eb874f  0000000333333334
24-27   b38cf9b100000450  5555555555555555  5555555555555555  5555555555555555
28-31   3031323334353637  3839616210148a10  6768696a6b6c6d6e  6f70717273747576

<Press any key to continue (q to quit)> 


'9000/785 B,C,J Workstation Unarchitected (per-CPU)', rev 1, 140 bytes:

Check Summary                = 0xcb81041008000000
Available Memory             = 0x0000000020000000
CPU Diagnose Register 2      = 0x0201010000000004
CPU Status Register 0        = 0x2440c24000000000
CPU Status Register 1        = 0x800a000000000000
SADD LOG                     = 0xc11ff0f0f0002b50
Read Short LOG               = 0xc18100fffee003fb
ERROR_STATUS                 = 0x0000000000100010
MEM_ADDR                     = 0x000001ff3fffffff
MEM_SYND                     = 0x0000000000000000
MEM_ADDR_CORR                = 0x000001ff3fffffff
MEM_SYND_CORR                = 0x0000000000000000
RUN_DATA_HIGH                = 0xe840c002000014bc
RUN_DATA_LOW                 = 0x379c00680f9a20dc
RUN_CTRL                     = 0x0000005c00001658
RUN_ADDR                     = 0xc13ff0f0f0002b50
System Responder Path        = 0x00ffff0a000e0101


HPMC PIM Analysis Information:

Timestamp = 
  Sun Jun  2 19:40:32 GMT 2002    (20:02:06:02:19:40:32)


'9000/785 B,C,J Workstation HPMC PIM Analysis (per-CPU)', rev 0, 1304 bytes:

An Instruction I/O Fetch and Data I/O Fetch Timeout occurred while CPU 1 was
requesting information from a device at the path 10/0/14/1/1 (built-in PCI device).


Memory/IO Controller Error Analysis Information:

The Memory/IO Controller only observed the Broadcast Error.  It did not log
any additional information about the HPMC.

<Press any key to continue (q to quit)> 

-----------------  Processor 1 LPMC Information ------------------

Check Type                   = 0x00000000
I/D Cache Parity Info        = 0x00000000
Cache Check                  = 0x00000000
TLB Check                    = 0x00000000
Bus Check                    = 0x00000000
Assists Check                = 0x00000000
Assist State                 = 0x00000000
Path Info                    = 0x00000000
System Responder Address     = 0x0000000000000000
System Requestor Address     = 0x0000000000000000


-----------------  Processor 1 TOC Information -------------------

General Registers 0 - 31
00-03   0000000000000000  000000001035eee0  00000000101009dc  0000000000000000
04-07   0000000000366000  00000000f0400008  00000000000000fa  00000000f0002f68
08-11   0000000000000000  0000000000000000  000000000004000e  00000000103a7464
12-15   00000000000000f2  0000000000000001  0000000000000001  00000000000000f3
16-19   0000000002020202  0000000000000002  00000000f000016c  0000000011158000
20-23   0000000000000000  00000000103382b0  00000000103597c4  0000000000000000
24-27   00000000103598a0  0000000000000032  0000000000000019  0000000010338010
28-31   0000000000000000  0000000000000010  00000000111586c0  00000000103598a0

<Press any key to continue (q to quit)> 

Control Registers 0 - 31
00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
08-11   0000000000000000  0000000000000000  00000000000000c0  000000000000001e
12-15   0000000000000000  0000000000000000  0000000000106000  00000000ff800000
16-19   0000001107f8d7df  0000000000000000  00000000101009dc  0000000003c008b3
20-23   0000000000000000  0000000000000000  000000000004ff0f  0000000000000000
24-27   0000000000366000  0000000000366000  0000000000044021  00000000f0412000
28-31   0000000055555555  0000000055555555  0000000011158000  0000000011111111
Space Registers 0 - 7

00-03   00000000          00000000          00000000          00000000
04-07   00000000          00000000          00000000          00000000

IIA Space                    = 0x0000000000000000
IIA Offset                   = 0x00000000101009e0
CPU State                    = 0x9e000001


<Press any key to continue (q to quit)> 

Memory Error Log Information:

Timestamp = 
  Sun Jun  2 19:40:32 GMT 2002    (20:02:06:02:19:40:32)


'9000/785 B,C,J Workstation Memory Error Log', rev 0, 64 bytes:

   No memory errors logged


I/O Module Error Log Information:

Timestamp = 
  Sun Jun  2 19:40:32 GMT 2002    (20:02:06:02:19:40:32)


'9000/785 B,C,J Workstation IO Error Log', rev 0, 228 bytes:

 Rope     Word1        Word2            Word3
------ ------------ ------------
   0    0x0002e000   0x0e0cc009   0x00000000000007fc
   1    0x00000000   0x1e0cc009   0x00000000fed32048
   2    0x04000000   0x2e0cc009   0xffffffffffffffff
   3    ----------   0x3e0cc009   ------------------
   4    0x00000000   0x4e0cc009   0x00000000fed38048
   5    ----------   0x5e0cc009   ------------------
   6    0x00000000   0x6e0cc009   0x00000000fed3c048
   7    ----------   0x7e0cc009   ------------------
Main Menu: Enter command > 
Main Menu: Enter command > 
Main Menu: Enter command > 

> 
> 
> > > hth,
> > grant
> > 
> > _______________________________________________
> > parisc-linux mailing list
> > parisc-linux@lists.parisc-linux.org
> > http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
> > 
> 
> 

-- 
On ability:
	A dwarf is small, even if he stands on a mountain top;
	a colossus keeps his height, even if he stands in a well.
		-- Lucius Annaeus Seneca, 4BC - 65AD

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] 2.4.18 SMP instability
  2002-06-03 21:58                   ` Jeremy Drake
@ 2002-06-05 21:24                     ` Grant Grundler
  0 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2002-06-05 21:24 UTC (permalink / raw)
  To: Jeremy Drake; +Cc: parisc-linux

Jeremy Drake wrote:
> Here it is...  BTW, maybe you could explain how to interpret these, so I 
> don't have to send you all of this...

Generally, look at IOA offset and GR02 to see where it died.
If it's not a kernel address, start trying to figure out what it is.
Lots more magic in the PIM dump that I don't understand either.

In this HPMC dump, I don't know where 0x19bf0 is...

The firmware on the workstations tries to give a high level decoding
of the error:
> A Data I/O Fetch Timeout occurred while CPU 0 was
> requesting information from a device at the path 10/0/15/1 (built-in PCI devi
>   ce).
> 
> 
> Memory/IO Controller Error Analysis Information:
> 
> The Memory/IO Controller only observed the Broadcast Error.  It did not log
> any additional information about the HPMC.


This typically means something in the IO path didn't respond
to a CPU read.

grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2002-06-05 21:24 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-26  0:48 [parisc-linux] 2.4.18 SMP instability Robert Stanford
2002-05-26  6:09 ` Grant Grundler
2002-05-26  7:29   ` Jeremy Drake
2002-05-26 20:23     ` Jeremy Drake
2002-05-27  2:04       ` Grant Grundler
2002-05-27  6:17         ` Jeremy Drake
2002-05-27 12:04           ` Matthew Wilcox
2002-05-27 18:44           ` Jeremy Drake
     [not found] <Pine.LNX.4.44.0205271438590.11012-200000@garibaldi.apptechsys.com>
2002-05-28 17:07 ` Grant Grundler
2002-05-28 19:35   ` Jeremy Drake
2002-05-28 19:45     ` Jeremy Drake
2002-05-28 21:56       ` Jeremy Drake
2002-05-29  4:56         ` Grant Grundler
2002-05-29  4:39       ` Grant Grundler
2002-05-29  6:26         ` Jeremy Drake
2002-05-29  6:35           ` Grant Grundler
2002-06-01  6:34             ` Jeremy Drake
2002-06-02 16:32               ` Grant Grundler
2002-06-02 19:48                 ` Jeremy Drake
2002-06-03  3:28                   ` Grant Grundler
2002-06-03 21:58                   ` Jeremy Drake
2002-06-05 21:24                     ` Grant Grundler
     [not found] <20020527223132.661F54843@dsl2.external.hp.com>
2002-05-29 18:56 ` Jeremy Drake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.