All of lore.kernel.org
 help / color / mirror / Atom feed
* Sun4d SMP - retests - patch was not required
@ 2004-08-19 22:22 C.Newport
  2004-08-19 22:28 ` William Lee Irwin III
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: C.Newport @ 2004-08-19 22:22 UTC (permalink / raw)
  To: sparclinux


I have done some more tests and found a surprise.
booting with -p causes the lockup at 
ioremap: done with statics switching to malloc.

This happens reliably, every time. Is this a clue ?.
 
Booting without -p boots reliably, which led me to suspect 
that my earlier tests may have been misleading.
Sure enough - reverting to the SMP kernel without the Pasi
spin_lock_bh patch does the same so it seems we probably 
do not need this.

More test results :-
Run top from serial console - OK

telnet in and tar -tf a kernel tarball - this sometimes completes
but always screws the keyboard input of the serial console.
If it does not complete it hangs, Load average goes to 1.0.

Another telnet - can kill the top session but serial kbd input is
still broken - kill the serial console root shell and all is well again.
Cannot kill the hung tar process.

Could this mean that interrupts might be getting
misrouted to the wrong CPU or process ?.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sun4d SMP - retests - patch was not required
  2004-08-19 22:22 Sun4d SMP - retests - patch was not required C.Newport
@ 2004-08-19 22:28 ` William Lee Irwin III
  2004-08-19 23:04 ` C.Newport
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: William Lee Irwin III @ 2004-08-19 22:28 UTC (permalink / raw)
  To: sparclinux

On Thu, Aug 19, 2004 at 11:22:43PM +0100, C.Newport wrote:
> I have done some more tests and found a surprise.
> booting with -p causes the lockup at 
> ioremap: done with statics switching to malloc.
> This happens reliably, every time. Is this a clue ?.
> Booting without -p boots reliably, which led me to suspect 
> that my earlier tests may have been misleading.
> Sure enough - reverting to the SMP kernel without the Pasi
> spin_lock_bh patch does the same so it seems we probably 
> do not need this.

*VERY* interesting. Now that we know it's -p ruining it, we need to
know why it's -p doing it. Perhaps the PROM doesn't expect something
we're doing to the cpu.


On Thu, Aug 19, 2004 at 11:22:43PM +0100, C.Newport wrote:
> More test results :-
> Run top from serial console - OK
> telnet in and tar -tf a kernel tarball - this sometimes completes
> but always screws the keyboard input of the serial console.
> If it does not complete it hangs, Load average goes to 1.0.
> Another telnet - can kill the top session but serial kbd input is
> still broken - kill the serial console root shell and all is well again.
> Cannot kill the hung tar process.
> Could this mean that interrupts might be getting
> misrouted to the wrong CPU or process ?.

Yes, though we're really going to have to do more intense diagnostics
(targeted printk'ing out of various bits of state etc.) to discover
what's really happening with interrupts. Checking detailed dumps of
interrupt routing state etc. vs. 2.2.x sound like a good first shot.


-- wli

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sun4d SMP - retests - patch was not required
  2004-08-19 22:22 Sun4d SMP - retests - patch was not required C.Newport
  2004-08-19 22:28 ` William Lee Irwin III
@ 2004-08-19 23:04 ` C.Newport
  2004-08-19 23:09 ` William Lee Irwin III
  2004-08-19 23:46 ` C.Newport
  3 siblings, 0 replies; 5+ messages in thread
From: C.Newport @ 2004-08-19 23:04 UTC (permalink / raw)
  To: sparclinux

On Thursday 19 August 2004 11:28 pm, William Lee Irwin III wrote:

> > Could this mean that interrupts might be getting
> > misrouted to the wrong CPU or process ?.
>
> Yes, though we're really going to have to do more intense diagnostics
> (targeted printk'ing out of various bits of state etc.) to discover
> what's really happening with interrupts. Checking detailed dumps of
> interrupt routing state etc. vs. 2.2.x sound like a good first shot.

Just send me the tests and I will run them.
Comparisons with 2.2 will not help, there was no SMP capable version.

I will give 2.4.27 UP a good thrashing tomorrow - any instability there 
might give us some more clues.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sun4d SMP - retests - patch was not required
  2004-08-19 22:22 Sun4d SMP - retests - patch was not required C.Newport
  2004-08-19 22:28 ` William Lee Irwin III
  2004-08-19 23:04 ` C.Newport
@ 2004-08-19 23:09 ` William Lee Irwin III
  2004-08-19 23:46 ` C.Newport
  3 siblings, 0 replies; 5+ messages in thread
From: William Lee Irwin III @ 2004-08-19 23:09 UTC (permalink / raw)
  To: sparclinux

On Thursday 19 August 2004 11:28 pm, William Lee Irwin III wrote:
>> Yes, though we're really going to have to do more intense diagnostics
>> (targeted printk'ing out of various bits of state etc.) to discover
>> what's really happening with interrupts. Checking detailed dumps of
>> interrupt routing state etc. vs. 2.2.x sound like a good first shot.

On Fri, Aug 20, 2004 at 12:04:52AM +0100, C.Newport wrote:
> Just send me the tests and I will run them.
> Comparisons with 2.2 will not help, there was no SMP capable version.
> I will give 2.4.27 UP a good thrashing tomorrow - any instability there 
> might give us some more clues.

These aren't tests per se, they're C code to print out extra
information. This generally needs to be done by hand and with some
insight into what you're looking at so as to find the relevant things
to print. With no extant working SMP sun4d kernel there's nothing to
really compare against, so I think we're going to have to do something
else here anyway, though some similar information may be useful for it.


-- wli

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sun4d SMP - retests - patch was not required
  2004-08-19 22:22 Sun4d SMP - retests - patch was not required C.Newport
                   ` (2 preceding siblings ...)
  2004-08-19 23:09 ` William Lee Irwin III
@ 2004-08-19 23:46 ` C.Newport
  3 siblings, 0 replies; 5+ messages in thread
From: C.Newport @ 2004-08-19 23:46 UTC (permalink / raw)
  To: sparclinux

On Friday 20 August 2004 12:09 am, William Lee Irwin III wrote:

> On Fri, Aug 20, 2004 at 12:04:52AM +0100, C.Newport wrote:
> > Just send me the tests and I will run them.
> > Comparisons with 2.2 will not help, there was no SMP capable version.
> > I will give 2.4.27 UP a good thrashing tomorrow - any instability there
> > might give us some more clues.
>
> These aren't tests per se, they're C code to print out extra
> information. This generally needs to be done by hand and with some
> insight into what you're looking at so as to find the relevant things
> to print. With no extant working SMP sun4d kernel there's nothing to
> really compare against, so I think we're going to have to do something
> else here anyway, though some similar information may be useful for it.

I am perfectly happy adding and removing bits of C code and testing the
result - what I lack is the kernel insights. This is a clumsy way to work but
probably the best we have at this stage unless Thomas has the time to
poke around in the interrupt stuff. (Or maybe someone else will volunteer).

BTW - the -p breakage also happens in 2.4.27 UP so it is probably
a red herring. We should get it fixed anyway before it causes more
confusion. I will try the same kernel on a Sun4m machine tomorrow
and see if it also happens there. I have a SS20 with 2 x SM81 as well
as SS5 and SS10. Is there any point in checking this on Sun4c - or
is this still too broken to worry about.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-08-19 23:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-19 22:22 Sun4d SMP - retests - patch was not required C.Newport
2004-08-19 22:28 ` William Lee Irwin III
2004-08-19 23:04 ` C.Newport
2004-08-19 23:09 ` William Lee Irwin III
2004-08-19 23:46 ` C.Newport

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.