All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel fault when booting with dual link network card
@ 2008-06-29 15:17 Sjoerd Simons
  2008-06-29 16:00 ` James Bottomley
  2008-06-29 21:05 ` Grant Grundler
  0 siblings, 2 replies; 9+ messages in thread
From: Sjoerd Simons @ 2008-06-29 15:17 UTC (permalink / raw)
  To: linux-parisc

[-- Attachment #1: Type: text/plain, Size: 245 bytes --]

Hi,

  Kernel 2.6.24 causes the machine to fault when a dual-link network card is
  installed right after loading the tulip module. Output of SER PIM attached.

    Sjoerd
-- 
What the world *really* needs is a good Automatic Bicycle Sharpener.

[-- Attachment #2: tulip.cap --]
[-- Type: application/cap, Size: 4753 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 15:17 kernel fault when booting with dual link network card Sjoerd Simons
@ 2008-06-29 16:00 ` James Bottomley
  2008-06-29 16:22   ` Matthew Wilcox
  2008-06-29 21:05 ` Grant Grundler
  1 sibling, 1 reply; 9+ messages in thread
From: James Bottomley @ 2008-06-29 16:00 UTC (permalink / raw)
  To: Sjoerd Simons; +Cc: linux-parisc

On Sun, 2008-06-29 at 16:17 +0100, Sjoerd Simons wrote:
> Hi,
> 
>   Kernel 2.6.24 causes the machine to fault when a dual-link network card is
>   installed right after loading the tulip module. Output of SER PIM attached.

I'm afraid just the hex dump isn't really any use.  To be possibly
useful, we need at least the symbolic addresses of IAOQ and %r2.
That's:

%r2: 0x10245748
IAOQ[0]: 0x10252bdc
IAOQ[1]: 0x10252be0

the I/O module error seems to indicate an incorrect GSC DMA read.

James



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 16:00 ` James Bottomley
@ 2008-06-29 16:22   ` Matthew Wilcox
  2008-06-29 17:54     ` Joel Soete
  2008-06-29 21:11     ` Grant Grundler
  0 siblings, 2 replies; 9+ messages in thread
From: Matthew Wilcox @ 2008-06-29 16:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: Sjoerd Simons, linux-parisc

On Sun, Jun 29, 2008 at 11:00:34AM -0500, James Bottomley wrote:
> On Sun, 2008-06-29 at 16:17 +0100, Sjoerd Simons wrote:
> > Hi,
> > 
> >   Kernel 2.6.24 causes the machine to fault when a dual-link network card is
> >   installed right after loading the tulip module. Output of SER PIM attached.
> 
> I'm afraid just the hex dump isn't really any use.  To be possibly
> useful, we need at least the symbolic addresses of IAOQ and %r2.
> That's:
> 
> %r2: 0x10245748
> IAOQ[0]: 0x10252bdc
> IAOQ[1]: 0x10252be0
> 
> the I/O module error seems to indicate an incorrect GSC DMA read.

I don't think it's going to tell us anything useful.  I believe that
we've not set up the cardmode Dino correctly to respond to iomem space
and as a result the first access to iomem space will fault.

Of course, this is a machine with CCIO, so it could be something going
wrong with the CCIO programming too.  But I think it's Dino.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 16:22   ` Matthew Wilcox
@ 2008-06-29 17:54     ` Joel Soete
  2008-06-29 17:57       ` Sjoerd Simons
  2008-06-29 21:11     ` Grant Grundler
  1 sibling, 1 reply; 9+ messages in thread
From: Joel Soete @ 2008-06-29 17:54 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: James Bottomley, Sjoerd Simons, linux-parisc



Matthew Wilcox wrote:
> On Sun, Jun 29, 2008 at 11:00:34AM -0500, James Bottomley wrote:
>> On Sun, 2008-06-29 at 16:17 +0100, Sjoerd Simons wrote:
>>> Hi,
>>>
>>>   Kernel 2.6.24 causes the machine to fault when a dual-link network card is
>>>   installed right after loading the tulip module. Output of SER PIM attached.
>> I'm afraid just the hex dump isn't really any use.  To be possibly
>> useful, we need at least the symbolic addresses of IAOQ and %r2.
>> That's:
>>
>> %r2: 0x10245748
>> IAOQ[0]: 0x10252bdc
>> IAOQ[1]: 0x10252be0
>>
>> the I/O module error seems to indicate an incorrect GSC DMA read.
> 
> I don't think it's going to tell us anything useful.  I believe that
> we've not set up the cardmode Dino correctly to respond to iomem space
> and as a result the first access to iomem space will fault.
> 
> Of course, this is a machine with CCIO, so it could be something going
> wrong with the CCIO programming too.

Yes specially if the system low in RAM (e.g. when I have to reduce ram of my c110 from 512M to 64M, it became impossible to 
boot it ;-( .)
After longly thought it was a pb of coherency, I am now convince that's the key pb: my d380 boot fine with 256M but I 
resurrect 'ccio_mem_ratio' which reduce artificially iova_space_size and if I can still boot it, disk's issues occur more 
quickly (a simple tar -xvf of a big file is enough now).
Right now, I reach to put in place many trace_mark() in this driver and thanks to Mathieu Desnoyers's patch and his help, 
it's now possible to me to collect a lot of info without degrading too much system perf. Just need more time to collect 
relevant info and analyze it ;-)

   But I think it's Dino.
> 
hypothesis easy to verify: just remove this card.

hth,
	J.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 17:54     ` Joel Soete
@ 2008-06-29 17:57       ` Sjoerd Simons
  2008-06-29 18:11         ` Joel Soete
  0 siblings, 1 reply; 9+ messages in thread
From: Sjoerd Simons @ 2008-06-29 17:57 UTC (permalink / raw)
  To: Joel Soete; +Cc: Matthew Wilcox, James Bottomley, linux-parisc

On Sun, Jun 29, 2008 at 05:54:15PM +0000, Joel Soete wrote:
>   But I think it's Dino.
>>
> hypothesis easy to verify: just remove this card.

Without the card the machine seems to work fine indeed.

  Sjoerd
-- 
"There are three principal ways to lose money: wine, women, and engineers.
While the first two are more pleasant, the third is by far the more certain."
		-- Baron Rothschild, ca. 1800

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 17:57       ` Sjoerd Simons
@ 2008-06-29 18:11         ` Joel Soete
  2008-06-29 18:38           ` Sjoerd Simons
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Soete @ 2008-06-29 18:11 UTC (permalink / raw)
  To: Sjoerd Simons; +Cc: Matthew Wilcox, James Bottomley, linux-parisc



Sjoerd Simons wrote:
> On Sun, Jun 29, 2008 at 05:54:15PM +0000, Joel Soete wrote:
>>   But I think it's Dino.
>> hypothesis easy to verify: just remove this card.
> 
> Without the card the machine seems to work fine indeed.
> 
tx for feedback.

btw, sorry if I miss it but what kind system is: a C, D or a K model?

Tx,
	J.

>   Sjoerd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 18:11         ` Joel Soete
@ 2008-06-29 18:38           ` Sjoerd Simons
  0 siblings, 0 replies; 9+ messages in thread
From: Sjoerd Simons @ 2008-06-29 18:38 UTC (permalink / raw)
  To: Joel Soete; +Cc: Matthew Wilcox, James Bottomley, linux-parisc

On Sun, Jun 29, 2008 at 06:11:49PM +0000, Joel Soete wrote:
>
>
> Sjoerd Simons wrote:
>> On Sun, Jun 29, 2008 at 05:54:15PM +0000, Joel Soete wrote:
>>>   But I think it's Dino.
>>> hypothesis easy to verify: just remove this card.
>>
>> Without the card the machine seems to work fine indeed.
>>
> tx for feedback.
>
> btw, sorry if I miss it but what kind system is: a C, D or a K model?

This is a D9000 machine with a D220 processor

  Sjoerd
-- 
All science is either physics or stamp collecting.
		-- Ernest Rutherford

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 15:17 kernel fault when booting with dual link network card Sjoerd Simons
  2008-06-29 16:00 ` James Bottomley
@ 2008-06-29 21:05 ` Grant Grundler
  1 sibling, 0 replies; 9+ messages in thread
From: Grant Grundler @ 2008-06-29 21:05 UTC (permalink / raw)
  To: Sjoerd Simons; +Cc: linux-parisc

On Sun, Jun 29, 2008 at 04:17:59PM +0100, Sjoerd Simons wrote:
> Hi,
> 
>   Kernel 2.6.24 causes the machine to fault when a dual-link network card is
>   installed right after loading the tulip module. Output of SER PIM attached.

BTW, once you've captured a PIM, it needs to be cleared with "ser clearpim"
before it will record the next one.

And thanks - this confirmed what several had suspected:

Timestamp =   Sun Jun  29 14:45:18 GMT 2008    (20:08:06:29:14:45:18)

Bus      HPA       Module Type      Path  Slt Md Sev Estat Requestor Responder
----- ---------- ---------------- -------- -- -- --- ---- ---------- ----------
GSC   0x0000006d A DMA I/O        8/8       2  0 fe  0x03 0x00000000 0x00000000

Type "io info" will dump all the IO devices and you can match the "8/8"
path to the offending device. I suspect it's the card-mode Dino as well.

We probably need to enable "PCI_DEBUG" which depends on "DEBUG_KERNEL".
Enable both of those, then capture the console output when booting
that kernel. We can then walk through the Dino and PCI code to see
what didn't get setup correctly.

thanks,
grant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kernel fault when booting with dual link network card
  2008-06-29 16:22   ` Matthew Wilcox
  2008-06-29 17:54     ` Joel Soete
@ 2008-06-29 21:11     ` Grant Grundler
  1 sibling, 0 replies; 9+ messages in thread
From: Grant Grundler @ 2008-06-29 21:11 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: James Bottomley, Sjoerd Simons, linux-parisc

On Sun, Jun 29, 2008 at 10:22:21AM -0600, Matthew Wilcox wrote:
...
> > the I/O module error seems to indicate an incorrect GSC DMA read.
> 
> I don't think it's going to tell us anything useful.  I believe that
> we've not set up the cardmode Dino correctly to respond to iomem space
> and as a result the first access to iomem space will fault.

It depends on if the first access is a read or a write. If it's a read
(and odds are good that's true), we will get the exact IP that faulted 
and get something about "CPU timeout" in the PIM dump.

> Of course, this is a machine with CCIO, so it could be something going
> wrong with the CCIO programming too.  But I think it's Dino.

I doubt it too. I agree it's much more likely something with Dino.

thanks,
grant

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-06-29 21:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-29 15:17 kernel fault when booting with dual link network card Sjoerd Simons
2008-06-29 16:00 ` James Bottomley
2008-06-29 16:22   ` Matthew Wilcox
2008-06-29 17:54     ` Joel Soete
2008-06-29 17:57       ` Sjoerd Simons
2008-06-29 18:11         ` Joel Soete
2008-06-29 18:38           ` Sjoerd Simons
2008-06-29 21:11     ` Grant Grundler
2008-06-29 21:05 ` Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.