SCSI crashes with vger

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* SCSI crashes with vger
@ 1999-03-18 17:31 Benjamin Herrenschmidt
  1999-03-18 22:55 ` Tom Rini
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 1999-03-18 17:31 UTC (permalink / raw)
  To: linuxppc-dev

Both latest vger (2.2.4) and latest paul's kenrel (2.2.2) crash in the
NCR53C8XX driverm in ncr_wakeup_done, just after having probed SCSI
controllers (the NCR and MESH) and before (or when) probing devices.

This configuration used to work well until (and including) 2.2.1.

I tried tacking this but I'm a little bit out of time and I don't know
the SCSI subsystem very well. The exception reported by xmon is:

vector: 700 at pc = c012b010, msr = 81032, sp = c019bbe8 [c019baf8]
current = c0199f58, pid = 0, comm = swapper

and it crashes somewhere in the swapper task (usual place for crashing ;-)

If someone have any clue, I'll continue my tracing on my part but some
help would be welcome. I don't have 2.2.1 source handy (to look what changed).

-- 
           E-Mail: <mailto:bh40@calva.net>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCSI crashes with vger
  1999-03-18 17:31 SCSI crashes with vger Benjamin Herrenschmidt
@ 1999-03-18 22:55 ` Tom Rini
  1999-03-18 23:07   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Rini @ 1999-03-18 22:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Thu, 18 Mar 1999, Benjamin Herrenschmidt wrote:

> Both latest vger (2.2.4) and latest paul's kenrel (2.2.2) crash in the
> NCR53C8XX driverm in ncr_wakeup_done, just after having probed SCSI
> controllers (the NCR and MESH) and before (or when) probing devices.

Erg <paranoid worry>. In 2.2.2 there was a "scsi fix" which seems to have
fixed the problem that some people (myself, Dan J) had when trying to do
things like burn a CD on the ext chain or use both chains at once.  Now
it's working well enough for me to do a silly thing like a raid0 /usr :)
Hope this isn't related.. </paranoid worry>

---
Tom Rini (TR1265)
http://dobbstown.yeti.edu/

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCSI crashes with vger
  1999-03-18 22:55 ` Tom Rini
@ 1999-03-18 23:07   ` Benjamin Herrenschmidt
  1999-03-18 23:18     ` Tom Rini
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 1999-03-18 23:07 UTC (permalink / raw)
  To: Tom Rini, linuxppc-dev

On Thu, Mar 18, 1999, Tom Rini <tmrini@ntplx.net> wrote:

>Erg <paranoid worry>. In 2.2.2 there was a "scsi fix" which seems to have
>fixed the problem that some people (myself, Dan J) had when trying to do
>things like burn a CD on the ext chain or use both chains at once.  Now
>it's working well enough for me to do a silly thing like a raid0 /usr :)
>Hope this isn't related.. </paranoid worry>

Do you still have the patch around so that i can revert it and see what
happen ? I'm really not familiar enough with linux to track this none. I
tried, but got deep inside the scsi subsystem, looks like it's crashing
when completing the request with a program check exception (which can
mean a lot of things, I still have to find out how to display SRR1 from xmon).

The first 2.2.2 I got from paul's rsync crashed when probing my disk on
the MESH bus (NCR bus passed fine) and worked whith this external disk
shut down. I re-rsynced some time later with Paul and now, NCR crashes
too, as does vger.

-- 
           E-Mail: <mailto:bh40@calva.net>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCSI crashes with vger
  1999-03-18 23:07   ` Benjamin Herrenschmidt
@ 1999-03-18 23:18     ` Tom Rini
  1999-03-19 17:12       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Rini @ 1999-03-18 23:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Fri, 19 Mar 1999, Benjamin Herrenschmidt wrote:

> On Thu, Mar 18, 1999, Tom Rini <tmrini@ntplx.net> wrote:
> 
> >Erg <paranoid worry>. In 2.2.2 there was a "scsi fix" which seems to have
> >fixed the problem that some people (myself, Dan J) had when trying to do
> >things like burn a CD on the ext chain or use both chains at once.  Now
> >it's working well enough for me to do a silly thing like a raid0 /usr :)
> >Hope this isn't related.. </paranoid worry>
> 
> Do you still have the patch around so that i can revert it and see what
> happen ? I'm really not familiar enough with linux to track this none. I
> tried, but got deep inside the scsi subsystem, looks like it's crashing
> when completing the request with a program check exception (which can
> mean a lot of things, I still have to find out how to display SRR1 from xmon).

The patch that "fixed" things before was incorrect and added in some
cli()s i think.  The actual fix shouldn't be too hard to yank out of
patch-2.2.2, on any of the kernel.org mirrors.

---
Tom Rini (TR1265)
http://dobbstown.yeti.edu/

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCSI crashes with vger
  1999-03-18 23:18     ` Tom Rini
@ 1999-03-19 17:12       ` Benjamin Herrenschmidt
  1999-03-19 20:32         ` Tom Rini
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 1999-03-19 17:12 UTC (permalink / raw)
  To: Tom Rini, linuxppc-dev

On Thu, Mar 18, 1999, Tom Rini <tmrini@ntplx.net> wrote:

>The patch that "fixed" things before was incorrect and added in some
>cli()s i think.  The actual fix shouldn't be too hard to yank out of
>patch-2.2.2, on any of the kernel.org mirrors.

I didn't find any patch in the scsi code that looks like what you
describe in 2.2.2 and 2.2.3 patches on kernel.org, but the fix may have
been elsewhere...

As far as I know, the problem is a page fault in the interrupt handler of
the NCR driver, when completing a request. I'm not familiar enough with
both the scsi subsystem and linux vm to conclude anything from that, but
I'm still trying to figure out what's happening. Some help (directions to
look into) would be welcome, I'll post a more detailed report of what I
found until now later today.

-- 
           E-Mail: <mailto:bh40@calva.net>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCSI crashes with vger
  1999-03-19 17:12       ` Benjamin Herrenschmidt
@ 1999-03-19 20:32         ` Tom Rini
  1999-03-19 20:41           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Rini @ 1999-03-19 20:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

On Fri, 19 Mar 1999, Benjamin Herrenschmidt wrote:

> On Thu, Mar 18, 1999, Tom Rini <tmrini@ntplx.net> wrote:
> 
> >The patch that "fixed" things before was incorrect and added in some
> >cli()s i think.  The actual fix shouldn't be too hard to yank out of
> >patch-2.2.2, on any of the kernel.org mirrors.
> 
> I didn't find any patch in the scsi code that looks like what you
> describe in 2.2.2 and 2.2.3 patches on kernel.org, but the fix may have
> been elsewhere...

Er, I guess I wasn't too clear.  The wrong patch looked nothing like the
right patch. :)  The "right" patch was the generic scsi fix listed in the
2.2.2 rel notes, not sure what files tho.  However in skimming the 2.2.2
patch I saw some changes to linux/drivers/scsi/ncr53c8xx.c.  reversing
this had no effect I take it? (didn't look at the full context of where
the diff went, might not effect us at all..)

---
Tom Rini (TR1265)
http://dobbstown.yeti.edu/

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCSI crashes with vger
  1999-03-19 20:32         ` Tom Rini
@ 1999-03-19 20:41           ` Benjamin Herrenschmidt
  1999-03-21  0:42             ` Douglas Godfrey
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 1999-03-19 20:41 UTC (permalink / raw)
  To: Tom Rini, linuxppc-dev

On Fri, Mar 19, 1999, Tom Rini <tmrini@ntplx.net> wrote:

>Er, I guess I wasn't too clear.  The wrong patch looked nothing like the
>right patch. :)  The "right" patch was the generic scsi fix listed in the
>2.2.2 rel notes, not sure what files tho.  However in skimming the 2.2.2
>patch I saw some changes to linux/drivers/scsi/ncr53c8xx.c.  reversing
>this had no effect I take it? (didn't look at the full context of where
>the diff went, might not effect us at all..)

I saw them but I didn't find them related to the problem. I may have been
wrong in my jugement however, I'll give this a closer look. I havent seen
any related fix to the SCSI generic code (only in some drivers),
apparently anything that looks like related to this bug. I may have
missed something and I'll look more closely.

Apparently, adding a save_flags()/cli()/restore_flags() in the
ncr_complete() function makes the code go a little bit further (to just
after the restore_flags() in my first test). I'm still moving the
restore_flag around to find out what is the exact critical region, but my
first impression is that part of the request structure itself (the
structure or some associated stuff) is beeing deallocated by another
interrupt. I still have to determine if another ncr interrupt happens at
this point or if it's something eventually coming from the MESH driver.

-- 
           E-Mail: <mailto:bh40@calva.net>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SCSI crashes with vger
  1999-03-19 20:41           ` Benjamin Herrenschmidt
@ 1999-03-21  0:42             ` Douglas Godfrey
  1999-03-21  5:28               ` Matroxfb, PReP, and ioremap Troy Benjegerdes
  0 siblings, 1 reply; 9+ messages in thread
From: Douglas Godfrey @ 1999-03-21  0:42 UTC (permalink / raw)
  To: linuxppc-dev


Reoly to Benjamin Herrenschmidt, 3/19/99 9:41 PM +0100: Re: SCSI crashes
with vger
>On Fri, Mar 19, 1999, Tom Rini <tmrini@ntplx.net> wrote:
>
>>Er, I guess I wasn't too clear.  The wrong patch looked nothing like the
>>right patch. :)  The "right" patch was the generic scsi fix listed in the
>>2.2.2 rel notes, not sure what files tho.  However in skimming the 2.2.2
>>patch I saw some changes to linux/drivers/scsi/ncr53c8xx.c.  reversing
>>this had no effect I take it? (didn't look at the full context of where
>>the diff went, might not effect us at all..)
>
>I saw them but I didn't find them related to the problem. I may have been
>wrong in my jugement however, I'll give this a closer look. I havent seen
>any related fix to the SCSI generic code (only in some drivers),
>apparently anything that looks like related to this bug. I may have
>missed something and I'll look more closely.
>
>Apparently, adding a save_flags()/cli()/restore_flags() in the
>ncr_complete() function makes the code go a little bit further (to just
>after the restore_flags() in my first test). I'm still moving the
>restore_flag around to find out what is the exact critical region, but my
>first impression is that part of the request structure itself (the
>structure or some associated stuff) is beeing deallocated by another
>interrupt. I still have to determine if another ncr interrupt happens at
>this point or if it's something eventually coming from the MESH driver.
>
>
The interrupt code that completes an I/O request must verify that the
interrupt is actually an I/O completion before deallocating any structures
relating to a request. The DMA engine in the NCR chip could still be in
progress and the page containing the SCSI command list could be re-used,
corrupting the SCSI commands and causing the NCR chip to stomp on any
random piece of memory.

This may be caused by mis-interpreting an interrupt in code that (wrongly)
asumes that there is no other device active that can cause an interrupt.

If this error only happens on G3 or 604E CPUs then the code should be
added to the interrupt handler synchronize the CPU with the NCR chip.
i.e. sync(), Read, write and read again the NCR chips status register.

Thanx...
  Doug



[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Matroxfb, PReP, and ioremap
  1999-03-21  0:42             ` Douglas Godfrey
@ 1999-03-21  5:28               ` Troy Benjegerdes
  0 siblings, 0 replies; 9+ messages in thread
From: Troy Benjegerdes @ 1999-03-21  5:28 UTC (permalink / raw)
  To: linuxppc-dev


It seems as though ioremap has been partially or completely broken on PReP
for awhile, and I'm thinking now would be a good time to fix it nicely,
with the rework of the machine dependent code. Any suggestions how to do
this without breaking other drivers?

With the following patches to Matroxfb, I now get the dual penguins on my
MTX board on bootup ;)

This only works since the PCI code reports the PCI memory space at
0xC0000000, and this space is mapped to 0xD0000000 on PReP. (thus the
phys+ 0x10000000)

[hozer@kalmia video]$ cvs diff -u matroxfb.c
Index: matroxfb.c
===================================================================
RCS file: /cvsroot/linux/drivers/video/matroxfb.c,v
retrieving revision 1.1.1.6
diff -u -r1.1.1.6 matroxfb.c
--- matroxfb.c  1999/03/10 19:38:29     1.1.1.6
+++ matroxfb.c  1999/03/20 22:50:50
@@ -187,7 +187,7 @@
 #if defined(CONFIG_PPC) && defined(CONFIG_PREP) && defined(_ISA_MEM_BASE)
 /* do not tell me that PPC is not broken... if ioremap() oops with
    invalid value written to msr... */
-#define MAP_ISAMEMBASE
+#define MAP_BOGUS
 #else
 #define MAP_IOREMAP
 #endif
@@ -351,11 +351,9 @@
 #ifdef MAP_BUSTOVIRT
        virt->vaddr = bus_to_virt(phys);
 #else
-#ifdef MAP_ISAMEMBASE
-       virt->vaddr = (void*)(phys + _ISA_MEM_BASE);
-#else
-#error "Your architecture does not have neither ioremap nor
bus_to_virt... Givi
ng up"
-#endif
+       virt->vaddr = phys+0x10000000; /* cheap hack */
+#warning "Cheap hack for my MTX and Millenium I"
+/*#error "Your architecture does not have neither ioremap nor
bus_to_virt... Gi
ving up"*/
 #endif
 #endif
        return (virt->vaddr == 0); /* 0, !0... 0, error_code in future */

--------------------------------------------------------------------------
| Troy Benjegerdes    |       troy@microux.com     |    hozer@drgw.net   |
|    Unix is user friendly... You just have to be friendly to it first.  |
| This message composed with 100% free software.    http://www.gnu.org   |
--------------------------------------------------------------------------


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~1999-03-21  5:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-03-18 17:31 SCSI crashes with vger Benjamin Herrenschmidt
1999-03-18 22:55 ` Tom Rini
1999-03-18 23:07   ` Benjamin Herrenschmidt
1999-03-18 23:18     ` Tom Rini
1999-03-19 17:12       ` Benjamin Herrenschmidt
1999-03-19 20:32         ` Tom Rini
1999-03-19 20:41           ` Benjamin Herrenschmidt
1999-03-21  0:42             ` Douglas Godfrey
1999-03-21  5:28               ` Matroxfb, PReP, and ioremap Troy Benjegerdes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).