[parisc-linux] 53c700 (LASI SCSI 53c700) hang

All of lore.kernel.org
 help / color / mirror / Atom feed

* [parisc-linux] 53c700 (LASI SCSI 53c700) hang
@ 2002-02-03 17:29 Helge Deller
  2002-02-03 18:29 ` [parisc-linux] " Helge Deller
  0 siblings, 1 reply; 4+ messages in thread
From: Helge Deller @ 2002-02-03 17:29 UTC (permalink / raw)
  To: James.Bottomley; +Cc: parisc-linux

Hi James,

today my 715/64 completely stopped after some "dpkg -i"-ing.
According to the following lines from the serial console the new
SCSI driver stopped talking to my harddisk.
I'm sending this bug report now, after that this has happened 
~3 times in the last 2 months.
Any ideas ?

TIA,
Helge

This is what I got on my serial console. Sadly some things before it
are missing:

scsi0 (6:0) New error handler wants to abort command                                                         
        0x00 00 00 00 00 00                                                                                  
scsi: device set offline - not ready or command retry failed after bus reset: 
host 0 channel 0 id 6 lun 0    
scsi0 (6:0) New error handler wants to abort command                                                         
        0x00 00 00 00 00 00                                                                                  
scsi: device set offline - not ready or command retry failed after bus reset: 
host 0 channel 0 id 6 lun 0    
scsi0 (6:0) New error handler wants to abort command                                                         
        0x00 00 00 00 00 00                                                                                  
scsi: device set offline - not ready or command retry failed after bus reset: 
host 0 channel 0 id 6 lun 0


from the dmesg:
SCSI subsystem driver Revision: 1.00
53c700: Version 2.6 By James.Bottomley@HansenPartnership.com
scsi0: 53c710 rev 2                                         
scsi0 : LASI SCSI 53c700
scsi0: (6:0) Synchronous at offset 8, period 100ns
  Vendor: QUANTUM   Model: FIREBALL_TM3200S  Rev: 300X
  Type:   Direct-Access                      ANSI SCSI revision: 02
Attached scsi disk sda at scsi0, channel 0, id 6, lun 0            
scsi0: (6:0) Enabling Tag Command Queuing              
SCSI device sda: 6281856 512-byte hdwr sectors (3216 MB)
Partition check:                                        
 sda: sda1 sda2 sda3 sda4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [parisc-linux] Re: 53c700 (LASI SCSI 53c700) hang
  2002-02-03 17:29 [parisc-linux] 53c700 (LASI SCSI 53c700) hang Helge Deller
@ 2002-02-03 18:29 ` Helge Deller
  2002-02-04 19:50   ` James Bottomley
  0 siblings, 1 reply; 4+ messages in thread
From: Helge Deller @ 2002-02-03 18:29 UTC (permalink / raw)
  To: James.Bottomley; +Cc: parisc-linux

Hi James,

it just happened again while shutting down the system - thus
now I have a little more verbose output:

Unmounting remote filesystems... scsi0 (6:0) New error handler wants to abort 
command                        
        0x2a 00 00 14 0e 5c 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 14 0e cc 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 18 0e 1c 00 00 10 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 18 0e a4 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 2c 0e 54 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 30 0e 2c 00 00 10 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 30 0e 4c 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 30 0f ac 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 30 10 2c 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 30 10 5c 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 30 10 84 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 08 0d fc 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 10 0e 1c 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 10 0e 44 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants to abort command                                                         
        0x2a 00 00 10 0e 54 00 00 10 00                                                                      
scsi0 (6:0) New error handler wants device reset                                                             
        0x2a 00 00 14 0e 5c 00 00 08 00                                                                      
scsi0 (6:0) New error handler wants BUS reset, cmd 10040c00                                                  
        0x2a 00 00 14 0e 5c 00 00 08 00                                                                      
scsi0: Bus Reset detected, executing command 00000000, slot 00000000, dsp 
000504a8[04a8]                     
 failing command because of reset, slot 00010520, cmnd 10040400                                              
 failing command because of reset, slot 00010654, cmnd 10040200                                              
 failing command because of reset, slot 00010788, cmnd 10040600                                              
 failing command because of reset, slot 000108bc, cmnd 10041e00                                              
 failing command because of reset, slot 000109f0, cmnd 10041200                                              
 failing command because of reset, slot 00010b24, cmnd 10041c00                                              
 failing command because of reset, slot 00010c58, cmnd 10040800                                              
 failing command because of reset, slot 00010ec0, cmnd 10040a00                                              
 failing command because of reset, slot 00010ff4, cmnd 10041400                                              
 failing command because of reset, slot 00011128, cmnd 10040c00                                              
 failing command because of reset, slot 0001125c, cmnd 10041000                                              
 failing command because of reset, slot 00011390, cmnd 10040000                                              
 failing command because of reset, slot 000114c4, cmnd 10041600                                              
 failing command because of reset, slot 000115f8, cmnd 10041800                                              
 failing command because of reset, slot 0001172c, cmnd 10041a00                                              
scsi0: (6:0) Synchronous at offset 8, period 100ns                                                           
scsi0 (6:0) New error handler wants to abort command                                                         
        0x03 00 00 00 40 00                                                                                  
scsi: device set offline - not ready or command retry failed after bus reset: 
host 0 channel 0 id 6 lun 0    


Helge



On Sunday 03 February 2002 18:29, Helge Deller wrote:
> Hi James,
>
> today my 715/64 completely stopped after some "dpkg -i"-ing.
> According to the following lines from the serial console the new
> SCSI driver stopped talking to my harddisk.
> I'm sending this bug report now, after that this has happened
> ~3 times in the last 2 months.
> Any ideas ?
>
> TIA,
> Helge
>
> This is what I got on my serial console. Sadly some things before it
> are missing:
>
> scsi0 (6:0) New error handler wants to abort command
>         0x00 00 00 00 00 00
> scsi: device set offline - not ready or command retry failed after bus
> reset: host 0 channel 0 id 6 lun 0
> scsi0 (6:0) New error handler wants to abort command
>         0x00 00 00 00 00 00
> scsi: device set offline - not ready or command retry failed after bus
> reset: host 0 channel 0 id 6 lun 0
> scsi0 (6:0) New error handler wants to abort command
>         0x00 00 00 00 00 00
> scsi: device set offline - not ready or command retry failed after bus
> reset: host 0 channel 0 id 6 lun 0
>
>
> from the dmesg:
> SCSI subsystem driver Revision: 1.00
> 53c700: Version 2.6 By James.Bottomley@HansenPartnership.com
> scsi0: 53c710 rev 2
> scsi0 : LASI SCSI 53c700
> scsi0: (6:0) Synchronous at offset 8, period 100ns
>   Vendor: QUANTUM   Model: FIREBALL_TM3200S  Rev: 300X
>   Type:   Direct-Access                      ANSI SCSI revision: 02
> Attached scsi disk sda at scsi0, channel 0, id 6, lun 0
> scsi0: (6:0) Enabling Tag Command Queuing
> SCSI device sda: 6281856 512-byte hdwr sectors (3216 MB)
> Partition check:
>  sda: sda1 sda2 sda3 sda4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [parisc-linux] Re: 53c700 (LASI SCSI 53c700) hang
  2002-02-03 18:29 ` [parisc-linux] " Helge Deller
@ 2002-02-04 19:50   ` James Bottomley
  2002-02-05  1:27     ` Carlos O'Donell Jr.
  0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2002-02-04 19:50 UTC (permalink / raw)
  To: Helge Deller; +Cc: James.Bottomley, parisc-linux

What these errors tell me is that your HD accepted more tags than it could 
cope with and then choked.  Linux error handler isn't very good at handling 
this situation.  Also, your disc:

deller@gmx.de said:
>   Vendor: QUANTUM   Model: FIREBALL_TM3200S  Rev: 300X 

Is a known trouble causer with tag command queueing.  Initially, try taking 
the #define NCR_700_MAX_TAGS in drivers/scsi/53c700.h down to 4 or 2 and 
recompiling the driver.  Alternatively, turn off tagged command queueing 
altogether by commenting out this block of code:

	if(SCp->device->tagged_supported && !SCp->device->tagged_queue
	   && (hostdata->tag_negotiated &(1<<SCp->target)) == 0
	   && NCR_700_is_flag_clear(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING)) {
		/* upper layer has indicated tags are supported.  We don't
		 * necessarily believe it yet.
		 *
		 * NOTE: There is a danger here: the mid layer supports
		 * tag queuing per LUN.  We only support it per PUN because
		 * of potential reselection issues */
		printk(KERN_INFO "scsi%d: (%d:%d) Enabling Tag Command Queuing\n", SCp->
device->host->host_no, SCp->target, SCp->lun);
		hostdata->tag_negotiated |= (1<<SCp->target);
		NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING);
		SCp->device->tagged_queue = 1;
	}

in drivers/scsi/53c700.c at about line 1891.

I am getting around to adding the code changes to make this able to be done as 
module/kernel command line options.

James

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [parisc-linux] Re: 53c700 (LASI SCSI 53c700) hang
  2002-02-04 19:50   ` James Bottomley
@ 2002-02-05  1:27     ` Carlos O'Donell Jr.
  0 siblings, 0 replies; 4+ messages in thread
From: Carlos O'Donell Jr. @ 2002-02-05  1:27 UTC (permalink / raw)
  To: James Bottomley, parisc-linux

> What these errors tell me is that your HD accepted more tags than it could 
> cope with and then choked.  Linux error handler isn't very good at handling 
> this situation.  Also, your disc:
> 
> deller@gmx.de said:
> >   Vendor: QUANTUM   Model: FIREBALL_TM3200S  Rev: 300X 
> 
> Is a known trouble causer with tag command queueing.  Initially, try taking 
> the #define NCR_700_MAX_TAGS in drivers/scsi/53c700.h down to 4 or 2 and 
> recompiling the driver.  Alternatively, turn off tagged command queueing 
> altogether by commenting out this block of code:
> 
> I am getting around to adding the code changes to make this able to be done as 
> module/kernel command line options.
> 
> James
>

I've been having problems with the driver for quite some time now.

SCSI subsystem driver Revision: 1.00
53c700: consistent memory allocation failed
53c700: Version 2.6 By James.Bottomley@HansenPartnership.com
scsi0: 53c700 rev 0 
scsi0 : LASI SCSI 53c700
  Vendor: FUJITSU   Model: M2694ES-512       Rev: 8134
  Type:   Direct-Access                      ANSI SCSI revision: 02
Attached scsi disk sda at scsi0, channel 0, id 6, lun 0
SCSI device sda: 2117025 512-byte hdwr sectors (1084 MB)
Partition check:
 sda: sda1 sda2

Compiled kernel with tag queue code _always_ disabled (2.4.17-pa18 from CVS).

#ifdef NEVERCOMIPLE
        if(SCp->device->tagged_supported && !SCp->device->tagged_queue
           && (hostdata->tag_negotiated &(1<<SCp->target)) == 0
           && NCR_700_is_flag_clear(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING)) {
                /* upper layer has indicated tags are supported.  We don't
                 * necessarily believe it yet.
                 *
                 * NOTE: There is a danger here: the mid layer supports
                 * tag queuing per LUN.  We only support it per PUN because
                 * of potential reselection issues */
                printk(KERN_INFO "scsi%d: (%d:%d) Enabling Tag Command Queuing\n", SCp->device->host->host_no, SCp->target, SCp->lun);
                hostdata->tag_negotiated |= (1<<SCp->target);
                NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING);
                SCp->device->tagged_queue = 1;
        }
#endif

in drivers/scsi/53c700.c at about line 1891.

Start up one of those real-world scripts :}

#!/bin/tcsh
while ( 1 )
find /bin | xargs cat > /dev/null
find /boot | xargs cat > /dev/null
find /etc | xargs cat > /dev/null
find /root | xargs cat > /dev/null
find /sbin | xargs cat > /dev/null
find /tmp | xargs cat > /dev/null
find /usr | xargs cat > /dev/null
find /var | xargs cat > /dev/null
end

root@node44:/proc/scsi/lasi700# cat 0
Total commands outstanding: 1
Target  Depth  Active  Next Tag
======  =====  ======  ========
  6: 0     16       1         0


10 minutes into the run, the find _and_ cat are D on the process list.
The drive is officially unresponsive around this point... maybe it was
just cat and find you say?

Soon after, kupdated goes into D aswell. From there on in the box is
locking up left right and center. I wish I had kdb and could see what's
going on.

I've repeated this lockup 3 times.

Most intersting is that when I reenable the Tag queueing code but change
the Tag depth to 2 (instead of 16). The machine doesn't seem to hang.
I have a box currently running well over the 10 minute mark that I will
leave running until tommorow.

The sim700 driver runs poorly, but happily for days... generating heat :)
Sadly, the sim700 driver is currently only functionaly with the older kernels.
I'm using 2.4.9-pa25 to run the 715/50's in our cluster (diskless boxes run
the latest kernel no problems).

Any thoughts? 

Is the issue as simple as: 

Leave Tag queuing in, but set depth to something low (2 or 4).

Good: 	Tag Queu, Depth = 2

Bad: 	No Tag Queue. 
	Tag Queue, Depth = 16.

c.
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-02-05  1:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-03 17:29 [parisc-linux] 53c700 (LASI SCSI 53c700) hang Helge Deller
2002-02-03 18:29 ` [parisc-linux] " Helge Deller
2002-02-04 19:50   ` James Bottomley
2002-02-05  1:27     ` Carlos O'Donell Jr.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.