* [parisc-linux] 53c700 (LASI SCSI 53c700) hang
@ 2002-02-03 17:29 Helge Deller
2002-02-03 18:29 ` [parisc-linux] " Helge Deller
0 siblings, 1 reply; 4+ messages in thread
From: Helge Deller @ 2002-02-03 17:29 UTC (permalink / raw)
To: James.Bottomley; +Cc: parisc-linux
Hi James,
today my 715/64 completely stopped after some "dpkg -i"-ing.
According to the following lines from the serial console the new
SCSI driver stopped talking to my harddisk.
I'm sending this bug report now, after that this has happened
~3 times in the last 2 months.
Any ideas ?
TIA,
Helge
This is what I got on my serial console. Sadly some things before it
are missing:
scsi0 (6:0) New error handler wants to abort command
0x00 00 00 00 00 00
scsi: device set offline - not ready or command retry failed after bus reset:
host 0 channel 0 id 6 lun 0
scsi0 (6:0) New error handler wants to abort command
0x00 00 00 00 00 00
scsi: device set offline - not ready or command retry failed after bus reset:
host 0 channel 0 id 6 lun 0
scsi0 (6:0) New error handler wants to abort command
0x00 00 00 00 00 00
scsi: device set offline - not ready or command retry failed after bus reset:
host 0 channel 0 id 6 lun 0
from the dmesg:
SCSI subsystem driver Revision: 1.00
53c700: Version 2.6 By James.Bottomley@HansenPartnership.com
scsi0: 53c710 rev 2
scsi0 : LASI SCSI 53c700
scsi0: (6:0) Synchronous at offset 8, period 100ns
Vendor: QUANTUM Model: FIREBALL_TM3200S Rev: 300X
Type: Direct-Access ANSI SCSI revision: 02
Attached scsi disk sda at scsi0, channel 0, id 6, lun 0
scsi0: (6:0) Enabling Tag Command Queuing
SCSI device sda: 6281856 512-byte hdwr sectors (3216 MB)
Partition check:
sda: sda1 sda2 sda3 sda4
^ permalink raw reply [flat|nested] 4+ messages in thread
* [parisc-linux] Re: 53c700 (LASI SCSI 53c700) hang
2002-02-03 17:29 [parisc-linux] 53c700 (LASI SCSI 53c700) hang Helge Deller
@ 2002-02-03 18:29 ` Helge Deller
2002-02-04 19:50 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: Helge Deller @ 2002-02-03 18:29 UTC (permalink / raw)
To: James.Bottomley; +Cc: parisc-linux
Hi James,
it just happened again while shutting down the system - thus
now I have a little more verbose output:
Unmounting remote filesystems... scsi0 (6:0) New error handler wants to abort
command
0x2a 00 00 14 0e 5c 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 14 0e cc 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 18 0e 1c 00 00 10 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 18 0e a4 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 2c 0e 54 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 30 0e 2c 00 00 10 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 30 0e 4c 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 30 0f ac 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 30 10 2c 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 30 10 5c 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 30 10 84 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 08 0d fc 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 10 0e 1c 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 10 0e 44 00 00 08 00
scsi0 (6:0) New error handler wants to abort command
0x2a 00 00 10 0e 54 00 00 10 00
scsi0 (6:0) New error handler wants device reset
0x2a 00 00 14 0e 5c 00 00 08 00
scsi0 (6:0) New error handler wants BUS reset, cmd 10040c00
0x2a 00 00 14 0e 5c 00 00 08 00
scsi0: Bus Reset detected, executing command 00000000, slot 00000000, dsp
000504a8[04a8]
failing command because of reset, slot 00010520, cmnd 10040400
failing command because of reset, slot 00010654, cmnd 10040200
failing command because of reset, slot 00010788, cmnd 10040600
failing command because of reset, slot 000108bc, cmnd 10041e00
failing command because of reset, slot 000109f0, cmnd 10041200
failing command because of reset, slot 00010b24, cmnd 10041c00
failing command because of reset, slot 00010c58, cmnd 10040800
failing command because of reset, slot 00010ec0, cmnd 10040a00
failing command because of reset, slot 00010ff4, cmnd 10041400
failing command because of reset, slot 00011128, cmnd 10040c00
failing command because of reset, slot 0001125c, cmnd 10041000
failing command because of reset, slot 00011390, cmnd 10040000
failing command because of reset, slot 000114c4, cmnd 10041600
failing command because of reset, slot 000115f8, cmnd 10041800
failing command because of reset, slot 0001172c, cmnd 10041a00
scsi0: (6:0) Synchronous at offset 8, period 100ns
scsi0 (6:0) New error handler wants to abort command
0x03 00 00 00 40 00
scsi: device set offline - not ready or command retry failed after bus reset:
host 0 channel 0 id 6 lun 0
Helge
On Sunday 03 February 2002 18:29, Helge Deller wrote:
> Hi James,
>
> today my 715/64 completely stopped after some "dpkg -i"-ing.
> According to the following lines from the serial console the new
> SCSI driver stopped talking to my harddisk.
> I'm sending this bug report now, after that this has happened
> ~3 times in the last 2 months.
> Any ideas ?
>
> TIA,
> Helge
>
> This is what I got on my serial console. Sadly some things before it
> are missing:
>
> scsi0 (6:0) New error handler wants to abort command
> 0x00 00 00 00 00 00
> scsi: device set offline - not ready or command retry failed after bus
> reset: host 0 channel 0 id 6 lun 0
> scsi0 (6:0) New error handler wants to abort command
> 0x00 00 00 00 00 00
> scsi: device set offline - not ready or command retry failed after bus
> reset: host 0 channel 0 id 6 lun 0
> scsi0 (6:0) New error handler wants to abort command
> 0x00 00 00 00 00 00
> scsi: device set offline - not ready or command retry failed after bus
> reset: host 0 channel 0 id 6 lun 0
>
>
> from the dmesg:
> SCSI subsystem driver Revision: 1.00
> 53c700: Version 2.6 By James.Bottomley@HansenPartnership.com
> scsi0: 53c710 rev 2
> scsi0 : LASI SCSI 53c700
> scsi0: (6:0) Synchronous at offset 8, period 100ns
> Vendor: QUANTUM Model: FIREBALL_TM3200S Rev: 300X
> Type: Direct-Access ANSI SCSI revision: 02
> Attached scsi disk sda at scsi0, channel 0, id 6, lun 0
> scsi0: (6:0) Enabling Tag Command Queuing
> SCSI device sda: 6281856 512-byte hdwr sectors (3216 MB)
> Partition check:
> sda: sda1 sda2 sda3 sda4
^ permalink raw reply [flat|nested] 4+ messages in thread
* [parisc-linux] Re: 53c700 (LASI SCSI 53c700) hang
2002-02-03 18:29 ` [parisc-linux] " Helge Deller
@ 2002-02-04 19:50 ` James Bottomley
2002-02-05 1:27 ` Carlos O'Donell Jr.
0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2002-02-04 19:50 UTC (permalink / raw)
To: Helge Deller; +Cc: James.Bottomley, parisc-linux
What these errors tell me is that your HD accepted more tags than it could
cope with and then choked. Linux error handler isn't very good at handling
this situation. Also, your disc:
deller@gmx.de said:
> Vendor: QUANTUM Model: FIREBALL_TM3200S Rev: 300X
Is a known trouble causer with tag command queueing. Initially, try taking
the #define NCR_700_MAX_TAGS in drivers/scsi/53c700.h down to 4 or 2 and
recompiling the driver. Alternatively, turn off tagged command queueing
altogether by commenting out this block of code:
if(SCp->device->tagged_supported && !SCp->device->tagged_queue
&& (hostdata->tag_negotiated &(1<<SCp->target)) == 0
&& NCR_700_is_flag_clear(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING)) {
/* upper layer has indicated tags are supported. We don't
* necessarily believe it yet.
*
* NOTE: There is a danger here: the mid layer supports
* tag queuing per LUN. We only support it per PUN because
* of potential reselection issues */
printk(KERN_INFO "scsi%d: (%d:%d) Enabling Tag Command Queuing\n", SCp->
device->host->host_no, SCp->target, SCp->lun);
hostdata->tag_negotiated |= (1<<SCp->target);
NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING);
SCp->device->tagged_queue = 1;
}
in drivers/scsi/53c700.c at about line 1891.
I am getting around to adding the code changes to make this able to be done as
module/kernel command line options.
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [parisc-linux] Re: 53c700 (LASI SCSI 53c700) hang
2002-02-04 19:50 ` James Bottomley
@ 2002-02-05 1:27 ` Carlos O'Donell Jr.
0 siblings, 0 replies; 4+ messages in thread
From: Carlos O'Donell Jr. @ 2002-02-05 1:27 UTC (permalink / raw)
To: James Bottomley, parisc-linux
> What these errors tell me is that your HD accepted more tags than it could
> cope with and then choked. Linux error handler isn't very good at handling
> this situation. Also, your disc:
>
> deller@gmx.de said:
> > Vendor: QUANTUM Model: FIREBALL_TM3200S Rev: 300X
>
> Is a known trouble causer with tag command queueing. Initially, try taking
> the #define NCR_700_MAX_TAGS in drivers/scsi/53c700.h down to 4 or 2 and
> recompiling the driver. Alternatively, turn off tagged command queueing
> altogether by commenting out this block of code:
>
> I am getting around to adding the code changes to make this able to be done as
> module/kernel command line options.
>
> James
>
I've been having problems with the driver for quite some time now.
SCSI subsystem driver Revision: 1.00
53c700: consistent memory allocation failed
53c700: Version 2.6 By James.Bottomley@HansenPartnership.com
scsi0: 53c700 rev 0
scsi0 : LASI SCSI 53c700
Vendor: FUJITSU Model: M2694ES-512 Rev: 8134
Type: Direct-Access ANSI SCSI revision: 02
Attached scsi disk sda at scsi0, channel 0, id 6, lun 0
SCSI device sda: 2117025 512-byte hdwr sectors (1084 MB)
Partition check:
sda: sda1 sda2
Compiled kernel with tag queue code _always_ disabled (2.4.17-pa18 from CVS).
#ifdef NEVERCOMIPLE
if(SCp->device->tagged_supported && !SCp->device->tagged_queue
&& (hostdata->tag_negotiated &(1<<SCp->target)) == 0
&& NCR_700_is_flag_clear(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING)) {
/* upper layer has indicated tags are supported. We don't
* necessarily believe it yet.
*
* NOTE: There is a danger here: the mid layer supports
* tag queuing per LUN. We only support it per PUN because
* of potential reselection issues */
printk(KERN_INFO "scsi%d: (%d:%d) Enabling Tag Command Queuing\n", SCp->device->host->host_no, SCp->target, SCp->lun);
hostdata->tag_negotiated |= (1<<SCp->target);
NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_TAG_QUEUEING);
SCp->device->tagged_queue = 1;
}
#endif
in drivers/scsi/53c700.c at about line 1891.
Start up one of those real-world scripts :}
#!/bin/tcsh
while ( 1 )
find /bin | xargs cat > /dev/null
find /boot | xargs cat > /dev/null
find /etc | xargs cat > /dev/null
find /root | xargs cat > /dev/null
find /sbin | xargs cat > /dev/null
find /tmp | xargs cat > /dev/null
find /usr | xargs cat > /dev/null
find /var | xargs cat > /dev/null
end
root@node44:/proc/scsi/lasi700# cat 0
Total commands outstanding: 1
Target Depth Active Next Tag
====== ===== ====== ========
6: 0 16 1 0
10 minutes into the run, the find _and_ cat are D on the process list.
The drive is officially unresponsive around this point... maybe it was
just cat and find you say?
Soon after, kupdated goes into D aswell. From there on in the box is
locking up left right and center. I wish I had kdb and could see what's
going on.
I've repeated this lockup 3 times.
Most intersting is that when I reenable the Tag queueing code but change
the Tag depth to 2 (instead of 16). The machine doesn't seem to hang.
I have a box currently running well over the 10 minute mark that I will
leave running until tommorow.
The sim700 driver runs poorly, but happily for days... generating heat :)
Sadly, the sim700 driver is currently only functionaly with the older kernels.
I'm using 2.4.9-pa25 to run the 715/50's in our cluster (diskless boxes run
the latest kernel no problems).
Any thoughts?
Is the issue as simple as:
Leave Tag queuing in, but set depth to something low (2 or 4).
Good: Tag Queu, Depth = 2
Bad: No Tag Queue.
Tag Queue, Depth = 16.
c.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-02-05 1:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-03 17:29 [parisc-linux] 53c700 (LASI SCSI 53c700) hang Helge Deller
2002-02-03 18:29 ` [parisc-linux] " Helge Deller
2002-02-04 19:50 ` James Bottomley
2002-02-05 1:27 ` Carlos O'Donell Jr.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.