public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RE: Reproducible SCSI Error with Adaptec 7902 & ST336607LW
@ 2003-03-18 15:58 Cress, Andrew R
  2003-03-18 16:26 ` Terry Barnaby
  0 siblings, 1 reply; 4+ messages in thread
From: Cress, Andrew R @ 2003-03-18 15:58 UTC (permalink / raw)
  To: 'Terry Barnaby'; +Cc: linux-kernel

Terry,

>From your description in (1) below it sounds like the disk firmware did go
out to lunch.
Disk firmware is complex, and there are always possibilities that it can
hang/crash so that it won't respond, especially since a new disk model comes
out every 9-12 months.  

To get a disk firmware problem escalated you will need to gather some
evidence:
  1) The current firmware level (you have this already)
  2) The mode pages of the drive (you can use sgmode or other tools to
get/set these)
     See http://scsirastools.sourceforge.net for sgmode
     One thing to check, is whether SMART on or off has an effect (page 1c
0a 88 means completely off).  
     SMART processing on the drive runs in background on the disk and can
cause strange errors.
  3) A SCSI trace of the problem (requires a SCSI analyzer)
     If you don't have a SCSI analyzer, the bug conditions would have to be
very well-defined so 
     that it could be reproduced readily by Seagate.

Hopefully you can devise a workaround by tweaking the disk mode pages, since
reporting, analyzing, and producing a new disk firmware version would take
longer.

Andy

-----Original Message-----
From: Terry Barnaby [mailto:terry@beam.ltd.uk] 
Sent: Tuesday, March 18, 2003 4:38 AM
To: Cress, Andrew R
Cc: 'Ingo Oeser'; Michael Madore; Justin T. Gibbs;
linux-kernel@vger.kernel.org
Subject: Re: Reproducible SCSI Error with Adaptec 7902


Hi Andy,

We have just updated to the latest driver 1.3.4. This has stopped the
drive locking up, but we are now getting nasty SCSI error reports
in /var/log/messages. Will continue to delve into this.

However, what ever the fault that triggers our drive to lock-up, the
drive certainly locks up. It locks up with LED on and will not respond
to a SCSI bus reset. We need to power cycle the system to get the drive
working again. We have tried two Seagate ST336607LW drives both exibit
the same behaviour. It appears to only happen when Linux is running in
SMP mode and when the drive is running in packetized mode.

So there is certainly the possibility of the Seagate ST336607LW not 
responding to resets. This may be a firmware fault so we have talked
to Seagate about the issue. The statement is the result of our direct 
question:

> I realise that the problem could be due to the Linux SCSI driver, the
Motherboard SCSI controller, the SCSI lead or the drive. We are used to
> tracking down such nasty problems. However, I have one firm pointer:
> 
> 1. Once the drive is locked up, with its LED on, a SCSI bus reset will
>     not clear the drive. A full poweroff/poweron cycle is needed.
> 
> So I ask again, is there a case where the drive will not respond to a
> SCSI bus reset ? 

Is there any way of getting this information to higher level Seagate 
support ?

Terry


Cress, Andrew R wrote:
> Ingo,
> 
> Our testing with that drive (same firmware, using same aic7902 chipset)
has
> not shown any problems like this.  However, we were using a later aic79xx
> driver versions (1.3.x).  That upgrade should be the first step.
> 
> I wouldn't get too excited about the statement by a level-1 Seagate
support
> guy, probably just a blanket statement when they want to disclaim
> responsibility.  
> 
> Andy
> 
> -----Original Message-----
> From: Ingo Oeser [mailto:ingo.oeser@informatik.tu-chemnitz.de] 
> Sent: Saturday, March 15, 2003 8:12 AM
> To: Terry Barnaby
> Cc: Michael Madore; Justin T. Gibbs; linux-kernel@vger.kernel.org
> Subject: Re: Reproducible SCSI Error with Adaptec 7902
> 
> 
> On Fri, Mar 14, 2003 at 04:17:59PM +0000, Terry Barnaby wrote:
> 
>>The Seagate ST336607LW has firmware: 0004.
>>Seagate have stated to me that this is the latest.
>>They have also stated to me:
>>
>>  Issuing an unrecognized or illegal command to the drive can cause the
>>  drive to go into a hardware fault mode where it will no longer respond,
>>  and may or may not respond to a SCSI BUS reset. It seems, in this case,
>>  the drive will no longer respond to any commands issued by the
>>  controller.
>>
>>Is this "feature" now common on SCSI drives ????
> 
> 
> Could we add a KERN_WARNING printk in sd.c quoting/referencing
> this message on inquiry detecting this device? 
> 
> So sysadmins who are used to SCSI being robust could return the
> drive to their vendors in exchange to a drive working along the
> SCSI specs after reading this message.
> 
> Thanks in the name of the sysadmins.
> 
> Regards
> 
> Ingo Oeser
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Dr Terry Barnaby                     BEAM Ltd
Phone: +44 1454 324512               Northavon Business Center, Dean Rd
Fax:   +44 1454 313172               Yate, Bristol, BS37 5NH, UK
Email: terry@beam.ltd.uk             Web: www.beam.ltd.uk
BEAM for: Visually Impaired X-Terminals, Parallel Processing, Software
                       "Tandems are twice the fun !"

^ permalink raw reply	[flat|nested] 4+ messages in thread
* RE: Reproducible SCSI Error with Adaptec 7902 & ST336607LW
@ 2003-03-20 16:46 Cress, Andrew R
  2003-03-20 16:55 ` Terry Barnaby
  0 siblings, 1 reply; 4+ messages in thread
From: Cress, Andrew R @ 2003-03-20 16:46 UTC (permalink / raw)
  To: 'Terry Barnaby'; +Cc: 'linux-kernel@vger.kernel.org'

Terry,

Did you try changing the mode pages on the Seagate ST336607LW to turn off
SMART?
The symptoms indicate to me that SMART might be an issue.

Andy

-----Original Message-----
From: Terry Barnaby [mailto:terry@beam.ltd.uk] 
Sent: Thursday, March 20, 2003 5:07 AM
To: linux-kernel@vger.kernel.org
Cc: Justin T. Gibbs; mmadore@aslab.com
Subject: Re: Reproducible SCSI Error with Adaptec 7902


Hi,

We have continued to try and get to the bottom of the problem we have
with the Seagate ST336607LW drive with an Adaptec 7902 SCSI controller
under Linux on an SMP machine. We have recently tried the latest
Adaptec Linux driver (1.3.4) from Justin Gibbs who is one of the Adaptec
SCSI driver developers. This has stopped the drive locking up but now
lists SCSI errors in the log files. I enclose a portion of this log
file. I have run the error logs past Justin and he has stated:

"The drive has unexpectedly dropped off the bus during a connection.
Without a SCSI bus trace it is impossible to know why the drive might
have done this or if perhaps a glitch on the BSY line is causing the
controller to detect a spurious busfree."

My current conclusions are:

1. The Seagate ST336607LW drive has a bug where in certain circumstances
	the drive can lock up, with LED on. In this state it will not
	respond to a hardware reset and a power off/on cycle is needed to
	reset the drive. There is a difference between the way the Linux
	Adaptec AIC79XX 1.1.0 driver and the 1.3.4 driver handles a SCSI
	error condition that triggers this behaviour.

2. There is a problem with one of the following: The Seagate ST336607LW
drive,
	the Adaptec 7902 SCSI controller on the SuperMicro X5DA8 Motherboard
or
	the Linux AIC79XX driver that causes a SCSI bus fault.

I am now giving up with Seagate ST336607LW drive and intend to try a
Maxtor Atlas 10K IV drive instead.
I include this information to hopefully assist others who may encounter this
problem and to list the bugs so that those who are in a position to fix them
know about it.

Terry

[...snip...]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-03-20 16:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-18 15:58 Reproducible SCSI Error with Adaptec 7902 & ST336607LW Cress, Andrew R
2003-03-18 16:26 ` Terry Barnaby
  -- strict thread matches above, loose matches on Subject: below --
2003-03-20 16:46 Cress, Andrew R
2003-03-20 16:55 ` Terry Barnaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox