public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [2.4.25], [aic79xx-2.0.8] problems with aic79xx on 2.4 (fwd)
@ 2004-04-01 23:41 Abhishek Rai
  2004-04-02 14:13 ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Abhishek Rai @ 2004-04-01 23:41 UTC (permalink / raw)
  To: linux-scsi

Hi,
I have an Adaptec 29320A Ultra320 SCSI host bus adapter with which I
connect to a SCSI array of 9 disks, each of which is capable of 320MBps.
Initially, in the BIOS settings, I configure each disk to operate at
160MBps. aic79xx-2.0.8 driver for 2.4 works just fine on 2.4.25 with this
setting. I am able to create a raid device on top of this 9-disk array,
and everything works fine.

Now, I configure each disk in the BIOS settings to operate at 320MBps. I
now start having trouble beginning with aic79xx module insertion itself.

1. modprobe aic79xx.o (version 2.0.8 dated 3/16/2004, latest from justin's
page) gives some warnings/errors (a couple of "overrun detected")
2. sometimes upon insertion, aic79xx.o is able to configure all of the 9
disks while at other times, it misses out on one of them (I couldn't
repeat it for long enough to make sure if it misses out on more than one
or is it just that disk, as this happened only once).
3. After this, I can see all the 9 disks in /proc/scsi/scsi and I try to
build raid on top of this array (mkraid -R /dev/md0). This eventually
causes the kernel to crash with a NULL pointer dereference. But before
that a lot of stuff happens. The Internet url to my log file is at the
bottom of this email. The log file shows the trace following insmod of
aic79xx (2.0.8, dated 3/16/2004), and later shows what all happens when a
mkraid is started.

4. In addition, during this mkraid, there are no LEDs glowing on any of
the 9 disks (as happens in the OK case, when the disks operate at 160MBps)
while the driver times out again and again (as the following log shows)
and the proc/mdstat keeps showing an increasingly long time until
completion.


The log files are at:

http://www.fsl.cs.sunysb.edu/~abba/scsi-raid-1.txt
(this instance ended with a kernel NULL pointer dereference within 2
minutes, couldn't ksymoops it)

http://www.fsl.cs.sunysb.edu/~abba/scsi-raid-2.txt
(this instance ran with errors and without any actual disk activity (no
LEDs glowing) for around 90 minutes before I terminated it).


Abhishek


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [2.4.25], [aic79xx-2.0.8] problems with aic79xx on 2.4 (fwd)
  2004-04-01 23:41 Abhishek Rai
@ 2004-04-02 14:13 ` Matthew Wilcox
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2004-04-02 14:13 UTC (permalink / raw)
  To: Abhishek Rai; +Cc: linux-scsi

On Thu, Apr 01, 2004 at 06:41:47PM -0500, Abhishek Rai wrote:
> I have an Adaptec 29320A Ultra320 SCSI host bus adapter with which I
> connect to a SCSI array of 9 disks, each of which is capable of 320MBps.
> Initially, in the BIOS settings, I configure each disk to operate at
> 160MBps. aic79xx-2.0.8 driver for 2.4 works just fine on 2.4.25 with this
> setting. I am able to create a raid device on top of this 9-disk array,
> and everything works fine.
> 
> Now, I configure each disk in the BIOS settings to operate at 320MBps. I
> now start having trouble beginning with aic79xx module insertion itself.

Are you sure your cables are within spec (length and quality) for
320MB/s operation?

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [2.4.25], [aic79xx-2.0.8] problems with aic79xx on 2.4 (fwd)
@ 2004-04-06 19:21 Ari
  2004-04-06 21:24 ` David Haring
  0 siblings, 1 reply; 5+ messages in thread
From: Ari @ 2004-04-06 19:21 UTC (permalink / raw)
  To: linux-scsi; +Cc: abba

On Thu, Apr 01, 2004 at 06:41:47PM -0500, Abhishek Rai wrote:
> I have an Adaptec 29320A Ultra320 SCSI host bus adapter with which I
> connect to a SCSI array of 9 disks, each of which is capable of 320MBps.
> Initially, in the BIOS settings, I configure each disk to operate at
> 160MBps. aic79xx-2.0.8 driver for 2.4 works just fine on 2.4.25 with this
> setting. I am able to create a raid device on top of this 9-disk array,
> and everything works fine.
> 
> Now, I configure each disk in the BIOS settings to operate at 320MBps. I
> now start having trouble beginning with aic79xx module insertion itself.

Hi,
I've ran into similar problems with the same controller and same disks
(Maxtor Atlas 10K4 73G WLS, mine are revision DFV0 however).
After a lot of frustration including changing all related hardware,
especially the cabling, resultless googling, etc, I found the solution:
Speeding down the disks does not really help. It works better, but under
stress load (3 bonnie++ at the same time) they quickly produce lost
SCB commands, even at 80Mhz.
What fixed it for me was limiting the Tagged Queuing depth to 16 (in
kernel config). After my raid5 array worked flawlessly under a 24h
3 simultaneous bonnie++ test.
I've found some hints googling that there might be a firmware problem
related to this (which should be fixed with the DFV0 version on my
drives, but obviously is not).
Anyhow, try to reduce tagged queuing to 16, if that fixes it, it will
also confirm my findings.

Best Regards
Andy



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [2.4.25], [aic79xx-2.0.8] problems with aic79xx on 2.4 (fwd)
  2004-04-06 19:21 [2.4.25], [aic79xx-2.0.8] problems with aic79xx on 2.4 (fwd) Ari
@ 2004-04-06 21:24 ` David Haring
  2004-04-15 16:42   ` Abhishek Rai
  0 siblings, 1 reply; 5+ messages in thread
From: David Haring @ 2004-04-06 21:24 UTC (permalink / raw)
  To: linux-scsi; +Cc: Ari, abba

Hello,

I would recommend reducing TCQ depth too - I have also had
troubles with recent Maxtor Atlas disks / Adaptec / aic79xx
and reducing TCQ depth solved the problem.

Interesting note though - the troubles only showed with Adaptec
controller. When I have replaced the Adaptec aic79xx SCSI card
with Tekram Ultra320 one the system worked without problem.

Ari wrote:
> 
> Hi,
> I've ran into similar problems with the same controller and same disks
> (Maxtor Atlas 10K4 73G WLS, mine are revision DFV0 however).
> After a lot of frustration including changing all related hardware,
> especially the cabling, resultless googling, etc, I found the solution:
> Speeding down the disks does not really help. It works better, but under
> stress load (3 bonnie++ at the same time) they quickly produce lost
> SCB commands, even at 80Mhz.
> What fixed it for me was limiting the Tagged Queuing depth to 16 (in
> kernel config). After my raid5 array worked flawlessly under a 24h
> 3 simultaneous bonnie++ test.
> I've found some hints googling that there might be a firmware problem
> related to this (which should be fixed with the DFV0 version on my
> drives, but obviously is not).
> Anyhow, try to reduce tagged queuing to 16, if that fixes it, it will
> also confirm my findings.

David Haring



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [2.4.25], [aic79xx-2.0.8] problems with aic79xx on 2.4 (fwd)
  2004-04-06 21:24 ` David Haring
@ 2004-04-15 16:42   ` Abhishek Rai
  0 siblings, 0 replies; 5+ messages in thread
From: Abhishek Rai @ 2004-04-15 16:42 UTC (permalink / raw)
  To: David Haring; +Cc: linux-scsi, Ari, Erez Zadok



> I would recommend reducing TCQ depth too - I have also had
> troubles with recent Maxtor Atlas disks / Adaptec / aic79xx
> and reducing TCQ depth solved the problem.
>
> Interesting note though - the troubles only showed with Adaptec
> controller. When I have replaced the Adaptec aic79xx SCSI card
> with Tekram Ultra320 one the system worked without problem.
>
> Ari wrote:
> >
> > Hi,
> > I've ran into similar problems with the same controller and same disks
> > (Maxtor Atlas 10K4 73G WLS, mine are revision DFV0 however).
> > After a lot of frustration including changing all related hardware,
> > especially the cabling, resultless googling, etc, I found the solution:
> > Speeding down the disks does not really help. It works better, but under
> > stress load (3 bonnie++ at the same time) they quickly produce lost
> > SCB commands, even at 80Mhz.
> > What fixed it for me was limiting the Tagged Queuing depth to 16 (in
> > kernel config). After my raid5 array worked flawlessly under a 24h
> > 3 simultaneous bonnie++ test.
> > I've found some hints googling that there might be a firmware problem
> > related to this (which should be fixed with the DFV0 version on my
> > drives, but obviously is not).
> > Anyhow, try to reduce tagged queuing to 16, if that fixes it, it will
> > also confirm my findings.
>
> David Haring
>
>
>



Yes it worked ! reducing tcq depth from 32 to 16. In summary: With a
tagged queue depth of 32 and at 320MBps per disk, when I use a
single disk out of the 9 disk scsi array I have, all's well. However when
I try to use all of them (as in a raid), they fail terribly. Reducinig
disk speeds to 160MBps resolves the problem and alls fine again. But of
course I need higher speeds. For that, reducing TCQ to 16 while still
having each disk operate at 320MBps works fine.

thanks
Abhishek

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-04-15 16:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-06 19:21 [2.4.25], [aic79xx-2.0.8] problems with aic79xx on 2.4 (fwd) Ari
2004-04-06 21:24 ` David Haring
2004-04-15 16:42   ` Abhishek Rai
  -- strict thread matches above, loose matches on Subject: below --
2004-04-01 23:41 Abhishek Rai
2004-04-02 14:13 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox