* RE: aacraid died on kernel 2.4.27
@ 2005-03-11 16:06 Salyzyn, Mark
2005-03-11 20:31 ` Nic Ferrier
0 siblings, 1 reply; 6+ messages in thread
From: Salyzyn, Mark @ 2005-03-11 16:06 UTC (permalink / raw)
To: Nic Ferrier, linux-scsi
This is a processor based RAID card, so stability and complexity is
virtually all rooted in the card. The problems you are experiencing are
most probably specific to your platform, and do not show up in other
adapters and systems that communicate with this same driver. Although
the driver can be used to mitigate some problems in the adapter
Firmware, it can not solve Power Supply, Hardware, Internal Firmware
Failures, Drive or Cabling issues. All these issues need to be resolved
first by communicating with Dell Technical Support.
Sadly, all these issues can end up locking up the card or the scsi bus.
The end result being the scsi system timing out, taking the devices
offline, then the file-system driver panicking. Similar symptoms, wide
variety of causes.
Sincerely -- Mark Salyzyn
-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Nic Ferrier
Sent: Thursday, March 10, 2005 4:02 PM
To: linux-scsi@vger.kernel.org
Subject: aacraid died on kernel 2.4.27
I've been trying, without success, to get the aacraid driver to work
reliably on a Dell 2650 Poweredge.
I'm using Debian so I tried kernel 2.6.8 first (Debian has it packaged
nicely but from a SCSI driver point of view it's just a standard
kernel.org source).
2.6.8 worked... but died as soon as we put the server under load.
2.6.11 also died under load.
Yesterday I put 2.4.27 on the box because I had understood that the
aacraid is stable in that release of the kernel. But a few hours ago
the box died in exactly the same way (it was under quite heavy load).
Unfortunately, I can't give you error messages because I don't have
any log from the failures. It comes out on the console and I don't
have it saved anywhere. But it definitely is the raid controller.
Maybe someone can answer the following for me:
- is the driver understood to be stable in 2.4.27?
- is there another driver I could try (would the pre-Cox one work?)
- is there anything I can do to alleviate the problem?
Nic Ferrier
http://www.tapsellferrier.co.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aacraid died on kernel 2.4.27
2005-03-11 16:06 aacraid died on kernel 2.4.27 Salyzyn, Mark
@ 2005-03-11 20:31 ` Nic Ferrier
2005-03-11 21:34 ` Andrew Kinney
0 siblings, 1 reply; 6+ messages in thread
From: Nic Ferrier @ 2005-03-11 20:31 UTC (permalink / raw)
To: Salyzyn, Mark; +Cc: linux-scsi
"Salyzyn, Mark" <mark_salyzyn@adaptec.com> writes:
> This is a processor based RAID card, so stability and complexity is
> virtually all rooted in the card. The problems you are experiencing are
> most probably specific to your platform, and do not show up in other
> adapters and systems that communicate with this same driver. Although
> the driver can be used to mitigate some problems in the adapter
> Firmware, it can not solve Power Supply, Hardware, Internal Firmware
> Failures, Drive or Cabling issues. All these issues need to be resolved
> first by communicating with Dell Technical Support.
>
> Sadly, all these issues can end up locking up the card or the scsi bus.
> The end result being the scsi system timing out, taking the devices
> offline, then the file-system driver panicking. Similar symptoms, wide
> variety of causes.
The machine I am having trouble with has been running MS Windows for 2
years.
I just put linux on it (with no other changes) and we get regular
(twice daily) catastrophic crashes.
Can this be a controller problem? I'm not a hardware expert but it
doesn't sound like one to me.
Nic
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aacraid died on kernel 2.4.27
2005-03-11 20:31 ` Nic Ferrier
@ 2005-03-11 21:34 ` Andrew Kinney
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Kinney @ 2005-03-11 21:34 UTC (permalink / raw)
To: Nic Ferrier; +Cc: linux-scsi
On 11 Mar 2005 at 20:31, Nic Ferrier wrote:
> The machine I am having trouble with has been running MS Windows for 2
> years.
>
> I just put linux on it (with no other changes) and we get regular
> (twice daily) catastrophic crashes.
>
> Can this be a controller problem? I'm not a hardware expert but it
> doesn't sound like one to me.
It can be a controller problem, but it can also be a drive problem,
cable problem, firmware problem, or a backplane problem. We had a
similar instance that was resolved by replacing the drive with a
different brand, replacing the backplane, replacing the cabling,
replacing the ROMB, and getting the newest firmware. Now, drives
fail gracefully instead of taking the whole container offline. Who
knows what the actual cause was, but the problem is fixed and that's
what I was looking for.
Like Mark S. said, many causes, one symptom. That Dell trouble
ticket is going to be the best way to get it solved. Their Linux
guys have seen it all and can escalate it to an engineer if they
haven't. They're going to ask you for the diagnostic output of
afacli, so you'll want to get that installed if you haven't already.
They can also swap in new components for you.
Sincerely,
Andrew Kinney
President and
Chief Technology Officer
Advantagecom Networks, Inc.
http://www.advantagecom.net
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: aacraid died on kernel 2.4.27
@ 2005-03-11 20:54 Salyzyn, Mark
0 siblings, 0 replies; 6+ messages in thread
From: Salyzyn, Mark @ 2005-03-11 20:54 UTC (permalink / raw)
To: Nic Ferrier; +Cc: linux-scsi
The I/O patterns are just not the same. Windoze typically can't even
approach more than 32 commands outstanding to the controller. And only
recently has the Cache bug been solved in the ROMB Firmware.
Sincerely -- Mark Salyzyn
-----Original Message-----
From: Nic Ferrier [mailto:nferrier@tapsellferrier.co.uk]
Sent: Friday, March 11, 2005 3:32 PM
To: Salyzyn, Mark
Cc: linux-scsi@vger.kernel.org
Subject: Re: aacraid died on kernel 2.4.27
"Salyzyn, Mark" <mark_salyzyn@adaptec.com> writes:
> This is a processor based RAID card, so stability and complexity is
> virtually all rooted in the card. The problems you are experiencing
are
> most probably specific to your platform, and do not show up in other
> adapters and systems that communicate with this same driver. Although
> the driver can be used to mitigate some problems in the adapter
> Firmware, it can not solve Power Supply, Hardware, Internal Firmware
> Failures, Drive or Cabling issues. All these issues need to be
resolved
> first by communicating with Dell Technical Support.
>
> Sadly, all these issues can end up locking up the card or the scsi
bus.
> The end result being the scsi system timing out, taking the devices
> offline, then the file-system driver panicking. Similar symptoms, wide
> variety of causes.
The machine I am having trouble with has been running MS Windows for 2
years.
I just put linux on it (with no other changes) and we get regular
(twice daily) catastrophic crashes.
Can this be a controller problem? I'm not a hardware expert but it
doesn't sound like one to me.
Nic
^ permalink raw reply [flat|nested] 6+ messages in thread
* aacraid died on kernel 2.4.27
@ 2005-03-10 21:01 Nic Ferrier
2005-03-11 6:04 ` Ryan Anderson
0 siblings, 1 reply; 6+ messages in thread
From: Nic Ferrier @ 2005-03-10 21:01 UTC (permalink / raw)
To: linux-scsi
I've been trying, without success, to get the aacraid driver to work
reliably on a Dell 2650 Poweredge.
I'm using Debian so I tried kernel 2.6.8 first (Debian has it packaged
nicely but from a SCSI driver point of view it's just a standard
kernel.org source).
2.6.8 worked... but died as soon as we put the server under load.
2.6.11 also died under load.
Yesterday I put 2.4.27 on the box because I had understood that the
aacraid is stable in that release of the kernel. But a few hours ago
the box died in exactly the same way (it was under quite heavy load).
Unfortunately, I can't give you error messages because I don't have
any log from the failures. It comes out on the console and I don't
have it saved anywhere. But it definitely is the raid controller.
Maybe someone can answer the following for me:
- is the driver understood to be stable in 2.4.27?
- is there another driver I could try (would the pre-Cox one work?)
- is there anything I can do to alleviate the problem?
Nic Ferrier
http://www.tapsellferrier.co.uk
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aacraid died on kernel 2.4.27
2005-03-10 21:01 Nic Ferrier
@ 2005-03-11 6:04 ` Ryan Anderson
0 siblings, 0 replies; 6+ messages in thread
From: Ryan Anderson @ 2005-03-11 6:04 UTC (permalink / raw)
To: Nic Ferrier; +Cc: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 961 bytes --]
On Thu, Mar 10, 2005 at 09:01:50PM +0000, Nic Ferrier wrote:
> I've been trying, without success, to get the aacraid driver to work
> reliably on a Dell 2650 Poweredge.
I've had problems there, too. With kernel 2.6.8.
I just had another failure, and I got some better logs.
Can you install the "afacli" utility (Dell-branded version of aacli),
and run
diag show history /old
from afacli?
If you can't seem to get the OLD history to show (compare diag show
history /current with diag shot history /old), try diag dump text - you
may find some useful messages there.
The other, standard, suggestion is to go through and open a ticket with
Dell, and do all the hardware diagnostics to try and see if the error
can be traced to hardware problems or not.
Good luck, I hope you find a fix faster than I have (still trying,
unfortunately)
--
Ryan Anderson
AutoWeb Communications, Inc.
email: ryan@autoweb.net
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-03-11 21:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-11 16:06 aacraid died on kernel 2.4.27 Salyzyn, Mark
2005-03-11 20:31 ` Nic Ferrier
2005-03-11 21:34 ` Andrew Kinney
-- strict thread matches above, loose matches on Subject: below --
2005-03-11 20:54 Salyzyn, Mark
2005-03-10 21:01 Nic Ferrier
2005-03-11 6:04 ` Ryan Anderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox