* [PATCH] 2.6 aacraid: Fix for controller load based timeouts
@ 2005-07-08 17:36 Mark Haverkamp
2005-07-08 17:41 ` Ryan Anderson
0 siblings, 1 reply; 7+ messages in thread
From: Mark Haverkamp @ 2005-07-08 17:36 UTC (permalink / raw)
To: James Bottomley; +Cc: Mark Salyzyn, linux-scsi, Martin Drab
Martin Drab found that he could get aacraid timeouts with high load on
his controller / disk drive combinations. After some experimentation
Mark Salyzyn has come up with a patch to reduce the default max_sectors
to something that will keep the controller from being overloaded and
will eliminate the timeout issues.
Patch against scsi-misc-2.6 git tree.
Signed-off-by: Mark Haverkamp <markh@osdl.org>
Index: scsi-misc-2.6/drivers/scsi/aacraid/aacraid.h
===================================================================
--- scsi-misc-2.6.orig/drivers/scsi/aacraid/aacraid.h 2005-07-08 09:22:20.000000000 -0700
+++ scsi-misc-2.6/drivers/scsi/aacraid/aacraid.h 2005-07-08 09:23:52.000000000 -0700
@@ -15,11 +15,7 @@
#define AAC_MAX_LUN (8)
#define AAC_MAX_HOSTPHYSMEMPAGES (0xfffff)
-/*
- * max_sectors is an unsigned short, otherwise limit is 0x100000000 / 512
- * Linux has starvation problems if we permit larger than 4MB I/O ...
- */
-#define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)8192)
+#define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)512)
/*
* These macros convert from physical channels to virtual channels
Index: scsi-misc-2.6/drivers/scsi/aacraid/linit.c
===================================================================
--- scsi-misc-2.6.orig/drivers/scsi/aacraid/linit.c 2005-06-27 09:57:38.000000000 -0700
+++ scsi-misc-2.6/drivers/scsi/aacraid/linit.c 2005-07-08 09:23:52.000000000 -0700
@@ -374,7 +374,8 @@
else
scsi_adjust_queue_depth(sdev, 0, 1);
- if (host->max_sectors < AAC_MAX_32BIT_SGBCOUNT)
+ if (!(((struct aac_dev *)host->hostdata)->adapter_info.options
+ & AAC_OPT_NEW_COMM))
blk_queue_max_segment_size(sdev->request_queue, 65536);
return 0;
--
Mark Haverkamp <markh@osdl.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] 2.6 aacraid: Fix for controller load based timeouts
2005-07-08 17:36 [PATCH] 2.6 aacraid: Fix for controller load based timeouts Mark Haverkamp
@ 2005-07-08 17:41 ` Ryan Anderson
2005-07-08 17:59 ` Mark Haverkamp
2005-07-08 18:17 ` Martin Drab
0 siblings, 2 replies; 7+ messages in thread
From: Ryan Anderson @ 2005-07-08 17:41 UTC (permalink / raw)
To: Mark Haverkamp; +Cc: James Bottomley, Mark Salyzyn, linux-scsi, Martin Drab
[-- Attachment #1: Type: text/plain, Size: 841 bytes --]
On Fri, 2005-07-08 at 10:36 -0700, Mark Haverkamp wrote:
> Martin Drab found that he could get aacraid timeouts with high load on
> his controller / disk drive combinations. After some experimentation
> Mark Salyzyn has come up with a patch to reduce the default max_sectors
> to something that will keep the controller from being overloaded and
> will eliminate the timeout issues.
Would hitting this timeout issue cause the container to go offline?
If so, I think this may fix the issues I was having 6 months ago. (We
ended up taking the aacraid controller out of our production
environment, in frustration.)
I'll try to get some testing time in on this next week, though, the
problems I've run into were very hard to reproduce on demand.
--
Ryan Anderson
AutoWeb Communications, Inc.
email: ryan@autoweb.net
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] 2.6 aacraid: Fix for controller load based timeouts
2005-07-08 17:41 ` Ryan Anderson
@ 2005-07-08 17:59 ` Mark Haverkamp
2005-07-08 18:17 ` Martin Drab
1 sibling, 0 replies; 7+ messages in thread
From: Mark Haverkamp @ 2005-07-08 17:59 UTC (permalink / raw)
To: ryan; +Cc: James Bottomley, Mark Salyzyn, linux-scsi, Martin Drab
On Fri, 2005-07-08 at 13:41 -0400, Ryan Anderson wrote:
> On Fri, 2005-07-08 at 10:36 -0700, Mark Haverkamp wrote:
> > Martin Drab found that he could get aacraid timeouts with high load on
> > his controller / disk drive combinations. After some experimentation
> > Mark Salyzyn has come up with a patch to reduce the default max_sectors
> > to something that will keep the controller from being overloaded and
> > will eliminate the timeout issues.
>
> Would hitting this timeout issue cause the container to go offline?
Yes, the usual results of the overload timeouts are that the container
goes offline.
>
> If so, I think this may fix the issues I was having 6 months ago. (We
> ended up taking the aacraid controller out of our production
> environment, in frustration.)
>
> I'll try to get some testing time in on this next week, though, the
> problems I've run into were very hard to reproduce on demand.
>
--
Mark Haverkamp <markh@osdl.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] 2.6 aacraid: Fix for controller load based timeouts
@ 2005-07-08 18:00 Salyzyn, Mark
2005-07-08 18:22 ` Martin Drab
0 siblings, 1 reply; 7+ messages in thread
From: Salyzyn, Mark @ 2005-07-08 18:00 UTC (permalink / raw)
To: ryan, Mark Haverkamp; +Cc: James Bottomley, linux-scsi, Martin Drab
Yes containers will go offline, but this fix is for a recent change to
the driver; if it was 6 months ago, it was a totally different problem.
You probably can resolve your problems by making sure you have the
latest Firmware. I don't believe there are any changes in the driver in
the past 6 months that would have worked around any
Firmware/Hardware/Compatibility issues.
Sadly, anything that goes wrong (including card, power supply, drives)
can cause containers to go offline; it is a pretty generic symptom to a
multitude of possible problems. Martin's initial problems were
associated with using the WD JD drives, which are not compatible with
RAID cards because their internal error recovery paths.
Sincerely -- Mark Salyzyn
-----Original Message-----
From: Ryan Anderson [mailto:ryan@autoweb.net]
Sent: Friday, July 08, 2005 1:42 PM
To: Mark Haverkamp
Cc: James Bottomley; Salyzyn, Mark; linux-scsi; Martin Drab
Subject: Re: [PATCH] 2.6 aacraid: Fix for controller load based timeouts
On Fri, 2005-07-08 at 10:36 -0700, Mark Haverkamp wrote:
> Martin Drab found that he could get aacraid timeouts with high load on
> his controller / disk drive combinations. After some experimentation
> Mark Salyzyn has come up with a patch to reduce the default
max_sectors
> to something that will keep the controller from being overloaded and
> will eliminate the timeout issues.
Would hitting this timeout issue cause the container to go offline?
If so, I think this may fix the issues I was having 6 months ago. (We
ended up taking the aacraid controller out of our production
environment, in frustration.)
I'll try to get some testing time in on this next week, though, the
problems I've run into were very hard to reproduce on demand.
--
Ryan Anderson
AutoWeb Communications, Inc.
email: ryan@autoweb.net
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] 2.6 aacraid: Fix for controller load based timeouts
2005-07-08 17:41 ` Ryan Anderson
2005-07-08 17:59 ` Mark Haverkamp
@ 2005-07-08 18:17 ` Martin Drab
2005-07-09 10:03 ` Mark Overmeer
1 sibling, 1 reply; 7+ messages in thread
From: Martin Drab @ 2005-07-08 18:17 UTC (permalink / raw)
To: Ryan Anderson; +Cc: Mark Haverkamp, James Bottomley, Mark Salyzyn, linux-scsi
On Fri, 8 Jul 2005, Ryan Anderson wrote:
> On Fri, 2005-07-08 at 10:36 -0700, Mark Haverkamp wrote:
> > Martin Drab found that he could get aacraid timeouts with high load on
> > his controller / disk drive combinations. After some experimentation
> > Mark Salyzyn has come up with a patch to reduce the default max_sectors
> > to something that will keep the controller from being overloaded and
> > will eliminate the timeout issues.
>
> Would hitting this timeout issue cause the container to go offline?
Yes. See my previous report to LKML.
(http://lkml.org/lkml/2005/7/5/194)
> If so, I think this may fix the issues I was having 6 months ago. (We
> ended up taking the aacraid controller out of our production
> environment, in frustration.)
>
> I'll try to get some testing time in on this next week, though, the
> problems I've run into were very hard to reproduce on demand.
Yes, just try to do some heavy copying of large files.
Martin
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] 2.6 aacraid: Fix for controller load based timeouts
2005-07-08 18:00 Salyzyn, Mark
@ 2005-07-08 18:22 ` Martin Drab
0 siblings, 0 replies; 7+ messages in thread
From: Martin Drab @ 2005-07-08 18:22 UTC (permalink / raw)
To: Salyzyn, Mark; +Cc: ryan, Mark Haverkamp, James Bottomley, linux-scsi
On Fri, 8 Jul 2005, Salyzyn, Mark wrote:
...
> Sadly, anything that goes wrong (including card, power supply, drives)
> can cause containers to go offline; it is a pretty generic symptom to a
> multitude of possible problems. Martin's initial problems were
> associated with using the WD JD drives, which are not compatible with
> RAID cards because their internal error recovery paths.
Yes, however a single WD SD drive didn't make it either, and that _should_
be a RAID compatible drive. (Perhaps not with _this_ RAID or I don't
know.)
Martin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] 2.6 aacraid: Fix for controller load based timeouts
2005-07-08 18:17 ` Martin Drab
@ 2005-07-09 10:03 ` Mark Overmeer
0 siblings, 0 replies; 7+ messages in thread
From: Mark Overmeer @ 2005-07-09 10:03 UTC (permalink / raw)
To: Martin Drab
Cc: Ryan Anderson, Mark Haverkamp, James Bottomley, Mark Salyzyn,
linux-scsi
* Martin Drab (drab@kepler.fjfi.cvut.cz) [050708 20:19]:
> > On Fri, 2005-07-08 at 10:36 -0700, Mark Haverkamp wrote:
> > > Martin Drab found that he could get aacraid timeouts with high load on
> > > his controller / disk drive combinations. After some experimentation
> > > Mark Salyzyn has come up with a patch to reduce the default max_sectors
> > > to something that will keep the controller from being overloaded and
> > > will eliminate the timeout issues.
> >
> > Would hitting this timeout issue cause the container to go offline?
>
> Yes. See my previous report to LKML.
> (http://lkml.org/lkml/2005/7/5/194)
>
> > If so, I think this may fix the issues I was having 6 months ago. (We
> > ended up taking the aacraid controller out of our production
> > environment, in frustration.)
The figures you have are comparible to my measurements: ~40MB/s r/w
Which is very poor when each disk can do better on its own.
I have three configurations:
Adaptec 2410SA with 4x250GB, stable since it came in, about 18 months
Adaptec 2810SA with 8x250GB, not really stable (1)
Adaptec 2810SA with 7x300GB, stable (2)
ad (2): With 8 disks, it was unstable. Over 2TB is not a good idea,
even not when split in 2x4 disks or otherwise. Didn't get it to work.
ad (1): crashed every 3 months losing disks. Then, it started to give
firmware kernel crashes. Got a replacement card, which didn't crash:
data was back, but one of the ports (#7) was dead. Replaced it by again
a new card: now all disks were seen... but somewhere lost all my data :(((
Not the end of disaster: after two weeks, the data was lost again, under
load conditions. It reports different disks to fail, but especially on
the high port numbers. Have used various firmware versions and many
different kernels.
Now I have changed my strategy: buying only motherboards with 4xSATA on
it (ICH6). 4 disks striped give me 200MB/s (performance gain of 5x)
>From
1 system with 7x300GB RAID5 2810SA, 2.8GHz P4, 2GB, Asus P4P800-E Deluxe
eff. ~40MB/s 1675GB netto
sept 2004: 3450 euro ex VAT
To
2 systems each 4x400GB RAID0, 3.0GHz P4, 1GB, Asus P5GDC-V Deluxe
each eff. ~200MB/s 1100GB netto
june 2005: 2850 euro ex VAT together!
Comparing apples and oranges: the second configuration is stable, faster,
and redundant. The first is larger and pseudo-redundant.
--
MarkOv
------------------------------------------------------------------------
drs Mark A.C.J. Overmeer MARKOV Solutions
Mark@Overmeer.net solutions@overmeer.net
http://Mark.Overmeer.net http://solutions.overmeer.net
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-07-09 10:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-08 17:36 [PATCH] 2.6 aacraid: Fix for controller load based timeouts Mark Haverkamp
2005-07-08 17:41 ` Ryan Anderson
2005-07-08 17:59 ` Mark Haverkamp
2005-07-08 18:17 ` Martin Drab
2005-07-09 10:03 ` Mark Overmeer
-- strict thread matches above, loose matches on Subject: below --
2005-07-08 18:00 Salyzyn, Mark
2005-07-08 18:22 ` Martin Drab
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.