* Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030?
@ 2006-07-24 13:42 Scott Lowrey
2006-07-28 16:15 ` Warren Volz
0 siblings, 1 reply; 7+ messages in thread
From: Scott Lowrey @ 2006-07-24 13:42 UTC (permalink / raw)
To: linux-scsi
Hello,
My apologies for the long-winded overview. I want to include all
relevant information. If you'd like to skip to the questions at the
end, feel free.
We have recently encountered a severe I/O wait-bound condition on Intel
servers using the SE7520JR2 mainboard. This board contains a built-in
LSI Logic 53c1030 SCSI adaptor capable of RAID 0 and RAID 1. Our
servers are configured with two 72GB Fujitsu drives in a RAID 1 array.
Our Linux system is a bit of a hybrid. We started out a year or so ago
with a system based on SuSE 9.1 Pro and the 2.6.8 kernel (SuSE 9.1 is
packaged with 2.6.5 but we had an immediate need to update to fix a
malloc() bug). We have since made several updates, one of which
included an update to kernel 2.6.11 (specifically, 2.6.11.4-21.11) to
fix a critical problem with related to packet capture. We always use
kernel source packages from SuSE. So, what we now have is a relatively
recent SuSE kernel running with a relatively old set of SuSE packages
(some have been updated but most are from 9.1).
The problem occurs when a fairly high number of disk writes occur - we
can reproduce the problem by copying large files around or by using 'cat
/dev/urandom > /tmp/file'. The symptoms are shown by 'iostat' as a very
high average wait time (from 7000 - 12000 ms) and 100% CPU utilization.
This condition persists for several minutes after the disk writing has
stopped. The machine slows down and, on some occasions, becomes
unusable for long periods.
The problem goes away immediately if we disable RAID - either by
hot-pulling one of the drives or by deleting the RAID in the MPT BIOS.
I've searched many web sites and mailing lists, including this one, and
found several reports of similar problems. From what I gather, there is
something going wrong with the RAID resync process. I can't follow the
discussions too far passed that point because I'm not a SCSI expert. At
any rate, the number of solutions seems to equal the number of system
configurations, so I'll describe our kernel situation and a possible
solution that we've found before asking The Questions.
Anyway, we think we have found a potential solution that involves
updating the MPT modules in our kernel. We stumbled across the
mptlinux-3.02.60-3 DKMS patch at ftp.lsil.com (the LSI web site offers
3.02.52-1 for SuSE 9.1 users - we figured we'd go with the newer version
since we have a newer kernel). After applying this "module swap", the
problem appears to be fixed. But we'd prefer not to distribute a DKMS
patch to our customers, so we are currently attempting to rebuild our
kernel with a patch generated from the DKMS source.
As one more point of interest: we can reproduce this problem with a
stock SuSE 9.1 distro, but it goes away with SuSE 9.3 and SuSE 10.0. If,
however, we transplant a 9.3 or 10.0 kernel into our distro, the problem
returns! Argh.
So, my questions are these:
Is there a "correct" version of the MPT Fusion modules we should be
using with our kernel?
Is there something in our system configuration that might be aggravating
the problem?
Thanks very much.
--
Scott Lowrey
slowrey@NexTone.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030?
2006-07-24 13:42 Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030? Scott Lowrey
@ 2006-07-28 16:15 ` Warren Volz
2006-08-01 14:38 ` Scott Lowrey
0 siblings, 1 reply; 7+ messages in thread
From: Warren Volz @ 2006-07-28 16:15 UTC (permalink / raw)
To: Scott Lowrey; +Cc: linux-scsi
On 7/24/2006 7:42 AM, Scott Lowrey wrote:
> Is there a "correct" version of the MPT Fusion modules we should be
> using with our kernel?
Scott,
My advice would be to use the latest driver you can get your hands on.
Our website usually has the latest that's been released although it can
be hard to navigate at times... It looks like the latest released driver
is 3.02.68 (works for both SAS and SCSI parts). You can get this by
going to the downloads section, and selecting the "SAS3442X" part.
-Warren
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030?
2006-07-28 16:15 ` Warren Volz
@ 2006-08-01 14:38 ` Scott Lowrey
2006-08-01 16:49 ` Warren Volz
0 siblings, 1 reply; 7+ messages in thread
From: Scott Lowrey @ 2006-08-01 14:38 UTC (permalink / raw)
To: Warren Volz; +Cc: linux-scsi
> On 7/24/2006 7:42 AM, Scott Lowrey wrote:
> > Is there a "correct" version of the MPT Fusion modules we should be
> > using with our kernel?
>
> Scott,
>
> My advice would be to use the latest driver you can get your hands on.
> Our website usually has the latest that's been released although it
can
> be hard to navigate at times... It looks like the latest released
driver
> is 3.02.68 (works for both SAS and SCSI parts). You can get this by
> going to the downloads section, and selecting the "SAS3442X" part.
>
> -Warren
Thanks, Warren. We've updated to 3.02.60 and so far things appear to be
working.
You're right, the Downloads page is confusing. There are two SAS
selections (neither of which is 3442X) and they lead to pages with no
matching results. If I simply select "LSI53C1030" from the Specific
Product menu, I get a page that offers 3.02.52 for SuSE 9.1. There are
no SAS products in the Specific Product menu.
Am I looking in the wrong place?
-Scott
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030?
2006-08-01 14:38 ` Scott Lowrey
@ 2006-08-01 16:49 ` Warren Volz
2006-08-01 18:03 ` Scott Lowrey
0 siblings, 1 reply; 7+ messages in thread
From: Warren Volz @ 2006-08-01 16:49 UTC (permalink / raw)
To: Scott Lowrey; +Cc: linux-scsi
On 8/1/2006 8:38 AM, Scott Lowrey wrote:
> Thanks, Warren. We've updated to 3.02.60 and so far things appear to be
> working.
Good to hear.
> You're right, the Downloads page is confusing. There are two SAS
> selections (neither of which is 3442X) and they lead to pages with no
> matching results. If I simply select "LSI53C1030" from the Specific
> Product menu, I get a page that offers 3.02.52 for SuSE 9.1. There are
> no SAS products in the Specific Product menu.
>
> Am I looking in the wrong place?
Hmm strange. To get to the SAS drivers I click "Downloads" at the top,
use the "Select a specific product" dropdown and find "LSISAS3442X", and
finally hit "Go". That should bring up the correct versions. Sorry it's
such a mess!
-Warren
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030?
2006-08-01 16:49 ` Warren Volz
@ 2006-08-01 18:03 ` Scott Lowrey
2006-08-01 18:35 ` Warren Volz
0 siblings, 1 reply; 7+ messages in thread
From: Scott Lowrey @ 2006-08-01 18:03 UTC (permalink / raw)
To: Warren Volz; +Cc: linux-scsi
> > You're right, the Downloads page is confusing. There are two SAS
> > selections (neither of which is 3442X) and they lead to pages with
no
> > matching results. If I simply select "LSI53C1030" from the Specific
> > Product menu, I get a page that offers 3.02.52 for SuSE 9.1. There
are
> > no SAS products in the Specific Product menu.
> >
> > Am I looking in the wrong place?
>
> Hmm strange. To get to the SAS drivers I click "Downloads" at the top,
> use the "Select a specific product" dropdown and find "LSISAS3442X",
and
> finally hit "Go". That should bring up the correct versions. Sorry
it's
> such a mess!
>
> -Warren
DOH! I was scrolling to products that began with "SAS", not "LSISAS".
Is there any reason you are referring to 3442X and not, say, 3800X? Or
are they all the same? (The driver version numbers appear to be the
same for each product.)
-Scott
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030?
2006-08-01 18:03 ` Scott Lowrey
@ 2006-08-01 18:35 ` Warren Volz
2006-08-03 17:49 ` Performance Problem with LSI Logic 53c1030 RAID Scott Lowrey
0 siblings, 1 reply; 7+ messages in thread
From: Warren Volz @ 2006-08-01 18:35 UTC (permalink / raw)
To: Scott Lowrey; +Cc: linux-scsi
On 8/1/2006 12:03 PM, Scott Lowrey wrote:
> DOH! I was scrolling to products that began with "SAS", not "LSISAS".
It's ok I do the same thing sometimes.
> Is there any reason you are referring to 3442X and not, say, 3800X? Or
> are they all the same? (The driver version numbers appear to be the
> same for each product.)
The driver is the same for all of our SAS parts (for now).
-Warren
^ permalink raw reply [flat|nested] 7+ messages in thread
* Performance Problem with LSI Logic 53c1030 RAID
2006-08-01 18:35 ` Warren Volz
@ 2006-08-03 17:49 ` Scott Lowrey
0 siblings, 0 replies; 7+ messages in thread
From: Scott Lowrey @ 2006-08-03 17:49 UTC (permalink / raw)
To: linux-scsi
Problem:
Deteriorating disk write performance with large files.
The problem *disappears* completely if RAID is disabled or if the
array
is put into a degraded state by pulling the secondary array drive.
The
problem *remains* if the primary array drive is pulled.
Setup:
Intel TIGI2U Chassis (carrier grade)
Intel SE7520JR2 server board
Dual 3.2 GHz Xeons, 800MHz FSB, 4GB RAM
No PCI cards, 2 builtin E1000 NICs
Integrated LSI 53c1030 Dual Channel SCSI HBA with integrated RAID
Two 72GB Fujitsu drives in RAID 1 Configuration
RAID array state optimal
SUSE 2.6.11.4-21.11-bigsmp kernel with mods:
LSI MPT 3.02.68 SCSI modules
Fix to include shared memory in core dumps
test command: cat /dev/urandom > junk
iostat command: iostat -d 2 -x -t
The condensed report below shows what happens: the test command was
executed
shortly after the report begins. The average wait time (await) jumps
around
erratically as does the write speed (wkB/s).
Then the weirdness occurs. The test command was stopped with Ctrl-C
just
before 12:34:00. A few seconds later, the CPU pegs and average wait
time goes
through the roof. The activity LED on the secondary array drive is on
solid;
the primary drive appears to be quiet. The entire system becomes
sluggish and
seems to hang. After about 40 seconds, all returns to normal.
Does anyone know what's wrong? Is there any other data I can post that
would be useful?
Time CPU wrqm/s w/s wkB/s avgrq-sz await svctm
12:33:20 18.85 72.7 1.6 297.2 8.7 2498.9 54.0
12:33:22 5.30 8.0 9.5 70.0 0.3 29.9 5.6
12:33:24 0.00 0.0 0.0 0.0 0.0 0.0 0.0
12:33:26 0.00 0.0 0.0 0.0 0.0 0.0 0.0
12:33:28 59.45 1766.0 0.5 7138.0 21.5 240.0 1189.0
12:33:30 100.05 0.0 0.0 0.0 36.0 0.0 0.0
12:33:32 100.05 0.0 0.0 0.0 36.0 0.0 0.0
12:33:34 72.30 10.0 24.0 64.0 18.7 4669.8 30.1
12:33:36 0.00 0.0 0.0 0.0 0.0 0.0 0.0
12:33:38 0.00 0.0 0.0 0.0 0.0 0.0 0.0
12:33:40 80.00 6151.2 0.0 24817.9 85.3 0.0 0.0
12:33:42 100.55 0.0 8.0 0.0 99.4 2590.4 125.1
12:33:44 99.55 0.0 0.0 0.0 90.6 0.0 0.0
12:33:46 100.05 0.0 0.0 0.0 91.0 0.0 0.0
12:33:48 100.50 0.0 7.0 0.0 88.7 9213.2 142.9
12:33:50 100.05 0.0 8.0 0.0 69.1 10613.7 125.1
12:33:52 100.05 0.0 1.0 0.0 59.1 11652.0 1000.5
12:33:54 100.05 0.0 0.0 0.0 59.0 0.0 0.0
12:33:56 100.05 0.0 9.5 0.0 51.9 16857.8 105.3
12:33:58 99.55 0.0 6.5 0.0 32.3 18442.2 153.9
12:34:00 100.05 0.0 0.0 0.0 27.0 0.0 0.0
12:34:02 66.78 15.1 17.6 76.4 16.5 17605.2 38.0
12:34:04 0.00 0.0 0.0 0.0 0.0 0.0 0.0
12:34:06 0.00 0.0 0.0 0.0 0.0 0.0 0.0
12:34:08 84.63 7825.4 2.0 31568.2 110.1 358.2 425.2
12:34:10 100.05 0.0 0.0 0.0 130.1 0.0 0.0
12:34:12 100.05 0.0 0.0 0.0 130.1 0.0 0.0
12:34:14 100.05 0.0 10.0 0.0 121.3 6831.0 100.0
12:34:16 100.05 0.0 6.0 0.0 102.1 8369.8 166.8
12:34:18 100.05 0.0 0.0 0.0 98.0 0.0 0.0
12:34:20 100.05 0.0 0.0 0.0 98.0 0.0 0.0
12:34:22 99.55 0.0 14.9 0.0 85.6 14898.2 66.7
12:34:24 100.00 0.0 2.0 0.0 65.2 16302.0 500.0
12:34:26 100.05 0.0 2.0 0.0 61.0 18192.8 500.2
12:34:28 100.05 0.0 4.0 0.0 55.7 20619.6 250.1
12:34:30 100.05 0.0 4.0 0.0 47.2 22506.1 250.1
12:34:32 100.05 0.0 4.5 0.0 39.0 24579.6 222.3
12:34:34 100.05 0.0 2.5 0.0 31.9 26468.8 400.2
12:34:36 77.79 21.1 27.1 132.7 19.2 16093.9 28.7
12:34:38 0.00 0.0 0.0 0.0 0.0 0.0 0.0
12:34:40 0.00 0.0 0.0 0.0 0.0 0.0 0.0
--
Scott Lowrey
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-08-03 17:49 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-24 13:42 Fusion MPT Modules for Linux 2.6.11 and LSI Logic 53c1030? Scott Lowrey
2006-07-28 16:15 ` Warren Volz
2006-08-01 14:38 ` Scott Lowrey
2006-08-01 16:49 ` Warren Volz
2006-08-01 18:03 ` Scott Lowrey
2006-08-01 18:35 ` Warren Volz
2006-08-03 17:49 ` Performance Problem with LSI Logic 53c1030 RAID Scott Lowrey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).