* What's the "right" qla2100 driver?
@ 2004-01-22 0:32 Poul Petersen
2004-01-22 5:22 ` Andrew Vasquez
0 siblings, 1 reply; 6+ messages in thread
From: Poul Petersen @ 2004-01-22 0:32 UTC (permalink / raw)
To: linux-scsi
I've got a Qlogic qla2100 card connected via copper to an array
of 28 ~18GB disks. I'm trying to create two raid level 5 md devices with
13disk+1spare each. What I'm experiencing is a lot of hangs and strange
disk failures that appear to be related to the drivers for the 2100. As
there
seems to be many different drivers to choose from for the qla2100, I've
tried them all and collected my (all bad) experiences, a summary of which I
have attached below.
What I am wondering is if anyone else is using this card with Linux
and if so what driver are they using? More importantly, which driver is
actively being worked on (if any) so that I might contribute some failure
information?
It's possible that I am experiencing a hardware problem, since these
disks
and controller are a few years old now, but I doubt it since most of the
errors
seem inconsistent with bad hardware (all disks failing, etc). If nothing
else,
if I get feedback saying that someone else is successfully using a certain
driver, then I can start playing with hardware using the same driver and
maybe
get to the bottom of this...
Many thanks for any help,
-poul
Oh, machine specs:
Dell Optiplex GX110
PIII-1GHz w/ 260MB RAM
RedHat 9
Summary of tests results:
---
Test #1
kernel 2.4.24 with kernel qlogicfc driver
---
# modprobe qlogicfc
# mkraid /dev/md0
At this point the system hangs with:
qlogicfc0: no handle slots, this should not happen
hostdata->queued is 58, in_ptr: 77
---
Test #2
kernel 2.4.24 with Qlogic 4.46.12b
---
I am unable to find a recent driver on the Qlogic site as support for
this card has been discontinued. I just happened to have this 4.x
series driver around.
# modprobe qla2x00
# mkraid /dev/md0
# mkraid /dev/md1
# cat /proc/mdstat | grep speed
[=>...................] resync = 5.1% (904396/17671424)
finish=7654.5min speed=35K/sec
[====>................] resync = 22.1% (3919392/17671424)
finish=34.8min speed=6577K/sec
The driver seems to work, but the ability of the driver to balance requests
between the two
raid sets seems very poor. Watching the activity lights reflects the stats
showed above: the
second raid set blinks once or twice, then hangs for 5~10 seconds, blinks,
hangs, etc.
I haven't tried running with this driver much longer than the mkraid...
---
Test #3
kernel 2.4.24 with kernel qlogicfc driver + backported patches
---
In doing a bunch of google searching, I found the following threads
that seem to be related
to the "no handle slots" problem:
http://www.ussg.iu.edu/hypermail/linux/kernel/0101.2/0267.html
http://groups.google.com/groups?selm=linux.scsi.1019759258.2413.1.camel%40lv
adp.fc.hp.com
Since one of these patches was for a 2.5 kernel, I "massaged" the
patch into the 2.4.24
kernel.
# modprobe qlogicfc
# mkraid /dev/md0
# mkraid /dev/md1
The resync operation got to about 1% done, then stopped. Cat
/proc/mdstat shows no activity.
dd if=/dev/sda of=/dev/null shows disk is still responsive. MD is hung.
Strange.
---
Test #4
2.6.1 with kernel qlogicfc driver
---
# modprobe qlogicfc
# mkraid /dev/md0
qlogicfc0: no handle slots, this should not happen
hostdata->queued is 49, in_ptr: f8
Same as 2.4.24 qlogicfc driver (Test #2)
---
Test #5
2.6.1 with sourceforge qlogic driver 8.00.00.b8 (
http://sourceforge.net/projects/linux-qla2xxx/ )
---
# modprobe qla2xxx
# mkraid /dev/md0
# mkraid /dev/md1
# mkfs.ext3 /dev/md0
After awhile, the mkfs hung and when I tried to reboot the machine,
I got a bunch of errors like the following for all 28 disks:
qla2xxx_eh_abort scsi(1:0:12:0) cmd_timeout_in_sec=0x1e.
qla2xxx_eh_abort Exiting: status=Failed
---
Test #6
RHAS 2.1 Update 3 kernel (2.4.9-e3)
---
# modprobe qlogicfc
# mkraid /dev/md0
At this point the system hangs with a bunch of:
qlogicfc0: no handle slots, this should not happen
hostdata->queued is 58, in_ptr: 77
etc.
So, the RedHat kernel qlogicfc driver appears to be no more
functional than the
2.4.24 driver.
---
Test #7
RHAS 2.1 Update 3 kernel (2.4.9-e3) + patches from Test #3
---
# modprobe qlogicfc
# mkraid /dev/md0
# mkraid /dev/md1
# mkfs.ext3 /dev/md0
This has been the most successful test setup, but the machine has
revealed strange behavior under load tests. At one point issuing a "raidstop
/dev/md0" led to a kernel panic. Another time, tar-copying 100GB of data
began failing disks. Pretty weird stuff.
---
2.6.1 with qlogicfc + patches from Test #3
---
# modprobe qlogicfc
# mkraid /dev/md0
# mkraid /dev/md1
# mkfs.ext3
At this point, the machine hung with no error messages.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: What's the "right" qla2100 driver?
2004-01-22 0:32 What's the "right" qla2100 driver? Poul Petersen
@ 2004-01-22 5:22 ` Andrew Vasquez
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Vasquez @ 2004-01-22 5:22 UTC (permalink / raw)
To: linux-scsi
On Wed, 21 Jan 2004, Poul Petersen wrote:
> I've got a Qlogic qla2100 card connected via copper to an array of
> 28 ~18GB disks. I'm trying to create two raid level 5 md devices
> with 13disk+1spare each. What I'm experiencing is a lot of hangs and
> strange disk failures that appear to be related to the drivers for
> the 2100. As there seems to be many different drivers to choose from
> for the qla2100, I've tried them all and collected my (all bad)
> experiences, a summary of which I have attached below.
>
> What I am wondering is if anyone else is using this card with Linux
> and if so what driver are they using? More importantly, which driver
> is actively being worked on (if any) so that I might contribute some
> failure information?
>
For 2.4, you could try another driver -- 6.06.10 (available at the
QLogic website). This driver has support for the 2100 (though it's
not actually documented). The makefile will not build the driver
though, try something similiar to the following:
# make qla2100.o
For 2.6, why don't we start out with the default qla2xxx driver in
2.6.2-rc1.
> It's possible that I am experiencing a hardware problem, since these
> disks and controller are a few years old now, but I doubt it since
> most of the errors seem inconsistent with bad hardware (all disks
> failing, etc). If nothing else, if I get feedback saying that
> someone else is successfully using a certain driver, then I can
> start playing with hardware using the same driver and maybe get to
> the bottom of this...
>
>From there let's see how far you get with the drivers. We may need to
enable some extra debugging for additional information.
> ---
> Test #5
> 2.6.1 with sourceforge qlogic driver 8.00.00.b8 (
> http://sourceforge.net/projects/linux-qla2xxx/ )
> ---
>
> # modprobe qla2xxx
> # mkraid /dev/md0
> # mkraid /dev/md1
> # mkfs.ext3 /dev/md0
>
> After awhile, the mkfs hung and when I tried to reboot the machine,
> I got a bunch of errors like the following for all 28 disks:
>
> qla2xxx_eh_abort scsi(1:0:12:0) cmd_timeout_in_sec=0x1e.
> qla2xxx_eh_abort Exiting: status=Failed
>
Interesting...
Regards,
Andrew Vasquez
^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <F888C30C3021D411B9DA00B0D0209BE8038F9BD1@cvo-exchange.cvo. roguewave.com>]
* Re: What's the "right" qla2100 driver?
[not found] <F888C30C3021D411B9DA00B0D0209BE8038F9BD1@cvo-exchange.cvo. roguewave.com>
@ 2004-01-22 0:40 ` Lincoln Dale
0 siblings, 0 replies; 6+ messages in thread
From: Lincoln Dale @ 2004-01-22 0:40 UTC (permalink / raw)
To: Poul Petersen; +Cc: linux-scsi
At 11:32 AM 22/01/2004, Poul Petersen wrote:
> I've got a Qlogic qla2100 card connected via copper to an array
>of 28 ~18GB disks. I'm trying to create two raid level 5 md devices with
>13disk+1spare each. What I'm experiencing is a lot of hangs and strange
>disk failures that appear to be related to the drivers for the 2100.
[..]
># cat /proc/mdstat | grep speed
>
> [=>...................] resync = 5.1% (904396/17671424)
>finish=7654.5min speed=35K/sec
> [====>................] resync = 22.1% (3919392/17671424)
>finish=34.8min speed=6577K/sec
>
>The driver seems to work, but the ability of the driver to balance requests
>between the two
>raid sets seems very poor. Watching the activity lights reflects the stats
>showed above: the
>second raid set blinks once or twice, then hangs for 5~10 seconds, blinks,
>hangs, etc.
>I haven't tried running with this driver much longer than the mkraid...
noone said that FC Arbitrated Loop provide FAIR arbitration between devices
. . .
^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <47F3C2BE74738E4683574107469DFA201DF5BF@XYUSEX01.xyus.xyrat ex.com>]
* RE: What's the "right" qla2100 driver?
[not found] <47F3C2BE74738E4683574107469DFA201DF5BF@XYUSEX01.xyus.xyrat ex.com>
@ 2004-01-22 5:06 ` Lincoln Dale
0 siblings, 0 replies; 6+ messages in thread
From: Lincoln Dale @ 2004-01-22 5:06 UTC (permalink / raw)
To: Frank Borich; +Cc: Poul Petersen, linux-scsi
At 12:39 PM 22/01/2004, Frank Borich wrote:
>I heard the fc driver is bad.
>Has anyone experienced very poor performance when using MD- SCSI
>mid-layer, and FC HBA?
>I create a 3 drive raid 5 array using MD, during initialization I get 50 +
>MB/sec
>When I write 00's using dd I get 3 MB/sec. I can tell by using a fc
>analyzer that my write commands
>are being chopped up into pieces. Just wondering if anyone else has seen
>this or knows of any issues using MD, and SCSI mid-layer, so I can shift
>focus to FC side? I lost all of my SCSI arrays! argh......
<shrug>
works ok here. i wonder if you're realizing exactly how RAID5 works ...?:
Linux 2.6.2-rc1, using the QLogic driver that is integrated into the kernel
with a QLA2300 attached to a Cisco MDS FC switch in turn connected to a FC
JBOD with 8 x 15K RPM disks:
[root@mel-stglab-host31 linux]# head -2 /proc/scsi/qla2xxx/0
QLogic PCI to Fibre Channel Host Adapter for QLA2310:
Firmware version 3.02.18 TPX, Driver version 8.00.00b8
constructing a 3-drive raid5 array (2 active 1 spare) using MD, i get
~31MB/sec:
to 'create' a 3-disk RAID5 array requires reading the data off 2 disks to
write the parity to the 3rd disk -- the overall speed of constructing a
RAID5 array is limited by the performance of a single disk spindle.
(output from mdstat)
[root@mel-stglab-host31 root]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath]
md0 : active raid5 sdg1[2] sdd1[1] sdb1[0]
17498880 blocks level 5, 256k chunk, algorithm 2 [2/2] [UU]
[==>..................] resync = 10.5% (1852292/17498880)
finish=8.5min speed=30369K/sec
unused devices: <none>
(output from my FC switch connected to this HBA showing the true
i/o that is going on - both reads & writes):
mel-stglab-mds9509-1# show interface fc2/9 | include rate
5 minutes input rate 137039200 bits/sec, 17129900 bytes/sec,
8709 frames/sec
5 minutes output rate 347203424 bits/sec, 43400428 bytes/sec,
21507 frames/sec
once the RAID5 array is built, i get ~44 MB/sec on a 'dd' -- once again,
about the speed limit of a single disk spindle:
[root@mel-stglab-host31 root]# time dd if=/dev/md0 of=/dev/null
bs=256K count=4000
4000+0 records in
4000+0 records out
real 0m22.639s
user 0m0.020s
sys 0m4.235s
if i do a 'dd' on one of the disk spindles that makes up the raid5 array, i
get around the same number (actually, its a bit better: i get 57MB/sec -
that is probably because ALL I/O is guaranteed to be sequential now):
[root@mel-stglab-host31 root]# time dd if=/dev/sdg1 of=/dev/null
bs=256K count=4000
4000+0 records in
4000+0 records out
real 0m17.538s
user 0m0.013s
sys 0m2.698s
.... now, if i convert the RAID5 array to be a RAID0 array instead, its
performance is much much better -- 105MB/sec:
[root@mel-stglab-host31 root]# !mkraid
mkraid --really-force /dev/md0
DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sdb1, 17499120kB, raid superblock at 17499008kB
disk 1: /dev/sdd1, 17499120kB, raid superblock at 17499008kB
disk 2: /dev/sdf1, 17775891kB, raid superblock at 17775808kB
disk 3: /dev/sdg1, 17775891kB, raid superblock at 17775808kB
[root@mel-stglab-host31 root]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath]
md0 : active raid0 sdg1[3] sdf1[2] sdd1[1] sdb1[0]
70548992 blocks 256k chunks
unused devices: <none>
[root@mel-stglab-host31 root]# time dd if=/dev/md0 of=/dev/null
bs=256K count=4000
4000+0 records in
4000+0 records out
real 0m9.557s
user 0m0.013s
sys 0m2.357s
cheers,
lincoln.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: What's the "right" qla2100 driver?
@ 2004-01-22 22:54 Poul Petersen
0 siblings, 0 replies; 6+ messages in thread
From: Poul Petersen @ 2004-01-22 22:54 UTC (permalink / raw)
To: linux-scsi
> For 2.4, you could try another driver -- 6.06.10 (available at the
Ah, so there is. I hadn't noticed that! After running the raidstart
resync with this qla2100 driver, disks
started eventualy failing with errors like:
SCSI disk error : host 4 channel 0 id 6 lun 0 return code = 20000
I/O error: dev 08:71, sector 10072520
SCSI disk error : host 4 channel 0 id 5 lun 0 return code = 20000
I/O error: dev 08:61, sector 10072520
SCSI disk error : host 4 channel 0 id 4 lun 0 return code = 20000
I/O error: dev 08:51, sector 10072520
SCSI disk error : host 4 channel 0 id 3 lun 0 return code = 20000
I/O error: dev 08:41, sector 10072520
SCSI disk error : host 4 channel 0 id 2 lun 0 return code = 20000
I/O error: dev 08:31, sector 10072520
SCSI disk error : host 4 channel 0 id 1 lun 0 return code = 20000
I/O error: dev 08:21, sector 10072520
SCSI disk error : host 4 channel 0 id 0 lun 0 return code = 20000
I/O error: dev 08:11, sector 10072520
Doesn't it seem odd that these seven disks all failed complaining
about the same sector? All 28 disks eventually failed, some with the same
sector errors. The machine hung when I tried to reboot it. Strange.
> For 2.6, why don't we start out with the default qla2xxx driver in
> 2.6.2-rc1.
I'm getting the same results with this combination as I did with
2.6.1 before. I should point out that I am not waiting for the resink to
start before trying to lay down a file system:
# mkraid /dev/md0
# mkraid /dev/md1
At this point I waited a few minutes, and both resync's cruised
along. Everything was looking good, so I did a:
# mkfs.ext3 /dev/md0
mke2fs 1.32 (09-Nov-2002)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
26509312 inodes, 53014272 blocks
2650713 blocks (5.00%) reserved for the super user
First data block=0
1618 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632,
2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Writing inode tables: 120/1618
At this point, the mkfs hung and the resync stopped. The status of
md is:
# cat /proc/mdstat
Personalities : [raid5]
md1 : active raid5 sdab1[13] sdaa1[12] sdz1[11] sdy1[10] sdx1[9] sdw1[8]
sdv1[7] sdu1[6] sdt1[5] sds1[4] sdr1[3] sdq1[2] sdp1[1] sdo1[0]
212057088 blocks level 5, 64k chunk, algorithm 2 [13/13]
[UUUUUUUUUUUUU]
[>....................] resync = 3.6% (645636/17671424)
finish=1271.7min speed=222K/sec
md0 : active raid5 sdn1[13] sdm1[12] sdl1[11] sdk1[10] sdj1[9] sdi1[8]
sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
212057088 blocks level 5, 64k chunk, algorithm 2 [13/13]
[UUUUUUUUUUUUU]
[=>...................] resync = 6.6% (1173248/17671424)
finish=1381.3min speed=198K/sec
unused devices: <none>
But the "finish" time is the only thing that changes, as it
continually increases. I also see the following in dmesg:
md: using maximum available idle IO bandwith (but not more than 200000
KB/sec) for reconstruction.
md: using 128k window, over a total of 17671424 blocks.
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(2:0:7:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(2:0:12:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(2:0:12:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(2:0:11:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(2:0:10:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
Doing a simple collection of these error messages:
# dmesg | grep scsi\( | cut -d: -f6 | sort -un | xargs
0 1 2 3 4 5 6 7 8 9 10 11 12
So, all 13 disks in the first array (md0) are generating errors. I
decided to try this again (after a reboot), without issuing the mkfs. I let
the sync run and watched dmesg for strange errors. Here is a summary:
md: using 128k window, over a total of 17671424 blocks.
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(1:0:1:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: Performing ISP error recovery - ha= cd7b01c0.
qla2100 0000:01:09.0: LIP reset occured (f8ef).
qla2100 0000:01:09.0: LIP occured (f8ef).
qla2100 0000:01:09.0: LOOP UP detected (1 Gbps).
qla2100 0000:01:09.0: qla2xxx_eh_abort: cmd already done sp=00000000
qla2100 0000:01:09.0: qla2xxx_eh_abort: cmd already done sp=00000000
qla2100 0000:01:09.0: qla2xxx_eh_abort: cmd already done sp=00000000
qla2100 0000:01:09.0: qla2xxx_eh_abort: cmd already done sp=00000000
... Repeated ~150 times ...
qla2100 0000:01:09.0: ISP System Error - mbx1=1935h mbx2=0h mbx3=8004h.
qla2100 0000:01:09.0: Failed to dump firmware (256)!!!
qla2100 0000:01:09.0: Performing ISP error recovery - ha= cd7b01c0.
qla2100 0000:01:09.0: LIP reset occured (f8f7).
qla2100 0000:01:09.0: LIP occured (f8f7).
qla2100 0000:01:09.0: LOOP UP detected (1 Gbps).
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(1:0:23:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(1:0:22:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
qla2100 0000:01:09.0: qla2xxx_eh_abort scsi(1:0:21:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:01:09.0: qla2xxx_eh_abort Exiting: status=Failed
... Repeated lots ...
This time it failed all of the disks in md1, plus one from md0:
# dmesg | grep '^qla2100.*scsi(' | cut -d: -f6 | sort -un | xargs
1 4 14 15 16 17 18 19 20 21 22 23 24 25 26
Thoughts?
Thanks for your help,
-poul
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: What's the "right" qla2100 driver?
@ 2004-05-27 0:25 Poul Petersen
0 siblings, 0 replies; 6+ messages in thread
From: Poul Petersen @ 2004-05-27 0:25 UTC (permalink / raw)
To: 'Andrew Vasquez', linux-scsi
OK - The box is now running kernel 2.6.6 and the qla2xxx driver
version 8.00.00b12. I tried making a single raid set and about 17% of the
way into the sync I got the following:
qla2100 0000:00:0e.0: LIP reset occured (f7ef).
qla2100 0000:00:0e.0: LIP occured (f7ef).
qla2100 0000:00:0e.0: Mailbox command timeout occured. Issuing ISP abort.
qla2100 0000:00:0e.0: Performing ISP error recovery - ha= c121c218.
qla2100 0000:00:0e.0: LIP reset occured (f8f7).
qla2100 0000:00:0e.0: LIP occured (f8f7).
qla2100 0000:00:0e.0: LOOP UP detected (1 Gbps).
qla2100 0000:00:0e.0: qla2xxx_eh_abort: cmd already done sp=00000000
qla2100 0000:00:0e.0: qla2xxx_eh_abort: cmd already done sp=00000000
qla2100 0000:00:0e.0: qla2xxx_eh_abort: cmd already done sp=00000000
qla2100 0000:00:0e.0: qla2xxx_eh_abort: cmd already done sp=00000000
qla2100 0000:00:0e.0: qla2xxx_eh_abort scsi(0:0:2:0):
cmd_timeout_in_sec=0x1e.
qla2100 0000:00:0e.0: qla2xxx_eh_abort Exiting: status=Failed
And so on, each disc failing with an error similar to the last two
lines. I changed DEBUG_QLA2100 to "1" in qla_settings.h and then I got a
bunch of these when I loaded the driver:
scsi(1): RLC failed to issue iocb! fcport=[000c/cf5e84ec] rval=0 cs=0 ss=b02
Followed by a bunch of:
cmd_timeout: Found in ISP
And after resyncing the raid set for sometime, the failure messages:
qla2xxx_eh_abort: refcount 1
qla2100 0000:00:0e.0: qla2xxx_eh_abort scsi(1:0:3:0):
cmd_timeout_in_sec=0x1e.
SCSI Command @=0xc4eabaa8, Handle=0x00000335
chan=0x00, target=0x03, lun=0x00, cmd_len=0x0a
CDB: 0x28 0x00 0x00 0xe6 0xc6 0x3f 0x00 0x01 0x00 0x00
seg_cnt=32, allowed=20, retries=0, serial_number_at_timeout=0x103982
request buffer=0xca53365c, request buffer len=0x20000
tag=90, transfersize=0x200
serial_number=103982, SP=cb214b38
data direction=2
sp flags=0x2
r_start=0x190678, u_start=0x190678, f_start=0x5a5a5a5a, state=7
e_start= 0x5a5a5a5a, ext_history=1515870810, fo retry=0, loopid=4, port
path=0
qla2100 0000:00:0e.0: qla2xxx_eh_abort Exiting: status=Failed
qla2xxx_eh_abort: refcount 1
qla2100 0000:00:0e.0: qla2xxx_eh_abort scsi(1:0:6:0):
cmd_timeout_in_sec=0x1e.
SCSI Command @=0xca4f033c, Handle=0x00000339
chan=0x00, target=0x06, lun=0x00, cmd_len=0x0a
CDB: 0x28 0x00 0x00 0xe6 0xc7 0xbf 0x00 0x00 0x48 0x00
seg_cnt=9, allowed=20, retries=0, serial_number_at_timeout=0x103986
request buffer=0xc82088b4, request buffer len=0x9000
tag=90, transfersize=0x200
serial_number=103986, SP=cb214e98
data direction=2
sp flags=0x2
r_start=0x19067a, u_start=0x19067a, f_start=0x5a5a5a5a, state=7
e_start= 0x5a5a5a5a, ext_history=1515870810, fo retry=0, loopid=7, port
path=0
qla2100 0000:00:0e.0: qla2xxx_eh_abort Exiting: status=Failed
qla2xxx_eh_abort: refcount 1
qla2100 0000:00:0e.0: qla2xxx_eh_abort scsi(1:0:0:0):
cmd_timeout_in_sec=0x1e.
SCSI Command @=0xcbc51c24, Handle=0x0000033b
chan=0x00, target=0x00, lun=0x00, cmd_len=0x0a
CDB: 0x28 0x00 0x00 0xe6 0xc7 0xbf 0x00 0x00 0x48 0x00
seg_cnt=9, allowed=20, retries=0, serial_number_at_timeout=0x103988
request buffer=0xcf5dd9c0, request buffer len=0x9000
tag=90, transfersize=0x200
serial_number=103988, SP=cb214238
data direction=2
sp flags=0x2
r_start=0x19067a, u_start=0x19067a, f_start=0x5a5a5a5a, state=7
e_start= 0x5a5a5a5a, ext_history=1515870810, fo retry=0, loopid=1, port
path=0
Thoughts? Many thanks,
-poul
> -----Original Message-----
> From: Andrew Vasquez [mailto:praka@users.sourceforge.net]
> Sent: Wednesday, January 21, 2004 9:22 PM
> To: linux-scsi@vger.kernel.org
> Subject: Re: What's the "right" qla2100 driver?
>
>
> On Wed, 21 Jan 2004, Poul Petersen wrote:
>
> > I've got a Qlogic qla2100 card connected via copper to an array of
> > 28 ~18GB disks. I'm trying to create two raid level 5 md devices
> > with 13disk+1spare each. What I'm experiencing is a lot of hangs and
> > strange disk failures that appear to be related to the drivers for
> > the 2100. As there seems to be many different drivers to choose from
> > for the qla2100, I've tried them all and collected my (all bad)
> > experiences, a summary of which I have attached below.
> >
> > What I am wondering is if anyone else is using this card with Linux
> > and if so what driver are they using? More importantly, which driver
> > is actively being worked on (if any) so that I might contribute some
> > failure information?
> >
>
> For 2.4, you could try another driver -- 6.06.10 (available at the
> QLogic website). This driver has support for the 2100 (though it's
> not actually documented). The makefile will not build the driver
> though, try something similiar to the following:
>
> # make qla2100.o
>
> For 2.6, why don't we start out with the default qla2xxx driver in
> 2.6.2-rc1.
>
> > It's possible that I am experiencing a hardware problem, since these
> > disks and controller are a few years old now, but I doubt it since
> > most of the errors seem inconsistent with bad hardware (all disks
> > failing, etc). If nothing else, if I get feedback saying that
> > someone else is successfully using a certain driver, then I can
> > start playing with hardware using the same driver and maybe get to
> > the bottom of this...
> >
>
> From there let's see how far you get with the drivers. We may need to
> enable some extra debugging for additional information.
>
> > ---
> > Test #5
> > 2.6.1 with sourceforge qlogic driver 8.00.00.b8 (
> > http://sourceforge.net/projects/linux-qla2xxx/ )
> > ---
> >
> > # modprobe qla2xxx
> > # mkraid /dev/md0
> > # mkraid /dev/md1
> > # mkfs.ext3 /dev/md0
> >
> > After awhile, the mkfs hung and when I tried to reboot
> the machine,
> > I got a bunch of errors like the following for all 28 disks:
> >
> > qla2xxx_eh_abort scsi(1:0:12:0) cmd_timeout_in_sec=0x1e.
> > qla2xxx_eh_abort Exiting: status=Failed
> >
>
> Interesting...
>
> Regards,
> Andrew Vasquez
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-05-27 0:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-22 0:32 What's the "right" qla2100 driver? Poul Petersen
2004-01-22 5:22 ` Andrew Vasquez
[not found] <F888C30C3021D411B9DA00B0D0209BE8038F9BD1@cvo-exchange.cvo. roguewave.com>
2004-01-22 0:40 ` Lincoln Dale
[not found] <47F3C2BE74738E4683574107469DFA201DF5BF@XYUSEX01.xyus.xyrat ex.com>
2004-01-22 5:06 ` Lincoln Dale
-- strict thread matches above, loose matches on Subject: below --
2004-01-22 22:54 Poul Petersen
2004-05-27 0:25 Poul Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox