All of lore.kernel.org
 help / color / mirror / Atom feed
* megaraid problems
@ 2001-05-15 22:49 Gabriel Rocha
  2001-05-15 22:55 ` Alan Cox
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Rocha @ 2001-05-15 22:49 UTC (permalink / raw)
  To: linux-kernel

Hi,
	I have a megaraid 466 controller, which both ami and the linux
	kernel say is supported under 2.4.4, I tried the ami patches to
	the drivers in the vanilla kernel to no avail, the card works
	under windows...the card is even detected under linux now, but
	it refuses to show any drives and i cant get the management
	software to work. so my question is this:
		has anyone with this card gotten it to work under linux
		2.4.4-acX? if so, how?
	I have read about many people upgrading firmware and using
	'latest drivers' both of which i have done, so i am at a loss
	here. any help is appreciated. --gabe


-- 

"It's not brave if you're not scared."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: megaraid problems
  2001-05-15 22:49 megaraid problems Gabriel Rocha
@ 2001-05-15 22:55 ` Alan Cox
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Cox @ 2001-05-15 22:55 UTC (permalink / raw)
  To: Gabriel Rocha; +Cc: linux-kernel

> 	it refuses to show any drives and i cant get the management
> 	software to work. so my question is this:
> 		has anyone with this card gotten it to work under linux
> 		2.4.4-acX? if so, how?

Mine works. In fact the 2.4.4ac source tree currently lives on it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Megaraid problems
@ 2005-12-01  4:34 Vladimir Dergachev
  0 siblings, 0 replies; 15+ messages in thread
From: Vladimir Dergachev @ 2005-12-01  4:34 UTC (permalink / raw)
  To: linux-kernel


Hi all :)

    I am wondering whether someone can shed some light on issues I am 
having with LSI Megaraid cards ? The problem described below is with
SATA card, but I am also having difficulties with LSI HBA Scsi adaptor, 
ableit I have not yet eliminated all the obvious things to try.

    Please CC your replies as I am not on the list..

    My setup: dual Opteron 252 with 8 GB RAM, Tyan Thunder K8W. SUSE 9.3 -
I tried both the native SUSE kernel and 2.6.14.3 from kernel.org, 4 400GB 
Western Digital drivers in RAID5 configuration connected to LSI MegaRAID 
SATA 150-6

    The are also 3 250 GB drives connected to on-board SIL3114 where Linux 
is installed.

    Problem: card recognizes the disks fine, initialization goes ok.
I can create partition with no problem. However when copying files I get
filesystem corruption (holds both with ext3 and XFS). For ext3 I see the 
following message:

EXT3-fs error (device sdd1): ext3_new_block: Allocating block in system zone - 
block = 100663296
Aborting journal on device sdd1.
EXT3-fs error (device sdd1) in ext3_prepare_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device sdd1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

    The hardware is otherwise stable - this box also has 3 disks in software 
RAID5 connected to SIL3114 and a similar box has 3ware card with 8 drives, 
which function fine.

    I would appreciate any suggestions on what to try/tweak/patch.

                 thank you very much !

                       Vladimir Dergachev

PS One more thing - none of LSI binary configuration tools work - they report 
they cannot find the adaptor even though /dev/megadev0 is pointing to correct 
adaptor (I tried both 253 and 254 for major numbers and 0,1,2,3,4 for 
minor numbers). I was not successful in finding an open-source management 
tool..


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Megaraid problems
@ 2005-12-01 13:41 Ju, Seokmann
  0 siblings, 0 replies; 15+ messages in thread
From: Ju, Seokmann @ 2005-12-01 13:41 UTC (permalink / raw)
  To: 'Vladimir Dergachev', linux-kernel

Hi,

On Wednesday, November 30, 2005 11:35 PM, Vladimir Dergachev wrote:
>     Problem: card recognizes the disks fine, initialization goes ok.
> I can create partition with no problem. However when copying 
> files I get
> filesystem corruption (holds both with ext3 and XFS). For 
> ext3 I see the 
> following message:
Can you please try with following changes to seeif it works?
- configure the controller with 1 400GB RAID0 instead of 4 400GB RAID5
- 4GB (or less) memory

Also, please provide following information.
- F/W on the MegaRAID controller version

Thank you,

Seokmann

> -----Original Message-----
> From: Vladimir Dergachev [mailto:volodya@mindspring.com] 
> Sent: Wednesday, November 30, 2005 11:35 PM
> To: linux-kernel@vger.kernel.org
> Subject: Megaraid problems
> 
> 
> Hi all :)
> 
>     I am wondering whether someone can shed some light on issues I am 
> having with LSI Megaraid cards ? The problem described below is with
> SATA card, but I am also having difficulties with LSI HBA 
> Scsi adaptor, 
> ableit I have not yet eliminated all the obvious things to try.
> 
>     Please CC your replies as I am not on the list..
> 
>     My setup: dual Opteron 252 with 8 GB RAM, Tyan Thunder 
> K8W. SUSE 9.3 -
> I tried both the native SUSE kernel and 2.6.14.3 from 
> kernel.org, 4 400GB 
> Western Digital drivers in RAID5 configuration connected to 
> LSI MegaRAID 
> SATA 150-6
> 
>     The are also 3 250 GB drives connected to on-board 
> SIL3114 where Linux 
> is installed.
> 
>     Problem: card recognizes the disks fine, initialization goes ok.
> I can create partition with no problem. However when copying 
> files I get
> filesystem corruption (holds both with ext3 and XFS). For 
> ext3 I see the 
> following message:
> 
> EXT3-fs error (device sdd1): ext3_new_block: Allocating block 
> in system zone - 
> block = 100663296
> Aborting journal on device sdd1.
> EXT3-fs error (device sdd1) in ext3_prepare_write: Journal has aborted
> ext3_abort called.
> EXT3-fs error (device sdd1): ext3_journal_start_sb: Detected 
> aborted journal
> Remounting filesystem read-only
> 
>     The hardware is otherwise stable - this box also has 3 
> disks in software 
> RAID5 connected to SIL3114 and a similar box has 3ware card 
> with 8 drives, 
> which function fine.
> 
>     I would appreciate any suggestions on what to try/tweak/patch.
> 
>                  thank you very much !
> 
>                        Vladimir Dergachev
> 
> PS One more thing - none of LSI binary configuration tools 
> work - they report 
> they cannot find the adaptor even though /dev/megadev0 is 
> pointing to correct 
> adaptor (I tried both 253 and 254 for major numbers and 0,1,2,3,4 for 
> minor numbers). I was not successful in finding an 
> open-source management 
> tool..
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Megaraid problems.
@ 2006-01-13 14:04 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-13 14:04 UTC (permalink / raw)
  To: linux-scsi

Hi list,

I have a Dell PowerEdge 850 with their PERC4sc RAID card driving a Dell PowerVault 220s external drive enclosure running Ubuntu 5.10.  This machine and all the parts that make it up are less than 2 months old.  In that time, I have had both logical drives supplied by PV220s taken offline by the megaraid driver twice.  The only cure for this has been a reboot of the machine.  Luckily, with the exception of the process that was running at the time of the problem, nothing else was damaged or hurt; no loss of data has been experienced (yet).

Both times the failure has occurred, it happened while creating a gzipped tarball of some backup data.  The final tarball created is averaging about 92+ GB in size and the machine is under heavy disk I/O for more than 7 hours.  I have been able to grab this information from the syslog after the failure (gathered with LogWatch):

 1 Time(s): [5535381.561000] megaraid abort: 55592075:43[255:128], fw owner
 1 Time(s): [5535381.561000] megaraid abort: 55592077:62[255:128], fw owner
 1 Time(s): [5535381.561000] megaraid abort: 55592078[255:128], driver owner
 1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 commands to complete:180
 1 Time(s): [5535381.561000] megaraid: 2 outstanding commands. Max wait 180 sec
 1 Time(s): [5535381.561000] megaraid: aborting-55592075 cmd=28 <c=1 t=0 l=0>
 1 Time(s): [5535381.561000] megaraid: aborting-55592077 cmd=28 <c=1 t=0 l=0>
 1 Time(s): [5535381.561000] megaraid: aborting-55592078 cmd=28 <c=1 t=0 l=0>
 1 Time(s): [5535381.561000] megaraid: reseting the host...
 1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 commands to complete:175

[... The above line repeat every 5 seconds, counting down to 0 ...]

 1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 commands to complete:5
 2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to offline device

The only difference in the two instances is the number of "commands" that are waiting to complete.  This snippet above is from the first instance, the second instance had 10 commands waiting.

The machine is running the default Ubuntu kernel, which is their patched version of 2.6.12.  In addition, both the megaraid_mbox and megaraid_mm modules are loaded.  Here is an output of 'modinfo' for both of those modules:

========================================================================================
megaraid_mbox
----------------------------------------------------------------------------------------
filename:       /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mbox.ko
author:         LSI Logic Corporation
description:    LSI Logic MegaRAID Mailbox Driver
license:        GPL
version:        2.20.4.5
vermagic:       2.6.12-10-386 386 gcc-3.4
depends:        megaraid_mm,scsi_mod
alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
srcversion:     042A4371A952248BEF860F4
parm:           debug_level:Debug level for driver (default=0) (int)
parm:           fast_load:Faster loading of the driver, skips physical devices! (default=0) (int)
parm:           cmd_per_lun:Maximum number of commands per logical unit (default=64) (int)
parm:           max_sectors:Maximum number of sectors per IO command (default=128) (int)
parm:           busy_wait:Max wait for mailbox in microseconds if busy (default=10) (int)
parm:           unconf_disks:Set to expose unconfigured disks to kernel (default=0) (int)

----------------------------------------------------------------------------------------
megaraid_mm:
----------------------------------------------------------------------------------------
filename:       /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
author:         LSI Logic Corporation
description:    LSI Logic Management Module
license:        GPL
version:        2.20.2.5
vermagic:       2.6.12-10-386 386 gcc-3.4
depends:
srcversion:     D2DA33EA7F3FEA9EBE4A603
parm:           dlevel:Debug level (default=0) (int)
========================================================================================

I have contacted Dell - via their linux-poweredge mailing list - and have discovered that I am not the only one experiencing these problems.  What bothers me is that while this problem, apparently, has been around a while and no fix has yet been discovered by Dell or anyone else.

My research also leads me to believe that this is not just an Ubuntu thing either.  I have reports that this happens under Redhat, Debian and SuSE.  It also appears as though the problem started happening around kernel version 2.6.9.

So, I'm hoping that someone here:

a). Knows about the problem and is working on it.

- and, more importantly - 

b). Can lead me to a fix.

My machine is in production and I do not have any additional hardware to test with, but I can do limited testing with it as long as it is completely functional by 8:00 pm eastern time.  I'm using it as offsite backup machine and that's when my backup processes kick in.  If more information is needed, let me know how to get it, and I'll supply it.

I need to get this solved ASAP.

Thanks in advance,

--
Kevin L. Collins, MCSE
Systems Manager
Nesbitt Engineering, Inc. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Megaraid problems.
@ 2006-01-13 14:39 Ju, Seokmann
  0 siblings, 0 replies; 15+ messages in thread
From: Ju, Seokmann @ 2006-01-13 14:39 UTC (permalink / raw)
  To: Collins, Kevin, linux-scsi

Hi,

Thank you for posting details regarding megaraid.
>From the log, the messaage are OK except for following.
---
>  1 Time(s): [5535381.561000] megaraid: reseting the host...
>  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to complete:175
> 
> [... The above line repeat every 5 seconds, counting down to 0 ...]
> 
>  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to complete:5
>  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to 
> offline device
---
> a). Knows about the problem and is working on it.
> 
> - and, more importantly - 
It seems like that, for some reason, controller couldn't return commands (2 commands for this case) within given timeout period.
And because of it, driver decided to reset the controller and as part of reset, it triggers the F/W to make the device offline.


> b). Can lead me to a fix.
Can you clarify what is F/W version on the controller?
Besides disk I/O, are there other operations involved like, tape R/W?
How about application? Any application that is communicating with MegaRAID through IOCTL at that time?

Thank you,


> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org 
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Collins, Kevin
> Sent: Friday, January 13, 2006 9:05 AM
> To: linux-scsi@vger.kernel.org
> Subject: Megaraid problems.
> 
> Hi list,
> 
> I have a Dell PowerEdge 850 with their PERC4sc RAID card 
> driving a Dell PowerVault 220s external drive enclosure 
> running Ubuntu 5.10.  This machine and all the parts that 
> make it up are less than 2 months old.  In that time, I have 
> had both logical drives supplied by PV220s taken offline by 
> the megaraid driver twice.  The only cure for this has been a 
> reboot of the machine.  Luckily, with the exception of the 
> process that was running at the time of the problem, nothing 
> else was damaged or hurt; no loss of data has been experienced (yet).
> 
> Both times the failure has occurred, it happened while 
> creating a gzipped tarball of some backup data.  The final 
> tarball created is averaging about 92+ GB in size and the 
> machine is under heavy disk I/O for more than 7 hours.  I 
> have been able to grab this information from the syslog after 
> the failure (gathered with LogWatch):
> 
>  1 Time(s): [5535381.561000] megaraid abort: 
> 55592075:43[255:128], fw owner
>  1 Time(s): [5535381.561000] megaraid abort: 
> 55592077:62[255:128], fw owner
>  1 Time(s): [5535381.561000] megaraid abort: 
> 55592078[255:128], driver owner
>  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 
> commands to complete:180
>  1 Time(s): [5535381.561000] megaraid: 2 outstanding 
> commands. Max wait 180 sec
>  1 Time(s): [5535381.561000] megaraid: aborting-55592075 
> cmd=28 <c=1 t=0 l=0>
>  1 Time(s): [5535381.561000] megaraid: aborting-55592077 
> cmd=28 <c=1 t=0 l=0>
>  1 Time(s): [5535381.561000] megaraid: aborting-55592078 
> cmd=28 <c=1 t=0 l=0>
>  1 Time(s): [5535381.561000] megaraid: reseting the host...
>  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to complete:175
> 
> [... The above line repeat every 5 seconds, counting down to 0 ...]
> 
>  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to complete:5
>  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to 
> offline device
> 
> The only difference in the two instances is the number of 
> "commands" that are waiting to complete.  This snippet above 
> is from the first instance, the second instance had 10 
> commands waiting.
> 
> The machine is running the default Ubuntu kernel, which is 
> their patched version of 2.6.12.  In addition, both the 
> megaraid_mbox and megaraid_mm modules are loaded.  Here is an 
> output of 'modinfo' for both of those modules:
> 
> ==============================================================
> ==========================
> megaraid_mbox
> --------------------------------------------------------------
> --------------------------
> filename:       
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> id_mbox.ko
> author:         LSI Logic Corporation
> description:    LSI Logic MegaRAID Mailbox Driver
> license:        GPL
> version:        2.20.4.5
> vermagic:       2.6.12-10-386 386 gcc-3.4
> depends:        megaraid_mm,scsi_mod
> alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> srcversion:     042A4371A952248BEF860F4
> parm:           debug_level:Debug level for driver (default=0) (int)
> parm:           fast_load:Faster loading of the driver, skips 
> physical devices! (default=0) (int)
> parm:           cmd_per_lun:Maximum number of commands per 
> logical unit (default=64) (int)
> parm:           max_sectors:Maximum number of sectors per IO 
> command (default=128) (int)
> parm:           busy_wait:Max wait for mailbox in 
> microseconds if busy (default=10) (int)
> parm:           unconf_disks:Set to expose unconfigured disks 
> to kernel (default=0) (int)
> 
> --------------------------------------------------------------
> --------------------------
> megaraid_mm:
> --------------------------------------------------------------
> --------------------------
> filename:       
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> author:         LSI Logic Corporation
> description:    LSI Logic Management Module
> license:        GPL
> version:        2.20.2.5
> vermagic:       2.6.12-10-386 386 gcc-3.4
> depends:
> srcversion:     D2DA33EA7F3FEA9EBE4A603
> parm:           dlevel:Debug level (default=0) (int)
> ==============================================================
> ==========================
> 
> I have contacted Dell - via their linux-poweredge mailing 
> list - and have discovered that I am not the only one 
> experiencing these problems.  What bothers me is that while 
> this problem, apparently, has been around a while and no fix 
> has yet been discovered by Dell or anyone else.
> 
> My research also leads me to believe that this is not just an 
> Ubuntu thing either.  I have reports that this happens under 
> Redhat, Debian and SuSE.  It also appears as though the 
> problem started happening around kernel version 2.6.9.
> 
> So, I'm hoping that someone here:
> 
> a). Knows about the problem and is working on it.
> 
> - and, more importantly - 
> 
> b). Can lead me to a fix.
> 
> My machine is in production and I do not have any additional 
> hardware to test with, but I can do limited testing with it 
> as long as it is completely functional by 8:00 pm eastern 
> time.  I'm using it as offsite backup machine and that's when 
> my backup processes kick in.  If more information is needed, 
> let me know how to get it, and I'll supply it.
> 
> I need to get this solved ASAP.
> 
> Thanks in advance,
> 
> --
> Kevin L. Collins, MCSE
> Systems Manager
> Nesbitt Engineering, Inc. 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Megaraid problems.
@ 2006-01-13 15:59 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-13 15:59 UTC (permalink / raw)
  To: linux-scsi, Seokmann.Ju

On Friday, January 13, 2006 9:39 AM, Seokmann Ju wrote:

> Hi,
Hey, glad to get a response!  :-)

> Thank you for posting details regarding megaraid.
> From the log, the messaage are OK except for following.
> ---
> >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 commands to 
> > complete:175
> > 
> > [... The above line repeat every 5 seconds, counting down to 0 ...]
> > 
> >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 commands to 
> > complete:5
> >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to offline 
> > device
> ---
> > a). Knows about the problem and is working on it.
> > 
> > - and, more importantly -
> It seems like that, for some reason, controller couldn't 
> return commands (2 commands for this case) within given 
> timeout period.
> And because of it, driver decided to reset the controller and 
> as part of reset, it triggers the F/W to make the device offline.

And I'm assuming that this is why my data isn't damamged or otherwise corrupted - which is a good thing! ;-)

> > b). Can lead me to a fix.
> Can you clarify what is F/W version on the controller?
Firmware on the controller (from /proc/scci/scsi):  351S
=================================================================
Host: scsi2 Channel: 00 Id: 06 Lun: 00
  Vendor: DELL     Model: PV22XS           Rev: E.18
  Type:   Processor                        ANSI SCSI revision: 03
Host: scsi2 Channel: 01 Id: 00 Lun: 00
  Vendor: MegaRAID Model: LD 0 RAID5  858G Rev: 351S
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi2 Channel: 01 Id: 01 Lun: 00
  Vendor: MegaRAID Model: LD 1 RAID5  858G Rev: 351S
  Type:   Direct-Access                    ANSI SCSI revision: 02
=================================================================

I have seen reports on Dells mailing list that elude to the fact the the E18 and 351S firmwares are supposed to help this situation, but not in my case.  My system shipped with these firmwares in place.  Dell, to my knowledge, does not offer any newer versions of either firmware.

> Besides disk I/O, are there other operations involved like, tape R/W?
No tape R/W, but...

> How about application? Any application that is communicating 
> with MegaRAID through IOCTL at that time?
As for other tasks, the machine also serves as a web server (Apache, MySQL and PHP) and e-mail relay (Postfix).  The mail relay does more work than the web server, but even that is light.

Besides the external storage (powered by megaraid, the PERC4 and the PV220s) the machine has two internal SATA drives.  These internal drives house the OS, the web server and the mail queue.  The only I/O running through megaraid at the time of the failures has been the creation of the tar files.

> 
> Thank you,

You're welcome.  I hope I have helped with the information and not hindered.  ;-)

Kevin

> 
> 
> > -----Original Message-----
> > From: linux-scsi-owner@vger.kernel.org 
> > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of 
> Collins, Kevin
> > Sent: Friday, January 13, 2006 9:05 AM
> > To: linux-scsi@vger.kernel.org
> > Subject: Megaraid problems.
> > 
> > Hi list,
> > 
> > I have a Dell PowerEdge 850 with their PERC4sc RAID card driving a 
> > Dell PowerVault 220s external drive enclosure running Ubuntu 5.10.  
> > This machine and all the parts that make it up are less 
> than 2 months 
> > old.  In that time, I have had both logical drives supplied 
> by PV220s 
> > taken offline by the megaraid driver twice.  The only cure for this 
> > has been a reboot of the machine.  Luckily, with the 
> exception of the 
> > process that was running at the time of the problem, 
> nothing else was 
> > damaged or hurt; no loss of data has been experienced (yet).
> > 
> > Both times the failure has occurred, it happened while creating a 
> > gzipped tarball of some backup data.  The final tarball created is 
> > averaging about 92+ GB in size and the machine is under 
> heavy disk I/O 
> > for more than 7 hours.  I have been able to grab this 
> information from 
> > the syslog after the failure (gathered with LogWatch):
> > 
> >  1 Time(s): [5535381.561000] megaraid abort: 
> > 55592075:43[255:128], fw owner
> >  1 Time(s): [5535381.561000] megaraid abort: 
> > 55592077:62[255:128], fw owner
> >  1 Time(s): [5535381.561000] megaraid abort: 
> > 55592078[255:128], driver owner
> >  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 commands to 
> > complete:180
> >  1 Time(s): [5535381.561000] megaraid: 2 outstanding commands. Max 
> > wait 180 sec
> >  1 Time(s): [5535381.561000] megaraid: aborting-55592075
> > cmd=28 <c=1 t=0 l=0>
> >  1 Time(s): [5535381.561000] megaraid: aborting-55592077
> > cmd=28 <c=1 t=0 l=0>
> >  1 Time(s): [5535381.561000] megaraid: aborting-55592078
> > cmd=28 <c=1 t=0 l=0>
> >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 commands to 
> > complete:175
> > 
> > [... The above line repeat every 5 seconds, counting down to 0 ...]
> > 
> >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 commands to 
> > complete:5
> >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to offline 
> > device
> > 
> > The only difference in the two instances is the number of 
> "commands" 
> > that are waiting to complete.  This snippet above is from the first 
> > instance, the second instance had 10 commands waiting.
> > 
> > The machine is running the default Ubuntu kernel, which is their 
> > patched version of 2.6.12.  In addition, both the megaraid_mbox and 
> > megaraid_mm modules are loaded.  Here is an output of 'modinfo' for 
> > both of those modules:
> > 
> > ==============================================================
> > ==========================
> > megaraid_mbox
> > --------------------------------------------------------------
> > --------------------------
> > filename:       
> > /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> > id_mbox.ko
> > author:         LSI Logic Corporation
> > description:    LSI Logic MegaRAID Mailbox Driver
> > license:        GPL
> > version:        2.20.4.5
> > vermagic:       2.6.12-10-386 386 gcc-3.4
> > depends:        megaraid_mm,scsi_mod
> > alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> > alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> > alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> > alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> > alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> > alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> > alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> > alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> > alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> > alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> > alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> > alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> > alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> > srcversion:     042A4371A952248BEF860F4
> > parm:           debug_level:Debug level for driver (default=0) (int)
> > parm:           fast_load:Faster loading of the driver, skips 
> > physical devices! (default=0) (int)
> > parm:           cmd_per_lun:Maximum number of commands per 
> > logical unit (default=64) (int)
> > parm:           max_sectors:Maximum number of sectors per IO 
> > command (default=128) (int)
> > parm:           busy_wait:Max wait for mailbox in 
> > microseconds if busy (default=10) (int)
> > parm:           unconf_disks:Set to expose unconfigured disks 
> > to kernel (default=0) (int)
> > 
> > --------------------------------------------------------------
> > --------------------------
> > megaraid_mm:
> > --------------------------------------------------------------
> > --------------------------
> > filename:       
> > 
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> > author:         LSI Logic Corporation
> > description:    LSI Logic Management Module
> > license:        GPL
> > version:        2.20.2.5
> > vermagic:       2.6.12-10-386 386 gcc-3.4
> > depends:
> > srcversion:     D2DA33EA7F3FEA9EBE4A603
> > parm:           dlevel:Debug level (default=0) (int)
> > ==============================================================
> > ==========================
> > 
> > I have contacted Dell - via their linux-poweredge mailing 
> list - and 
> > have discovered that I am not the only one experiencing these 
> > problems.  What bothers me is that while this problem, 
> apparently, has 
> > been around a while and no fix has yet been discovered by Dell or 
> > anyone else.
> > 
> > My research also leads me to believe that this is not just 
> an Ubuntu 
> > thing either.  I have reports that this happens under 
> Redhat, Debian 
> > and SuSE.  It also appears as though the problem started happening 
> > around kernel version 2.6.9.
> > 
> > So, I'm hoping that someone here:
> > 
> > a). Knows about the problem and is working on it.
> > 
> > - and, more importantly -
> > 
> > b). Can lead me to a fix.
> > 
> > My machine is in production and I do not have any 
> additional hardware 
> > to test with, but I can do limited testing with it as long as it is 
> > completely functional by 8:00 pm eastern time.  I'm using it as 
> > offsite backup machine and that's when my backup processes 
> kick in.  
> > If more information is needed, let me know how to get it, and I'll 
> > supply it.
> > 
> > I need to get this solved ASAP.
> > 
> > Thanks in advance,
> > 
> > --
> > Kevin L. Collins, MCSE
> > Systems Manager
> > Nesbitt Engineering, Inc. 
> > -
> > To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" 
> > in the body of a message to majordomo@vger.kernel.org More 
> majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Megaraid problems.
@ 2006-01-13 16:55 Ju, Seokmann
  0 siblings, 0 replies; 15+ messages in thread
From: Ju, Seokmann @ 2006-01-13 16:55 UTC (permalink / raw)
  To: Collins, Kevin, linux-scsi

Hi,
> Besides the external storage (powered by megaraid, the PERC4 
> and the PV220s) the machine has two internal SATA drives.  
> These internal drives house the OS, the web server and the 
> mail queue.  The only I/O running through megaraid at the 
> time of the failures has been the creation of the tar files.
If it is happening during disk I/O, I would like to investigate further.
It would be greatly helpful if you could provide some detail steps to get the issue including how to create that big size file.
And also, I'll check with F/W team to see if any updated version of it and will get back to you if so.

Thank you, again.

> -----Original Message-----
> From: Collins, Kevin [mailto:kCollins@nesbittengineering.com] 
> Sent: Friday, January 13, 2006 11:00 AM
> To: linux-scsi@vger.kernel.org; Ju, Seokmann
> Subject: RE: Megaraid problems.
> 
> On Friday, January 13, 2006 9:39 AM, Seokmann Ju wrote:
> 
> > Hi,
> Hey, glad to get a response!  :-)
> 
> > Thank you for posting details regarding megaraid.
> > From the log, the messaage are OK except for following.
> > ---
> > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:175
> > > 
> > > [... The above line repeat every 5 seconds, counting down 
> to 0 ...]
> > > 
> > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:5
> > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O 
> to offline 
> > > device
> > ---
> > > a). Knows about the problem and is working on it.
> > > 
> > > - and, more importantly -
> > It seems like that, for some reason, controller couldn't 
> > return commands (2 commands for this case) within given 
> > timeout period.
> > And because of it, driver decided to reset the controller and 
> > as part of reset, it triggers the F/W to make the device offline.
> 
> And I'm assuming that this is why my data isn't damamged or 
> otherwise corrupted - which is a good thing! ;-)
> 
> > > b). Can lead me to a fix.
> > Can you clarify what is F/W version on the controller?
> Firmware on the controller (from /proc/scci/scsi):  351S
> =================================================================
> Host: scsi2 Channel: 00 Id: 06 Lun: 00
>   Vendor: DELL     Model: PV22XS           Rev: E.18
>   Type:   Processor                        ANSI SCSI revision: 03
> Host: scsi2 Channel: 01 Id: 00 Lun: 00
>   Vendor: MegaRAID Model: LD 0 RAID5  858G Rev: 351S
>   Type:   Direct-Access                    ANSI SCSI revision: 02
> Host: scsi2 Channel: 01 Id: 01 Lun: 00
>   Vendor: MegaRAID Model: LD 1 RAID5  858G Rev: 351S
>   Type:   Direct-Access                    ANSI SCSI revision: 02
> =================================================================
> 
> I have seen reports on Dells mailing list that elude to the 
> fact the the E18 and 351S firmwares are supposed to help this 
> situation, but not in my case.  My system shipped with these 
> firmwares in place.  Dell, to my knowledge, does not offer 
> any newer versions of either firmware.
> 
> > Besides disk I/O, are there other operations involved like, 
> tape R/W?
> No tape R/W, but...
> 
> > How about application? Any application that is communicating 
> > with MegaRAID through IOCTL at that time?
> As for other tasks, the machine also serves as a web server 
> (Apache, MySQL and PHP) and e-mail relay (Postfix).  The mail 
> relay does more work than the web server, but even that is light.
> 
> Besides the external storage (powered by megaraid, the PERC4 
> and the PV220s) the machine has two internal SATA drives.  
> These internal drives house the OS, the web server and the 
> mail queue.  The only I/O running through megaraid at the 
> time of the failures has been the creation of the tar files.
> 
> > 
> > Thank you,
> 
> You're welcome.  I hope I have helped with the information 
> and not hindered.  ;-)
> 
> Kevin
> 
> > 
> > 
> > > -----Original Message-----
> > > From: linux-scsi-owner@vger.kernel.org 
> > > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of 
> > Collins, Kevin
> > > Sent: Friday, January 13, 2006 9:05 AM
> > > To: linux-scsi@vger.kernel.org
> > > Subject: Megaraid problems.
> > > 
> > > Hi list,
> > > 
> > > I have a Dell PowerEdge 850 with their PERC4sc RAID card 
> driving a 
> > > Dell PowerVault 220s external drive enclosure running 
> Ubuntu 5.10.  
> > > This machine and all the parts that make it up are less 
> > than 2 months 
> > > old.  In that time, I have had both logical drives supplied 
> > by PV220s 
> > > taken offline by the megaraid driver twice.  The only 
> cure for this 
> > > has been a reboot of the machine.  Luckily, with the 
> > exception of the 
> > > process that was running at the time of the problem, 
> > nothing else was 
> > > damaged or hurt; no loss of data has been experienced (yet).
> > > 
> > > Both times the failure has occurred, it happened while creating a 
> > > gzipped tarball of some backup data.  The final tarball 
> created is 
> > > averaging about 92+ GB in size and the machine is under 
> > heavy disk I/O 
> > > for more than 7 hours.  I have been able to grab this 
> > information from 
> > > the syslog after the failure (gathered with LogWatch):
> > > 
> > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > 55592075:43[255:128], fw owner
> > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > 55592077:62[255:128], fw owner
> > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > 55592078[255:128], driver owner
> > >  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:180
> > >  1 Time(s): [5535381.561000] megaraid: 2 outstanding 
> commands. Max 
> > > wait 180 sec
> > >  1 Time(s): [5535381.561000] megaraid: aborting-55592075
> > > cmd=28 <c=1 t=0 l=0>
> > >  1 Time(s): [5535381.561000] megaraid: aborting-55592077
> > > cmd=28 <c=1 t=0 l=0>
> > >  1 Time(s): [5535381.561000] megaraid: aborting-55592078
> > > cmd=28 <c=1 t=0 l=0>
> > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:175
> > > 
> > > [... The above line repeat every 5 seconds, counting down 
> to 0 ...]
> > > 
> > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:5
> > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O 
> to offline 
> > > device
> > > 
> > > The only difference in the two instances is the number of 
> > "commands" 
> > > that are waiting to complete.  This snippet above is from 
> the first 
> > > instance, the second instance had 10 commands waiting.
> > > 
> > > The machine is running the default Ubuntu kernel, which is their 
> > > patched version of 2.6.12.  In addition, both the 
> megaraid_mbox and 
> > > megaraid_mm modules are loaded.  Here is an output of 
> 'modinfo' for 
> > > both of those modules:
> > > 
> > > ==============================================================
> > > ==========================
> > > megaraid_mbox
> > > --------------------------------------------------------------
> > > --------------------------
> > > filename:       
> > > /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> > > id_mbox.ko
> > > author:         LSI Logic Corporation
> > > description:    LSI Logic MegaRAID Mailbox Driver
> > > license:        GPL
> > > version:        2.20.4.5
> > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > depends:        megaraid_mm,scsi_mod
> > > alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> > > alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> > > alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> > > alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> > > alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> > > srcversion:     042A4371A952248BEF860F4
> > > parm:           debug_level:Debug level for driver 
> (default=0) (int)
> > > parm:           fast_load:Faster loading of the driver, skips 
> > > physical devices! (default=0) (int)
> > > parm:           cmd_per_lun:Maximum number of commands per 
> > > logical unit (default=64) (int)
> > > parm:           max_sectors:Maximum number of sectors per IO 
> > > command (default=128) (int)
> > > parm:           busy_wait:Max wait for mailbox in 
> > > microseconds if busy (default=10) (int)
> > > parm:           unconf_disks:Set to expose unconfigured disks 
> > > to kernel (default=0) (int)
> > > 
> > > --------------------------------------------------------------
> > > --------------------------
> > > megaraid_mm:
> > > --------------------------------------------------------------
> > > --------------------------
> > > filename:       
> > > 
> > 
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> > > author:         LSI Logic Corporation
> > > description:    LSI Logic Management Module
> > > license:        GPL
> > > version:        2.20.2.5
> > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > depends:
> > > srcversion:     D2DA33EA7F3FEA9EBE4A603
> > > parm:           dlevel:Debug level (default=0) (int)
> > > ==============================================================
> > > ==========================
> > > 
> > > I have contacted Dell - via their linux-poweredge mailing 
> > list - and 
> > > have discovered that I am not the only one experiencing these 
> > > problems.  What bothers me is that while this problem, 
> > apparently, has 
> > > been around a while and no fix has yet been discovered by Dell or 
> > > anyone else.
> > > 
> > > My research also leads me to believe that this is not just 
> > an Ubuntu 
> > > thing either.  I have reports that this happens under 
> > Redhat, Debian 
> > > and SuSE.  It also appears as though the problem started 
> happening 
> > > around kernel version 2.6.9.
> > > 
> > > So, I'm hoping that someone here:
> > > 
> > > a). Knows about the problem and is working on it.
> > > 
> > > - and, more importantly -
> > > 
> > > b). Can lead me to a fix.
> > > 
> > > My machine is in production and I do not have any 
> > additional hardware 
> > > to test with, but I can do limited testing with it as 
> long as it is 
> > > completely functional by 8:00 pm eastern time.  I'm using it as 
> > > offsite backup machine and that's when my backup processes 
> > kick in.  
> > > If more information is needed, let me know how to get it, 
> and I'll 
> > > supply it.
> > > 
> > > I need to get this solved ASAP.
> > > 
> > > Thanks in advance,
> > > 
> > > --
> > > Kevin L. Collins, MCSE
> > > Systems Manager
> > > Nesbitt Engineering, Inc. 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe 
> > linux-scsi" 
> > > in the body of a message to majordomo@vger.kernel.org More 
> > majordomo 
> > > info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Megaraid problems.
@ 2006-01-13 18:25 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-13 18:25 UTC (permalink / raw)
  To: linux-scsi, Seokmann.Ju

On Friday, January 13, 2006 11:56 AM, Seokmann Ju wrote:

> Hi,
> > Besides the external storage (powered by megaraid, the 
> PERC4 and the 
> > PV220s) the machine has two internal SATA drives.
> > These internal drives house the OS, the web server and the 
> mail queue.  
> > The only I/O running through megaraid at the time of the 
> failures has 
> > been the creation of the tar files.
> If it is happening during disk I/O, I would like to 
> investigate further.
> It would be greatly helpful if you could provide some detail 
> steps to get the issue including how to create that big size file.

This server is an RSYNC hub for my company's three offices.  Every night an on-site backup cache in each office pushes the day's data to this machine.  After that process is complete, this machine creates a tarball of the combined sum of the three office's data to keep for short term storage.  The machine also rotates these tarballs for a week, so I end up with 7 90+ GB tarballs.

The uncompressed data contains everything from Word documents to AutoCAD drawings to a backup of my e-mail data store and Jpeg pictures from a company party.  You name it, I probably have it. ;-)  The uncompressed data is floating around 140GB.

To create my tarball I simply run "tar -zcvf /daily/backup1/archive.tar.gz ." from inside   of a perl script of my own creation.  Nothing special about it.

> And also, I'll check with F/W team to see if any updated 
> version of it and will get back to you if so.

Thanks.

Kevin

> 
> > -----Original Message-----
> > From: Collins, Kevin [mailto:kCollins@nesbittengineering.com]
> > Sent: Friday, January 13, 2006 11:00 AM
> > To: linux-scsi@vger.kernel.org; Ju, Seokmann
> > Subject: RE: Megaraid problems.
> > 
> > On Friday, January 13, 2006 9:39 AM, Seokmann Ju wrote:
> > 
> > > Hi,
> > Hey, glad to get a response!  :-)
> > 
> > > Thank you for posting details regarding megaraid.
> > > From the log, the messaage are OK except for following.
> > > ---
> > > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:175
> > > > 
> > > > [... The above line repeat every 5 seconds, counting down
> > to 0 ...]
> > > > 
> > > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:5
> > > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O
> > to offline
> > > > device
> > > ---
> > > > a). Knows about the problem and is working on it.
> > > > 
> > > > - and, more importantly -
> > > It seems like that, for some reason, controller couldn't return 
> > > commands (2 commands for this case) within given timeout period.
> > > And because of it, driver decided to reset the controller and as 
> > > part of reset, it triggers the F/W to make the device offline.
> > 
> > And I'm assuming that this is why my data isn't damamged or 
> otherwise 
> > corrupted - which is a good thing! ;-)
> > 
> > > > b). Can lead me to a fix.
> > > Can you clarify what is F/W version on the controller?
> > Firmware on the controller (from /proc/scci/scsi):  351S 
> > =================================================================
> > Host: scsi2 Channel: 00 Id: 06 Lun: 00
> >   Vendor: DELL     Model: PV22XS           Rev: E.18
> >   Type:   Processor                        ANSI SCSI revision: 03
> > Host: scsi2 Channel: 01 Id: 00 Lun: 00
> >   Vendor: MegaRAID Model: LD 0 RAID5  858G Rev: 351S
> >   Type:   Direct-Access                    ANSI SCSI revision: 02
> > Host: scsi2 Channel: 01 Id: 01 Lun: 00
> >   Vendor: MegaRAID Model: LD 1 RAID5  858G Rev: 351S
> >   Type:   Direct-Access                    ANSI SCSI revision: 02
> > =================================================================
> > 
> > I have seen reports on Dells mailing list that elude to the 
> fact the 
> > the E18 and 351S firmwares are supposed to help this situation, but 
> > not in my case.  My system shipped with these firmwares in place.  
> > Dell, to my knowledge, does not offer any newer versions of either 
> > firmware.
> > 
> > > Besides disk I/O, are there other operations involved like,
> > tape R/W?
> > No tape R/W, but...
> > 
> > > How about application? Any application that is communicating with 
> > > MegaRAID through IOCTL at that time?
> > As for other tasks, the machine also serves as a web server 
> (Apache, 
> > MySQL and PHP) and e-mail relay (Postfix).  The mail relay 
> does more 
> > work than the web server, but even that is light.
> > 
> > Besides the external storage (powered by megaraid, the 
> PERC4 and the 
> > PV220s) the machine has two internal SATA drives.
> > These internal drives house the OS, the web server and the 
> mail queue.  
> > The only I/O running through megaraid at the time of the 
> failures has 
> > been the creation of the tar files.
> > 
> > > 
> > > Thank you,
> > 
> > You're welcome.  I hope I have helped with the information and not 
> > hindered.  ;-)
> > 
> > Kevin
> > 
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: linux-scsi-owner@vger.kernel.org 
> > > > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of
> > > Collins, Kevin
> > > > Sent: Friday, January 13, 2006 9:05 AM
> > > > To: linux-scsi@vger.kernel.org
> > > > Subject: Megaraid problems.
> > > > 
> > > > Hi list,
> > > > 
> > > > I have a Dell PowerEdge 850 with their PERC4sc RAID card
> > driving a
> > > > Dell PowerVault 220s external drive enclosure running
> > Ubuntu 5.10.  
> > > > This machine and all the parts that make it up are less
> > > than 2 months
> > > > old.  In that time, I have had both logical drives supplied
> > > by PV220s
> > > > taken offline by the megaraid driver twice.  The only
> > cure for this
> > > > has been a reboot of the machine.  Luckily, with the
> > > exception of the
> > > > process that was running at the time of the problem,
> > > nothing else was
> > > > damaged or hurt; no loss of data has been experienced (yet).
> > > > 
> > > > Both times the failure has occurred, it happened while 
> creating a 
> > > > gzipped tarball of some backup data.  The final tarball
> > created is
> > > > averaging about 92+ GB in size and the machine is under
> > > heavy disk I/O
> > > > for more than 7 hours.  I have been able to grab this
> > > information from
> > > > the syslog after the failure (gathered with LogWatch):
> > > > 
> > > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > > 55592075:43[255:128], fw owner
> > > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > > 55592077:62[255:128], fw owner
> > > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > > 55592078[255:128], driver owner
> > > >  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:180
> > > >  1 Time(s): [5535381.561000] megaraid: 2 outstanding
> > commands. Max
> > > > wait 180 sec
> > > >  1 Time(s): [5535381.561000] megaraid: aborting-55592075
> > > > cmd=28 <c=1 t=0 l=0>
> > > >  1 Time(s): [5535381.561000] megaraid: aborting-55592077
> > > > cmd=28 <c=1 t=0 l=0>
> > > >  1 Time(s): [5535381.561000] megaraid: aborting-55592078
> > > > cmd=28 <c=1 t=0 l=0>
> > > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:175
> > > > 
> > > > [... The above line repeat every 5 seconds, counting down
> > to 0 ...]
> > > > 
> > > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:5
> > > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O
> > to offline
> > > > device
> > > > 
> > > > The only difference in the two instances is the number of
> > > "commands" 
> > > > that are waiting to complete.  This snippet above is from
> > the first
> > > > instance, the second instance had 10 commands waiting.
> > > > 
> > > > The machine is running the default Ubuntu kernel, which 
> is their 
> > > > patched version of 2.6.12.  In addition, both the
> > megaraid_mbox and
> > > > megaraid_mm modules are loaded.  Here is an output of
> > 'modinfo' for
> > > > both of those modules:
> > > > 
> > > > ==============================================================
> > > > ==========================
> > > > megaraid_mbox
> > > > --------------------------------------------------------------
> > > > --------------------------
> > > > filename:       
> > > > /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> > > > id_mbox.ko
> > > > author:         LSI Logic Corporation
> > > > description:    LSI Logic MegaRAID Mailbox Driver
> > > > license:        GPL
> > > > version:        2.20.4.5
> > > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > > depends:        megaraid_mm,scsi_mod
> > > > alias:          
> pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> > > > alias:          
> pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> > > > alias:          
> pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> > > > alias:          
> pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> > > > alias:          
> pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> > > > srcversion:     042A4371A952248BEF860F4
> > > > parm:           debug_level:Debug level for driver 
> > (default=0) (int)
> > > > parm:           fast_load:Faster loading of the driver, skips 
> > > > physical devices! (default=0) (int)
> > > > parm:           cmd_per_lun:Maximum number of commands per 
> > > > logical unit (default=64) (int)
> > > > parm:           max_sectors:Maximum number of sectors per IO 
> > > > command (default=128) (int)
> > > > parm:           busy_wait:Max wait for mailbox in 
> > > > microseconds if busy (default=10) (int)
> > > > parm:           unconf_disks:Set to expose unconfigured disks 
> > > > to kernel (default=0) (int)
> > > > 
> > > > --------------------------------------------------------------
> > > > --------------------------
> > > > megaraid_mm:
> > > > --------------------------------------------------------------
> > > > --------------------------
> > > > filename:       
> > > > 
> > > 
> > 
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> > > > author:         LSI Logic Corporation
> > > > description:    LSI Logic Management Module
> > > > license:        GPL
> > > > version:        2.20.2.5
> > > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > > depends:
> > > > srcversion:     D2DA33EA7F3FEA9EBE4A603
> > > > parm:           dlevel:Debug level (default=0) (int)
> > > > ==============================================================
> > > > ==========================
> > > > 
> > > > I have contacted Dell - via their linux-poweredge mailing
> > > list - and
> > > > have discovered that I am not the only one experiencing these 
> > > > problems.  What bothers me is that while this problem,
> > > apparently, has
> > > > been around a while and no fix has yet been discovered 
> by Dell or 
> > > > anyone else.
> > > > 
> > > > My research also leads me to believe that this is not just
> > > an Ubuntu
> > > > thing either.  I have reports that this happens under
> > > Redhat, Debian
> > > > and SuSE.  It also appears as though the problem started
> > happening
> > > > around kernel version 2.6.9.
> > > > 
> > > > So, I'm hoping that someone here:
> > > > 
> > > > a). Knows about the problem and is working on it.
> > > > 
> > > > - and, more importantly -
> > > > 
> > > > b). Can lead me to a fix.
> > > > 
> > > > My machine is in production and I do not have any
> > > additional hardware
> > > > to test with, but I can do limited testing with it as
> > long as it is
> > > > completely functional by 8:00 pm eastern time.  I'm using it as 
> > > > offsite backup machine and that's when my backup processes
> > > kick in.  
> > > > If more information is needed, let me know how to get it,
> > and I'll
> > > > supply it.
> > > > 
> > > > I need to get this solved ASAP.
> > > > 
> > > > Thanks in advance,
> > > > 
> > > > --
> > > > Kevin L. Collins, MCSE
> > > > Systems Manager
> > > > Nesbitt Engineering, Inc. 
> > > > -
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-scsi" 
> > > > in the body of a message to majordomo@vger.kernel.org More
> > > majordomo
> > > > info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Megaraid problems.
@ 2006-01-20 13:19 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-20 13:19 UTC (permalink / raw)
  To: linux-scsi, Seokmann.Ju


> > If it is happening during disk I/O, I would like to investigate 
> > further.

Ju,

Any word on this problem yet?  Anything further that I can do to assist?

FYI:  I'm going to be ordering a new server next week.  Its hardware will be a close match to the troublesome system that I have in production.  Hopefully, once it arrives I'll be able to do some serious testing on it and will be able to "hack" on it more severely than we could the production machine.

In the meantime, anything you can think of, or want me to try, to help the production machine I'd love to hear about it.

--
Kevin L. Collins, MCSE
Systems Manager
Nesbitt Engineering, Inc. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* how to set stride / raid-howto still up to date?
@ 2006-09-24 13:03 Dexter Filmore
  2006-09-25  4:40 ` Mark Hahn
  0 siblings, 1 reply; 15+ messages in thread
From: Dexter Filmore @ 2006-09-24 13:03 UTC (permalink / raw)
  To: linux-raid

Want to change a partition from xfs to ext3 but can't tell what to put for 
stride.

man page says:

"stride=stripe-size
                          Configure  the  filesystem  for  a  RAID  array with
                          stripe-size filesystem blocks per stripe."

So, what is stripe size anyway? The same as chunk size? 

Is the example in the raid howto section 5.11 still valid? (It still tells 
about -R instead of -E. Seems a bit old.)

So going with -b 4096 for the ext3 with a 32k chunk size still comes down to a 
stride of 8, correct?

Dex

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d--(+)@ s-:+ a- C++++ UL++ P+>++ L+++>++++ E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
------END GEEK CODE BLOCK------

http://www.stop1984.com
http://www.againsttcpa.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: how to set stride / raid-howto still up to date?
  2006-09-24 13:03 how to set stride / raid-howto still up to date? Dexter Filmore
@ 2006-09-25  4:40 ` Mark Hahn
  2006-10-12  9:06   ` MegaRaid problems Gordon Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Hahn @ 2006-09-25  4:40 UTC (permalink / raw)
  To: Dexter Filmore; +Cc: linux-raid

> "stride=stripe-size
>                          Configure  the  filesystem  for  a  RAID  array with
>                          stripe-size filesystem blocks per stripe."

my understanding (hey, I even had a quick look at the source) is that
you want blocksize * stripe-size = raid-stripe-size, where the latter 
is chunksize * num-data-elements.

> So, what is stripe size anyway? The same as chunk size?

I don't know whether there's any agreed-on terminology.  mke2fs wants to know
how many fs-blocks it'll take to span a whole set of raid chunks.  that is, 
a write of X will hit all the disks in parallel.  writes that are full-stripe
(and aligned) don't require the read-mod-write cycle.

> So going with -b 4096 for the ext3 with a 32k chunk size still comes down to a
> stride of 8, correct?

no, I don't think so - I think it should be chunksize*ndatadisks/fsblocksize...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* MegaRaid problems..
  2006-09-25  4:40 ` Mark Hahn
@ 2006-10-12  9:06   ` Gordon Henderson
  2006-10-12 11:13     ` Joshua Baker-LePain
  2006-10-12 13:15     ` Mike Hardy
  0 siblings, 2 replies; 15+ messages in thread
From: Gordon Henderson @ 2006-10-12  9:06 UTC (permalink / raw)
  To: linux-raid


This might not be strictly on-topic here, but you may provide
enlightenment, as a lot of web searching hasn't helpmed me so-far )-:

A client has bought some Dell hardware - Dell 1950 1U server, 2 on-board
SATA drives connected to a Fusion MPT SAS controller. This works just
fine. The on-board drives are mirrored using s/w RAID, which is great and
just how I want it.

The server also has 2 x Dell PERC dual-port SAS Raid Cards which have LSI
MegaRaid chipssets on them. One cable from each raid card connect to half
of a Dell external disk array box - 15 500GB SATA drives with a SAS
backplane, one card has 8 drives, the other 7. I want to run the RAID
cards in JBOD mode, so I can use linux s/w RAID. A nice little package,
which takes up 4U of rack space in total. (although the disk box is f'ing
heavy!!!)

And this is where I'm a little frustrated! I've compiled up a custom
kernel, (which is what I always do for my servers - no modules, no initrd,
this is 2.6.18), and at boot time the dmesg output sees the drives in the
external enclosure, but does not associate them to sdX drives! The
underlying distro is Debian stable, but I doubt theres anything of issue
there.

ie.:

megasas: 00.00.03.01 Sun May 14 22:49:52 PDT 2006
megasas: 0x1028:0x0015:0x1028:0x1f01: bus 12:slot 14:func 0
ACPI: PCI Interrupt 0000:0c:0e.0[A] -> GSI 18 (level, low) -> IRQ 66
scsi0 : LSI Logic SAS based MegaRAID driver
  Vendor: DELL      Model: MD1000            Rev: A.00
  Type:   Enclosure                          ANSI SCSI revision: 05
  Vendor: ATA       Model: HDS725050KLA360   Rev: AB5A
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: HDS725050KLA360   Rev: AB5A
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: HDS725050KLA360   Rev: AB5A
  Type:   Direct-Access                      ANSI SCSI revision: 05

etc. and it's repeated for the 2nd card. I've never seen this behaviour
before - normally it sees the drives, then associated them to sdX devices
- which is exactly what it does for the internal 2 drives:

On the internal drives, it goes like:

Fusion MPT base driver 3.04.01
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SAS Host driver 3.04.01
ACPI: PCI Interrupt 0000:02:08.0[A] -> GSI 64 (level, low) -> IRQ 74
mptbase: Initiating ioc0 bringup
ioc0: SAS1068: Capabilities={Initiator}
scsi2 : ioc0: LSISAS1068, FwRev=00062800h, Ports=1, MaxQ=511, IRQ=74
  Vendor: ATA       Model: WDC WD1600JS-75N  Rev: 2E04
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 312500000 512-byte hdwr sectors (160000 MB)
sda: Write Protect is off
sda: Mode Sense: 67 00 00 08
SCSI device sda: drive cache: write back
SCSI device sda: 312500000 512-byte hdwr sectors (160000 MB)
sda: Write Protect is off
sda: Mode Sense: 67 00 00 08
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
sd 2:0:0:0: Attached scsi disk sda
sd 2:0:0:0: Attached scsi generic sg2 type 0

and the same for the 2nd drive, sdb.

# cat/proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: DELL     Model: MD1000           Rev: A.00
  Type:   Enclosure                        ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: DELL     Model: MD1000           Rev: A.00
  Type:   Enclosure                        ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: WDC WD1600JS-75N Rev: 2E04
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 01 Lun: 00
  Vendor: ATA      Model: WDC WD1600JS-75N Rev: 2E04
  Type:   Direct-Access                    ANSI SCSI revision: 05

So it's a bit frustrating.

LUN support is enabled, and the boot-line is:

  auto BOOT_IMAGE=Linux ro root=901 max_luns=8

So if it's a LUN issue, then they ought to be being probed...

I'm wondering about things like the controllers needing some poking at the
BIOS level, but I've looked and there isn't a JBOD mode - only various
built-in RAID modes, so I've not created any RAID sets through the BIOS (I
want raid 6 on this box over all 15 drives)

I'm sure it's something dead obvious, so any clues would be appreciated!

Thanks,

Gordon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MegaRaid problems..
  2006-10-12  9:06   ` MegaRaid problems Gordon Henderson
@ 2006-10-12 11:13     ` Joshua Baker-LePain
  2006-10-12 13:15     ` Mike Hardy
  1 sibling, 0 replies; 15+ messages in thread
From: Joshua Baker-LePain @ 2006-10-12 11:13 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: linux-raid

On Thu, 12 Oct 2006 at 10:06am, Gordon Henderson wrote

> I'm wondering about things like the controllers needing some poking at the
> BIOS level, but I've looked and there isn't a JBOD mode - only various
> built-in RAID modes, so I've not created any RAID sets through the BIOS (I
> want raid 6 on this box over all 15 drives)

Some RAID cards don't explicitly have a JBOD mode but instead an option 
that says something like "Export unconfigured disks", and ISTR that 
MegaRAID may be that sort of card.  Poke about in the card BIOS a bit 
more.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: MegaRaid problems..
  2006-10-12  9:06   ` MegaRaid problems Gordon Henderson
  2006-10-12 11:13     ` Joshua Baker-LePain
@ 2006-10-12 13:15     ` Mike Hardy
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Hardy @ 2006-10-12 13:15 UTC (permalink / raw)
  To: linux-raid



Gordon Henderson wrote:
> This might not be strictly on-topic here, but you may provide
> enlightenment, as a lot of web searching hasn't helpmed me so-far )-:
> 
> A client has bought some Dell hardware - Dell 1950 1U server, 2 on-board
> SATA drives connected to a Fusion MPT SAS controller. This works just
> fine. The on-board drives are mirrored using s/w RAID, which is great and
> just how I want it.
> 
> The server also has 2 x Dell PERC dual-port SAS Raid Cards which have LSI
> MegaRaid chipssets on them. One cable from each raid card connect to half


> this is 2.6.18), and at boot time the dmesg output sees the drives in the
> external enclosure, but does not associate them to sdX drives! The
> underlying distro is Debian stable, but I doubt theres anything of issue
> there.

I have several Dell 2950s (same chassis) and they have this problem.

You can't do the PERC card and get JBOD basically. The PERC5 card has no
JBOD mode, whereas the PERC4 card did.

Dell said they may get a BIOS update, but wouldn't commit.

In the meantime, you have to exchange the PERC5 card for a SAS5 card,
then you can have JBOD.

I was a little disappointed, as the PERC5 card can drive 6 or 8 devices,
but the SAS5 card can only drive 4. Lame.

-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-10-12 13:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-24 13:03 how to set stride / raid-howto still up to date? Dexter Filmore
2006-09-25  4:40 ` Mark Hahn
2006-10-12  9:06   ` MegaRaid problems Gordon Henderson
2006-10-12 11:13     ` Joshua Baker-LePain
2006-10-12 13:15     ` Mike Hardy
  -- strict thread matches above, loose matches on Subject: below --
2006-01-20 13:19 Megaraid problems Collins, Kevin
2006-01-13 18:25 Collins, Kevin
2006-01-13 16:55 Ju, Seokmann
2006-01-13 15:59 Collins, Kevin
2006-01-13 14:39 Ju, Seokmann
2006-01-13 14:04 Collins, Kevin
2005-12-01 13:41 Ju, Seokmann
2005-12-01  4:34 Vladimir Dergachev
2001-05-15 22:49 megaraid problems Gabriel Rocha
2001-05-15 22:55 ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.