All of lore.kernel.org
 help / color / mirror / Atom feed
* how to set stride / raid-howto still up to date?
@ 2006-09-24 13:03 Dexter Filmore
  2006-09-25  4:40 ` Mark Hahn
  0 siblings, 1 reply; 15+ messages in thread
From: Dexter Filmore @ 2006-09-24 13:03 UTC (permalink / raw)
  To: linux-raid

Want to change a partition from xfs to ext3 but can't tell what to put for 
stride.

man page says:

"stride=stripe-size
                          Configure  the  filesystem  for  a  RAID  array with
                          stripe-size filesystem blocks per stripe."

So, what is stripe size anyway? The same as chunk size? 

Is the example in the raid howto section 5.11 still valid? (It still tells 
about -R instead of -E. Seems a bit old.)

So going with -b 4096 for the ext3 with a 32k chunk size still comes down to a 
stride of 8, correct?

Dex

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d--(+)@ s-:+ a- C++++ UL++ P+>++ L+++>++++ E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
------END GEEK CODE BLOCK------

http://www.stop1984.com
http://www.againsttcpa.com

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Megaraid problems.
@ 2006-01-20 13:19 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-20 13:19 UTC (permalink / raw)
  To: linux-scsi, Seokmann.Ju


> > If it is happening during disk I/O, I would like to investigate 
> > further.

Ju,

Any word on this problem yet?  Anything further that I can do to assist?

FYI:  I'm going to be ordering a new server next week.  Its hardware will be a close match to the troublesome system that I have in production.  Hopefully, once it arrives I'll be able to do some serious testing on it and will be able to "hack" on it more severely than we could the production machine.

In the meantime, anything you can think of, or want me to try, to help the production machine I'd love to hear about it.

--
Kevin L. Collins, MCSE
Systems Manager
Nesbitt Engineering, Inc. 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Megaraid problems.
@ 2006-01-13 18:25 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-13 18:25 UTC (permalink / raw)
  To: linux-scsi, Seokmann.Ju

On Friday, January 13, 2006 11:56 AM, Seokmann Ju wrote:

> Hi,
> > Besides the external storage (powered by megaraid, the 
> PERC4 and the 
> > PV220s) the machine has two internal SATA drives.
> > These internal drives house the OS, the web server and the 
> mail queue.  
> > The only I/O running through megaraid at the time of the 
> failures has 
> > been the creation of the tar files.
> If it is happening during disk I/O, I would like to 
> investigate further.
> It would be greatly helpful if you could provide some detail 
> steps to get the issue including how to create that big size file.

This server is an RSYNC hub for my company's three offices.  Every night an on-site backup cache in each office pushes the day's data to this machine.  After that process is complete, this machine creates a tarball of the combined sum of the three office's data to keep for short term storage.  The machine also rotates these tarballs for a week, so I end up with 7 90+ GB tarballs.

The uncompressed data contains everything from Word documents to AutoCAD drawings to a backup of my e-mail data store and Jpeg pictures from a company party.  You name it, I probably have it. ;-)  The uncompressed data is floating around 140GB.

To create my tarball I simply run "tar -zcvf /daily/backup1/archive.tar.gz ." from inside   of a perl script of my own creation.  Nothing special about it.

> And also, I'll check with F/W team to see if any updated 
> version of it and will get back to you if so.

Thanks.

Kevin

> 
> > -----Original Message-----
> > From: Collins, Kevin [mailto:kCollins@nesbittengineering.com]
> > Sent: Friday, January 13, 2006 11:00 AM
> > To: linux-scsi@vger.kernel.org; Ju, Seokmann
> > Subject: RE: Megaraid problems.
> > 
> > On Friday, January 13, 2006 9:39 AM, Seokmann Ju wrote:
> > 
> > > Hi,
> > Hey, glad to get a response!  :-)
> > 
> > > Thank you for posting details regarding megaraid.
> > > From the log, the messaage are OK except for following.
> > > ---
> > > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:175
> > > > 
> > > > [... The above line repeat every 5 seconds, counting down
> > to 0 ...]
> > > > 
> > > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:5
> > > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O
> > to offline
> > > > device
> > > ---
> > > > a). Knows about the problem and is working on it.
> > > > 
> > > > - and, more importantly -
> > > It seems like that, for some reason, controller couldn't return 
> > > commands (2 commands for this case) within given timeout period.
> > > And because of it, driver decided to reset the controller and as 
> > > part of reset, it triggers the F/W to make the device offline.
> > 
> > And I'm assuming that this is why my data isn't damamged or 
> otherwise 
> > corrupted - which is a good thing! ;-)
> > 
> > > > b). Can lead me to a fix.
> > > Can you clarify what is F/W version on the controller?
> > Firmware on the controller (from /proc/scci/scsi):  351S 
> > =================================================================
> > Host: scsi2 Channel: 00 Id: 06 Lun: 00
> >   Vendor: DELL     Model: PV22XS           Rev: E.18
> >   Type:   Processor                        ANSI SCSI revision: 03
> > Host: scsi2 Channel: 01 Id: 00 Lun: 00
> >   Vendor: MegaRAID Model: LD 0 RAID5  858G Rev: 351S
> >   Type:   Direct-Access                    ANSI SCSI revision: 02
> > Host: scsi2 Channel: 01 Id: 01 Lun: 00
> >   Vendor: MegaRAID Model: LD 1 RAID5  858G Rev: 351S
> >   Type:   Direct-Access                    ANSI SCSI revision: 02
> > =================================================================
> > 
> > I have seen reports on Dells mailing list that elude to the 
> fact the 
> > the E18 and 351S firmwares are supposed to help this situation, but 
> > not in my case.  My system shipped with these firmwares in place.  
> > Dell, to my knowledge, does not offer any newer versions of either 
> > firmware.
> > 
> > > Besides disk I/O, are there other operations involved like,
> > tape R/W?
> > No tape R/W, but...
> > 
> > > How about application? Any application that is communicating with 
> > > MegaRAID through IOCTL at that time?
> > As for other tasks, the machine also serves as a web server 
> (Apache, 
> > MySQL and PHP) and e-mail relay (Postfix).  The mail relay 
> does more 
> > work than the web server, but even that is light.
> > 
> > Besides the external storage (powered by megaraid, the 
> PERC4 and the 
> > PV220s) the machine has two internal SATA drives.
> > These internal drives house the OS, the web server and the 
> mail queue.  
> > The only I/O running through megaraid at the time of the 
> failures has 
> > been the creation of the tar files.
> > 
> > > 
> > > Thank you,
> > 
> > You're welcome.  I hope I have helped with the information and not 
> > hindered.  ;-)
> > 
> > Kevin
> > 
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: linux-scsi-owner@vger.kernel.org 
> > > > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of
> > > Collins, Kevin
> > > > Sent: Friday, January 13, 2006 9:05 AM
> > > > To: linux-scsi@vger.kernel.org
> > > > Subject: Megaraid problems.
> > > > 
> > > > Hi list,
> > > > 
> > > > I have a Dell PowerEdge 850 with their PERC4sc RAID card
> > driving a
> > > > Dell PowerVault 220s external drive enclosure running
> > Ubuntu 5.10.  
> > > > This machine and all the parts that make it up are less
> > > than 2 months
> > > > old.  In that time, I have had both logical drives supplied
> > > by PV220s
> > > > taken offline by the megaraid driver twice.  The only
> > cure for this
> > > > has been a reboot of the machine.  Luckily, with the
> > > exception of the
> > > > process that was running at the time of the problem,
> > > nothing else was
> > > > damaged or hurt; no loss of data has been experienced (yet).
> > > > 
> > > > Both times the failure has occurred, it happened while 
> creating a 
> > > > gzipped tarball of some backup data.  The final tarball
> > created is
> > > > averaging about 92+ GB in size and the machine is under
> > > heavy disk I/O
> > > > for more than 7 hours.  I have been able to grab this
> > > information from
> > > > the syslog after the failure (gathered with LogWatch):
> > > > 
> > > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > > 55592075:43[255:128], fw owner
> > > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > > 55592077:62[255:128], fw owner
> > > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > > 55592078[255:128], driver owner
> > > >  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:180
> > > >  1 Time(s): [5535381.561000] megaraid: 2 outstanding
> > commands. Max
> > > > wait 180 sec
> > > >  1 Time(s): [5535381.561000] megaraid: aborting-55592075
> > > > cmd=28 <c=1 t=0 l=0>
> > > >  1 Time(s): [5535381.561000] megaraid: aborting-55592077
> > > > cmd=28 <c=1 t=0 l=0>
> > > >  1 Time(s): [5535381.561000] megaraid: aborting-55592078
> > > > cmd=28 <c=1 t=0 l=0>
> > > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:175
> > > > 
> > > > [... The above line repeat every 5 seconds, counting down
> > to 0 ...]
> > > > 
> > > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2
> > commands to
> > > > complete:5
> > > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O
> > to offline
> > > > device
> > > > 
> > > > The only difference in the two instances is the number of
> > > "commands" 
> > > > that are waiting to complete.  This snippet above is from
> > the first
> > > > instance, the second instance had 10 commands waiting.
> > > > 
> > > > The machine is running the default Ubuntu kernel, which 
> is their 
> > > > patched version of 2.6.12.  In addition, both the
> > megaraid_mbox and
> > > > megaraid_mm modules are loaded.  Here is an output of
> > 'modinfo' for
> > > > both of those modules:
> > > > 
> > > > ==============================================================
> > > > ==========================
> > > > megaraid_mbox
> > > > --------------------------------------------------------------
> > > > --------------------------
> > > > filename:       
> > > > /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> > > > id_mbox.ko
> > > > author:         LSI Logic Corporation
> > > > description:    LSI Logic MegaRAID Mailbox Driver
> > > > license:        GPL
> > > > version:        2.20.4.5
> > > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > > depends:        megaraid_mm,scsi_mod
> > > > alias:          
> pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> > > > alias:          
> pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> > > > alias:          
> pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> > > > alias:          
> pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> > > > alias:          
> pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> > > > alias:          
> pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> > > > alias:          
> pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> > > > alias:          
> pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> > > > alias:          
> pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> > > > alias:          
> pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> > > > srcversion:     042A4371A952248BEF860F4
> > > > parm:           debug_level:Debug level for driver 
> > (default=0) (int)
> > > > parm:           fast_load:Faster loading of the driver, skips 
> > > > physical devices! (default=0) (int)
> > > > parm:           cmd_per_lun:Maximum number of commands per 
> > > > logical unit (default=64) (int)
> > > > parm:           max_sectors:Maximum number of sectors per IO 
> > > > command (default=128) (int)
> > > > parm:           busy_wait:Max wait for mailbox in 
> > > > microseconds if busy (default=10) (int)
> > > > parm:           unconf_disks:Set to expose unconfigured disks 
> > > > to kernel (default=0) (int)
> > > > 
> > > > --------------------------------------------------------------
> > > > --------------------------
> > > > megaraid_mm:
> > > > --------------------------------------------------------------
> > > > --------------------------
> > > > filename:       
> > > > 
> > > 
> > 
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> > > > author:         LSI Logic Corporation
> > > > description:    LSI Logic Management Module
> > > > license:        GPL
> > > > version:        2.20.2.5
> > > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > > depends:
> > > > srcversion:     D2DA33EA7F3FEA9EBE4A603
> > > > parm:           dlevel:Debug level (default=0) (int)
> > > > ==============================================================
> > > > ==========================
> > > > 
> > > > I have contacted Dell - via their linux-poweredge mailing
> > > list - and
> > > > have discovered that I am not the only one experiencing these 
> > > > problems.  What bothers me is that while this problem,
> > > apparently, has
> > > > been around a while and no fix has yet been discovered 
> by Dell or 
> > > > anyone else.
> > > > 
> > > > My research also leads me to believe that this is not just
> > > an Ubuntu
> > > > thing either.  I have reports that this happens under
> > > Redhat, Debian
> > > > and SuSE.  It also appears as though the problem started
> > happening
> > > > around kernel version 2.6.9.
> > > > 
> > > > So, I'm hoping that someone here:
> > > > 
> > > > a). Knows about the problem and is working on it.
> > > > 
> > > > - and, more importantly -
> > > > 
> > > > b). Can lead me to a fix.
> > > > 
> > > > My machine is in production and I do not have any
> > > additional hardware
> > > > to test with, but I can do limited testing with it as
> > long as it is
> > > > completely functional by 8:00 pm eastern time.  I'm using it as 
> > > > offsite backup machine and that's when my backup processes
> > > kick in.  
> > > > If more information is needed, let me know how to get it,
> > and I'll
> > > > supply it.
> > > > 
> > > > I need to get this solved ASAP.
> > > > 
> > > > Thanks in advance,
> > > > 
> > > > --
> > > > Kevin L. Collins, MCSE
> > > > Systems Manager
> > > > Nesbitt Engineering, Inc. 
> > > > -
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > linux-scsi" 
> > > > in the body of a message to majordomo@vger.kernel.org More
> > > majordomo
> > > > info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Megaraid problems.
@ 2006-01-13 16:55 Ju, Seokmann
  0 siblings, 0 replies; 15+ messages in thread
From: Ju, Seokmann @ 2006-01-13 16:55 UTC (permalink / raw)
  To: Collins, Kevin, linux-scsi

Hi,
> Besides the external storage (powered by megaraid, the PERC4 
> and the PV220s) the machine has two internal SATA drives.  
> These internal drives house the OS, the web server and the 
> mail queue.  The only I/O running through megaraid at the 
> time of the failures has been the creation of the tar files.
If it is happening during disk I/O, I would like to investigate further.
It would be greatly helpful if you could provide some detail steps to get the issue including how to create that big size file.
And also, I'll check with F/W team to see if any updated version of it and will get back to you if so.

Thank you, again.

> -----Original Message-----
> From: Collins, Kevin [mailto:kCollins@nesbittengineering.com] 
> Sent: Friday, January 13, 2006 11:00 AM
> To: linux-scsi@vger.kernel.org; Ju, Seokmann
> Subject: RE: Megaraid problems.
> 
> On Friday, January 13, 2006 9:39 AM, Seokmann Ju wrote:
> 
> > Hi,
> Hey, glad to get a response!  :-)
> 
> > Thank you for posting details regarding megaraid.
> > From the log, the messaage are OK except for following.
> > ---
> > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:175
> > > 
> > > [... The above line repeat every 5 seconds, counting down 
> to 0 ...]
> > > 
> > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:5
> > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O 
> to offline 
> > > device
> > ---
> > > a). Knows about the problem and is working on it.
> > > 
> > > - and, more importantly -
> > It seems like that, for some reason, controller couldn't 
> > return commands (2 commands for this case) within given 
> > timeout period.
> > And because of it, driver decided to reset the controller and 
> > as part of reset, it triggers the F/W to make the device offline.
> 
> And I'm assuming that this is why my data isn't damamged or 
> otherwise corrupted - which is a good thing! ;-)
> 
> > > b). Can lead me to a fix.
> > Can you clarify what is F/W version on the controller?
> Firmware on the controller (from /proc/scci/scsi):  351S
> =================================================================
> Host: scsi2 Channel: 00 Id: 06 Lun: 00
>   Vendor: DELL     Model: PV22XS           Rev: E.18
>   Type:   Processor                        ANSI SCSI revision: 03
> Host: scsi2 Channel: 01 Id: 00 Lun: 00
>   Vendor: MegaRAID Model: LD 0 RAID5  858G Rev: 351S
>   Type:   Direct-Access                    ANSI SCSI revision: 02
> Host: scsi2 Channel: 01 Id: 01 Lun: 00
>   Vendor: MegaRAID Model: LD 1 RAID5  858G Rev: 351S
>   Type:   Direct-Access                    ANSI SCSI revision: 02
> =================================================================
> 
> I have seen reports on Dells mailing list that elude to the 
> fact the the E18 and 351S firmwares are supposed to help this 
> situation, but not in my case.  My system shipped with these 
> firmwares in place.  Dell, to my knowledge, does not offer 
> any newer versions of either firmware.
> 
> > Besides disk I/O, are there other operations involved like, 
> tape R/W?
> No tape R/W, but...
> 
> > How about application? Any application that is communicating 
> > with MegaRAID through IOCTL at that time?
> As for other tasks, the machine also serves as a web server 
> (Apache, MySQL and PHP) and e-mail relay (Postfix).  The mail 
> relay does more work than the web server, but even that is light.
> 
> Besides the external storage (powered by megaraid, the PERC4 
> and the PV220s) the machine has two internal SATA drives.  
> These internal drives house the OS, the web server and the 
> mail queue.  The only I/O running through megaraid at the 
> time of the failures has been the creation of the tar files.
> 
> > 
> > Thank you,
> 
> You're welcome.  I hope I have helped with the information 
> and not hindered.  ;-)
> 
> Kevin
> 
> > 
> > 
> > > -----Original Message-----
> > > From: linux-scsi-owner@vger.kernel.org 
> > > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of 
> > Collins, Kevin
> > > Sent: Friday, January 13, 2006 9:05 AM
> > > To: linux-scsi@vger.kernel.org
> > > Subject: Megaraid problems.
> > > 
> > > Hi list,
> > > 
> > > I have a Dell PowerEdge 850 with their PERC4sc RAID card 
> driving a 
> > > Dell PowerVault 220s external drive enclosure running 
> Ubuntu 5.10.  
> > > This machine and all the parts that make it up are less 
> > than 2 months 
> > > old.  In that time, I have had both logical drives supplied 
> > by PV220s 
> > > taken offline by the megaraid driver twice.  The only 
> cure for this 
> > > has been a reboot of the machine.  Luckily, with the 
> > exception of the 
> > > process that was running at the time of the problem, 
> > nothing else was 
> > > damaged or hurt; no loss of data has been experienced (yet).
> > > 
> > > Both times the failure has occurred, it happened while creating a 
> > > gzipped tarball of some backup data.  The final tarball 
> created is 
> > > averaging about 92+ GB in size and the machine is under 
> > heavy disk I/O 
> > > for more than 7 hours.  I have been able to grab this 
> > information from 
> > > the syslog after the failure (gathered with LogWatch):
> > > 
> > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > 55592075:43[255:128], fw owner
> > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > 55592077:62[255:128], fw owner
> > >  1 Time(s): [5535381.561000] megaraid abort: 
> > > 55592078[255:128], driver owner
> > >  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:180
> > >  1 Time(s): [5535381.561000] megaraid: 2 outstanding 
> commands. Max 
> > > wait 180 sec
> > >  1 Time(s): [5535381.561000] megaraid: aborting-55592075
> > > cmd=28 <c=1 t=0 l=0>
> > >  1 Time(s): [5535381.561000] megaraid: aborting-55592077
> > > cmd=28 <c=1 t=0 l=0>
> > >  1 Time(s): [5535381.561000] megaraid: aborting-55592078
> > > cmd=28 <c=1 t=0 l=0>
> > >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> > >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:175
> > > 
> > > [... The above line repeat every 5 seconds, counting down 
> to 0 ...]
> > > 
> > >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to 
> > > complete:5
> > >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O 
> to offline 
> > > device
> > > 
> > > The only difference in the two instances is the number of 
> > "commands" 
> > > that are waiting to complete.  This snippet above is from 
> the first 
> > > instance, the second instance had 10 commands waiting.
> > > 
> > > The machine is running the default Ubuntu kernel, which is their 
> > > patched version of 2.6.12.  In addition, both the 
> megaraid_mbox and 
> > > megaraid_mm modules are loaded.  Here is an output of 
> 'modinfo' for 
> > > both of those modules:
> > > 
> > > ==============================================================
> > > ==========================
> > > megaraid_mbox
> > > --------------------------------------------------------------
> > > --------------------------
> > > filename:       
> > > /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> > > id_mbox.ko
> > > author:         LSI Logic Corporation
> > > description:    LSI Logic MegaRAID Mailbox Driver
> > > license:        GPL
> > > version:        2.20.4.5
> > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > depends:        megaraid_mm,scsi_mod
> > > alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> > > alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> > > alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> > > alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> > > alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> > > alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> > > alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> > > alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> > > alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> > > alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> > > srcversion:     042A4371A952248BEF860F4
> > > parm:           debug_level:Debug level for driver 
> (default=0) (int)
> > > parm:           fast_load:Faster loading of the driver, skips 
> > > physical devices! (default=0) (int)
> > > parm:           cmd_per_lun:Maximum number of commands per 
> > > logical unit (default=64) (int)
> > > parm:           max_sectors:Maximum number of sectors per IO 
> > > command (default=128) (int)
> > > parm:           busy_wait:Max wait for mailbox in 
> > > microseconds if busy (default=10) (int)
> > > parm:           unconf_disks:Set to expose unconfigured disks 
> > > to kernel (default=0) (int)
> > > 
> > > --------------------------------------------------------------
> > > --------------------------
> > > megaraid_mm:
> > > --------------------------------------------------------------
> > > --------------------------
> > > filename:       
> > > 
> > 
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> > > author:         LSI Logic Corporation
> > > description:    LSI Logic Management Module
> > > license:        GPL
> > > version:        2.20.2.5
> > > vermagic:       2.6.12-10-386 386 gcc-3.4
> > > depends:
> > > srcversion:     D2DA33EA7F3FEA9EBE4A603
> > > parm:           dlevel:Debug level (default=0) (int)
> > > ==============================================================
> > > ==========================
> > > 
> > > I have contacted Dell - via their linux-poweredge mailing 
> > list - and 
> > > have discovered that I am not the only one experiencing these 
> > > problems.  What bothers me is that while this problem, 
> > apparently, has 
> > > been around a while and no fix has yet been discovered by Dell or 
> > > anyone else.
> > > 
> > > My research also leads me to believe that this is not just 
> > an Ubuntu 
> > > thing either.  I have reports that this happens under 
> > Redhat, Debian 
> > > and SuSE.  It also appears as though the problem started 
> happening 
> > > around kernel version 2.6.9.
> > > 
> > > So, I'm hoping that someone here:
> > > 
> > > a). Knows about the problem and is working on it.
> > > 
> > > - and, more importantly -
> > > 
> > > b). Can lead me to a fix.
> > > 
> > > My machine is in production and I do not have any 
> > additional hardware 
> > > to test with, but I can do limited testing with it as 
> long as it is 
> > > completely functional by 8:00 pm eastern time.  I'm using it as 
> > > offsite backup machine and that's when my backup processes 
> > kick in.  
> > > If more information is needed, let me know how to get it, 
> and I'll 
> > > supply it.
> > > 
> > > I need to get this solved ASAP.
> > > 
> > > Thanks in advance,
> > > 
> > > --
> > > Kevin L. Collins, MCSE
> > > Systems Manager
> > > Nesbitt Engineering, Inc. 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe 
> > linux-scsi" 
> > > in the body of a message to majordomo@vger.kernel.org More 
> > majordomo 
> > > info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Megaraid problems.
@ 2006-01-13 15:59 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-13 15:59 UTC (permalink / raw)
  To: linux-scsi, Seokmann.Ju

On Friday, January 13, 2006 9:39 AM, Seokmann Ju wrote:

> Hi,
Hey, glad to get a response!  :-)

> Thank you for posting details regarding megaraid.
> From the log, the messaage are OK except for following.
> ---
> >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 commands to 
> > complete:175
> > 
> > [... The above line repeat every 5 seconds, counting down to 0 ...]
> > 
> >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 commands to 
> > complete:5
> >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to offline 
> > device
> ---
> > a). Knows about the problem and is working on it.
> > 
> > - and, more importantly -
> It seems like that, for some reason, controller couldn't 
> return commands (2 commands for this case) within given 
> timeout period.
> And because of it, driver decided to reset the controller and 
> as part of reset, it triggers the F/W to make the device offline.

And I'm assuming that this is why my data isn't damamged or otherwise corrupted - which is a good thing! ;-)

> > b). Can lead me to a fix.
> Can you clarify what is F/W version on the controller?
Firmware on the controller (from /proc/scci/scsi):  351S
=================================================================
Host: scsi2 Channel: 00 Id: 06 Lun: 00
  Vendor: DELL     Model: PV22XS           Rev: E.18
  Type:   Processor                        ANSI SCSI revision: 03
Host: scsi2 Channel: 01 Id: 00 Lun: 00
  Vendor: MegaRAID Model: LD 0 RAID5  858G Rev: 351S
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi2 Channel: 01 Id: 01 Lun: 00
  Vendor: MegaRAID Model: LD 1 RAID5  858G Rev: 351S
  Type:   Direct-Access                    ANSI SCSI revision: 02
=================================================================

I have seen reports on Dells mailing list that elude to the fact the the E18 and 351S firmwares are supposed to help this situation, but not in my case.  My system shipped with these firmwares in place.  Dell, to my knowledge, does not offer any newer versions of either firmware.

> Besides disk I/O, are there other operations involved like, tape R/W?
No tape R/W, but...

> How about application? Any application that is communicating 
> with MegaRAID through IOCTL at that time?
As for other tasks, the machine also serves as a web server (Apache, MySQL and PHP) and e-mail relay (Postfix).  The mail relay does more work than the web server, but even that is light.

Besides the external storage (powered by megaraid, the PERC4 and the PV220s) the machine has two internal SATA drives.  These internal drives house the OS, the web server and the mail queue.  The only I/O running through megaraid at the time of the failures has been the creation of the tar files.

> 
> Thank you,

You're welcome.  I hope I have helped with the information and not hindered.  ;-)

Kevin

> 
> 
> > -----Original Message-----
> > From: linux-scsi-owner@vger.kernel.org 
> > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of 
> Collins, Kevin
> > Sent: Friday, January 13, 2006 9:05 AM
> > To: linux-scsi@vger.kernel.org
> > Subject: Megaraid problems.
> > 
> > Hi list,
> > 
> > I have a Dell PowerEdge 850 with their PERC4sc RAID card driving a 
> > Dell PowerVault 220s external drive enclosure running Ubuntu 5.10.  
> > This machine and all the parts that make it up are less 
> than 2 months 
> > old.  In that time, I have had both logical drives supplied 
> by PV220s 
> > taken offline by the megaraid driver twice.  The only cure for this 
> > has been a reboot of the machine.  Luckily, with the 
> exception of the 
> > process that was running at the time of the problem, 
> nothing else was 
> > damaged or hurt; no loss of data has been experienced (yet).
> > 
> > Both times the failure has occurred, it happened while creating a 
> > gzipped tarball of some backup data.  The final tarball created is 
> > averaging about 92+ GB in size and the machine is under 
> heavy disk I/O 
> > for more than 7 hours.  I have been able to grab this 
> information from 
> > the syslog after the failure (gathered with LogWatch):
> > 
> >  1 Time(s): [5535381.561000] megaraid abort: 
> > 55592075:43[255:128], fw owner
> >  1 Time(s): [5535381.561000] megaraid abort: 
> > 55592077:62[255:128], fw owner
> >  1 Time(s): [5535381.561000] megaraid abort: 
> > 55592078[255:128], driver owner
> >  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 commands to 
> > complete:180
> >  1 Time(s): [5535381.561000] megaraid: 2 outstanding commands. Max 
> > wait 180 sec
> >  1 Time(s): [5535381.561000] megaraid: aborting-55592075
> > cmd=28 <c=1 t=0 l=0>
> >  1 Time(s): [5535381.561000] megaraid: aborting-55592077
> > cmd=28 <c=1 t=0 l=0>
> >  1 Time(s): [5535381.561000] megaraid: aborting-55592078
> > cmd=28 <c=1 t=0 l=0>
> >  1 Time(s): [5535381.561000] megaraid: reseting the host...
> >  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 commands to 
> > complete:175
> > 
> > [... The above line repeat every 5 seconds, counting down to 0 ...]
> > 
> >  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 commands to 
> > complete:5
> >  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to offline 
> > device
> > 
> > The only difference in the two instances is the number of 
> "commands" 
> > that are waiting to complete.  This snippet above is from the first 
> > instance, the second instance had 10 commands waiting.
> > 
> > The machine is running the default Ubuntu kernel, which is their 
> > patched version of 2.6.12.  In addition, both the megaraid_mbox and 
> > megaraid_mm modules are loaded.  Here is an output of 'modinfo' for 
> > both of those modules:
> > 
> > ==============================================================
> > ==========================
> > megaraid_mbox
> > --------------------------------------------------------------
> > --------------------------
> > filename:       
> > /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> > id_mbox.ko
> > author:         LSI Logic Corporation
> > description:    LSI Logic MegaRAID Mailbox Driver
> > license:        GPL
> > version:        2.20.4.5
> > vermagic:       2.6.12-10-386 386 gcc-3.4
> > depends:        megaraid_mm,scsi_mod
> > alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> > alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> > alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> > alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> > alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> > alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> > alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> > alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> > alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> > alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> > alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> > alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> > alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> > alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> > alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> > alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> > alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> > alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> > srcversion:     042A4371A952248BEF860F4
> > parm:           debug_level:Debug level for driver (default=0) (int)
> > parm:           fast_load:Faster loading of the driver, skips 
> > physical devices! (default=0) (int)
> > parm:           cmd_per_lun:Maximum number of commands per 
> > logical unit (default=64) (int)
> > parm:           max_sectors:Maximum number of sectors per IO 
> > command (default=128) (int)
> > parm:           busy_wait:Max wait for mailbox in 
> > microseconds if busy (default=10) (int)
> > parm:           unconf_disks:Set to expose unconfigured disks 
> > to kernel (default=0) (int)
> > 
> > --------------------------------------------------------------
> > --------------------------
> > megaraid_mm:
> > --------------------------------------------------------------
> > --------------------------
> > filename:       
> > 
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> > author:         LSI Logic Corporation
> > description:    LSI Logic Management Module
> > license:        GPL
> > version:        2.20.2.5
> > vermagic:       2.6.12-10-386 386 gcc-3.4
> > depends:
> > srcversion:     D2DA33EA7F3FEA9EBE4A603
> > parm:           dlevel:Debug level (default=0) (int)
> > ==============================================================
> > ==========================
> > 
> > I have contacted Dell - via their linux-poweredge mailing 
> list - and 
> > have discovered that I am not the only one experiencing these 
> > problems.  What bothers me is that while this problem, 
> apparently, has 
> > been around a while and no fix has yet been discovered by Dell or 
> > anyone else.
> > 
> > My research also leads me to believe that this is not just 
> an Ubuntu 
> > thing either.  I have reports that this happens under 
> Redhat, Debian 
> > and SuSE.  It also appears as though the problem started happening 
> > around kernel version 2.6.9.
> > 
> > So, I'm hoping that someone here:
> > 
> > a). Knows about the problem and is working on it.
> > 
> > - and, more importantly -
> > 
> > b). Can lead me to a fix.
> > 
> > My machine is in production and I do not have any 
> additional hardware 
> > to test with, but I can do limited testing with it as long as it is 
> > completely functional by 8:00 pm eastern time.  I'm using it as 
> > offsite backup machine and that's when my backup processes 
> kick in.  
> > If more information is needed, let me know how to get it, and I'll 
> > supply it.
> > 
> > I need to get this solved ASAP.
> > 
> > Thanks in advance,
> > 
> > --
> > Kevin L. Collins, MCSE
> > Systems Manager
> > Nesbitt Engineering, Inc. 
> > -
> > To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" 
> > in the body of a message to majordomo@vger.kernel.org More 
> majordomo 
> > info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Megaraid problems.
@ 2006-01-13 14:39 Ju, Seokmann
  0 siblings, 0 replies; 15+ messages in thread
From: Ju, Seokmann @ 2006-01-13 14:39 UTC (permalink / raw)
  To: Collins, Kevin, linux-scsi

Hi,

Thank you for posting details regarding megaraid.
>From the log, the messaage are OK except for following.
---
>  1 Time(s): [5535381.561000] megaraid: reseting the host...
>  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to complete:175
> 
> [... The above line repeat every 5 seconds, counting down to 0 ...]
> 
>  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to complete:5
>  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to 
> offline device
---
> a). Knows about the problem and is working on it.
> 
> - and, more importantly - 
It seems like that, for some reason, controller couldn't return commands (2 commands for this case) within given timeout period.
And because of it, driver decided to reset the controller and as part of reset, it triggers the F/W to make the device offline.


> b). Can lead me to a fix.
Can you clarify what is F/W version on the controller?
Besides disk I/O, are there other operations involved like, tape R/W?
How about application? Any application that is communicating with MegaRAID through IOCTL at that time?

Thank you,


> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org 
> [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Collins, Kevin
> Sent: Friday, January 13, 2006 9:05 AM
> To: linux-scsi@vger.kernel.org
> Subject: Megaraid problems.
> 
> Hi list,
> 
> I have a Dell PowerEdge 850 with their PERC4sc RAID card 
> driving a Dell PowerVault 220s external drive enclosure 
> running Ubuntu 5.10.  This machine and all the parts that 
> make it up are less than 2 months old.  In that time, I have 
> had both logical drives supplied by PV220s taken offline by 
> the megaraid driver twice.  The only cure for this has been a 
> reboot of the machine.  Luckily, with the exception of the 
> process that was running at the time of the problem, nothing 
> else was damaged or hurt; no loss of data has been experienced (yet).
> 
> Both times the failure has occurred, it happened while 
> creating a gzipped tarball of some backup data.  The final 
> tarball created is averaging about 92+ GB in size and the 
> machine is under heavy disk I/O for more than 7 hours.  I 
> have been able to grab this information from the syslog after 
> the failure (gathered with LogWatch):
> 
>  1 Time(s): [5535381.561000] megaraid abort: 
> 55592075:43[255:128], fw owner
>  1 Time(s): [5535381.561000] megaraid abort: 
> 55592077:62[255:128], fw owner
>  1 Time(s): [5535381.561000] megaraid abort: 
> 55592078[255:128], driver owner
>  1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 
> commands to complete:180
>  1 Time(s): [5535381.561000] megaraid: 2 outstanding 
> commands. Max wait 180 sec
>  1 Time(s): [5535381.561000] megaraid: aborting-55592075 
> cmd=28 <c=1 t=0 l=0>
>  1 Time(s): [5535381.561000] megaraid: aborting-55592077 
> cmd=28 <c=1 t=0 l=0>
>  1 Time(s): [5535381.561000] megaraid: aborting-55592078 
> cmd=28 <c=1 t=0 l=0>
>  1 Time(s): [5535381.561000] megaraid: reseting the host...
>  1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 
> commands to complete:175
> 
> [... The above line repeat every 5 seconds, counting down to 0 ...]
> 
>  1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 
> commands to complete:5
>  2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to 
> offline device
> 
> The only difference in the two instances is the number of 
> "commands" that are waiting to complete.  This snippet above 
> is from the first instance, the second instance had 10 
> commands waiting.
> 
> The machine is running the default Ubuntu kernel, which is 
> their patched version of 2.6.12.  In addition, both the 
> megaraid_mbox and megaraid_mm modules are loaded.  Here is an 
> output of 'modinfo' for both of those modules:
> 
> ==============================================================
> ==========================
> megaraid_mbox
> --------------------------------------------------------------
> --------------------------
> filename:       
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megara
> id_mbox.ko
> author:         LSI Logic Corporation
> description:    LSI Logic MegaRAID Mailbox Driver
> license:        GPL
> version:        2.20.4.5
> vermagic:       2.6.12-10-386 386 gcc-3.4
> depends:        megaraid_mm,scsi_mod
> alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
> alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
> alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
> alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
> alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
> alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
> alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
> alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
> alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
> alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
> alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
> alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
> alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
> alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
> alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
> alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
> alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
> alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
> alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
> alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
> alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
> alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
> alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
> alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
> alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
> alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
> alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
> alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
> alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
> alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
> alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
> alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
> srcversion:     042A4371A952248BEF860F4
> parm:           debug_level:Debug level for driver (default=0) (int)
> parm:           fast_load:Faster loading of the driver, skips 
> physical devices! (default=0) (int)
> parm:           cmd_per_lun:Maximum number of commands per 
> logical unit (default=64) (int)
> parm:           max_sectors:Maximum number of sectors per IO 
> command (default=128) (int)
> parm:           busy_wait:Max wait for mailbox in 
> microseconds if busy (default=10) (int)
> parm:           unconf_disks:Set to expose unconfigured disks 
> to kernel (default=0) (int)
> 
> --------------------------------------------------------------
> --------------------------
> megaraid_mm:
> --------------------------------------------------------------
> --------------------------
> filename:       
> /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
> author:         LSI Logic Corporation
> description:    LSI Logic Management Module
> license:        GPL
> version:        2.20.2.5
> vermagic:       2.6.12-10-386 386 gcc-3.4
> depends:
> srcversion:     D2DA33EA7F3FEA9EBE4A603
> parm:           dlevel:Debug level (default=0) (int)
> ==============================================================
> ==========================
> 
> I have contacted Dell - via their linux-poweredge mailing 
> list - and have discovered that I am not the only one 
> experiencing these problems.  What bothers me is that while 
> this problem, apparently, has been around a while and no fix 
> has yet been discovered by Dell or anyone else.
> 
> My research also leads me to believe that this is not just an 
> Ubuntu thing either.  I have reports that this happens under 
> Redhat, Debian and SuSE.  It also appears as though the 
> problem started happening around kernel version 2.6.9.
> 
> So, I'm hoping that someone here:
> 
> a). Knows about the problem and is working on it.
> 
> - and, more importantly - 
> 
> b). Can lead me to a fix.
> 
> My machine is in production and I do not have any additional 
> hardware to test with, but I can do limited testing with it 
> as long as it is completely functional by 8:00 pm eastern 
> time.  I'm using it as offsite backup machine and that's when 
> my backup processes kick in.  If more information is needed, 
> let me know how to get it, and I'll supply it.
> 
> I need to get this solved ASAP.
> 
> Thanks in advance,
> 
> --
> Kevin L. Collins, MCSE
> Systems Manager
> Nesbitt Engineering, Inc. 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Megaraid problems.
@ 2006-01-13 14:04 Collins, Kevin
  0 siblings, 0 replies; 15+ messages in thread
From: Collins, Kevin @ 2006-01-13 14:04 UTC (permalink / raw)
  To: linux-scsi

Hi list,

I have a Dell PowerEdge 850 with their PERC4sc RAID card driving a Dell PowerVault 220s external drive enclosure running Ubuntu 5.10.  This machine and all the parts that make it up are less than 2 months old.  In that time, I have had both logical drives supplied by PV220s taken offline by the megaraid driver twice.  The only cure for this has been a reboot of the machine.  Luckily, with the exception of the process that was running at the time of the problem, nothing else was damaged or hurt; no loss of data has been experienced (yet).

Both times the failure has occurred, it happened while creating a gzipped tarball of some backup data.  The final tarball created is averaging about 92+ GB in size and the machine is under heavy disk I/O for more than 7 hours.  I have been able to grab this information from the syslog after the failure (gathered with LogWatch):

 1 Time(s): [5535381.561000] megaraid abort: 55592075:43[255:128], fw owner
 1 Time(s): [5535381.561000] megaraid abort: 55592077:62[255:128], fw owner
 1 Time(s): [5535381.561000] megaraid abort: 55592078[255:128], driver owner
 1 Time(s): [5535381.561000] megaraid mbox: Wait for 2 commands to complete:180
 1 Time(s): [5535381.561000] megaraid: 2 outstanding commands. Max wait 180 sec
 1 Time(s): [5535381.561000] megaraid: aborting-55592075 cmd=28 <c=1 t=0 l=0>
 1 Time(s): [5535381.561000] megaraid: aborting-55592077 cmd=28 <c=1 t=0 l=0>
 1 Time(s): [5535381.561000] megaraid: aborting-55592078 cmd=28 <c=1 t=0 l=0>
 1 Time(s): [5535381.561000] megaraid: reseting the host...
 1 Time(s): [5535386.566000] megaraid mbox: Wait for 2 commands to complete:175

[... The above line repeat every 5 seconds, counting down to 0 ...]

 1 Time(s): [5535556.736000] megaraid mbox: Wait for 2 commands to complete:5
 2 Time(s): [5535562.611000] scsi2 (0:0): rejecting I/O to offline device

The only difference in the two instances is the number of "commands" that are waiting to complete.  This snippet above is from the first instance, the second instance had 10 commands waiting.

The machine is running the default Ubuntu kernel, which is their patched version of 2.6.12.  In addition, both the megaraid_mbox and megaraid_mm modules are loaded.  Here is an output of 'modinfo' for both of those modules:

========================================================================================
megaraid_mbox
----------------------------------------------------------------------------------------
filename:       /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mbox.ko
author:         LSI Logic Corporation
description:    LSI Logic MegaRAID Mailbox Driver
license:        GPL
version:        2.20.4.5
vermagic:       2.6.12-10-386 386 gcc-3.4
depends:        megaraid_mm,scsi_mod
alias:          pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
alias:          pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
alias:          pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
alias:          pci:v00001000d00000407sv00001028sd00000531bc*sc*i*
alias:          pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
alias:          pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
alias:          pci:v00001000d00000408sv00001028sd00000002bc*sc*i*
alias:          pci:v00001000d00000408sv00001028sd00000001bc*sc*i*
alias:          pci:v0000101Ed00001960sv00001028sd00000471bc*sc*i*
alias:          pci:v0000101Ed00001960sv00001028sd00000493bc*sc*i*
alias:          pci:v0000101Ed00001960sv00001028sd00000475bc*sc*i*
alias:          pci:v0000101Ed00001960sv0000101Esd00000475bc*sc*i*
alias:          pci:v0000101Ed00001960sv0000101Esd00000493bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd0000A520bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000520bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000518bc*sc*i*
alias:          pci:v00001000d00000407sv00001000sd00000530bc*sc*i*
alias:          pci:v00001000d00000407sv00001000sd00000532bc*sc*i*
alias:          pci:v00001000d00000407sv00001000sd00000531bc*sc*i*
alias:          pci:v00001000d00000408sv00001000sd00000001bc*sc*i*
alias:          pci:v00001000d00000408sv00001000sd00000002bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000522bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00004523bc*sc*i*
alias:          pci:v00001000d00001960sv00001000sd00000523bc*sc*i*
alias:          pci:v00001000d00000409sv00001000sd00003004bc*sc*i*
alias:          pci:v00001000d00000409sv00001000sd00003008bc*sc*i*
alias:          pci:v00001000d00000407sv00008086sd00000532bc*sc*i*
alias:          pci:v00001000d00001960sv00008086sd00000523bc*sc*i*
alias:          pci:v00001000d00000408sv00008086sd00000002bc*sc*i*
alias:          pci:v00001000d00000407sv00008086sd00000530bc*sc*i*
alias:          pci:v00001000d00000409sv00008086sd00003008bc*sc*i*
alias:          pci:v00001000d00000408sv00008086sd00003431bc*sc*i*
alias:          pci:v00001000d00000408sv00008086sd00003499bc*sc*i*
alias:          pci:v00001000d00001960sv00008086sd00000520bc*sc*i*
alias:          pci:v00001000d00000408sv00001734sd00001065bc*sc*i*
alias:          pci:v00001000d00000408sv00001025sd0000004Dbc*sc*i*
alias:          pci:v00001000d00000408sv00001033sd00008287bc*sc*i*
srcversion:     042A4371A952248BEF860F4
parm:           debug_level:Debug level for driver (default=0) (int)
parm:           fast_load:Faster loading of the driver, skips physical devices! (default=0) (int)
parm:           cmd_per_lun:Maximum number of commands per logical unit (default=64) (int)
parm:           max_sectors:Maximum number of sectors per IO command (default=128) (int)
parm:           busy_wait:Max wait for mailbox in microseconds if busy (default=10) (int)
parm:           unconf_disks:Set to expose unconfigured disks to kernel (default=0) (int)

----------------------------------------------------------------------------------------
megaraid_mm:
----------------------------------------------------------------------------------------
filename:       /lib/modules/2.6.12-10-386/kernel/drivers/scsi/megaraid/megaraid_mm.ko
author:         LSI Logic Corporation
description:    LSI Logic Management Module
license:        GPL
version:        2.20.2.5
vermagic:       2.6.12-10-386 386 gcc-3.4
depends:
srcversion:     D2DA33EA7F3FEA9EBE4A603
parm:           dlevel:Debug level (default=0) (int)
========================================================================================

I have contacted Dell - via their linux-poweredge mailing list - and have discovered that I am not the only one experiencing these problems.  What bothers me is that while this problem, apparently, has been around a while and no fix has yet been discovered by Dell or anyone else.

My research also leads me to believe that this is not just an Ubuntu thing either.  I have reports that this happens under Redhat, Debian and SuSE.  It also appears as though the problem started happening around kernel version 2.6.9.

So, I'm hoping that someone here:

a). Knows about the problem and is working on it.

- and, more importantly - 

b). Can lead me to a fix.

My machine is in production and I do not have any additional hardware to test with, but I can do limited testing with it as long as it is completely functional by 8:00 pm eastern time.  I'm using it as offsite backup machine and that's when my backup processes kick in.  If more information is needed, let me know how to get it, and I'll supply it.

I need to get this solved ASAP.

Thanks in advance,

--
Kevin L. Collins, MCSE
Systems Manager
Nesbitt Engineering, Inc. 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* RE: Megaraid problems
@ 2005-12-01 13:41 Ju, Seokmann
  0 siblings, 0 replies; 15+ messages in thread
From: Ju, Seokmann @ 2005-12-01 13:41 UTC (permalink / raw)
  To: 'Vladimir Dergachev', linux-kernel

Hi,

On Wednesday, November 30, 2005 11:35 PM, Vladimir Dergachev wrote:
>     Problem: card recognizes the disks fine, initialization goes ok.
> I can create partition with no problem. However when copying 
> files I get
> filesystem corruption (holds both with ext3 and XFS). For 
> ext3 I see the 
> following message:
Can you please try with following changes to seeif it works?
- configure the controller with 1 400GB RAID0 instead of 4 400GB RAID5
- 4GB (or less) memory

Also, please provide following information.
- F/W on the MegaRAID controller version

Thank you,

Seokmann

> -----Original Message-----
> From: Vladimir Dergachev [mailto:volodya@mindspring.com] 
> Sent: Wednesday, November 30, 2005 11:35 PM
> To: linux-kernel@vger.kernel.org
> Subject: Megaraid problems
> 
> 
> Hi all :)
> 
>     I am wondering whether someone can shed some light on issues I am 
> having with LSI Megaraid cards ? The problem described below is with
> SATA card, but I am also having difficulties with LSI HBA 
> Scsi adaptor, 
> ableit I have not yet eliminated all the obvious things to try.
> 
>     Please CC your replies as I am not on the list..
> 
>     My setup: dual Opteron 252 with 8 GB RAM, Tyan Thunder 
> K8W. SUSE 9.3 -
> I tried both the native SUSE kernel and 2.6.14.3 from 
> kernel.org, 4 400GB 
> Western Digital drivers in RAID5 configuration connected to 
> LSI MegaRAID 
> SATA 150-6
> 
>     The are also 3 250 GB drives connected to on-board 
> SIL3114 where Linux 
> is installed.
> 
>     Problem: card recognizes the disks fine, initialization goes ok.
> I can create partition with no problem. However when copying 
> files I get
> filesystem corruption (holds both with ext3 and XFS). For 
> ext3 I see the 
> following message:
> 
> EXT3-fs error (device sdd1): ext3_new_block: Allocating block 
> in system zone - 
> block = 100663296
> Aborting journal on device sdd1.
> EXT3-fs error (device sdd1) in ext3_prepare_write: Journal has aborted
> ext3_abort called.
> EXT3-fs error (device sdd1): ext3_journal_start_sb: Detected 
> aborted journal
> Remounting filesystem read-only
> 
>     The hardware is otherwise stable - this box also has 3 
> disks in software 
> RAID5 connected to SIL3114 and a similar box has 3ware card 
> with 8 drives, 
> which function fine.
> 
>     I would appreciate any suggestions on what to try/tweak/patch.
> 
>                  thank you very much !
> 
>                        Vladimir Dergachev
> 
> PS One more thing - none of LSI binary configuration tools 
> work - they report 
> they cannot find the adaptor even though /dev/megadev0 is 
> pointing to correct 
> adaptor (I tried both 253 and 254 for major numbers and 0,1,2,3,4 for 
> minor numbers). I was not successful in finding an 
> open-source management 
> tool..
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Megaraid problems
@ 2005-12-01  4:34 Vladimir Dergachev
  0 siblings, 0 replies; 15+ messages in thread
From: Vladimir Dergachev @ 2005-12-01  4:34 UTC (permalink / raw)
  To: linux-kernel


Hi all :)

    I am wondering whether someone can shed some light on issues I am 
having with LSI Megaraid cards ? The problem described below is with
SATA card, but I am also having difficulties with LSI HBA Scsi adaptor, 
ableit I have not yet eliminated all the obvious things to try.

    Please CC your replies as I am not on the list..

    My setup: dual Opteron 252 with 8 GB RAM, Tyan Thunder K8W. SUSE 9.3 -
I tried both the native SUSE kernel and 2.6.14.3 from kernel.org, 4 400GB 
Western Digital drivers in RAID5 configuration connected to LSI MegaRAID 
SATA 150-6

    The are also 3 250 GB drives connected to on-board SIL3114 where Linux 
is installed.

    Problem: card recognizes the disks fine, initialization goes ok.
I can create partition with no problem. However when copying files I get
filesystem corruption (holds both with ext3 and XFS). For ext3 I see the 
following message:

EXT3-fs error (device sdd1): ext3_new_block: Allocating block in system zone - 
block = 100663296
Aborting journal on device sdd1.
EXT3-fs error (device sdd1) in ext3_prepare_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device sdd1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

    The hardware is otherwise stable - this box also has 3 disks in software 
RAID5 connected to SIL3114 and a similar box has 3ware card with 8 drives, 
which function fine.

    I would appreciate any suggestions on what to try/tweak/patch.

                 thank you very much !

                       Vladimir Dergachev

PS One more thing - none of LSI binary configuration tools work - they report 
they cannot find the adaptor even though /dev/megadev0 is pointing to correct 
adaptor (I tried both 253 and 254 for major numbers and 0,1,2,3,4 for 
minor numbers). I was not successful in finding an open-source management 
tool..


^ permalink raw reply	[flat|nested] 15+ messages in thread
* megaraid problems
@ 2001-05-15 22:49 Gabriel Rocha
  2001-05-15 22:55 ` Alan Cox
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Rocha @ 2001-05-15 22:49 UTC (permalink / raw)
  To: linux-kernel

Hi,
	I have a megaraid 466 controller, which both ami and the linux
	kernel say is supported under 2.4.4, I tried the ami patches to
	the drivers in the vanilla kernel to no avail, the card works
	under windows...the card is even detected under linux now, but
	it refuses to show any drives and i cant get the management
	software to work. so my question is this:
		has anyone with this card gotten it to work under linux
		2.4.4-acX? if so, how?
	I have read about many people upgrading firmware and using
	'latest drivers' both of which i have done, so i am at a loss
	here. any help is appreciated. --gabe


-- 

"It's not brave if you're not scared."

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-10-12 13:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-24 13:03 how to set stride / raid-howto still up to date? Dexter Filmore
2006-09-25  4:40 ` Mark Hahn
2006-10-12  9:06   ` MegaRaid problems Gordon Henderson
2006-10-12 11:13     ` Joshua Baker-LePain
2006-10-12 13:15     ` Mike Hardy
  -- strict thread matches above, loose matches on Subject: below --
2006-01-20 13:19 Megaraid problems Collins, Kevin
2006-01-13 18:25 Collins, Kevin
2006-01-13 16:55 Ju, Seokmann
2006-01-13 15:59 Collins, Kevin
2006-01-13 14:39 Ju, Seokmann
2006-01-13 14:04 Collins, Kevin
2005-12-01 13:41 Ju, Seokmann
2005-12-01  4:34 Vladimir Dergachev
2001-05-15 22:49 megaraid problems Gabriel Rocha
2001-05-15 22:55 ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.