linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver
@ 2010-10-21  7:31 Louis-David Mitterrand
  2010-10-21 11:08 ` Tim Small
  0 siblings, 1 reply; 4+ messages in thread
From: Louis-David Mitterrand @ 2010-10-21  7:31 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

Hi,

I am setting up a new Dell T610 server with 8 WD Black Caviar sata3 1TB
disks on a LSISAS2008 controller:

	Oct 21 09:12:37 grml kernel: [   83.377388] mpt2sas0: LSISAS2008: FWVersion(02.1
5.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)

My layout is as follows:

- small un-encrypted raid1 boot partition on /dev/md0

- dm-crypt main partition on /dev/md1 (actuallly /dev/mapper/cmd1)

A recent grml64 is used to create the partitions, install the system and
run lilo.

When running lilo I get these errors from the controller:

	Oct 21 08:57:11 grml kernel: [40832.015207] mpt2sas0: fault_state(0x265d)!
	Oct 21 08:57:11 grml kernel: [40832.015210] mpt2sas0: sending diag reset !!
	Oct 21 08:57:12 grml kernel: [40833.209839] mpt2sas0: diag reset: SUCCESS
	Oct 21 08:57:12 grml ata_id[3570]: HDIO_GET_IDENTITY failed for '/dev/sde'
	Oct 21 08:57:13 grml kernel: [40833.407078] mpt2sas0: LSISAS2008: FWVersion(02.15.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
	Oct 21 08:57:13 grml kernel: [40833.407084] mpt2sas0: Dell PERC H200 Integrated: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1E)
	Oct 21 08:57:13 grml kernel: [40833.407087] mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)

etc.. and then all disks are kicked out of /dev/md1:

	Oct 21 08:57:20 grml kernel: [40840.361581] md: super_written gets error=-5, uptodate=0
	Oct 21 08:57:20 grml kernel: [40840.361586] md/raid:md1: Disk failure on sdd2, disabling device.
	Oct 21 08:57:20 grml kernel: [40840.361587] <1>md/raid:md1: Operation continuing on 0 devices.

But on another attempt /dev/md0 was stopped as well.

Any suggestion on fixing that problem would be welcome. I can send more
complete logs.

Thanks,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver
  2010-10-21  7:31 problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver Louis-David Mitterrand
@ 2010-10-21 11:08 ` Tim Small
  2010-10-21 12:50   ` Louis-David Mitterrand
  2010-10-22  5:25   ` Stefan /*St0fF*/ Hübner
  0 siblings, 2 replies; 4+ messages in thread
From: Tim Small @ 2010-10-21 11:08 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org, linux-poweredge@dell.com

On 21/10/10 08:31, Louis-David Mitterrand wrote:
> Hi,
>
> I am setting up a new Dell T610 server with 8 WD Black Caviar sata3 1TB
> disks on a LSISAS2008 controller:
>
> 	Oct 21 09:12:37 grml kernel: [   83.377388] mpt2sas0: LSISAS2008: FWVersion(02.1
> 5.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
>
> My layout is as follows:
>
> - small un-encrypted raid1 boot partition on /dev/md0
>
> - dm-crypt main partition on /dev/md1 (actuallly /dev/mapper/cmd1)
>
> A recent grml64 is used to create the partitions, install the system and
> run lilo.
>
> When running lilo I get these errors from the controller:
>
> 	Oct 21 08:57:11 grml kernel: [40832.015207] mpt2sas0: fault_state(0x265d)!
> 	Oct 21 08:57:11 grml kernel: [40832.015210] mpt2sas0: sending diag reset !!
>    


> Any suggestion on fixing that problem would be welcome. I can send more
> complete logs.
>    

Looks like a firmware bug - do you have the latest firmware?  Drive 
firmwares?  Anything in the drive error logs (using smartctl)?

If not, then try opening a bug on the kernel bugzilla - LSI engineers 
read that (and sometimes even fix things).

Otherwise, you could try replacing with a straight SATA contoller, if 
that box doesn't have a SAS backplane - I've not been to impressed by 
the quality of engineering for LSI contollers, and SATA-on-SAS in 
general hasn't been very reliable IMO.  Just go for a well supported 
SATA controller (e.g. Sil 3132 etc.).

Tim.


-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver
  2010-10-21 11:08 ` Tim Small
@ 2010-10-21 12:50   ` Louis-David Mitterrand
  2010-10-22  5:25   ` Stefan /*St0fF*/ Hübner
  1 sibling, 0 replies; 4+ messages in thread
From: Louis-David Mitterrand @ 2010-10-21 12:50 UTC (permalink / raw)
  To: linux-raid

On Thu, Oct 21, 2010 at 12:08:51PM +0100, Tim Small wrote:
> 
> >Any suggestion on fixing that problem would be welcome. I can send more
> >complete logs.
> 
> Looks like a firmware bug - do you have the latest firmware?  Drive
> firmwares?  Anything in the drive error logs (using smartctl)?
> 
> If not, then try opening a bug on the kernel bugzilla - LSI
> engineers read that (and sometimes even fix things).
> 
> Otherwise, you could try replacing with a straight SATA contoller,
> if that box doesn't have a SAS backplane - I've not been to
> impressed by the quality of engineering for LSI contollers, and
> SATA-on-SAS in general hasn't been very reliable IMO.  Just go for a
> well supported SATA controller (e.g. Sil 3132 etc.).

Hi Tim and thanks for your feedback.

I was eventually able to "fix" the problem. After very carefully running
lilo on each disk with "raid-extra-boot=/dev/sdX" (instead of "mbr") I
rebooted into my live system with a freshly compliled 2.6.36 and the
problem vanished. lilo now runs fine even my "raid-extra-boot=mbr" and
several reboots have not triggered any further issue.

The firmwares are all to their latest so I guess the mpt2sas kernel
driver must have been improved between 2.6.35 and 2.6.36.

For info here is part of the 2.6.36 boot log with a few ominous "!!" and
one "failure" but with no apparent consequence.

Cheers,

Oct 21 14:25:47 zenon kernel: mpt2sas version 06.100.00.00 loaded
Oct 21 14:25:47 zenon kernel: scsi0 : Fusion MPT SAS Host
Oct 21 14:25:47 zenon kernel: mpt2sas 0000:02:00.0: PCI INT A -> GSI 41 (level, 
low) -> IRQ 41
Oct 21 14:25:47 zenon kernel: mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (16426776 kB)
Oct 21 14:25:47 zenon kernel: mpt2sas0: IO-APIC enabled: IRQ 41
Oct 21 14:25:47 zenon kernel: mpt2sas0: iomem(0x00000000df2b0000), mapped(0xffffc90000060000), size(65536)
Oct 21 14:25:47 zenon kernel: mpt2sas0: ioport(0x000000000000fc00), size(256)
Oct 21 14:25:47 zenon kernel: mpt2sas0: sending diag reset !!
Oct 21 14:25:47 zenon kernel: mpt2sas0: diag reset: SUCCESS
Oct 21 14:25:47 zenon kernel: mpt2sas0: Allocated physical memory: size(1091 kB)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Current Controller Queue Depth(467), Max Controller Queue Depth(3439)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Scatter Gather Elements per IO(128)
Oct 21 14:25:47 zenon kernel: mpt2sas0: LSISAS2008: FWVersion(02.15.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Dell PERC H200 Integrated: Vendor(0x1000), Device(0x0072), SSVID(0x1028), SSDID(0x1F1E)
Oct 21 14:25:47 zenon kernel: mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
Oct 21 14:25:47 zenon kernel: mpt2sas0: sending port enable !!
Oct 21 14:25:47 zenon kernel: mpt2sas0: host_add: handle(0x0001), sas_addr(0x5842b2b05020c600), phys(8)
Oct 21 14:25:47 zenon kernel: mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:4546/_scsih_add_device()!
Oct 21 14:25:47 zenon kernel: mpt2sas0: port enable: SUCCESS
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: Direct-Access     ATA      WDC WD1002FAEX-0 1D05 PQ: 0 ANSI: 5
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: SATA: handle(0x0011), sas_addr(0x4433221107000000), phy(7), device_name(0x4ee25001c38204eb)
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: SATA: enclosure_logical_id(0x5842b2b05020c600), slot(0)
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
Oct 21 14:25:47 zenon kernel: scsi 0:0:0:0: qdepth(32), tagged(1), simple(1), ordered(0), scsi_level(6), cmd_que(1)

	etc..

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver
  2010-10-21 11:08 ` Tim Small
  2010-10-21 12:50   ` Louis-David Mitterrand
@ 2010-10-22  5:25   ` Stefan /*St0fF*/ Hübner
  1 sibling, 0 replies; 4+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2010-10-22  5:25 UTC (permalink / raw)
  To: Tim Small; +Cc: linux-raid@vger.kernel.org, linux-poweredge@dell.com

Am 21.10.2010 13:08, schrieb Tim Small:
> On 21/10/10 08:31, Louis-David Mitterrand wrote:
>> Hi,
>>
>> I am setting up a new Dell T610 server with 8 WD Black Caviar sata3 1TB
>> disks on a LSISAS2008 controller:
>>
>>     Oct 21 09:12:37 grml kernel: [   83.377388] mpt2sas0: LSISAS2008:
>> FWVersion(02.1
>> 5.63.00), ChipRevision(0x02), BiosVersion(07.01.09.00)
>>
>> My layout is as follows:
>>
>> - small un-encrypted raid1 boot partition on /dev/md0
>>
>> - dm-crypt main partition on /dev/md1 (actuallly /dev/mapper/cmd1)
>>
>> A recent grml64 is used to create the partitions, install the system and
>> run lilo.
>>
>> When running lilo I get these errors from the controller:
>>
>>     Oct 21 08:57:11 grml kernel: [40832.015207] mpt2sas0:
>> fault_state(0x265d)!
>>     Oct 21 08:57:11 grml kernel: [40832.015210] mpt2sas0: sending diag
>> reset !!
>>    
> 
> 
>> Any suggestion on fixing that problem would be welcome. I can send more
>> complete logs.
>>    
> 
> Looks like a firmware bug - do you have the latest firmware?  Drive
> firmwares?  Anything in the drive error logs (using smartctl)?
> 
> If not, then try opening a bug on the kernel bugzilla - LSI engineers
> read that (and sometimes even fix things).
> 
> Otherwise, you could try replacing with a straight SATA contoller, if
> that box doesn't have a SAS backplane - I've not been to impressed by
> the quality of engineering for LSI contollers, and SATA-on-SAS in
> general hasn't been very reliable IMO.  Just go for a well supported
> SATA controller (e.g. Sil 3132 etc.).
> 
> Tim.
> 
> 
I'll have to object on the matter of SATA-drives on SAS-controllers.  We
use 3ware/LSI 9650,9690 and 9750 controllers a lot and have rarely had
any problems.  The problems we encountered came with hardware failures.
On the LSISAS2008 it's good to hear that most problems got fixed with
later kernels.  As we are trying to get our lower-cost storage systems
running on this controller (onboard a supermicro MB), this shows which
way to go... Thank you for this information!

Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-10-22  5:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-21  7:31 problems with "LSISAS2008 6Gb/s SAS" kernel mpt2sas driver Louis-David Mitterrand
2010-10-21 11:08 ` Tim Small
2010-10-21 12:50   ` Louis-David Mitterrand
2010-10-22  5:25   ` Stefan /*St0fF*/ Hübner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).