* megaraid Error 40005 on cluster
@ 2005-05-19 14:11 pg
2005-05-19 14:23 ` Matt Domsch
0 siblings, 1 reply; 3+ messages in thread
From: pg @ 2005-05-19 14:11 UTC (permalink / raw)
To: linux-scsi
I exerienced the following error on a RedHat cluster configuration with Dell hardware (Perc 3/DC controller and PowerVault 220 disk array).
When the error occurs the cluster manager shutdown the cluster node, but the filesystem is corruped and the other node cannot mount it until a manual fsck.
Any idea?
SCSI and system configuration
--------------------------------
Redhat AS 2.1 + Cluster Manager
DELL PowerEdge 2650 with PERC 3/DC
DELL PowerVault 220s - cluster configuration
# uname -a
Linux myHost 2.4.9-e.40smp #1 SMP Thu Apr 8 16:53:29 EDT 2004 i686 unknown
# cat /etc/modules.conf
options scsi_mod max_scsi_luns=255
alias scsi_hostadapter aacraid
alias scsi_hostadapter1 megaraid_2009
....
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: DELL Model: PERCRAID Mirror Rev: V1.0
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: MegaRAID Model: LD 0 RAID1 34G Rev: 1.92
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: MegaRAID Model: LD 1 RAID5 69G Rev: 1.92
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi1 Channel: 04 Id: 07 Lun: 00
Vendor: DELL Model: PERC 3/DC Rev: 1.92
Type: Processor ANSI SCSI revision: 02
Host: scsi1 Channel: 04 Id: 15 Lun: 00
Vendor: DELL Model: PV22XS Rev: E.14
Type: Processor ANSI SCSI revision: 03
# cat /proc/scsi/megaraid/1
LSI Logic MegaRAID 1.92 254 commands 16 targs 5 chans 7 luns
# cat /proc/megaraid/hba1/config
v2.00.9 (Release Date: Thu Sep 4 17:49:42 EDT 2003)
PERC 3/DC
Controller Type: 438/466/467/471/493/518/520/531/532
Controller Supports 40 Logical Drives
Controller capable of 64-bit memory addressing
Controller using 64-bit memory addressing
Base = f9030000, Irq = 16, Logical Drives = 2, Channels = 2
Version =1.92:3.31, DRAM = 128Mb
Controller Queue Depth = 254, Driver Queue Depth = 126
support_ext_cdb = 1
support_random_del = 1
boot_ldrv_enabled = 1
boot_ldrv = 0
boot_pdrv_enabled = 0
boot_pdrv_ch = 0
boot_pdrv_tgt = 0
quiescent = 0
has_cluster = 1
Module Parameters:
max_cmd_per_lun = 63
max_sectors_per_io = 128
# cat /proc/megaraid/hba1/diskdrives-ch0
Channel: 0 Id: 0 State: Online.
Vendor: FUJITSU Model: MAP3367NC Rev: 5605
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 1 State: Online.
Vendor: FUJITSU Model: MAP3367NC Rev: 5605
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 2 State: Hot spare.
Vendor: FUJITSU Model: MAP3367NC Rev: 5605
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 3 State: Online.
Vendor: FUJITSU Model: MAP3367NC Rev: 5605
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 4 State: Online.
Vendor: FUJITSU Model: MAP3367NC Rev: 5605
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 5 State: Online.
Vendor: FUJITSU Model: MAP3367NC Rev: 5605
Type: Direct-Access ANSI SCSI revision: 03
# cat /proc/megaraid/hba1/raiddrives-0-9
Logical drive: 0:, state: optimal
Span depth: 1, RAID level: 1, Stripe size: 64, Row size: 2
Read Policy: No read ahead, Write Policy: Write thru, Cache Policy: Direct IO
Logical drive: 1:, state: optimal
Span depth: 1, RAID level: 5, Stripe size: 64, Row size: 3
Read Policy: No read ahead, Write Policy: Write thru, Cache Policy: Direct IO
------------------------------------
Error report
------------------------------------
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 2290456
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 2290464
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 11488
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 11496
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 11504
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 528528
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 2283712
May 13 05:31:27 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:27 clu2a kernel: I/O error: dev 08:21, sector 2283720
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:21, sector 2283728
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:21, sector 2283736
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:21, sector 2281160
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:21, sector 2281168
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:21, sector 2281176
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:21, sector 2281184
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:22, sector 1052480
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:28 clu2a kernel: I/O error: dev 08:22, sector 2363240
May 13 05:31:28 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:22, sector 266776
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:22, sector 266776
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:22, sector 3673920
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:22, sector 3670072
May 13 05:31:29 clu2a kernel: EXT3-fs error (device sd(8,34)): ext3_get_inode_loc: unable to read inode block - inode=216936, block=458759
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:22, sector 0
May 13 05:31:29 clu2a kernel: EXT3-fs error (device sd(8,34)) in ext3_reserve_inode_write: IO failure
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:22, sector 0
May 13 05:31:29 clu2a kernel: EXT3-fs error (device sd(8,34)) in ext3_new_inode: IO failure
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:22, sector 0
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:11, sector 8
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 0 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:11, sector 8
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:29 clu2a kernel: I/O error: dev 08:21, sector 3670080
May 13 05:31:29 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:30 clu2a kernel: I/O error: dev 08:22, sector 2359352
May 13 05:31:30 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:30 clu2a kernel: I/O error: dev 08:21, sector 3670064
May 13 05:31:30 clu2a kernel: SCSI disk error : host 1 channel 0 id 1 lun 0 return code = 40005
May 13 05:31:30 clu2a kernel: I/O error: dev 08:21, sector 3674152
[...]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: megaraid Error 40005 on cluster
2005-05-19 14:11 megaraid Error 40005 on cluster pg
@ 2005-05-19 14:23 ` Matt Domsch
2005-05-19 19:39 ` Pg
0 siblings, 1 reply; 3+ messages in thread
From: Matt Domsch @ 2005-05-19 14:23 UTC (permalink / raw)
To: pg@atlavia.it; +Cc: linux-scsi
On Thu, May 19, 2005 at 02:11:43PM +0000, pg@atlavia.it wrote:
> I exerienced the following error on a RedHat cluster configuration
> with Dell hardware (Perc 3/DC controller and PowerVault 220 disk
> array). When the error occurs the cluster manager shutdown the
> cluster node, but the filesystem is corruped and the other node
> cannot mount it until a manual fsck.
>
> Any idea?
The firmware on the PERC 3/DC is not multi-initiator cluster-capable
under Linux. For this reason, neither Dell nor Red Hat recommend
trying to create a HA shared storage cluster with this configuration.
Even with write cache disabled, the lock sectors used by the cluster
manager don't manage to stay coherent.
I understand that newest versions of the Red Hat Cluster Suite may no
longer use lock sectors on the disk as the I/O fencing mechanism,
which may then enable such configurations. But neither Dell nor Red
Hat have done any testing with the hardware config you've got.
The price is right, until you actually need your data to be highly
available and it crashes.
Thanks,
Matt
--
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: megaraid Error 40005 on cluster
2005-05-19 14:23 ` Matt Domsch
@ 2005-05-19 19:39 ` Pg
0 siblings, 0 replies; 3+ messages in thread
From: Pg @ 2005-05-19 19:39 UTC (permalink / raw)
To: linux-scsi
Matt Domsch ha scritto:
>On Thu, May 19, 2005 at 02:11:43PM +0000, pg@atlavia.it wrote:
>
>
>>I exerienced the following error on a RedHat cluster configuration
>>with Dell hardware (Perc 3/DC controller and PowerVault 220 disk
>>array). When the error occurs the cluster manager shutdown the
>>cluster node, but the filesystem is corruped and the other node
>>cannot mount it until a manual fsck.
>>
>>Any idea?
>>
>>
>
>The firmware on the PERC 3/DC is not multi-initiator cluster-capable
>under Linux. For this reason, neither Dell nor Red Hat recommend
>trying to create a HA shared storage cluster with this configuration.
>Even with write cache disabled, the lock sectors used by the cluster
>manager don't manage to stay coherent.
>
>I understand that newest versions of the Red Hat Cluster Suite may no
>longer use lock sectors on the disk as the I/O fencing mechanism,
>which may then enable such configurations. But neither Dell nor Red
>Hat have done any testing with the hardware config you've got.
>
>The price is right, until you actually need your data to be highly
>available and it crashes.
>
>Thanks,
>Matt
>
>
>
As I'm not so expert in HA clusters I got an hw ans sw solution
"suggested" by DELL and my hw/sw configuration, that is working quiet
well since a couple of years. May be i've been luky.
As racommended I don't mount the same filesystem on both nodes of the
cluster at the same time; every service has its own filesystem and a
service is active on a single node.
The uniqe shared partition is the quorum, that is on a RAID-1 volume.
The other partitions are on a single RAID-5 volume: do you think that to
make a seaprate volume for each partition could help?
Thanks
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-05-19 19:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-19 14:11 megaraid Error 40005 on cluster pg
2005-05-19 14:23 ` Matt Domsch
2005-05-19 19:39 ` Pg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox