* Trouble with StorageTek 2530 (SAS) and RDAC
@ 2010-01-26 22:09 Jakov Sosic
2010-01-27 2:23 ` Chandra Seetharaman
0 siblings, 1 reply; 3+ messages in thread
From: Jakov Sosic @ 2010-01-26 22:09 UTC (permalink / raw)
To: device-mapper development
Hi!
I have contacted list almost a half a year ago about this storage. I
still haven't figured out how to set it up... I have 3 nodes connected
to it, and 2 volumes shared across all 3 nodes. I'm using CentOS 5.4.
Here is my multipath.conf:
defaults {
udev_dir /dev
polling_interval 10
selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout /bin/true
path_checker readsector0
rr_min_io 100
max_fds 8192
rr_weight priorities
failback immediate
no_path_retry fail
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^sda"
}
multipaths {
multipath {
wwid 3600a0b80003abc5c000011504b52f919
alias sas-qd
}
multipath {
wwid 3600a0b80002fcd1800001a374b52fa1e
alias sas-data
}
}
devices {
device {
vendor "SUN"
product "LCSM100_S"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/sbin/mpath_prio_rdac /dev/%n"
features "0"
hardware_handler "1 rdac"
path_grouping_policy group_by_prio
failback immediate
path_checker rdac
rr_weight uniform
no_path_retry 300
rr_min_io 1000
}
}
And here is multipath -ll:
# multipath -ll sas-data
sas-data (3600a0b80002fcd1800001a374b52fa1e) dm-1 SUN,LCSM100_S
[size=2.7T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][enabled]
\_ 1:0:3:1 sde 8:64 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:1 sdc 8:32 [active][ghost]
On that volume, I have set up CLVM, and I have created one logical
clustered volume. If I try to format it with ext3, here is what I finish
with:
Jan 26 23:00:43 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360267648
Jan 26 23:00:43 node01 kernel: device-mapper: multipath: Failing path 8:64.
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360269696
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360527744
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360528768
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360529792
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 multipathd: 8:64: mark as failed
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360530816
Jan 26 23:00:44 node01 multipathd: sas-data: remaining active paths: 1
Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:44 node01 multipathd: dm-1: add map (uevent)
Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
1360531840
Jan 26 23:00:44 node01 multipathd: dm-1: devmap already registered
Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:44 node01 multipathd: sdd: remove path (uevent)
Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
1360789888
Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
1360790912
.
.
.
lot of similar messages
.
.
.
Jan 26 23:00:50 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:50 node01 kernel: end_request: I/O error, dev sde, sector
1358694784
Jan 26 23:00:50 node01 kernel: mptsas: ioc1: removing ssp device,
channel 0, id 4, phy 7
Jan 26 23:00:50 node01 kernel: scsi 1:0:1:0: rdac Dettached
Jan 26 23:00:50 node01 kernel: scsi 1:0:1:1: rdac Dettached
Jan 26 23:00:50 node01 kernel: sd 1:0:0:1: queueing MODE_SELECT command.
Jan 26 23:00:50 node01 kernel: device-mapper: multipath: Using scsi_dh
module scsi_dh_rdac for failover/failback and device management.
Jan 26 23:00:51 node01 kernel: sd 1:0:0:0: rdac Dettached
Jan 26 23:00:51 node01 multipathd: sas-qd: load table [0 204800
multipath 0 1 rdac 1 1 round-robin 0 1 1 8:16 1000]
Jan 26 23:00:51 node01 multipathd: sde: remove path (uevent)
Jan 26 23:00:51 node01 kernel: device-mapper: multipath: Using scsi_dh
module scsi_dh_rdac for failover/failback and device management.
Jan 26 23:00:52 node01 kernel: sd 1:0:0:1: rdac Dettached
Jan 26 23:00:52 node01 multipathd: sas-data: load table [0 5855165440
multipath 0 1 rdac 1 1 round-robin 0 1 1 8:32 1000]
Jan 26 23:00:52 node01 multipathd: dm-0: add map (uevent)
Jan 26 23:00:52 node01 multipathd: dm-0: devmap already registered
Jan 26 23:00:52 node01 multipathd: dm-1: add map (uevent)
Jan 26 23:00:52 node01 multipathd: dm-1: devmap already registered
Jan 26 23:00:52 node01 kernel: device-mapper: multipath: Cannot failover
device because scsi_dh_rdac was not loaded.
Any ideas?
--
| Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/ |
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Trouble with StorageTek 2530 (SAS) and RDAC 2010-01-26 22:09 Trouble with StorageTek 2530 (SAS) and RDAC Jakov Sosic @ 2010-01-27 2:23 ` Chandra Seetharaman 2010-01-29 0:24 ` Jakov Sosic 0 siblings, 1 reply; 3+ messages in thread From: Chandra Seetharaman @ 2010-01-27 2:23 UTC (permalink / raw) To: device-mapper development On Tue, 2010-01-26 at 23:09 +0100, Jakov Sosic wrote: > Hi! > > I have contacted list almost a half a year ago about this storage. I > still haven't figured out how to set it up... I have 3 nodes connected > to it, and 2 volumes shared across all 3 nodes. I'm using CentOS 5.4. > Here is my multipath.conf: > > > defaults { > udev_dir /dev > polling_interval 10 > selector "round-robin 0" > path_grouping_policy multibus > getuid_callout "/sbin/scsi_id -g -u -s /block/%n" > prio_callout /bin/true > path_checker readsector0 > rr_min_io 100 > max_fds 8192 > rr_weight priorities > failback immediate > no_path_retry fail > } > blacklist { > devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" > devnode "^hd[a-z]" > devnode "^sda" > } > multipaths { > multipath { > wwid 3600a0b80003abc5c000011504b52f919 > alias sas-qd > } > multipath { > wwid 3600a0b80002fcd1800001a374b52fa1e > alias sas-data > } > } > > devices { > device { > vendor "SUN" > product "LCSM100_S" > getuid_callout "/sbin/scsi_id -g -u -s /block/%n" > prio_callout "/sbin/mpath_prio_rdac /dev/%n" > features "0" > hardware_handler "1 rdac" > path_grouping_policy group_by_prio > failback immediate > path_checker rdac > rr_weight uniform > no_path_retry 300 > rr_min_io 1000 > } > } > > > And here is multipath -ll: > # multipath -ll sas-data > sas-data (3600a0b80002fcd1800001a374b52fa1e) dm-1 SUN,LCSM100_S > [size=2.7T][features=1 queue_if_no_path][hwhandler=1 rdac][rw] > \_ round-robin 0 [prio=100][enabled] > \_ 1:0:3:1 sde 8:64 [active][ready] > \_ round-robin 0 [prio=0][enabled] > \_ 1:0:0:1 sdc 8:32 [active][ghost] > > > On that volume, I have set up CLVM, and I have created one logical > clustered volume. If I try to format it with ext3, here is what I finish > with: > > > Jan 26 23:00:43 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): > Originator={PL}, Code={IO Executed}, SubCode(0x0000) > Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector > 1360267648 Is the message got from the same node as where you got the multipath -ll o/p from ? From these messages it looks like sde is 1:0:1:1, but from the multipath -ll o/p it looks like it is 1:0:3:1. > Jan 26 23:00:43 node01 kernel: device-mapper: multipath: Failing path 8:64. > Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 This return code means that the host is returning DID_NO_CONNECT. which means that the host is not able to connect to the end point. I would suggest you to go step-by-step. 1. Try to access both the paths of a lun (in all nodes). one should succeed and other should fail. 2. Try to access the multipath device and see if all is good. 3. Create a LVM on a single node (not clusters) and see if that works. 4. Create a clustered LVM on top of all the Active (non-ghost) sd devices and see if it works. When you send the results include o/p "dmsetup table" and "dmsetup ls" > Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector > 1360269696 > Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector > 1360527744 > Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector > 1360528768 > Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector > 1360529792 > Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:43 node01 multipathd: 8:64: mark as failed > Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector > 1360530816 > Jan 26 23:00:44 node01 multipathd: sas-data: remaining active paths: 1 > Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:44 node01 multipathd: dm-1: add map (uevent) > Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector > 1360531840 > Jan 26 23:00:44 node01 multipathd: dm-1: devmap already registered > Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:44 node01 multipathd: sdd: remove path (uevent) > Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector > 1360789888 > Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector > 1360790912 > . > . > . > lot of similar messages > . > . > . > > Jan 26 23:00:50 node01 kernel: sd 1:0:1:1: SCSI error: return code = > 0x00010000 > Jan 26 23:00:50 node01 kernel: end_request: I/O error, dev sde, sector > 1358694784 > Jan 26 23:00:50 node01 kernel: mptsas: ioc1: removing ssp device, > channel 0, id 4, phy 7 > Jan 26 23:00:50 node01 kernel: scsi 1:0:1:0: rdac Dettached > Jan 26 23:00:50 node01 kernel: scsi 1:0:1:1: rdac Dettached > Jan 26 23:00:50 node01 kernel: sd 1:0:0:1: queueing MODE_SELECT command. > Jan 26 23:00:50 node01 kernel: device-mapper: multipath: Using scsi_dh > module scsi_dh_rdac for failover/failback and device management. > Jan 26 23:00:51 node01 kernel: sd 1:0:0:0: rdac Dettached > Jan 26 23:00:51 node01 multipathd: sas-qd: load table [0 204800 > multipath 0 1 rdac 1 1 round-robin 0 1 1 8:16 1000] > Jan 26 23:00:51 node01 multipathd: sde: remove path (uevent) > Jan 26 23:00:51 node01 kernel: device-mapper: multipath: Using scsi_dh > module scsi_dh_rdac for failover/failback and device management. > Jan 26 23:00:52 node01 kernel: sd 1:0:0:1: rdac Dettached > Jan 26 23:00:52 node01 multipathd: sas-data: load table [0 5855165440 > multipath 0 1 rdac 1 1 round-robin 0 1 1 8:32 1000] > Jan 26 23:00:52 node01 multipathd: dm-0: add map (uevent) > Jan 26 23:00:52 node01 multipathd: dm-0: devmap already registered > Jan 26 23:00:52 node01 multipathd: dm-1: add map (uevent) > Jan 26 23:00:52 node01 multipathd: dm-1: devmap already registered > Jan 26 23:00:52 node01 kernel: device-mapper: multipath: Cannot failover > device because scsi_dh_rdac was not loaded. > > > Any ideas? > > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Trouble with StorageTek 2530 (SAS) and RDAC 2010-01-27 2:23 ` Chandra Seetharaman @ 2010-01-29 0:24 ` Jakov Sosic 0 siblings, 0 replies; 3+ messages in thread From: Jakov Sosic @ 2010-01-29 0:24 UTC (permalink / raw) To: sekharan, device-mapper development On 01/27/2010 03:23 AM, Chandra Seetharaman wrote: > This return code means that the host is returning DID_NO_CONNECT. which > means that the host is not able to connect to the end point. > > I would suggest you to go step-by-step. > 1. Try to access both the paths of a lun (in all nodes). > one should succeed and other should fail. > 2. Try to access the multipath device and see if all is good. > 3. Create a LVM on a single node (not clusters) and see if that works. > 4. Create a clustered LVM on top of all the Active (non-ghost) sd > devices and see if it works. > > When you send the results include o/p "dmsetup table" and "dmsetup ls" Thank you! I've solved the multipath problems with new kernel I built with my device added to scsi_dh_rdac.c! I've added the "SUN" "LCMS100_S", just as few months back Charlie Brady suggested to me! That was the solution for the multipath problems. Now multipath is able to do it's own part. But, after the failover, secondary path works for just a bit, and then hangs... When I disconnect active SAS cable from the server, multipath and scsi_dh_rdac do their thing, but if I have active read/write processes (like copying one file over on the volume mounted from storage to the exact same partition for example), everything hangs few seconds after multipath failover. Very strange behaviour indeed. This is what happens now: Jan 28 20:26:12 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) Jan 28 20:26:12 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) Jan 28 20:26:12 node01 kernel: sd 1:0:0:1: SCSI error: return code = 0x00010000 Jan 28 20:26:12 node01 kernel: end_request: I/O error, dev sdc, sector 7012168 Jan 28 20:26:12 node01 kernel: device-mapper: multipath: Failing path 8:32. Jan 28 20:26:12 node01 kernel: sd 1:0:0:1: SCSI error: return code = 0x00010000 Jan 28 20:26:12 node01 kernel: end_request: I/O error, dev sdc, sector 7012424 So, multipath activated... Lots of similar scsi I/O error messages follow, and in between I see this: Jan 28 20:26:12 node01 multipathd: dm-1: add map (uevent) Jan 28 20:26:12 node01 multipathd: dm-1: devmap already registered Jan 28 20:26:12 node01 multipathd: 8:32: mark as failed Jan 28 20:26:12 node01 multipathd: sas-data: remaining active paths: 1 Jan 28 20:26:12 node01 multipathd: sdb: remove path (uevent) and then Jan 28 20:26:13 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) Jan 28 20:26:13 node01 last message repeated 61 times Jan 28 20:26:18 node01 multipathd: sas-qd: load table [0 204800 multipath 0 1 rdac 1 1 round-robin 0 1 1 8:80 3000] Jan 28 20:26:18 node01 multipathd: sdc: remove path (uevent) Jan 28 20:26:18 node01 multipathd: sas-data: load table [0 3774873600 multipath 0 1 rdac 1 1 round-robin 0 1 1 8:96 1000] Jan 28 20:26:18 node01 multipathd: sdd: remove path (uevent) Jan 28 20:26:18 node01 kernel: mptsas: ioc1: removing ssp device, channel 0, id 1, phy 3 Jan 28 20:26:18 node01 multipathd: sas-os: load table [0 2080291840 multipath 0 1 rdac 1 1 round-robin 0 1 1 8:112 3000] Jan 28 20:26:18 node01 multipathd: sde: remove path (uevent) Jan 28 20:26:18 node01 kernel: scsi 1:0:0:0: rdac Dettached Jan 28 20:26:19 node01 multipathd: sde: spurious uevent, path not in pathvec Jan 28 20:26:19 node01 kernel: scsi 1:0:0:1: rdac Dettached Jan 28 20:26:19 node01 multipathd: uevent trigger error Jan 28 20:26:19 node01 kernel: scsi 1:0:0:2: rdac Dettached Jan 28 20:26:19 node01 multipathd: dm-0: add map (uevent) Jan 28 20:26:19 node01 kernel: sd 1:0:3:1: queueing MODE_SELECT command. Jan 28 20:26:19 node01 multipathd: dm-0: devmap already registered Jan 28 20:26:19 node01 kernel: device-mapper: multipath: Using scsi_dh module scsi_dh_rdac for failover/failback and device management. Jan 28 20:26:19 node01 multipathd: dm-1: add map (uevent) Jan 28 20:26:19 node01 multipathd: dm-1: devmap already registered Jan 28 20:26:19 node01 multipathd: dm-2: add map (uevent) Jan 28 20:26:19 node01 kernel: scsi 1:0:0:1: rejecting I/O to dead device Jan 28 20:26:19 node01 multipathd: dm-2: devmap already registered Jan 28 20:26:19 node01 kernel: device-mapper: multipath: Using scsi_dh module scsi_dh_rdac for failover/failback and device management. Jan 28 20:26:19 node01 kernel: device-mapper: multipath: Using scsi_dh module scsi_dh_rdac for failover/failback and device management. Jan 28 20:26:20 node01 multipathd: 8:96: reinstated Jan 28 20:27:08 node01 multipathd: dm-1: add map (uevent) Jan 28 20:27:08 node01 multipathd: dm-1: devmap already registered Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29045144 Jan 28 20:27:08 node01 kernel: device-mapper: multipath: Failing path 8:96. Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29089224 Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29090248 Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29091272 Jan 28 20:27:08 node01 multipathd: 8:96: mark as failed Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 multipathd: sas-data: Entering recovery mode: max_retries=300 Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29092296 Jan 28 20:27:08 node01 multipathd: sas-data: remaining active paths: 0 Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 multipathd: sdf: remove path (uevent) Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29093320 Jan 28 20:27:08 node01 multipathd: sas-qd: stop event checker thread Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 multipathd: sdg: remove path (uevent) Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29094344 Jan 28 20:27:08 node01 multipathd: sas-data: map in use Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 multipathd: sas-data: can't flush Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29095368 Jan 28 20:27:08 node01 multipathd: sdh: remove path (uevent) Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 multipathd: sas-os: stop event checker thread Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29096400 Jan 28 20:27:08 node01 multipathd: sdi: remove path (uevent) Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) Jan 28 20:27:08 node01 multipathd: sdi: spurious uevent, path not in pathvec Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) Jan 28 20:27:08 node01 multipathd: uevent trigger error Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) Jan 28 20:27:08 node01 last message repeated 60 times Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector 29097424 Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code = 0x00010000 lots of SCSI errors... Jan 28 20:27:14 node01 kernel: mptsas: ioc1: removing ssp device, channel 0, id 4, phy 7 Jan 28 20:27:14 node01 kernel: scsi 1:0:3:0: rdac Dettached Jan 28 20:27:14 node01 kernel: scsi 1:0:3:1: rdac Dettached Jan 28 20:27:14 node01 kernel: scsi 1:0:3:2: rdac Dettached Jan 28 20:27:14 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device Jan 28 20:28:18 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device Jan 28 20:28:18 node01 multipathd: sdg: rdac checker reports path is down Jan 28 20:29:29 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device Jan 28 20:29:29 node01 multipathd: sdg: rdac checker reports path is down Jan 28 20:30:40 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device Jan 28 20:30:40 node01 multipathd: sdg: rdac checker reports path is down And that's it... all path's lost. Node is still alive, I can access it, read from it, write to it, but commands like "multipath -ll" just hang forever... And if I try to restart the server, it hangs too. I do use CLVM partition, but I'm willing to try going on raw SAS volume, if you think that would be solution. And about your suggestions: 1. Try to access both the paths of a lun (in all nodes). one should succeed and other should fail. This works OK. No problems noticed. 2. Try to access the multipath device and see if all is good. This works too, if I don't disconnect one of the two cables :) 3. Create a LVM on a single node (not clusters) and see if that works. 4. Create a clustered LVM on top of all the Active (non-ghost) sd devices and see if it works. 3 & 4 I did not try. Problem is that after I get errors, I loose all the volumes from the nodes. It is ok to loose one path, but on secondary path, I get something like # # # # (failed)(failed) in multipath -ll output... Also, all other volumes are simply lost, there are no devices present. It seems to me like the controller itself, or maybe mptsas driver goes berzerk in the process. Any ideas? :) ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-01-29 0:24 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-01-26 22:09 Trouble with StorageTek 2530 (SAS) and RDAC Jakov Sosic 2010-01-27 2:23 ` Chandra Seetharaman 2010-01-29 0:24 ` Jakov Sosic
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.