* Clariion CX600 automatic failback support
@ 2005-10-26 19:37 Phil Lowden (plowden)
2005-10-31 13:58 ` Brian Long
0 siblings, 1 reply; 20+ messages in thread
From: Phil Lowden (plowden) @ 2005-10-26 19:37 UTC (permalink / raw)
To: dm-devel
I'm using LVM2 with device-mapper-multipath 0.4.5 on GA RHEL 4 update 2
release, kernel version 2.6.9-22.ELsmp. Storage is 4 Clariion CX600
LUNs,
2 with a primary path on SP A and 2 on SP B. HBAs are QLA3240 with
firmware 3.03.15 IPX and the RedHat-distributed driver version
8.01.00b5-rh2.
When I disrupt one path by disabling a host or SP switch port,
failover works great but failback doesn't happen automatically
with the current config (below). By this I mean
when the connection to e.g. SP B is restored, all
4 LUNs stay trespassed to SP A. Is this by design?
Or is there support for automatic failback, i.e.
I/O is paused and a trespass is issued to restore
the 2 SP B LUNs to their primary paths?
Of course manual failback is possible, but without
quiescing I/O I found I was able to munge my LVM2
objects up quite nicely. Manual failback with quiesced
I/O is fine.
Here's the /etc/multipath.conf:
defaults {
multipath_tool "/sbin/multipath -v0"
udev_dir /dev
polling_interval 10
default_selector "round-robin 0"
default_path_grouping_policy multibus
default_getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
default_prio_callout "/bin/true"
default_features "0"
rr_wmin_io 100
failback immediate
}
devices {
device {
vendor "DGC "
product "RAID 5 "
#path_grouping_policy group_by_serial
path_grouping_policy failover
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
path_checker emc_clariion
path_selector "round-robin 0"
features "0"
hardware_handler "1 emc"
}
}
Here's the multipath -l output (with both paths up)
and a few commands to help make sense of it:
[root@sandbox ~]# multipath -l
3600601604b600d00743e69d8862fda11
[size=1 GB][features="0"][hwhandler="1 emc"]
\_ round-robin 0 [active]
\_ 1:0:0:3 sdd 8:48 [active]
\_ round-robin 0 [enabled]
\_ 0:0:0:3 sdh 8:112 [active]
3600601604b600d00773e69d8862fda11
[size=4 GB][features="0"][hwhandler="1 emc"]
\_ round-robin 0 [active]
\_ 1:0:0:0 sda 8:0 [active]
\_ round-robin 0 [enabled]
\_ 0:0:0:0 sde 8:64 [active]
3600601604b600d00763e69d8862fda11
[size=3 GB][features="0"][hwhandler="1 emc"]
\_ round-robin 0 [active]
\_ 1:0:0:1 sdb 8:16 [active]
\_ round-robin 0 [enabled]
\_ 0:0:0:1 sdf 8:80 [active]
3600601604b600d00753e69d8862fda11
[size=2 GB][features="0"][hwhandler="1 emc"]
\_ round-robin 0 [active]
\_ 1:0:0:2 sdc 8:32 [active]
\_ round-robin 0 [enabled]
\_ 0:0:0:2 sdg 8:96 [active]
# ls -l /dev/mpath
total 0
lrwxrwxrwx 1 root root 7 Oct 21 12:56 3600601604b600d00743e69d8862fda11
-> ../dm-3
lrwxrwxrwx 1 root root 7 Oct 21 12:56 3600601604b600d00753e69d8862fda11
-> ../dm-2
lrwxrwxrwx 1 root root 7 Oct 21 12:56 3600601604b600d00763e69d8862fda11
-> ../dm-1
lrwxrwxrwx 1 root root 7 Oct 21 12:56 3600601604b600d00773e69d8862fda11
-> ../dm-0
# pvs
PV VG Fmt Attr PSize PFree
/dev/dm-0 vgtest2 lvm2 a- 4.00G 3.90G
/dev/dm-1 vgtest2 lvm2 a- 3.00G 2.90G
/dev/dm-2 vgtest2 lvm2 a- 2.00G 1.90G
/dev/dm-3 vgtest2 lvm2 a- 1020.00M 920.00M
Thanks in advance,
Phil Lowden
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Clariion CX600 automatic failback support
2005-10-26 19:37 Clariion CX600 automatic failback support Phil Lowden (plowden)
@ 2005-10-31 13:58 ` Brian Long
2005-10-31 14:02 ` Bernd Zeimetz
2005-10-31 14:07 ` Christophe Varoqui
0 siblings, 2 replies; 20+ messages in thread
From: Brian Long @ 2005-10-31 13:58 UTC (permalink / raw)
To: device-mapper development
On Wed, 2005-10-26 at 15:37 -0400, Phil Lowden (plowden) wrote:
> I'm using LVM2 with device-mapper-multipath 0.4.5 on GA RHEL 4 update 2
> release, kernel version 2.6.9-22.ELsmp. Storage is 4 Clariion CX600
> LUNs,
> 2 with a primary path on SP A and 2 on SP B. HBAs are QLA3240 with
> firmware 3.03.15 IPX and the RedHat-distributed driver version
> 8.01.00b5-rh2.
>
> When I disrupt one path by disabling a host or SP switch port,
> failover works great but failback doesn't happen automatically
> with the current config (below). By this I mean
> when the connection to e.g. SP B is restored, all
> 4 LUNs stay trespassed to SP A. Is this by design?
> Or is there support for automatic failback, i.e.
> I/O is paused and a trespass is issued to restore
> the 2 SP B LUNs to their primary paths?
>
> Of course manual failback is possible, but without
> quiescing I/O I found I was able to munge my LVM2
> objects up quite nicely. Manual failback with quiesced
> I/O is fine.
No one has automatic failback working on Clarrion storage?
/Brian/
--
Brian Long | | |
IT Data Center Systems | .|||. .|||.
Cisco Linux Developer | ..:|||||||:...:|||||||:..
Phone: (919) 392-7363 | C i s c o S y s t e m s
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Clariion CX600 automatic failback support
2005-10-31 13:58 ` Brian Long
@ 2005-10-31 14:02 ` Bernd Zeimetz
2005-10-31 14:07 ` Christophe Varoqui
1 sibling, 0 replies; 20+ messages in thread
From: Bernd Zeimetz @ 2005-10-31 14:02 UTC (permalink / raw)
To: device-mapper development
Hi,
> No one has automatic failback working on Clarrion storage?
failback seems to work with the latest multipath-tools, but since evms (with
lvm-plugin..) doesn't recognize the multipath-devices anymore (it uses only
the blank sd* devices) I never had a real chance to test it - see my mail to
the lsit from saturday.
Bernd
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clariion CX600 automatic failback support
2005-10-31 13:58 ` Brian Long
2005-10-31 14:02 ` Bernd Zeimetz
@ 2005-10-31 14:07 ` Christophe Varoqui
2005-10-31 15:51 ` Brian Long
1 sibling, 1 reply; 20+ messages in thread
From: Christophe Varoqui @ 2005-10-31 14:07 UTC (permalink / raw)
To: device-mapper development
>
> No one has automatic failback working on Clarrion storage?
>
Do you reproduce with upstream ?
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clariion CX600 automatic failback support
2005-10-31 14:07 ` Christophe Varoqui
@ 2005-10-31 15:51 ` Brian Long
2005-10-31 16:06 ` Christophe Varoqui
0 siblings, 1 reply; 20+ messages in thread
From: Brian Long @ 2005-10-31 15:51 UTC (permalink / raw)
To: device-mapper development
On Mon, 2005-10-31 at 15:07 +0100, Christophe Varoqui wrote:
> >
> > No one has automatic failback working on Clarrion storage?
> >
> Do you reproduce with upstream ?
No. I imagine we will log a support request with Red Hat to get it
working on RHEL 4 U2. If they want to give us an updated device-mapper-
multipath RPM to try to reproduce with upstream, that's up to them.
Phil and I were just wondering if there is a specific configuration
parameter to get the trespass to happen again once all paths are back
online.
/Brian/
--
Brian Long | | |
IT Data Center Systems | .|||. .|||.
Cisco Linux Developer | ..:|||||||:...:|||||||:..
Phone: (919) 392-7363 | C i s c o S y s t e m s
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Clariion CX600 automatic failback support
2005-10-31 15:51 ` Brian Long
@ 2005-10-31 16:06 ` Christophe Varoqui
0 siblings, 0 replies; 20+ messages in thread
From: Christophe Varoqui @ 2005-10-31 16:06 UTC (permalink / raw)
To: device-mapper development
On Mon, Oct 31, 2005 at 10:51:38AM -0500, Brian Long wrote:
> On Mon, 2005-10-31 at 15:07 +0100, Christophe Varoqui wrote:
> > >
> > > No one has automatic failback working on Clarrion storage?
> > >
> > Do you reproduce with upstream ?
>
> No. I imagine we will log a support request with Red Hat to get it
> working on RHEL 4 U2. If they want to give us an updated device-mapper-
> multipath RPM to try to reproduce with upstream, that's up to them.
> Phil and I were just wondering if there is a specific configuration
> parameter to get the trespass to happen again once all paths are back
> online.
>
Every check interval, the daemon re-evaluates path priorities (through
external prioritizers), then re-compute path-groups priorities. If PG
priority order change, the highest-prio PG might be activated :
- immediatly, if "failback" is set to immediate
- never, if "failback" is set to manual
- in a number of prio-order stable consecutive checks, if failback
set to #
Thus, failback/no-failback is a property of the external prioritizer, plus
a framework policy switch in the name of the "failback" config file keyword.
Well, at least it is the current design upstream.
You'll have to dig the changelog to see if RedHat packages all the
necassary bits.
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Clariion CX600 automatic failback support
@ 2005-11-02 21:32 goggin, edward
2005-11-03 13:58 ` Christophe Varoqui
2005-11-07 19:25 ` Benjamin Marzinski
0 siblings, 2 replies; 20+ messages in thread
From: goggin, edward @ 2005-11-02 21:32 UTC (permalink / raw)
To: 'dm-devel@redhat.com'
On Date: Mon, 31 Oct 2005 08:58:06 -0500
Brian Long <brilong@cisco.com> wrote
>
> On Wed, 2005-10-26 at 15:37 -0400, Phil Lowden (plowden) wrote:
> > I'm using LVM2 with device-mapper-multipath 0.4.5 on GA
> RHEL 4 update 2
> > release, kernel version 2.6.9-22.ELsmp. Storage is 4 Clariion CX600
> > LUNs,
> > 2 with a primary path on SP A and 2 on SP B. HBAs are QLA3240 with
> > firmware 3.03.15 IPX and the RedHat-distributed driver version
> > 8.01.00b5-rh2.
> >
> > When I disrupt one path by disabling a host or SP switch port,
> > failover works great but failback doesn't happen automatically
> > with the current config (below). By this I mean
> > when the connection to e.g. SP B is restored, all
> > 4 LUNs stay trespassed to SP A. Is this by design?
> > Or is there support for automatic failback, i.e.
> > I/O is paused and a trespass is issued to restore
> > the 2 SP B LUNs to their primary paths?
> >
> > Of course manual failback is possible, but without
> > quiescing I/O I found I was able to munge my LVM2
> > objects up quite nicely. Manual failback with quiesced
> > I/O is fine.
>
> No one has automatic failback working on Clarrion storage?
Unfortunately, this doesn't just work "out of the box" yet.
You need to setup a default failback setting of immediate in the
/etc/multipath.conf configuration file, ...
defaults {
failback immediate
}
... or selectively set the failback setting to immediate in the
multipath config file for each EMC CLARiiON logical unit
multipaths {
multipath {
wwid 360061 ...
failback immediate
}
...
}
Setting this attribute value for the entire EMC CLARiiON class of
devices via the multipath config file "device" attribute doesn't
work because of a bug (mpp->hwe field is never set for multipathd)
in the multipathd code which is patched to work by the single line
patch below. Due to the same bug, a patch I submitted many weeks
ago to libmultipath/hwtable.c to setup the default failback policy
for EMC CLARiiON to immediate also does not work.
Once the distributors pick up both this single line change and the
change in hwtable.c, immediate path group failback for CLARiiON
will just work.
-------------------------------------------------------------------
diff --git a/multipathd/main.c b/multipathd/main.c
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -146,6 +146,7 @@ adopt_paths (struct vectors * vecs, stru
if (!strncmp(mpp->wwid, pp->wwid, WWID_SIZE)) {
condlog(4, "%s ownership set", pp->dev_t);
pp->mpp = mpp;
+ mpp->hwe = pp->hwe;
}
}
}
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Clariion CX600 automatic failback support
2005-11-02 21:32 goggin, edward
@ 2005-11-03 13:58 ` Christophe Varoqui
2005-11-07 19:25 ` Benjamin Marzinski
1 sibling, 0 replies; 20+ messages in thread
From: Christophe Varoqui @ 2005-11-03 13:58 UTC (permalink / raw)
To: device-mapper development
>
> Once the distributors pick up both this single line change and the
> change in hwtable.c, immediate path group failback for CLARiiON
> will just work.
>
> -------------------------------------------------------------------
> diff --git a/multipathd/main.c b/multipathd/main.c
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -146,6 +146,7 @@ adopt_paths (struct vectors * vecs, stru
> if (!strncmp(mpp->wwid, pp->wwid, WWID_SIZE)) {
> condlog(4, "%s ownership set", pp->dev_t);
> pp->mpp = mpp;
> + mpp->hwe = pp->hwe;
> }
> }
> }
>
Upstream has had a patch for that. It plug the mpp->hwe refresh in
setup_multipath().
Can you check it has no regression wrt your suggested fix ?
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Clariion CX600 automatic failback support
2005-11-02 21:32 goggin, edward
2005-11-03 13:58 ` Christophe Varoqui
@ 2005-11-07 19:25 ` Benjamin Marzinski
1 sibling, 0 replies; 20+ messages in thread
From: Benjamin Marzinski @ 2005-11-07 19:25 UTC (permalink / raw)
To: device-mapper development
On Wed, Nov 02, 2005 at 04:32:06PM -0500, goggin, edward wrote:
> On Date: Mon, 31 Oct 2005 08:58:06 -0500
> Brian Long <brilong@cisco.com> wrote
>
> >
> > On Wed, 2005-10-26 at 15:37 -0400, Phil Lowden (plowden) wrote:
> > > I'm using LVM2 with device-mapper-multipath 0.4.5 on GA
> > RHEL 4 update 2
> > > release, kernel version 2.6.9-22.ELsmp. Storage is 4 Clariion CX600
> > > LUNs,
> > > 2 with a primary path on SP A and 2 on SP B. HBAs are QLA3240 with
> > > firmware 3.03.15 IPX and the RedHat-distributed driver version
> > > 8.01.00b5-rh2.
> > >
> > > When I disrupt one path by disabling a host or SP switch port,
> > > failover works great but failback doesn't happen automatically
> > > with the current config (below). By this I mean
> > > when the connection to e.g. SP B is restored, all
> > > 4 LUNs stay trespassed to SP A. Is this by design?
> > > Or is there support for automatic failback, i.e.
> > > I/O is paused and a trespass is issued to restore
> > > the 2 SP B LUNs to their primary paths?
> > >
> > > Of course manual failback is possible, but without
> > > quiescing I/O I found I was able to munge my LVM2
> > > objects up quite nicely. Manual failback with quiesced
> > > I/O is fine.
> >
> > No one has automatic failback working on Clarrion storage?
>
> Unfortunately, this doesn't just work "out of the box" yet.
>
> You need to setup a default failback setting of immediate in the
> /etc/multipath.conf configuration file, ...
>
> defaults {
> failback immediate
> }
>
> ... or selectively set the failback setting to immediate in the
> multipath config file for each EMC CLARiiON logical unit
>
> multipaths {
> multipath {
> wwid 360061 ...
> failback immediate
> }
> ...
> }
>
> Setting this attribute value for the entire EMC CLARiiON class of
> devices via the multipath config file "device" attribute doesn't
> work because of a bug (mpp->hwe field is never set for multipathd)
> in the multipathd code which is patched to work by the single line
> patch below. Due to the same bug, a patch I submitted many weeks
> ago to libmultipath/hwtable.c to setup the default failback policy
> for EMC CLARiiON to immediate also does not work.
>
> Once the distributors pick up both this single line change and the
> change in hwtable.c, immediate path group failback for CLARiiON
> will just work.
These changes are both in the RHEL4 U3 code.
> -------------------------------------------------------------------
> diff --git a/multipathd/main.c b/multipathd/main.c
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -146,6 +146,7 @@ adopt_paths (struct vectors * vecs, stru
> if (!strncmp(mpp->wwid, pp->wwid, WWID_SIZE)) {
> condlog(4, "%s ownership set", pp->dev_t);
> pp->mpp = mpp;
> + mpp->hwe = pp->hwe;
> }
> }
> }
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Clariion CX600 automatic failback support
@ 2005-11-03 18:16 goggin, edward
2005-11-03 20:55 ` Brian Long
2005-11-04 8:19 ` Bernd Zeimetz
0 siblings, 2 replies; 20+ messages in thread
From: goggin, edward @ 2005-11-03 18:16 UTC (permalink / raw)
To: 'dm-devel@redhat.com'
On Date: Thu, 3 Nov 2005 14:58:36 +0100
Christophe Varoqui <christophe.varoqui@free.fr> wrote
> > Once the distributors pick up both this single line change and the
> > change in hwtable.c, immediate path group failback for CLARiiON
> > will just work.
> >
> > -------------------------------------------------------------------
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -146,6 +146,7 @@ adopt_paths (struct vectors * vecs, stru
> > if (!strncmp(mpp->wwid, pp->wwid, WWID_SIZE)) {
> > condlog(4, "%s ownership set", pp->dev_t);
> > pp->mpp = mpp;
> > + mpp->hwe = pp->hwe;
> > }
> > }
> > }
> >
> Upstream has had a patch for that. It plug the mpp->hwe refresh in
> setup_multipath().
>
> Can you check it has no regression wrt your suggested fix ?
>
> Regards,
> cvaroqui
Looks good, except the call to extract_hwe_from_path in setup_multipath
must be before the call to select_pgfailback.
^ permalink raw reply [flat|nested] 20+ messages in thread* RE: Clariion CX600 automatic failback support
2005-11-03 18:16 goggin, edward
@ 2005-11-03 20:55 ` Brian Long
2005-11-03 22:00 ` Bernd Zeimetz
2005-11-04 8:19 ` Bernd Zeimetz
1 sibling, 1 reply; 20+ messages in thread
From: Brian Long @ 2005-11-03 20:55 UTC (permalink / raw)
To: device-mapper development
On Thu, 2005-11-03 at 13:16 -0500, goggin, edward wrote:
> On Date: Thu, 3 Nov 2005 14:58:36 +0100
> Christophe Varoqui <christophe.varoqui@free.fr> wrote
>
> > > Once the distributors pick up both this single line change and the
> > > change in hwtable.c, immediate path group failback for CLARiiON
> > > will just work.
> > >
> > > -------------------------------------------------------------------
> > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > --- a/multipathd/main.c
> > > +++ b/multipathd/main.c
> > > @@ -146,6 +146,7 @@ adopt_paths (struct vectors * vecs, stru
> > > if (!strncmp(mpp->wwid, pp->wwid, WWID_SIZE)) {
> > > condlog(4, "%s ownership set", pp->dev_t);
> > > pp->mpp = mpp;
> > > + mpp->hwe = pp->hwe;
> > > }
> > > }
> > > }
> > >
> > Upstream has had a patch for that. It plug the mpp->hwe refresh in
> > setup_multipath().
> >
> > Can you check it has no regression wrt your suggested fix ?
> >
> > Regards,
> > cvaroqui
>
> Looks good, except the call to extract_hwe_from_path in setup_multipath
> must be before the call to select_pgfailback.
Ed,
So it sounds like I need Red Hat to issue me a hotfix for device-mapper-
multipath that includes the upstream patch or your patch for get
Clariion fallback working properly.
I'll ask Red Hat about this in my existing support case.
/Brian/
--
Brian Long | | |
IT Data Center Systems | .|||. .|||.
Cisco Linux Developer | ..:|||||||:...:|||||||:..
Phone: (919) 392-7363 | C i s c o S y s t e m s
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Clariion CX600 automatic failback support
2005-11-03 18:16 goggin, edward
2005-11-03 20:55 ` Brian Long
@ 2005-11-04 8:19 ` Bernd Zeimetz
2005-11-04 9:47 ` Christophe Varoqui
1 sibling, 1 reply; 20+ messages in thread
From: Bernd Zeimetz @ 2005-11-04 8:19 UTC (permalink / raw)
To: device-mapper development
Hi,
> Looks good, except the call to extract_hwe_from_path in setup_multipath
> must be before the call to select_pgfailback.
>
We have a ton of these log messages now.
From the last about 2 hours:
580x multipathd: 3600601f4a20c000099dc229956b0d711: switch to path group #1
714x multipathd: 3600601f4a20c000096dc229956b0d711: switch to path group #1
714x multipathd: 3600601f2a20c0000fb13caa656b0d711: switch to path group #1
45x kernel: device-mapper: dm-emc: emc_pg_init: sending switch-over command
It happens only for these 3 of 8 luns.
Kernel is 2.6.13.4, devmapper 1.01, latest multipath-tools from git, evms
2.5.3
zeus-1:/# cat /etc/multipath.conf
defaults {
failback immediate
}
Since we had a lot of trouble with our CX400 (Hardware problems, but they're
supposed to be fixed) I'm wondering what these messages mean, and if we still
have hardware/software bugs somewhere.
Any ideas?
Thanks for your help!
Bernd
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Clariion CX600 automatic failback support
2005-11-04 8:19 ` Bernd Zeimetz
@ 2005-11-04 9:47 ` Christophe Varoqui
0 siblings, 0 replies; 20+ messages in thread
From: Christophe Varoqui @ 2005-11-04 9:47 UTC (permalink / raw)
To: device-mapper development
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 1298 bytes --]
On Fri, Nov 04, 2005 at 09:19:06AM +0100, Bernd Zeimetz wrote:
> Hi,
>
> > Looks good, except the call to extract_hwe_from_path in setup_multipath
> > must be before the call to select_pgfailback.
> >
>
> We have a ton of these log messages now.
> >From the last about 2 hours:
>
> 580x multipathd: 3600601f4a20c000099dc229956b0d711: switch to path group #1
> 714x multipathd: 3600601f4a20c000096dc229956b0d711: switch to path group #1
> 714x multipathd: 3600601f2a20c0000fb13caa656b0d711: switch to path group #1
> 45x kernel: device-mapper: dm-emc: emc_pg_init: sending switch-over command
>
> It happens only for these 3 of 8 luns.
>
> Kernel is 2.6.13.4, devmapper 1.01, latest multipath-tools from git, evms
> 2.5.3
>
> zeus-1:/# cat /etc/multipath.conf
> defaults {
> failback immediate
> }
>
>
> Since we had a lot of trouble with our CX400 (Hardware problems, but they're
> supposed to be fixed) I'm wondering what these messages mean, and if we still
> have hardware/software bugs somewhere.
>
Yes, the test to determine if a switch is needed is sub-optimal :
The no activated PG case (all Enabled for exampleÃ) keeps triggering a switch.
I checked-in a patch to solve that issue.
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Clariion CX600 automatic failback support
@ 2005-11-04 19:36 goggin, edward
2005-11-04 20:55 ` Christophe Varoqui
0 siblings, 1 reply; 20+ messages in thread
From: goggin, edward @ 2005-11-04 19:36 UTC (permalink / raw)
To: 'dm-devel@redhat.com'
On Date: Fri, 4 Nov 2005 10:47:51 +0100
Christophe Varoqui <christophe.varoqui@free.fr> wrote
> On Fri, Nov 04, 2005 at 09:19:06AM +0100, Bernd Zeimetz wrote:
> > Hi,
> >
> > > Looks good, except the call to extract_hwe_from_path in
> setup_multipath
> > > must be before the call to select_pgfailback.
> > >
> >
> > We have a ton of these log messages now.
> > >From the last about 2 hours:
> >
> > 580x multipathd: 3600601f4a20c000099dc229956b0d711: switch
> to path group #1
> > 714x multipathd: 3600601f4a20c000096dc229956b0d711: switch
> to path group #1
> > 714x multipathd: 3600601f2a20c0000fb13caa656b0d711: switch
> to path group #1
> > 45x kernel: device-mapper: dm-emc: emc_pg_init: sending
> switch-over command
> >
> > It happens only for these 3 of 8 luns.
> >
> > Kernel is 2.6.13.4, devmapper 1.01, latest multipath-tools
> from git, evms
> > 2.5.3
> >
> > zeus-1:/# cat /etc/multipath.conf
> > defaults {
> > failback immediate
> > }
> >
> >
> > Since we had a lot of trouble with our CX400 (Hardware
> problems, but they're
> > supposed to be fixed) I'm wondering what these messages
> mean, and if we still
> > have hardware/software bugs somewhere.
> >
> Yes, the test to determine if a switch is needed is sub-optimal :
> The no activated PG case (all Enabled for exampleÃ) keeps
> triggering a switch.
> I checked-in a patch to solve that issue.
I've dealt with this problem by patching the device mapper
multipath driver in the kernel.
The patch below patches drivers/md/dm-mpath.c:multipath_status()
to return an active state for a path group which is either the
current path group or setup to be the next path group instead
of just returning an active state for a path group which is
currently active.
*** ../base/linux-2.6.14-rc4/drivers/md/dm-mpath.c Mon Oct 10 20:19:19
2005
--- drivers/md/dm-mpath.c Thu Nov 3 04:17:48 2005
***************
*** 1158,1164 ****
list_for_each_entry(pg, &m->priority_groups, list) {
if (pg->bypassed)
state = 'D'; /* Disabled */
! else if (pg == m->current_pg)
state = 'A'; /* Currently Active */
else
state = 'E'; /* Enabled */
--- 1158,1164 ----
list_for_each_entry(pg, &m->priority_groups, list) {
if (pg->bypassed)
state = 'D'; /* Disabled */
! else if ((pg == m->current_pg) || (pg ==
m->next_pg))
state = 'A'; /* Currently Active */
else
state = 'E'; /* Enabled */
^ permalink raw reply [flat|nested] 20+ messages in thread* RE: Clariion CX600 automatic failback support
2005-11-04 19:36 goggin, edward
@ 2005-11-04 20:55 ` Christophe Varoqui
2005-11-05 0:03 ` Bernd Zeimetz
0 siblings, 1 reply; 20+ messages in thread
From: Christophe Varoqui @ 2005-11-04 20:55 UTC (permalink / raw)
To: device-mapper development
> > Yes, the test to determine if a switch is needed is sub-optimal :
> > The no activated PG case (all Enabled for exampleÃ) keeps
> > triggering a switch.
> > I checked-in a patch to solve that issue.
>
> I've dealt with this problem by patching the device mapper
> multipath driver in the kernel.
>
> The patch below patches drivers/md/dm-mpath.c:multipath_status()
> to return an active state for a path group which is either the
> current path group or setup to be the next path group instead
> of just returning an active state for a path group which is
> currently active.
>
Right, it deserved to be done.
I'm inclined to keep the tools patch anyway : it wastes an "int" per
multipath but is brain-friendlier (as it avoids using mpp->nextpg to
represent the current setting *and* the best-rated setting).
Are interested parties ok with that ?
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clariion CX600 automatic failback support
2005-11-04 20:55 ` Christophe Varoqui
@ 2005-11-05 0:03 ` Bernd Zeimetz
2005-11-05 12:26 ` Christophe Varoqui
0 siblings, 1 reply; 20+ messages in thread
From: Bernd Zeimetz @ 2005-11-05 0:03 UTC (permalink / raw)
To: device-mapper development
Hi,
> I'm inclined to keep the tools patch anyway : it wastes an "int" per
> multipath but is brain-friendlier (as it avoids using mpp->nextpg to
> represent the current setting *and* the best-rated setting).
>
> Are interested parties ok with that ?
I'm ok with anything you prefer, as long as it works and it doesn't change
again with the next kernel/tools-release ;)
Thank you,
Bernd
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clariion CX600 automatic failback support
2005-11-05 0:03 ` Bernd Zeimetz
@ 2005-11-05 12:26 ` Christophe Varoqui
2005-11-10 19:24 ` Bernd Zeimetz
0 siblings, 1 reply; 20+ messages in thread
From: Christophe Varoqui @ 2005-11-05 12:26 UTC (permalink / raw)
To: device-mapper development
On sam, 2005-11-05 at 01:03 +0100, Bernd Zeimetz wrote:
> Hi,
>
> > I'm inclined to keep the tools patch anyway : it wastes an "int" per
> > multipath but is brain-friendlier (as it avoids using mpp->nextpg to
> > represent the current setting *and* the best-rated setting).
> >
> > Are interested parties ok with that ?
>
> I'm ok with anything you prefer, as long as it works and it doesn't change
> again with the next kernel/tools-release ;)
>
Want to write a regression test ?
Regards,
cvaroqui
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Clariion CX600 automatic failback support
2005-11-05 12:26 ` Christophe Varoqui
@ 2005-11-10 19:24 ` Bernd Zeimetz
0 siblings, 0 replies; 20+ messages in thread
From: Bernd Zeimetz @ 2005-11-10 19:24 UTC (permalink / raw)
To: device-mapper development
Hi,
> Want to write a regression test ?
Doubt that's really neccessary - at least not for us (which means I'll have to
work on the other waiting projects first before doing anything else here).
Due to the lack of the company where we've bought the machine (unfortunately
not directly from EMC) in supporting us with their own
closed-source-multipath-modules for an actual kernel.org kernel it took me
more time than planned to get the CX running stable again. Many thanks again
to Christophe, Edward and Kevin for the support & patches!
Within the next week I'll update your wiki with a completion of the
DebianInstall page, which will include a link to an updated multipath-tools
package (latest git) for Debian Sarge.
A few minutes ago I've added the CX to
http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=TestedEnvironments - I
hope it'll be useful. Please let me know if I've forgotten anything.
Thanks again,
Bernd
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Clariion CX600 automatic failback support
@ 2005-11-06 18:29 goggin, edward
0 siblings, 0 replies; 20+ messages in thread
From: goggin, edward @ 2005-11-06 18:29 UTC (permalink / raw)
To: 'dm-devel@redhat.com'
On Date: Fri, 04 Nov 2005 21:55:20 +0100
Christophe Varoqui <christophe.varoqui@free.fr> wrote
> Subject: RE: [dm-devel] Clariion CX600 automatic failback support
> To: device-mapper development <dm-devel@redhat.com>
> Message-ID: <1131137721.7463.12.camel@zezette>
> Content-Type: text/plain; charset=utf-8
>
>
> > > Yes, the test to determine if a switch is needed is sub-optimal :
> > > The no activated PG case (all Enabled for exampleÃ) keeps
> > > triggering a switch.
> > > I checked-in a patch to solve that issue.
> >
> > I've dealt with this problem by patching the device mapper
> > multipath driver in the kernel.
> >
> > The patch below patches drivers/md/dm-mpath.c:multipath_status()
> > to return an active state for a path group which is either the
> > current path group or setup to be the next path group instead
> > of just returning an active state for a path group which is
> > currently active.
> >
> Right, it deserved to be done.
>
> I'm inclined to keep the tools patch anyway : it wastes an "int" per
> multipath but is brain-friendlier (as it avoids using mpp->nextpg to
> represent the current setting *and* the best-rated setting).
>
> Are interested parties ok with that ?
Good point. This slightly modified patch fixes the "*and*" case by only
returning active if either the particular path group is the current active
or there is no current active but the particular path group is set to be
the next (first) deferred active path group.
*** ../base/linux-2.6.14-rc4/drivers/md/dm-mpath.c Mon Oct 10 20:19:19
2005
--- drivers/md/dm-mpath.c Sat Nov 5 03:02:50 2005
***************
*** 1158,1164 ****
list_for_each_entry(pg, &m->priority_groups, list) {
if (pg->bypassed)
state = 'D'; /* Disabled */
! else if (pg == m->current_pg)
state = 'A'; /* Currently Active */
else
state = 'E'; /* Enabled */
--- 1158,1164 ----
list_for_each_entry(pg, &m->priority_groups, list) {
if (pg->bypassed)
state = 'D'; /* Disabled */
! else if ((pg == m->current_pg) || ((!m->current_pg)
&&(pg == m->next_pg)))
state = 'A'; /* Currently Active */
else
state = 'E'; /* Enabled */
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2005-11-10 19:24 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-26 19:37 Clariion CX600 automatic failback support Phil Lowden (plowden)
2005-10-31 13:58 ` Brian Long
2005-10-31 14:02 ` Bernd Zeimetz
2005-10-31 14:07 ` Christophe Varoqui
2005-10-31 15:51 ` Brian Long
2005-10-31 16:06 ` Christophe Varoqui
-- strict thread matches above, loose matches on Subject: below --
2005-11-02 21:32 goggin, edward
2005-11-03 13:58 ` Christophe Varoqui
2005-11-07 19:25 ` Benjamin Marzinski
2005-11-03 18:16 goggin, edward
2005-11-03 20:55 ` Brian Long
2005-11-03 22:00 ` Bernd Zeimetz
2005-11-04 8:19 ` Bernd Zeimetz
2005-11-04 9:47 ` Christophe Varoqui
2005-11-04 19:36 goggin, edward
2005-11-04 20:55 ` Christophe Varoqui
2005-11-05 0:03 ` Bernd Zeimetz
2005-11-05 12:26 ` Christophe Varoqui
2005-11-10 19:24 ` Bernd Zeimetz
2005-11-06 18:29 goggin, edward
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.