Re: INFO: task md2_resync:7950 blocked for more than 120 seconds

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: INFO: task md2_resync:7950 blocked for more than 120 seconds
       [not found] <8a6b34b03e0524aa66c862534fee9b7a@www3.mail.volny.cz>
@ 2008-05-21 11:41 ` Neil Brown
  2008-05-21 14:48   ` Round Robin vs Active/Passive Craig Simpson
  0 siblings, 1 reply; 21+ messages in thread
From: Neil Brown @ 2008-05-21 11:41 UTC (permalink / raw)
  To: wylda; +Cc: dm-devel, linux-raid

On Sunday May 4, wylda@volny.cz wrote:
> Hello,
> 
> today i noticed in syslog some strange entries triggered by checkarray.
> I cannot judge if those are harmless messages or it means, that something
> bad is happening. So is it bad or just a bug?

Hi,
 your attachment was very large and so the original mail didn't go out
to the list but went to the moderator instead, and I ended up with it
(thanks Alasdair!).

The syslog contained lots of things like:

May  4 00:57:02 ss4000 mdadm: RebuildStarted event detected on md device /dev/md0
May  4 00:57:02 ss4000 kernel: md: delaying data-check of md1 until md2 has finished (they share one or more physical units)
May  4 00:57:02 ss4000 kernel: md: delaying data-check of md2 until md0 has finished (they share one or more physical units)
May  4 01:20:26 ss4000 kernel: INFO: task md1_resync:7949 blocked for more than 120 seconds.
May  4 01:20:26 ss4000 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  4 01:20:26 ss4000 kernel: md1_resync    D c02c01b8     0  7949      2
May  4 01:20:26 ss4000 kernel: [<c02bff50>] (schedule+0x0/0x2d4) from [<c0221ed0>] (md_do_sync+0x1dc/0x950)
May  4 01:20:26 ss4000 kernel: [<c0221cf4>] (md_do_sync+0x0/0x950) from [<c02237a0>] (md_thread+0x114/0x130)
May  4 01:20:26 ss4000 kernel: [<c022368c>] (md_thread+0x0/0x130) from [<c004ea7c>] (kthread+0x5c/0x94)
May  4 01:20:26 ss4000 kernel: [<c004ea20>] (kthread+0x0/0x94) from [<c003c3dc>] (do_exit+0x0/0x328)
May  4 01:20:26 ss4000 kernel:  r6:00000000 r5:00000000 r4:00000000
May  4 01:20:27 ss4000 kernel: no locks held by md1_resync/7949.
May  4 01:20:27 ss4000 kernel: INFO: task md2_resync:7950 blocked for more than 120 seconds.

suggesting that 'md1_resync' was blocked waiting for disk IO for 2 minutes
multiple times.

I have heard of another case of this happening when the md arrays were
built on top of dm logical volumes.

Could you please report exactly what kernel and distro you are using,
and describe your storage setup (what devices, what LVM, DM, MD, loop,
crypto, filesystem,.... is being used and how).

NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Round Robin vs Active/Passive
  2008-05-21 11:41 ` INFO: task md2_resync:7950 blocked for more than 120 seconds Neil Brown
@ 2008-05-21 14:48   ` Craig Simpson
  2008-05-21 15:32     ` Craig Simpson
  0 siblings, 1 reply; 21+ messages in thread
From: Craig Simpson @ 2008-05-21 14:48 UTC (permalink / raw)
  To: device-mapper development

For multipathd and dm is round-robin the only mode for multipathing? The
array we have as a newer Hitachi AMS200 and Active/Passive says the
documentation. 

Using round-robin seems to work fine with it. Have 4 paths and they are
all sharing 1/4 of the load. 

Are there other options that can be set in /etc/multipath.conf? I only
know of round-robin. 

Craig  

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Round Robin vs Active/Passive
  2008-05-21 14:48   ` Round Robin vs Active/Passive Craig Simpson
@ 2008-05-21 15:32     ` Craig Simpson
  2008-05-21 18:23       ` Tore Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Craig Simpson @ 2008-05-21 15:32 UTC (permalink / raw)
  To: device-mapper development; +Cc: Michael Denney



My Hitachi AMS200 is an Active/Passive array says Hitachi. 
By looking at asm13 I see all my paths active. Did use the
"path_grouping_policy    multibus" when creating that alias. 

The LUNs that are picked up and not aliased in /etc/multipath.conf seem
to show an active/passive setup. 

Wondering "How" the [active] or [active] [enables] setup is figured out
by multipath.


(this is a different machine, but with no alias I [active] & [enabled]
path. 
mpath5 (1HITACHI_D60090910036) dm-10 HITACHI,DF600F
[size=5.5G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:1:36 sdq 65:0  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:36 sdf 8:80  [active][undef]


This one was aliased with below settings in /etc/multipath.conf
Multipath -l
asm13 (1HITACHI_730600240012) dm-53 HITACHI,DF600F
[size=64G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:1:12 sdba 67:64   [active][undef]
 \_ 1:0:0:12 sdco 69:192  [active][undef]
 \_ 1:0:1:12 sdec 128:64  [active][undef]
 \_ 0:0:0:12 sdm  8:192   [active][undef]




From /etc/multipath.conf:
        multipath {
                wwid                    1HITACHI_730600240012
                alias                   asm13
                path_grouping_policy    multibus
                path_checker            readsector0
                path_selector           "round-robin 0"
                failback                immediate
        }




From /etc/multipath.conf:
defaults {
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout            /bin/true
        path_checker            readsector0
        rr_min_io               100
        rr_weight               priorities
        failback                immediate
        no_path_retry           fail
        user_friendly_name      yes
}






Craig




-----Original Message-----
From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com]
On Behalf Of Craig Simpson
Sent: Wednesday, May 21, 2008 7:48 AM
To: device-mapper development
Subject: [dm-devel] Round Robin vs Active/Passive



For multipathd and dm is round-robin the only mode for multipathing? The
array we have as a newer Hitachi AMS200 and Active/Passive says the
documentation. 

Using round-robin seems to work fine with it. Have 4 paths and they are
all sharing 1/4 of the load. 

Are there other options that can be set in /etc/multipath.conf? I only
know of round-robin. 

Craig  

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-21 15:32     ` Craig Simpson
@ 2008-05-21 18:23       ` Tore Anderson
  2008-05-21 20:21         ` Craig Simpson
  2008-05-22  8:24         ` Domenico Viggiani
  0 siblings, 2 replies; 21+ messages in thread
From: Tore Anderson @ 2008-05-21 18:23 UTC (permalink / raw)
  To: device-mapper development; +Cc: Michael Denney

* Craig Simpson
> 
> My Hitachi AMS200 is an Active/Passive array says Hitachi. 
> By looking at asm13 I see all my paths active. Did use the
> "path_grouping_policy    multibus" when creating that alias. 

The AMS200 is indeed an active/passive array, but it's "fakes"
active/active behaviour - if the passive controller receives an I/O
operation it will redirect it internally to the active one which will
process it and return it back to the passive controller, which in turn
returns it back to the initiator, which have no idea that this happens
at all.

So you can use it as a true active/active array, but I'd recommend
against it for two reasons;  first, there might be a slight processing
overhead to route I/O through the passive controller (as well as a
slight increase in latency), second, you might risk saturating the
interconnect between the controllers with re-routed I/O if you have lots
of volumes using the array in this way (this might or might not be a
real problem depending on how the hardware is built).

So what you should do is to distinguish between paths to the active
controller and run round-robin on all of these, while having fail-over
to the set of paths to the passive controller.  An example on how this
looks:

mysql (36006016034301f0004582492ab21dd11)
[size=40 GB][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 4:0:2:0 sds 65:32 [active][ready]
 \_ 3:0:2:0 sdu 65:64 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 4:0:3:0 sdr 65:16 [active][ready]
 \_ 3:0:3:0 sdt 65:48 [active][ready]

I/O is here balanced between sds and sdu, which have the highest
priority.  sdr and sdt will only be used should both sds and sdu fail.
This is accomplished by the following two configuration settings:

path_grouping_policy group_by_prio
prio_callout "/sbin/mpath_prio_emc_silent /dev/%n"

(This is an EMC array.)

You should be able to do the same using mpath_prio_hdc_modular as the
prio_callout.  Last I checked this callout wasn't actually able to
determine which controller is the preferred for a given volume (one of
the reasons I bought an EMC instead), but did a simplistic check which
was something along the lines of "controller 0 is preferred for all
volumes with an even LUN;  controller 1 for all volumes with an odd
LUN".  So even though this probably won't match reality unless you take
care to configure the AMS accordingly, you will get the desired effect -
round robin between the paths to one controller, failover to the paths
to the other.  The AMS is also clever enough to understand that if
you're only sending I/O to the passive controller it will automatically
change the ownership of the volume to the controller actually receiving
I/O, so you won't have the problem of I/O being re-routed between
controllers.

The downside is that you can't decide which controller is the preferred
one for a given volume, so if you have two highly active volumes with
odd LUNs and two mostly idle one with even LUNs you won't be able to
split the load equally between the controllers.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Round Robin vs Active/Passive
  2008-05-21 18:23       ` Tore Anderson
@ 2008-05-21 20:21         ` Craig Simpson
  2008-05-21 21:01           ` Tore Anderson
  2008-05-22  8:24         ` Domenico Viggiani
  1 sibling, 1 reply; 21+ messages in thread
From: Craig Simpson @ 2008-05-21 20:21 UTC (permalink / raw)
  To: device-mapper development; +Cc: Michael Denney

Amazing Info, thanks!

Changed my Defaults to this:

defaults {
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    group_by_prio
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout            /sbin/mpath_prio_hds_modular
        path_checker            readsector0
        rr_min_io               100
        rr_weight               priorities
        failback                immediate
        no_path_retry           fail
        user_friendly_name      yes
}

So figure I don't need to include anything in my aliases, since the
defaults are set.

        multipath {
                wwid                    1HITACHI_D60090910032
                alias                   asm01
        }

Did a multipathd -k
And a reconfigure

But when doing a multipath -l Not sure if it looks correct:

asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F
[size=32G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:1:32 sdm 8:192 [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:32 sdb 8:16  [active][undef]

Also a multipathd> show topology

reload: asm01  (1HITACHI_D60090910032) dm-6  HITACHI,DF600F
[size=32G ][features=0       ][hwhandler=0        ]
\_ round-robin 0 [prio=1][enabled]
 \_ 0:0:1:32 sdm 8:192 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:32 sdb 8:16  [active][ready]

Looks like I have [enabled] [enabled] ...
But it should be [active] [enabled]

Thanks for any feedback. 

Craig

-----Original Message-----
From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com]
On Behalf Of Tore Anderson
Sent: Wednesday, May 21, 2008 11:24 AM
To: device-mapper development
Cc: Michael Denney
Subject: Re: [dm-devel] Round Robin vs Active/Passive

* Craig Simpson
> 
> My Hitachi AMS200 is an Active/Passive array says Hitachi. 
> By looking at asm13 I see all my paths active. Did use the
> "path_grouping_policy    multibus" when creating that alias. 

The AMS200 is indeed an active/passive array, but it's "fakes"
active/active behaviour - if the passive controller receives an I/O
operation it will redirect it internally to the active one which will
process it and return it back to the passive controller, which in turn
returns it back to the initiator, which have no idea that this happens
at all.

So you can use it as a true active/active array, but I'd recommend
against it for two reasons;  first, there might be a slight processing
overhead to route I/O through the passive controller (as well as a
slight increase in latency), second, you might risk saturating the
interconnect between the controllers with re-routed I/O if you have lots
of volumes using the array in this way (this might or might not be a
real problem depending on how the hardware is built).

So what you should do is to distinguish between paths to the active
controller and run round-robin on all of these, while having fail-over
to the set of paths to the passive controller.  An example on how this
looks:

mysql (36006016034301f0004582492ab21dd11)
[size=40 GB][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 4:0:2:0 sds 65:32 [active][ready]
 \_ 3:0:2:0 sdu 65:64 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 4:0:3:0 sdr 65:16 [active][ready]
 \_ 3:0:3:0 sdt 65:48 [active][ready]

I/O is here balanced between sds and sdu, which have the highest
priority.  sdr and sdt will only be used should both sds and sdu fail.
This is accomplished by the following two configuration settings:

path_grouping_policy group_by_prio
prio_callout "/sbin/mpath_prio_emc_silent /dev/%n"

(This is an EMC array.)

You should be able to do the same using mpath_prio_hdc_modular as the
prio_callout.  Last I checked this callout wasn't actually able to
determine which controller is the preferred for a given volume (one of
the reasons I bought an EMC instead), but did a simplistic check which
was something along the lines of "controller 0 is preferred for all
volumes with an even LUN;  controller 1 for all volumes with an odd
LUN".  So even though this probably won't match reality unless you take
care to configure the AMS accordingly, you will get the desired effect -
round robin between the paths to one controller, failover to the paths
to the other.  The AMS is also clever enough to understand that if
you're only sending I/O to the passive controller it will automatically
change the ownership of the volume to the controller actually receiving
I/O, so you won't have the problem of I/O being re-routed between
controllers.

The downside is that you can't decide which controller is the preferred
one for a given volume, so if you have two highly active volumes with
odd LUNs and two mostly idle one with even LUNs you won't be able to
split the load equally between the controllers.

Regards,
-- 
Tore Anderson

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-21 20:21         ` Craig Simpson
@ 2008-05-21 21:01           ` Tore Anderson
  2008-05-21 21:49             ` Craig Simpson
  0 siblings, 1 reply; 21+ messages in thread
From: Tore Anderson @ 2008-05-21 21:01 UTC (permalink / raw)
  To: device-mapper development; +Cc: Michael Denney

Hi,

* Craig Simpson

> Amazing Info, thanks!

Glad I could help.

> Changed my Defaults to this:
> 
> defaults {
>         udev_dir                /dev
>         polling_interval        10
>         selector                "round-robin 0"
>         path_grouping_policy    group_by_prio
>         getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
>         prio_callout            /sbin/mpath_prio_hds_modular
>         path_checker            readsector0
>         rr_min_io               100
>         rr_weight               priorities
>         failback                immediate
>         no_path_retry           fail
>         user_friendly_name      yes
> }

You need

  prio_callout "/sbin/mpath_prio_hds_modular /dev/%n"

for the priority to be determined correctly.

Anyway I'm a bit surprised that you need to specify these things, the
AMS series do have a entry in hwtable.c in multipath-tools 0.4.8 at
least.  Running an old version maybe?  They don't differ much from what
you have there, though.

> So figure I don't need to include anything in my aliases, since the
> defaults are set.

You figure correctly.

> Did a multipathd -k
> And a reconfigure

You also need to actually reload the multipath maps in the kernel, by
invoking e.g. "multipath -v2".

> But when doing a multipath -l Not sure if it looks correct:
> 
> asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F
> [size=32G][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:1:32 sdm 8:192 [active][undef]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:32 sdb 8:16  [active][undef]

You need to use "multipath -ll" (two l's) for it to show you the
priority, but it looks like multipathd have everything figured out:

> Also a multipathd> show topology
> 
> reload: asm01  (1HITACHI_D60090910032) dm-6  HITACHI,DF600F
> [size=32G ][features=0       ][hwhandler=0        ]
> \_ round-robin 0 [prio=1][enabled]
>  \_ 0:0:1:32 sdm 8:192 [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:32 sdb 8:16  [active][ready]

It's a bit strange that it actually is able to determine the priority,
considering that you have the prio_callout set incorrectly in your
defaults section.

I suspect that the default values from hwtable.c comes into play and
overrides your settings anyway.  You can check this by running "show
config" from inside "multipathd -k" - if you have a device section for
your vendor HITACHI, product DF.* there that might be what's going on.

> Looks like I have [enabled] [enabled] ...
> But it should be [active] [enabled]

Have you sent any I/O to the device after the configuration change?  The
PG doesn't transition from enabled to active before some regular I/O has
been sent there.  Just reading some data from it should suffice.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Round Robin vs Active/Passive
  2008-05-21 21:01           ` Tore Anderson
@ 2008-05-21 21:49             ` Craig Simpson
  2008-05-21 23:41               ` Craig Simpson
  0 siblings, 1 reply; 21+ messages in thread
From: Craig Simpson @ 2008-05-21 21:49 UTC (permalink / raw)
  To: device-mapper development; +Cc: Geoff Quan, Michael Denney, Kevin Koplar




OK, I did notice that my Multipath Tools are a little old

[root@wpe02 sbin]# rpm -qa |grep -i multipath
device-mapper-multipath-0.4.7-12.el5_1.3

Running Oracle Linux 5. Which in truth is:

[root@wpe02 sbin]# cat /etc/redhat-release
Enterprise Linux Enterprise Linux Server release 5.1 (Carthage)

Guess I could grab the latest multipath tools from RedHat for ES 5.1.


Looks like I am on track now. 

I looked in
/usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.defaults 
And see the below. I guess that is what multipath is trying to use.
#       device {
#               vendor                  "HITACHI"
#               product                 "DF.*"
#               getuid_callout          "/sbin/scsi_id -g -u -s
/block/%n"
#               prio_callout            "/sbin/mpath_prio_hds_modular
%d"
#               features                "0"
#               hardware_handler        "0"
#               path_grouping_policy    group_by_prio
#               failback                immediate
#               rr_weight               uniform
#               rr_min_io               1000
#               path_checker            readsector0
#       }

Created my own version of defaults in /etc/multipath.conf from that:
defaults {
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    group_by_prio
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout            "/sbin/mpath_prio_hds_modular /dev/%n"
        path_checker            readsector0
        rr_min_io               1000
        rr_weight               uniform
        failback                immediate
        no_path_retry           fail
        user_friendly_name      yes
}



Then for an alias I use this:
        multipath {
                wwid                    1HITACHI_D60090910032
                alias                   asm01
        }


Output from multipath -ll
asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F
[size=32G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:1:32 sdm 8:192 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:32 sdb 8:16  [active][ready]



From a multipathd -k, "show config"
        device {
                vendor HITACHI
                product DF.*
                prio_callout mpath_prio_hds_modular %d
        }

So looks like maybe it is incorrect there? 
Usually if that is messing up, it shows in /var/log/messages. 


THANKS THANKS THANKS Tore!!!!!!!! Would be lost without the help!

Craig



-----Original Message-----
From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com]
On Behalf Of Tore Anderson
Sent: Wednesday, May 21, 2008 2:01 PM
To: device-mapper development
Cc: Michael Denney
Subject: Re: [dm-devel] Round Robin vs Active/Passive

Hi,

* Craig Simpson

> Amazing Info, thanks!

Glad I could help.

> Changed my Defaults to this:
> 
> defaults {
>         udev_dir                /dev
>         polling_interval        10
>         selector                "round-robin 0"
>         path_grouping_policy    group_by_prio
>         getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
>         prio_callout            /sbin/mpath_prio_hds_modular
>         path_checker            readsector0
>         rr_min_io               100
>         rr_weight               priorities
>         failback                immediate
>         no_path_retry           fail
>         user_friendly_name      yes
> }

You need

  prio_callout "/sbin/mpath_prio_hds_modular /dev/%n"

for the priority to be determined correctly.

Anyway I'm a bit surprised that you need to specify these things, the
AMS series do have a entry in hwtable.c in multipath-tools 0.4.8 at
least.  Running an old version maybe?  They don't differ much from what
you have there, though.

> So figure I don't need to include anything in my aliases, since the
> defaults are set.

You figure correctly.

> Did a multipathd -k
> And a reconfigure

You also need to actually reload the multipath maps in the kernel, by
invoking e.g. "multipath -v2".

> But when doing a multipath -l Not sure if it looks correct:
> 
> asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F
> [size=32G][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:1:32 sdm 8:192 [active][undef]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:32 sdb 8:16  [active][undef]

You need to use "multipath -ll" (two l's) for it to show you the
priority, but it looks like multipathd have everything figured out:

> Also a multipathd> show topology
> 
> reload: asm01  (1HITACHI_D60090910032) dm-6  HITACHI,DF600F
> [size=32G ][features=0       ][hwhandler=0        ]
> \_ round-robin 0 [prio=1][enabled]
>  \_ 0:0:1:32 sdm 8:192 [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:32 sdb 8:16  [active][ready]

It's a bit strange that it actually is able to determine the priority,
considering that you have the prio_callout set incorrectly in your
defaults section.

I suspect that the default values from hwtable.c comes into play and
overrides your settings anyway.  You can check this by running "show
config" from inside "multipathd -k" - if you have a device section for
your vendor HITACHI, product DF.* there that might be what's going on.

> Looks like I have [enabled] [enabled] ...
> But it should be [active] [enabled]

Have you sent any I/O to the device after the configuration change?  The
PG doesn't transition from enabled to active before some regular I/O has
been sent there.  Just reading some data from it should suffice.

Regards,
-- 
Tore Anderson

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Round Robin vs Active/Passive
  2008-05-21 21:49             ` Craig Simpson
@ 2008-05-21 23:41               ` Craig Simpson
  2008-05-22 12:00                 ` Tore Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Craig Simpson @ 2008-05-21 23:41 UTC (permalink / raw)
  To: device-mapper development; +Cc: Geoff Quan, Michael Denney, Kevin Koplar


All is Beautiful now. Not logging any errors to messages so looks like "
mpath_prio_hds_modular" is getting called correctly. 

Multipath -ll
asm14 (1HITACHI_730600240013) dm-55 HITACHI,DF600F
[size=64G][features=0][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 1:0:0:13 sdcp 69:208  [active][ready]
 \_ 0:0:0:13 sdn  8:208   [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:1:13 sdbb 67:80   [active][ready]
 \_ 1:0:1:13 sded 128:80  [active][ready]

Thanks,
Craig




-----Original Message-----
From: Craig Simpson 
Sent: Wednesday, May 21, 2008 2:49 PM
To: device-mapper development
Cc: Michael Denney; Geoff Quan; Kevin Koplar; Craig Simpson
Subject: RE: [dm-devel] Round Robin vs Active/Passive




OK, I did notice that my Multipath Tools are a little old

[root@wpe02 sbin]# rpm -qa |grep -i multipath
device-mapper-multipath-0.4.7-12.el5_1.3

Running Oracle Linux 5. Which in truth is:

[root@wpe02 sbin]# cat /etc/redhat-release
Enterprise Linux Enterprise Linux Server release 5.1 (Carthage)

Guess I could grab the latest multipath tools from RedHat for ES 5.1.


Looks like I am on track now. 

I looked in
/usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.defaults 
And see the below. I guess that is what multipath is trying to use.
#       device {
#               vendor                  "HITACHI"
#               product                 "DF.*"
#               getuid_callout          "/sbin/scsi_id -g -u -s
/block/%n"
#               prio_callout            "/sbin/mpath_prio_hds_modular
%d"
#               features                "0"
#               hardware_handler        "0"
#               path_grouping_policy    group_by_prio
#               failback                immediate
#               rr_weight               uniform
#               rr_min_io               1000
#               path_checker            readsector0
#       }

Created my own version of defaults in /etc/multipath.conf from that:
defaults {
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    group_by_prio
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout            "/sbin/mpath_prio_hds_modular /dev/%n"
        path_checker            readsector0
        rr_min_io               1000
        rr_weight               uniform
        failback                immediate
        no_path_retry           fail
        user_friendly_name      yes
}



Then for an alias I use this:
        multipath {
                wwid                    1HITACHI_D60090910032
                alias                   asm01
        }


Output from multipath -ll
asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F
[size=32G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:1:32 sdm 8:192 [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:32 sdb 8:16  [active][ready]



From a multipathd -k, "show config"
        device {
                vendor HITACHI
                product DF.*
                prio_callout mpath_prio_hds_modular %d
        }

So looks like maybe it is incorrect there? 
Usually if that is messing up, it shows in /var/log/messages. 


THANKS THANKS THANKS Tore!!!!!!!! Would be lost without the help!

Craig



-----Original Message-----
From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com]
On Behalf Of Tore Anderson
Sent: Wednesday, May 21, 2008 2:01 PM
To: device-mapper development
Cc: Michael Denney
Subject: Re: [dm-devel] Round Robin vs Active/Passive

Hi,

* Craig Simpson

> Amazing Info, thanks!

Glad I could help.

> Changed my Defaults to this:
> 
> defaults {
>         udev_dir                /dev
>         polling_interval        10
>         selector                "round-robin 0"
>         path_grouping_policy    group_by_prio
>         getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
>         prio_callout            /sbin/mpath_prio_hds_modular
>         path_checker            readsector0
>         rr_min_io               100
>         rr_weight               priorities
>         failback                immediate
>         no_path_retry           fail
>         user_friendly_name      yes
> }

You need

  prio_callout "/sbin/mpath_prio_hds_modular /dev/%n"

for the priority to be determined correctly.

Anyway I'm a bit surprised that you need to specify these things, the
AMS series do have a entry in hwtable.c in multipath-tools 0.4.8 at
least.  Running an old version maybe?  They don't differ much from what
you have there, though.

> So figure I don't need to include anything in my aliases, since the
> defaults are set.

You figure correctly.

> Did a multipathd -k
> And a reconfigure

You also need to actually reload the multipath maps in the kernel, by
invoking e.g. "multipath -v2".

> But when doing a multipath -l Not sure if it looks correct:
> 
> asm01 (1HITACHI_D60090910032) dm-6 HITACHI,DF600F
> [size=32G][features=0][hwhandler=0]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:1:32 sdm 8:192 [active][undef]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:32 sdb 8:16  [active][undef]

You need to use "multipath -ll" (two l's) for it to show you the
priority, but it looks like multipathd have everything figured out:

> Also a multipathd> show topology
> 
> reload: asm01  (1HITACHI_D60090910032) dm-6  HITACHI,DF600F
> [size=32G ][features=0       ][hwhandler=0        ]
> \_ round-robin 0 [prio=1][enabled]
>  \_ 0:0:1:32 sdm 8:192 [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:0:32 sdb 8:16  [active][ready]

It's a bit strange that it actually is able to determine the priority,
considering that you have the prio_callout set incorrectly in your
defaults section.

I suspect that the default values from hwtable.c comes into play and
overrides your settings anyway.  You can check this by running "show
config" from inside "multipathd -k" - if you have a device section for
your vendor HITACHI, product DF.* there that might be what's going on.

> Looks like I have [enabled] [enabled] ...
> But it should be [active] [enabled]

Have you sent any I/O to the device after the configuration change?  The
PG doesn't transition from enabled to active before some regular I/O has
been sent there.  Just reading some data from it should suffice.

Regards,
-- 
Tore Anderson

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Round Robin vs Active/Passive
  2008-05-21 18:23       ` Tore Anderson
  2008-05-21 20:21         ` Craig Simpson
@ 2008-05-22  8:24         ` Domenico Viggiani
  2008-05-22 11:57           ` Tore Anderson
  2008-05-23  7:16           ` Hannes Reinecke
  1 sibling, 2 replies; 21+ messages in thread
From: Domenico Viggiani @ 2008-05-22  8:24 UTC (permalink / raw)
  To: 'device-mapper development'

* Tore Anderson
>
> The AMS200 is indeed an active/passive array, but it's "fakes"
> active/active behaviour - if the passive controller receives 
> an I/O operation it will redirect it internally to the active 
> one which will process it and return it back to the passive 
> controller, which in turn returns it back to the initiator, 
> which have no idea that this happens at all.
> 
> So you can use it as a true active/active array, but I'd 
> recommend against it for two reasons;  first, there might be 
> a slight processing overhead to route I/O through the passive 
> controller (as well as a slight increase in latency), second, 
> you might risk saturating the interconnect between the 
> controllers with re-routed I/O if you have lots of volumes 
> using the array in this way (this might or might not be a 
> real problem depending on how the hardware is built).
> 
> So what you should do is to distinguish between paths to the 
> active controller and run round-robin on all of these, while 
> having fail-over to the set of paths to the passive 
> controller.  An example on how this
> looks:
> 
> mysql (36006016034301f0004582492ab21dd11)
> [size=40 GB][features=1 queue_if_no_path][hwhandler=0] \_ 
> round-robin 0 [prio=2][active]  \_ 4:0:2:0 sds 65:32 
> [active][ready]  \_ 3:0:2:0 sdu 65:64 [active][ready] \_ 
> round-robin 0 [prio=0][enabled]  \_ 4:0:3:0 sdr 65:16 
> [active][ready]  \_ 3:0:3:0 sdt 65:48 [active][ready]
> 
> I/O is here balanced between sds and sdu, which have the 
> highest priority.  sdr and sdt will only be used should both 
> sds and sdu fail.
> This is accomplished by the following two configuration settings:
> 
> path_grouping_policy group_by_prio
> prio_callout "/sbin/mpath_prio_emc_silent /dev/%n"
> 
> (This is an EMC array.)
> 
> You should be able to do the same using 
> mpath_prio_hdc_modular as the prio_callout.  Last I checked 
> this callout wasn't actually able to determine which 
> controller is the preferred for a given volume (one of the 
> reasons I bought an EMC instead), but did a simplistic check 
> which was something along the lines of "controller 0 is 
> preferred for all volumes with an even LUN;  controller 1 for 
> all volumes with an odd LUN".  So even though this probably 
> won't match reality unless you take care to configure the AMS 
> accordingly, you will get the desired effect - round robin 
> between the paths to one controller, failover to the paths to 
> the other.  The AMS is also clever enough to understand that 
> if you're only sending I/O to the passive controller it will 
> automatically change the ownership of the volume to the 
> controller actually receiving I/O, so you won't have the 
> problem of I/O being re-routed between controllers.
> 
> The downside is that you can't decide which controller is the 
> preferred one for a given volume, so if you have two highly 
> active volumes with odd LUNs and two mostly idle one with 
> even LUNs you won't be able to split the load equally between 
> the controllers.

Sorry for question: is this how new ALUA mode works for EMC Clariion CX
arrays?
Are default settings suitable for this new failover mode?

I just upgraded my CX700 to FLARE26 with ALUA mode...

Thanks
--
Domenico Viggiani

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-22  8:24         ` Domenico Viggiani
@ 2008-05-22 11:57           ` Tore Anderson
  2008-05-23  7:16           ` Hannes Reinecke
  1 sibling, 0 replies; 21+ messages in thread
From: Tore Anderson @ 2008-05-22 11:57 UTC (permalink / raw)
  To: device-mapper development

* Domenico Viggiani

> Sorry for question: is this how new ALUA mode works for EMC Clariion
> CX arrays?

Yes, that's right.  Except for the fact that mpath_prio_emc will
correctly detect the preferred controller, while mpath_prio_hds_modular
only checks even/odd LUNs.

> Are default settings suitable for this new failover mode?

You don't have to use the EMC specific hardware handler or path checker
any longer.  This is what I use for my CX3:

        device {
                vendor                  DGC
                product                 *
                product_blacklist       LUNZ
                path_grouping_policy    group_by_prio
                path_checker            tur
                no_path_retry           queue
                prio_callout            "/sbin/mpath_prio_emc /dev/%n"
                failback                immediate
        }

Note that the host is no longer able to explicitly trespass the volume
between controllers.  I actually see that as an advantage, especially in
cluster environments.  If the host wants to change controllers it can
simply do so and wait for the CX to implicitly trespass the volume (due
to the I/O coming mostly to the passive controller).  This works very
well, and I consider it a huge improvement over the old
Passive-Not-Ready mode you had to use earlier (hwhandler "emc").

Note that there's also a new ALUA-specific hardware handler available
now.  I never tried it, so I can't tell how it differs from my setup.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-21 23:41               ` Craig Simpson
@ 2008-05-22 12:00                 ` Tore Anderson
  0 siblings, 0 replies; 21+ messages in thread
From: Tore Anderson @ 2008-05-22 12:00 UTC (permalink / raw)
  To: device-mapper development; +Cc: Michael Denney, Geoff Quan, Kevin Koplar

* Craig Simpson

> All is Beautiful now. Not logging any errors to messages so looks like "
> mpath_prio_hds_modular" is getting called correctly. 
> 
> Multipath -ll
> asm14 (1HITACHI_730600240013) dm-55 HITACHI,DF600F
> [size=64G][features=0][hwhandler=0]
> \_ round-robin 0 [prio=2][active]
>  \_ 1:0:0:13 sdcp 69:208  [active][ready]
>  \_ 0:0:0:13 sdn  8:208   [active][ready]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 0:0:1:13 sdbb 67:80   [active][ready]
>  \_ 1:0:1:13 sded 128:80  [active][ready]

Yes, you can see from this output that sdcp and sdn have a prio of 1,
placing them in the same priority group (which in sum have a prio of 2),
so the prio-callout definitely works as it is supposed to.  Everything
looks good, now you should try yanking some fibres (with lots of I/O
activity running) to make sure that it actually is able to handle a
failure scenario.  Good luck.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-22  8:24         ` Domenico Viggiani
  2008-05-22 11:57           ` Tore Anderson
@ 2008-05-23  7:16           ` Hannes Reinecke
  2008-05-23  8:00             ` Tore Anderson
  1 sibling, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2008-05-23  7:16 UTC (permalink / raw)
  To: device-mapper development

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 951 bytes --]

Hi Domenico,

On Thu, May 22, 2008 at 10:24:52AM +0200, Domenico Viggiani wrote:
> * Tore Anderson
> >
> > path_grouping_policy group_by_prio
> > prio_callout "/sbin/mpath_prio_emc_silent /dev/%n"
> > 
> > (This is an EMC array.)
> > 
[ .. ]
> 
> Sorry for question: is this how new ALUA mode works for EMC Clariion CX
> arrays?
> Are default settings suitable for this new failover mode?
> 
> I just upgraded my CX700 to FLARE26 with ALUA mode...
> 
No. Alua is completely different. You have to use

prio_callout "/sbin/mpath_prio_alua /dev/%n"

for this.
Although the normal EMC configuration continues to work, too.
And also note that you have to change the failover mode
to '4' to enable ALUA on the Clariion.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-23  7:16           ` Hannes Reinecke
@ 2008-05-23  8:00             ` Tore Anderson
  2008-05-23  8:55               ` Hannes Reinecke
  2008-05-23 10:36               ` Domenico Viggiani
  0 siblings, 2 replies; 21+ messages in thread
From: Tore Anderson @ 2008-05-23  8:00 UTC (permalink / raw)
  To: device-mapper development

* Hannes Reinecke

> No. Alua is completely different. You have to use
> 
> prio_callout "/sbin/mpath_prio_alua /dev/%n"
> 
> for this.
> Although the normal EMC configuration continues to work, too.
> And also note that you have to change the failover mode
> to '4' to enable ALUA on the Clariion.

Hmm, interesting.  Apologies if I've been spreading misinformation!

Now you made me curious.  How does using an array (in ALUA failover mode
4) with my configuration:

        device {
                vendor                  DGC
                product                 *
                product_blacklist       LUNZ
                path_grouping_policy    group_by_prio
                path_checker            tur
                no_path_retry           queue
                prio_callout            "/sbin/mpath_prio_emc /dev/%n"
                failback                immediate
        }

differ from using the ALUA specific code in multipath-tools?  I believe
it would look something like this?

        device {
                vendor                  DGC
                product                 *
                product_blacklist       LUNZ
                path_grouping_policy    group_by_prio
                path_checker            tur
                no_path_retry           queue
                prio_callout            "/sbin/mpath_prio_alua /dev/%n"
                failback                immediate
                hardware_handler        "1 alua"
        }

I assume the ALUA bits are able to explicitly tell the CLARiiON to
transfer volume ownership from one controller to another (something I
don't think is desired in clustered environments anyway - the array
should have a better understanding of the optimal location of the volume
than the hosts, who could be in disagreement and end up moving the
volume back and forth), but what other differences are there?

I'm speaking strictly from a user's point of view here - the differences
"under the hood" isn't that interesting to me as long as it ends up
working the same way and in an equally reliable manner.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-23  8:00             ` Tore Anderson
@ 2008-05-23  8:55               ` Hannes Reinecke
  2008-05-23  9:42                 ` Tore Anderson
  2008-05-23 10:36               ` Domenico Viggiani
  1 sibling, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2008-05-23  8:55 UTC (permalink / raw)
  To: device-mapper development

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 3822 bytes --]

Hi Tore,

On Fri, May 23, 2008 at 10:00:15AM +0200, Tore Anderson wrote:
> * Hannes Reinecke
> 
> > No. Alua is completely different. You have to use
> > 
> > prio_callout "/sbin/mpath_prio_alua /dev/%n"
> > 
> > for this.
> > Although the normal EMC configuration continues to work, too.
> > And also note that you have to change the failover mode
> > to '4' to enable ALUA on the Clariion.
> 
> Hmm, interesting.  Apologies if I've been spreading misinformation!
> 
> Now you made me curious.  How does using an array (in ALUA failover mode
> 4) with my configuration:
> 
>         device {
>                 vendor                  DGC
>                 product                 *
>                 product_blacklist       LUNZ
>                 path_grouping_policy    group_by_prio
>                 path_checker            tur
>                 no_path_retry           queue
>                 prio_callout            "/sbin/mpath_prio_emc /dev/%n"
>                 failback                immediate
>         }
> 
> differ from using the ALUA specific code in multipath-tools?  I believe
> it would look something like this?
> 
>         device {
>                 vendor                  DGC
>                 product                 *
>                 product_blacklist       LUNZ
>                 path_grouping_policy    group_by_prio
>                 path_checker            tur
>                 no_path_retry           queue
>                 prio_callout            "/sbin/mpath_prio_alua /dev/%n"
>                 failback                immediate
>                 hardware_handler        "1 alua"
>         }
> 
Yes, this looks about okay.

> I assume the ALUA bits are able to explicitly tell the CLARiiON to
> transfer volume ownership from one controller to another (something I
> don't think is desired in clustered environments anyway - the array
> should have a better understanding of the optimal location of the volume
> than the hosts, who could be in disagreement and end up moving the
> volume back and forth), but what other differences are there?
>
In moving the assignment to another controller you will have a more
efficient I/O throughput, as in general the internal link between those
two controller isn't the fastest. So you really want to move the
LUN to that controller which should handle the I/O.
(Remember, the original EMC failover couldn't even handle I/O on
the non-active controller ...)

And for ALUA support in general this is just a standardized method of
signalling the required failover/multipath support mode to the initiator.
You can map just about any existing failover method on ALUA modes,
so in principle every IHV can update their firmware to support ALUA.

And most vendors already did so. Funnily enough, none of those chose
to implement the _exact_ modes the original firmware supported.
With EMC we suddenly can do I/O on the secondary path, the active/active
HP boxes suddenly have optimal and non/optimal paths, the old HP
MSA firmware starts do do I/O on their standby path, too, etc.
Bit of a shame, really. I was really curious how ALUA paths in 'standby'
mode would be reacting ...

> I'm speaking strictly from a user's point of view here - the differences
> "under the hood" isn't that interesting to me as long as it ends up
> working the same way and in an equally reliable manner.
> 
Well, the big advantage of the ALUA support in the EMC is that even the
secondary path will function, albeit slower. Hence you wouldn't get
the I/O errors during boot anymore.

HTH,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-23  8:55               ` Hannes Reinecke
@ 2008-05-23  9:42                 ` Tore Anderson
  0 siblings, 0 replies; 21+ messages in thread
From: Tore Anderson @ 2008-05-23  9:42 UTC (permalink / raw)
  To: device-mapper development

Hi,

* Hannes Reinecke

> In moving the assignment to another controller you will have a more
> efficient I/O throughput, as in general the internal link between those
> two controller isn't the fastest. So you really want to move the
> LUN to that controller which should handle the I/O.

Yes.  But the EMC will take care of moving the volume where it gets the
most I/O on its own (they call it "implicit trespass"), no
host-initiated volume movement is necessary.  I'll elaborate on why I
believe this to be better (especially in clustered environments):

Consider a simple dual-fabric topology, which controller A connected to
fabric A and controller B connected to fabric B.  Ten nodes share one
volume, which by default are owned by controller A.  They all generate
about the same amount of I/O.

  +-----------[ Switch - Fabric A ]
  |            |    |     |    |
Ctrl A         |    |     |    |
              N1   N2   [..]  N10
Ctrl B         |    |     |    |
  |            |    |     |    |
  +-----------[ Switch - Fabric B ]

Normally all traffic to the volume are passed through fabric A to
controller A, while fabric B and controller B are completely idle.

Now, say that the patch cord between N10 and Fabric A breaks.  In the
situation when there's no host-initiated volume, N10 will start using
the path through fabric B through the passive controller with a slight
performance impact, while the CLARiiON will leave the volume on
controller A since it can tell that that's where 90% of the I/O is
coming in anyway.

If the hosts all will move the volume ("explicit trespass") whenever
they see fit, in the above scenario N10 would move the volume to
controller B, making 90% of all I/O come in the wrong way.  Depending on
the failback settings one of N1-9 would move it back to controller A
later, and until the broken patch cord was fixed the volume would keep
moving back and forth between controllers - not exactly optimal.

At least that's how I _think_ it would work, and that's why I don't use
the ALUA bits.  I'd appreciate your comments on whether or not this
makes sense or not...

Note that in the case of an error that redirects all I/O to the passive
controller (for example if N10 was the only node using the volume, or
the switch in fabric A failed), the volume would still get moved to
controller B (even though the hosts aren't able to do this themselves),
because of the "implicit trespass"-functionality.  The only drawback
that I can see of relying on this is that the I/O will pass through the
passive controller for some minutes instead of being transferred
immediately, which isn't really a problem as the performance degradation
is very slight, at least not on the CX3;  I'm having problems measuring
it at all.

> (Remember, the original EMC failover couldn't even handle I/O on
> the non-active controller ...)

Yes, PNR mode sucked.  I'm really glad the new CLARiiONs support ALUA,
now I don't need a Symmetrix anymore.  :-)

> Well, the big advantage of the ALUA support in the EMC is that even the
> secondary path will function, albeit slower. Hence you wouldn't get
> the I/O errors during boot anymore.

Ah, yes, I am aware of this.  That is how the HDS AMS that was discussed
earlier in the thread works too, though I'm not sure if that is actually
ALUA or some HDS-specific implementation that does basically the same thing.

I understood that Domenico asked if this is functionally equivalent to
the new ALUA mode on the CLARiiONs, not if the old PNR mode was
equivalent to ALUA, which is what I think is maybe how you understood
it?  If so I agree with you - those two are indeed completely different.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Round Robin vs Active/Passive
  2008-05-23  8:00             ` Tore Anderson
  2008-05-23  8:55               ` Hannes Reinecke
@ 2008-05-23 10:36               ` Domenico Viggiani
  2008-05-23 10:46                 ` Tore Anderson
  1 sibling, 1 reply; 21+ messages in thread
From: Domenico Viggiani @ 2008-05-23 10:36 UTC (permalink / raw)
  To: 'device-mapper development'

* Tore Anderson
> 4) with my configuration:
> 
>         device {
>                 vendor                  DGC
>                 product                 *
>                 product_blacklist       LUNZ
>                 path_grouping_policy    group_by_prio
>                 path_checker            tur
>                 no_path_retry           queue
>                 prio_callout            "/sbin/mpath_prio_emc /dev/%n"
>                 failback                immediate
>         }

Red Hat 4.6 defaults for EMC are:
       device {
               vendor                  "DGC"
               product                 "*"
               bl_product              "LUNZ"
               path_grouping_policy    group_by_prio
               getuid_callout          "/sbin/scsi_id -g -u -s"
               prio_callout            "/sbin/mpath_prio_emc /dev/%n"
               hardware_handler        "1 emc"
               features                "1 queue_if_no_path"
               path_checker            emc_clariion
               failback                immediate
       }
(from  /usr/share/doc/device-mapper-multipath-0.4.5/multipath.conf.defaults)
Why do you use different settings? Are they not "optimal"?

--
DV

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-23 10:36               ` Domenico Viggiani
@ 2008-05-23 10:46                 ` Tore Anderson
  2008-05-23 21:16                   ` Sebastian Herbszt
  0 siblings, 1 reply; 21+ messages in thread
From: Tore Anderson @ 2008-05-23 10:46 UTC (permalink / raw)
  To: device-mapper development

* Domenico Viggiani

> Red Hat 4.6 defaults for EMC are:
>        device {
>                vendor                  "DGC"
>                product                 "*"
>                bl_product              "LUNZ"
>                path_grouping_policy    group_by_prio
>                getuid_callout          "/sbin/scsi_id -g -u -s"
>                prio_callout            "/sbin/mpath_prio_emc /dev/%n"
>                hardware_handler        "1 emc"
>                features                "1 queue_if_no_path"
>                path_checker            emc_clariion
>                failback                immediate
>        }
> (from  /usr/share/doc/device-mapper-multipath-0.4.5/multipath.conf.defaults)
> Why do you use different settings? Are they not "optimal"?

These settings are suitable for PNR mode (failover mode 1, where the
passive paths are unable to process I/O - this will show as large
amounts of I/O errors during boot).  When all paths to the currently
active  fail, dm-multipath will instruct the CX to move the volume from
the active controller to the passive one.  This is bad in cluster
environment, where two cluster nodes might have a differing opinion of
which controller should own the volume and you'll end up having a volume
that constantly moves back and forth between controllers.

My settings are better suited for ALUA mode (failover mode 4, all paths
are able to process I/O), especially if the ALUA-specific support in
dm-multipath isn't available due to old kernels or similar.  I sent an
email to the list one hour ago detailing the advantages I see with this
setup.

Unfortunately I have found no way to detect if an array is operating in
ALUA or PNR mode and have dm-multipath automatically apply different
device{} sections based on that.  I have some nodes that are connected
to both my CX3 and an old CX200 (which doesn't support ALUA), and due to
this I need to use PNR mode on the CX3 too, wich kinda sucks.  Time to
get rid of the CX200 I guess.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Round Robin vs Active/Passive
  2008-05-23 10:46                 ` Tore Anderson
@ 2008-05-23 21:16                   ` Sebastian Herbszt
  2008-06-05  6:54                     ` Tore Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Sebastian Herbszt @ 2008-05-23 21:16 UTC (permalink / raw)
  To: device-mapper development

From: "Tore Anderson"
> Unfortunately I have found no way to detect if an array is operating in
> ALUA or PNR mode and have dm-multipath automatically apply different
> device{} sections based on that.  I have some nodes that are connected
> to both my CX3 and an old CX200 (which doesn't support ALUA), and due to
> this I need to use PNR mode on the CX3 too, wich kinda sucks.  Time to
> get rid of the CX200 I guess.

The comment and code from libmultipath/prioritizers/emc.c

 if ( /* Effective initiator type */
      sense_buffer[27] != 0x03
  /*
   * Failover mode should be set to 1 (PNR failover mode)
   * or 4 (ALUA failover mode).
   */
  || (((sense_buffer[28] & 0x07) != 0x04) &&
      ((sense_buffer[28] & 0x07) != 0x06))
  /* Arraycommpath should be set to 1 */
  || (sense_buffer[30] & 0x04) != 0x04) {
  pp_emc_log(0, "path not correctly configured for failover");
 }

doesn't help with the detection part?

- Sebastian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: Round Robin vs Active/Passive
  2008-05-23 21:16                   ` Sebastian Herbszt
@ 2008-06-05  6:54                     ` Tore Anderson
  2008-06-05  7:20                       ` Hannes Reinecke
  0 siblings, 1 reply; 21+ messages in thread
From: Tore Anderson @ 2008-06-05  6:54 UTC (permalink / raw)
  To: device-mapper development

* Sebastian Herbszt

> The comment and code from libmultipath/prioritizers/emc.c
> 
>  if ( /* Effective initiator type */
>       sense_buffer[27] != 0x03
>   /*
>    * Failover mode should be set to 1 (PNR failover mode)
>    * or 4 (ALUA failover mode).
>    */
>   || (((sense_buffer[28] & 0x07) != 0x04) &&
>       ((sense_buffer[28] & 0x07) != 0x06))
>   /* Arraycommpath should be set to 1 */
>   || (sense_buffer[30] & 0x04) != 0x04) {
>   pp_emc_log(0, "path not correctly configured for failover");
>  }
> 
> doesn't help with the detection part?

Not really, there's no way I can put this into a device{} section to
differentiate between two CLARiiONs running different failover modes.

What I need to do is this:

# My new CX3-40f which supports ALUA (preferred) as well as PNR
device {
	vendor DGC
	product *
	product_blacklist LUNZ
	alua_capable yes
	[..ALUA optimised settings...]
}

# My old CX200, only supports PNR
device {
	vendor DGC
	product *
	product_blacklist LUNZ
	alua_capable no
	[..PNR optimised settings...]
}

...but there's no such thing as the "alua_capable" setting or any other
setting that can be used to distinguish between the two CLARiiONs, as
far as I know, so I have to use both arrays in PNR mode even though the
newest one of them supports ALUA.

Please prove me wrong...  ;-)

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: Round Robin vs Active/Passive
  2008-06-05  6:54                     ` Tore Anderson
@ 2008-06-05  7:20                       ` Hannes Reinecke
  2008-06-06  7:19                         ` Tore Anderson
  0 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2008-06-05  7:20 UTC (permalink / raw)
  To: device-mapper development

Hi Tore,

Tore Anderson wrote:
[ .. ]
> What I need to do is this:
> 
> # My new CX3-40f which supports ALUA (preferred) as well as PNR
> device {
> 	vendor DGC
> 	product *
> 	product_blacklist LUNZ
> 	alua_capable yes
> 	[..ALUA optimised settings...]
> }
> 
> # My old CX200, only supports PNR
> device {
> 	vendor DGC
> 	product *
> 	product_blacklist LUNZ
> 	alua_capable no
> 	[..PNR optimised settings...]
> }
> 
> ...but there's no such thing as the "alua_capable" setting or any other
> setting that can be used to distinguish between the two CLARiiONs, as
> far as I know, so I have to use both arrays in PNR mode even though the
> newest one of them supports ALUA.
> 
No need. I did some patch once ago which allowed you to set the hardware
handler in the multipaths section.
For exactly this scenario. Can you test if it works?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: Round Robin vs Active/Passive
  2008-06-05  7:20                       ` Hannes Reinecke
@ 2008-06-06  7:19                         ` Tore Anderson
  0 siblings, 0 replies; 21+ messages in thread
From: Tore Anderson @ 2008-06-06  7:19 UTC (permalink / raw)
  To: device-mapper development

Hi Hannes,

* Hannes Reinecke

> No need. I did some patch once ago which allowed you to set the hardware
> handler in the multipaths section.
> For exactly this scenario. Can you test if it works?

Certainly!  Can you send me the patch or point me to it, though?  I 
didn't find it in my archive, but I might have missed it as you've been 
submitting a lot of patches here.

Regards,
-- 
Tore Anderson

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-06-06  7:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8a6b34b03e0524aa66c862534fee9b7a@www3.mail.volny.cz>
2008-05-21 11:41 ` INFO: task md2_resync:7950 blocked for more than 120 seconds Neil Brown
2008-05-21 14:48   ` Round Robin vs Active/Passive Craig Simpson
2008-05-21 15:32     ` Craig Simpson
2008-05-21 18:23       ` Tore Anderson
2008-05-21 20:21         ` Craig Simpson
2008-05-21 21:01           ` Tore Anderson
2008-05-21 21:49             ` Craig Simpson
2008-05-21 23:41               ` Craig Simpson
2008-05-22 12:00                 ` Tore Anderson
2008-05-22  8:24         ` Domenico Viggiani
2008-05-22 11:57           ` Tore Anderson
2008-05-23  7:16           ` Hannes Reinecke
2008-05-23  8:00             ` Tore Anderson
2008-05-23  8:55               ` Hannes Reinecke
2008-05-23  9:42                 ` Tore Anderson
2008-05-23 10:36               ` Domenico Viggiani
2008-05-23 10:46                 ` Tore Anderson
2008-05-23 21:16                   ` Sebastian Herbszt
2008-06-05  6:54                     ` Tore Anderson
2008-06-05  7:20                       ` Hannes Reinecke
2008-06-06  7:19                         ` Tore Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.