All of lore.kernel.org
 help / color / mirror / Atom feed
* DM-Multipath path failure questions..
@ 2007-11-14  6:07 Michael Vallaly
  2007-11-14 17:28 ` Mike Christie
  2007-11-14 18:26 ` Kevin Foote
  0 siblings, 2 replies; 7+ messages in thread
From: Michael Vallaly @ 2007-11-14  6:07 UTC (permalink / raw)
  To: dm-devel


Hello,

I am currently using the dm-multipather (multipath-tools) to allow high-availability / increased capacity to our Equallogic iSCSI SAN. I was wondering if anyone had come across a way to re-instantiate a failed path / paths from a multipath target, when the backend device (iscsi initiator) goes away. 

All goes well until we have a lengthy network hiccup or non-recoverable iSCSI error in which case the multipather seems to get wedged. The path seems to get stuck in a [active][faulty] state and the backend block device (sdX) actually gets removed from the system. I have tried reconnecting the iSCSI session, after this happens, and get a new (different IE: sdg vs. sdf) backend block level device, but the multipather never picks it up / never resumes IO operations, and I generally have then to power cycle the box.

We have anywhere from 2 to 4 iSCSI sessions open per multipath target, but even one path failing seems to cause the whole multipath to die. I am hoping there is a way to continue on after a path failure, rather than the power cycle. I have tried multipath-tools 0.4.6/0.4.7/0.4.8, and almost every permutation of the configuration I can think of. Maybe I am missing something quite obvious.  

Working Multipather
<snip>
mpath89 (36090a0281051367df57194d2a37392d5) dm-4 EQLOGIC ,100E-00       
[size=300G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 5:0:0:0  sdf 8:80  [active][ready]
 \_ 6:0:0:0  sdg 8:96  [active][ready]
</snip>

Wedged Multipather (when a iSCSI session terminates) (All IO queues indefinitely)
<snip>
mpath94 (36090a0180087e6045673743d3c01401c) dm-10 ,
[size=600G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
 \_ #:#:#:#  -   #:#   [active][faulty]
</snip>

Our multipath.conf looks like this: 
<snip>
defaults {
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
        #prio_callout            /bin/true
        #path_checker            readsector0
        path_checker            directio
        rr_min_io               100
        rr_weight               priorities
        failback                immediate
        no_path_retry           fail
        #user_friendly_names     no
        user_friendly_names     yes
}

blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
        devnode "^hd[a-z][[0-9]*]"
        devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}


devices {
        device {
                vendor                  "EQLOGIC"
                product                 "100E-00"
                path_grouping_policy    multibus
                getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
                #path_checker            directio
                path_checker            readsector0
                path_selector           "round-robin 0"
                ##hardware_handler        "0"
                failback                immediate
                rr_weight               priorities
                no_path_retry           queue
                #no_path_retry           fail
                rr_min_io               100
                product_blacklist       LUN_Z
        }
}

</snip>

Thanks for your help.

- Mike Vallaly

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DM-Multipath path failure questions..
  2007-11-14  6:07 DM-Multipath path failure questions Michael Vallaly
@ 2007-11-14 17:28 ` Mike Christie
  2007-11-14 22:33   ` Michael Vallaly
  2007-11-14 18:26 ` Kevin Foote
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Christie @ 2007-11-14 17:28 UTC (permalink / raw)
  To: device-mapper development

Michael Vallaly wrote:
> Hello,
> 
> I am currently using the dm-multipather (multipath-tools) to allow high-availability / increased capacity to our Equallogic iSCSI SAN. I was wondering if anyone had come across a way to re-instantiate a failed path / paths from a multipath target, when the backend device (iscsi initiator) goes away. 
> 
> All goes well until we have a lengthy network hiccup or non-recoverable iSCSI error in which case the multipather seems to get wedged. The path seems to get stuck in a [active][faulty] state and the backend block device (sdX) actually gets removed from the system. I have tried reconnecting the iSCSI session, after this happens, and get a new (different IE: sdg vs. sdf) backend block level device, but the multipather never picks it up / never resumes IO operations, and I generally have then to power cycle the box.
> 
> We have anywhere from 2 to 4 iSCSI sessions open per multipath target, but even one path failing seems to cause the whole multipath to die. I am hoping there is a way to continue on after a path failure, rather than the power cycle. I have tried multipath-tools 0.4.6/0.4.7/0.4.8, and almost every permutation of the configuration I can think of. Maybe I am missing something quite obvious.  
> 

I was wondering what you are doing on the target to cause the device/sdX 
to be removed or what error you get? Normally that only happens if you 
run the iscsiadm logout command, or if the target is sends the initiator 
a error indicating that is going away for good, or there is some other 
error like the CHAP values changed on the target. And in older versions 
of open-iscsi there is a bug where it kills the session and removes sdXs 
a little early on errors that should be recoverable (We found the bug in 
865-* but this is fixed in the open-iscsi git tree and will be fixed in 
the new release), so I just want to make sure I got all the recoverable 
errors.

What kernel are you using, and what happens when you reconnect the 
session and get a new sdX if you run the multipath command by hand?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DM-Multipath path failure questions..
  2007-11-14  6:07 DM-Multipath path failure questions Michael Vallaly
  2007-11-14 17:28 ` Mike Christie
@ 2007-11-14 18:26 ` Kevin Foote
  2007-11-14 21:44   ` Michael Vallaly
  2007-11-14 23:39   ` Michael Vallaly
  1 sibling, 2 replies; 7+ messages in thread
From: Kevin Foote @ 2007-11-14 18:26 UTC (permalink / raw)
  To: device-mapper development

Mike,
If you are going for failover things should look like this..
We go to an Equallogic PS box as well .. through a QLA4052 HBA

You need to change this line in your /etc/multipath.conf file to
reflect what you want multipathd to do.
>         path_grouping_policy    multibus
should read ...
         path_grouping_policy    failover

In turn your maps will look like this ..  (multipath -ll)
media-oracle (30690a018a0b3d369dd4f04191c4090f9)
[size=8 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
  \_ 2:0:8:0 sdj 8:144 [active][ready]
\_ round-robin 0 [enabled]
  \_ 3:0:8:0 sdr 65:16 [active][ready]

and dmsetup table <dev>
#> dmsetup table /dev/mapper/media-oracle
0 16803840 multipath 0 0 2 1 round-robin 0 1 1 8:144 100 round-robin 0
1 1 65:16 100

will show a multipath failover setup


-- 
:wq!
kevin.foote

On Nov 14, 2007 1:07 AM, Michael Vallaly <vaio@nolatency.com> wrote:
>
> Hello,
>
> I am currently using the dm-multipather (multipath-tools) to allow high-availability / increased capacity to our Equallogic iSCSI SAN. I was wondering if anyone had come across a way to re-instantiate a failed path / paths from a multipath target, when the backend device (iscsi initiator) goes away.
>
> All goes well until we have a lengthy network hiccup or non-recoverable iSCSI error in which case the multipather seems to get wedged. The path seems to get stuck in a [active][faulty] state and the backend block device (sdX) actually gets removed from the system. I have tried reconnecting the iSCSI session, after this happens, and get a new (different IE: sdg vs. sdf) backend block level device, but the multipather never picks it up / never resumes IO operations, and I generally have then to power cycle the box.
>
> We have anywhere from 2 to 4 iSCSI sessions open per multipath target, but even one path failing seems to cause the whole multipath to die. I am hoping there is a way to continue on after a path failure, rather than the power cycle. I have tried multipath-tools 0.4.6/0.4.7/0.4.8, and almost every permutation of the configuration I can think of. Maybe I am missing something quite obvious.
>
> Working Multipather
> <snip>
> mpath89 (36090a0281051367df57194d2a37392d5) dm-4 EQLOGIC ,100E-00
> [size=300G][features=1 queue_if_no_path][hwhandler=0]
> \_ round-robin 0 [prio=2][active]
>  \_ 5:0:0:0  sdf 8:80  [active][ready]
>  \_ 6:0:0:0  sdg 8:96  [active][ready]
> </snip>
>
> Wedged Multipather (when a iSCSI session terminates) (All IO queues indefinitely)
> <snip>
> mpath94 (36090a0180087e6045673743d3c01401c) dm-10 ,
> [size=600G][features=1 queue_if_no_path][hwhandler=0]
> \_ round-robin 0 [prio=0][enabled]
>  \_ #:#:#:#  -   #:#   [active][faulty]
> </snip>
>
> Our multipath.conf looks like this:
> <snip>
> defaults {
>         udev_dir                /dev
>         polling_interval        10
>         selector                "round-robin 0"
>         path_grouping_policy    multibus
>         getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
>         #prio_callout            /bin/true
>         #path_checker            readsector0
>         path_checker            directio
>         rr_min_io               100
>         rr_weight               priorities
>         failback                immediate
>         no_path_retry           fail
>         #user_friendly_names     no
>         user_friendly_names     yes
> }
>
> blacklist {
>         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
>         devnode "^hd[a-z][[0-9]*]"
>         devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
> }
>
>
> devices {
>         device {
>                 vendor                  "EQLOGIC"
>                 product                 "100E-00"
>                 path_grouping_policy    multibus
>                 getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
>                 #path_checker            directio
>                 path_checker            readsector0
>                 path_selector           "round-robin 0"
>                 ##hardware_handler        "0"
>                 failback                immediate
>                 rr_weight               priorities
>                 no_path_retry           queue
>                 #no_path_retry           fail
>                 rr_min_io               100
>                 product_blacklist       LUN_Z
>         }
> }
>
> </snip>
>
> Thanks for your help.
>
> - Mike Vallaly
>
>
>
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DM-Multipath path failure questions..
  2007-11-14 18:26 ` Kevin Foote
@ 2007-11-14 21:44   ` Michael Vallaly
  2007-11-14 23:39   ` Michael Vallaly
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Vallaly @ 2007-11-14 21:44 UTC (permalink / raw)
  To: dm-devel

Kevin,

Correct me if im wrong, but I if I changed the path_grouping_policy to "failover" I lose the ability to aggregate IO traffic across multiple active paths at once. Unfortunately in our situation the performance hit would be undesirable.

With your setup do your backend block device names ever change? Say if you had an extended network outage and had to manually reconnect to the SAN?

-Mike

On Wed, 14 Nov 2007 13:26:32 -0500
"Kevin Foote" <kevin.foote@gmail.com> wrote:

> Mike,
> If you are going for failover things should look like this..
> We go to an Equallogic PS box as well .. through a QLA4052 HBA
> 
> You need to change this line in your /etc/multipath.conf file to
> reflect what you want multipathd to do.
> >         path_grouping_policy    multibus
> should read ...
>          path_grouping_policy    failover
> 
> In turn your maps will look like this ..  (multipath -ll)
> media-oracle (30690a018a0b3d369dd4f04191c4090f9)
> [size=8 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active]
>   \_ 2:0:8:0 sdj 8:144 [active][ready]
> \_ round-robin 0 [enabled]
>   \_ 3:0:8:0 sdr 65:16 [active][ready]
> 
> and dmsetup table <dev>
> #> dmsetup table /dev/mapper/media-oracle
> 0 16803840 multipath 0 0 2 1 round-robin 0 1 1 8:144 100 round-robin 0
> 1 1 65:16 100
> 
> will show a multipath failover setup
> 
> 
> -- 
> :wq!
> kevin.foote
> 
> On Nov 14, 2007 1:07 AM, Michael Vallaly <vaio@nolatency.com> wrote:
> >
> > Hello,
> >
> > I am currently using the dm-multipather (multipath-tools) to allow high-availability / increased capacity to our Equallogic iSCSI SAN. I was wondering if anyone had come across a way to re-instantiate a failed path / paths from a multipath target, when the backend device (iscsi initiator) goes away.
> >
> > All goes well until we have a lengthy network hiccup or non-recoverable iSCSI error in which case the multipather seems to get wedged. The path seems to get stuck in a [active][faulty] state and the backend block device (sdX) actually gets removed from the system. I have tried reconnecting the iSCSI session, after this happens, and get a new (different IE: sdg vs. sdf) backend block level device, but the multipather never picks it up / never resumes IO operations, and I generally have then to power cycle the box.
> >
> > We have anywhere from 2 to 4 iSCSI sessions open per multipath target, but even one path failing seems to cause the whole multipath to die. I am hoping there is a way to continue on after a path failure, rather than the power cycle. I have tried multipath-tools 0.4.6/0.4.7/0.4.8, and almost every permutation of the configuration I can think of. Maybe I am missing something quite obvious.
> >
> > Working Multipather
> > <snip>
> > mpath89 (36090a0281051367df57194d2a37392d5) dm-4 EQLOGIC ,100E-00
> > [size=300G][features=1 queue_if_no_path][hwhandler=0]
> > \_ round-robin 0 [prio=2][active]
> >  \_ 5:0:0:0  sdf 8:80  [active][ready]
> >  \_ 6:0:0:0  sdg 8:96  [active][ready]
> > </snip>
> >
> > Wedged Multipather (when a iSCSI session terminates) (All IO queues indefinitely)
> > <snip>
> > mpath94 (36090a0180087e6045673743d3c01401c) dm-10 ,
> > [size=600G][features=1 queue_if_no_path][hwhandler=0]
> > \_ round-robin 0 [prio=0][enabled]
> >  \_ #:#:#:#  -   #:#   [active][faulty]
> > </snip>
> >
> > Our multipath.conf looks like this:
> > <snip>
> > defaults {
> >         udev_dir                /dev
> >         polling_interval        10
> >         selector                "round-robin 0"
> >         path_grouping_policy    multibus
> >         getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
> >         #prio_callout            /bin/true
> >         #path_checker            readsector0
> >         path_checker            directio
> >         rr_min_io               100
> >         rr_weight               priorities
> >         failback                immediate
> >         no_path_retry           fail
> >         #user_friendly_names     no
> >         user_friendly_names     yes
> > }
> >
> > blacklist {
> >         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
> >         devnode "^hd[a-z][[0-9]*]"
> >         devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
> > }
> >
> >
> > devices {
> >         device {
> >                 vendor                  "EQLOGIC"
> >                 product                 "100E-00"
> >                 path_grouping_policy    multibus
> >                 getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
> >                 #path_checker            directio
> >                 path_checker            readsector0
> >                 path_selector           "round-robin 0"
> >                 ##hardware_handler        "0"
> >                 failback                immediate
> >                 rr_weight               priorities
> >                 no_path_retry           queue
> >                 #no_path_retry           fail
> >                 rr_min_io               100
> >                 product_blacklist       LUN_Z
> >         }
> > }
> >
> > </snip>
> >
> > Thanks for your help.
> >
> > - Mike Vallaly
> >
> >
> >
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


-- 

Michael Vallaly
Senior System Administrator
CashNetUSA
200 W Jackson Blvd, Suite 2400
Chicago, IL 60606
mvallaly@cashnetusa.com
(W) 312-568-4230 
(P) 312-933-9589
(C) 847-769-5568

This e-mail contains proprietary information and may be confidential.  If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited.  If you received this message in error, please delete it immediately.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DM-Multipath path failure questions..
  2007-11-14 17:28 ` Mike Christie
@ 2007-11-14 22:33   ` Michael Vallaly
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Vallaly @ 2007-11-14 22:33 UTC (permalink / raw)
  To: dm-devel


Mike,

Long time no chat ;)

We recently discovered/uncovered a "bug" in Equallogic's firmware (its a rare corner case im told) which has the unfortunate side effect of logging out our first iSCSI initiator when using MPIO. It does so with what I would consider to be a "bogus" error code (not recoverable). This in turn kills one of our open-iSCSI sessions (MPIO paths), and the multipather seems to wedge itself once the backend device associated with said session gets removed. We are currently working with Equallogic to fix the issue (they have verified the bug, are able to reproduce it, and have a fix available in the next firmware release). My line of questioning here is to see if there is a way to catch this or similar issues in the future (iscsi session termination) and prevent them from affecting the IO at the DM-
 Multipath layer. 

Thanks again for all your help.

-Mike

On Wed, 14 Nov 2007 11:28:37 -0600
Mike Christie <michaelc@cs.wisc.edu> wrote:

> Michael Vallaly wrote:
> > Hello,
> > 
> > I am currently using the dm-multipather (multipath-tools) to allow high-availability / increased capacity to our Equallogic iSCSI SAN. I was wondering if anyone had come across a way to re-instantiate a failed path / paths from a multipath target, when the backend device (iscsi initiator) goes away. 
> > 
> > All goes well until we have a lengthy network hiccup or non-recoverable iSCSI error in which case the multipather seems to get wedged. The path seems to get stuck in a [active][faulty] state and the backend block device (sdX) actually gets removed from the system. I have tried reconnecting the iSCSI session, after this happens, and get a new (different IE: sdg vs. sdf) backend block level device, but the multipather never picks it up / never resumes IO operations, and I generally have then to power cycle the box.
> > 
> > We have anywhere from 2 to 4 iSCSI sessions open per multipath target, but even one path failing seems to cause the whole multipath to die. I am hoping there is a way to continue on after a path failure, rather than the power cycle. I have tried multipath-tools 0.4.6/0.4.7/0.4.8, and almost every permutation of the configuration I can think of. Maybe I am missing something quite obvious.  
> > 
> 
> I was wondering what you are doing on the target to cause the device/sdX 
> to be removed or what error you get? Normally that only happens if you 
> run the iscsiadm logout command, or if the target is sends the initiator 
> a error indicating that is going away for good, or there is some other 
> error like the CHAP values changed on the target. And in older versions 
> of open-iscsi there is a bug where it kills the session and removes sdXs 
> a little early on errors that should be recoverable (We found the bug in 
> 865-* but this is fixed in the open-iscsi git tree and will be fixed in 
> the new release), so I just want to make sure I got all the recoverable 
> errors.
> 
> What kernel are you using, and what happens when you reconnect the 
> session and get a new sdX if you run the multipath command by hand?
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DM-Multipath path failure questions..
  2007-11-14 18:26 ` Kevin Foote
  2007-11-14 21:44   ` Michael Vallaly
@ 2007-11-14 23:39   ` Michael Vallaly
  2007-11-15  4:55     ` Kevin Foote
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Vallaly @ 2007-11-14 23:39 UTC (permalink / raw)
  To: dm-devel

Kevin,

Correct me if im wrong, but I if I changed the path_grouping_policy to "failover" I lose the ability to aggregate IO traffic across multiple active paths at once. Unfortunately in our situation the performance hit would be undesirable.

With your setup do your backend block device names ever change? Say if you had an extended network outage and had to manually reconnect to the SAN?

-Mike

On Wed, 14 Nov 2007 13:26:32 -0500
"Kevin Foote" <kevin.foote@gmail.com> wrote:

> Mike,
> If you are going for failover things should look like this..
> We go to an Equallogic PS box as well .. through a QLA4052 HBA
> 
> You need to change this line in your /etc/multipath.conf file to
> reflect what you want multipathd to do.
> >         path_grouping_policy    multibus
> should read ...
>          path_grouping_policy    failover
> 
> In turn your maps will look like this ..  (multipath -ll)
> media-oracle (30690a018a0b3d369dd4f04191c4090f9)
> [size=8 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active]
>   \_ 2:0:8:0 sdj 8:144 [active][ready]
> \_ round-robin 0 [enabled]
>   \_ 3:0:8:0 sdr 65:16 [active][ready]
> 
> and dmsetup table <dev>
> #> dmsetup table /dev/mapper/media-oracle
> 0 16803840 multipath 0 0 2 1 round-robin 0 1 1 8:144 100 round-robin 0
> 1 1 65:16 100
> 
> will show a multipath failover setup
> 
> 
> -- 
> :wq!
> kevin.foote
> 
> On Nov 14, 2007 1:07 AM, Michael Vallaly <vaio@nolatency.com> wrote:
> >
> > Hello,
> >
> > I am currently using the dm-multipather (multipath-tools) to allow high-availability / increased capacity to our Equallogic iSCSI SAN. I was wondering if anyone had come across a way to re-instantiate a failed path / paths from a multipath target, when the backend device (iscsi initiator) goes away.
> >
> > All goes well until we have a lengthy network hiccup or non-recoverable iSCSI error in which case the multipather seems to get wedged. The path seems to get stuck in a [active][faulty] state and the backend block device (sdX) actually gets removed from the system. I have tried reconnecting the iSCSI session, after this happens, and get a new (different IE: sdg vs. sdf) backend block level device, but the multipather never picks it up / never resumes IO operations, and I generally have then to power cycle the box.
> >
> > We have anywhere from 2 to 4 iSCSI sessions open per multipath target, but even one path failing seems to cause the whole multipath to die. I am hoping there is a way to continue on after a path failure, rather than the power cycle. I have tried multipath-tools 0.4.6/0.4.7/0.4.8, and almost every permutation of the configuration I can think of. Maybe I am missing something quite obvious.
> >
> > Working Multipather
> > <snip>
> > mpath89 (36090a0281051367df57194d2a37392d5) dm-4 EQLOGIC ,100E-00
> > [size=300G][features=1 queue_if_no_path][hwhandler=0]
> > \_ round-robin 0 [prio=2][active]
> >  \_ 5:0:0:0  sdf 8:80  [active][ready]
> >  \_ 6:0:0:0  sdg 8:96  [active][ready]
> > </snip>
> >
> > Wedged Multipather (when a iSCSI session terminates) (All IO queues indefinitely)
> > <snip>
> > mpath94 (36090a0180087e6045673743d3c01401c) dm-10 ,
> > [size=600G][features=1 queue_if_no_path][hwhandler=0]
> > \_ round-robin 0 [prio=0][enabled]
> >  \_ #:#:#:#  -   #:#   [active][faulty]
> > </snip>
> >
> > Our multipath.conf looks like this:
> > <snip>
> > defaults {
> >         udev_dir                /dev
> >         polling_interval        10
> >         selector                "round-robin 0"
> >         path_grouping_policy    multibus
> >         getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
> >         #prio_callout            /bin/true
> >         #path_checker            readsector0
> >         path_checker            directio
> >         rr_min_io               100
> >         rr_weight               priorities
> >         failback                immediate
> >         no_path_retry           fail
> >         #user_friendly_names     no
> >         user_friendly_names     yes
> > }
> >
> > blacklist {
> >         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
> >         devnode "^hd[a-z][[0-9]*]"
> >         devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
> > }
> >
> >
> > devices {
> >         device {
> >                 vendor                  "EQLOGIC"
> >                 product                 "100E-00"
> >                 path_grouping_policy    multibus
> >                 getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
> >                 #path_checker            directio
> >                 path_checker            readsector0
> >                 path_selector           "round-robin 0"
> >                 ##hardware_handler        "0"
> >                 failback                immediate
> >                 rr_weight               priorities
> >                 no_path_retry           queue
> >                 #no_path_retry           fail
> >                 rr_min_io               100
> >                 product_blacklist       LUN_Z
> >         }
> > }
> >
> > </snip>
> >
> > Thanks for your help.
> >
> > - Mike Vallaly
> >
> >
> >
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: DM-Multipath path failure questions..
  2007-11-14 23:39   ` Michael Vallaly
@ 2007-11-15  4:55     ` Kevin Foote
  0 siblings, 0 replies; 7+ messages in thread
From: Kevin Foote @ 2007-11-15  4:55 UTC (permalink / raw)
  To: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 7091 bytes --]

As far as I know you should be able to do the same as with a SAN. you can
have multiple block-dev entries (to handle your aggregated IO) per path
group. Some one on the list please correct me if Im wrong.. So in my
previous example things would expand like this to handle your aggregated IO
to the device.

media-oracle (30690a018a0b3d369dd4f04191c4090f9)
[size=8 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 2:0:8:0 sdj 8:144 [active][ready]            }
 \_ another dev here                                    } These all make up
path group one
 \_ another dev here                                    }
\_ round-robin 0 [enabled]
 \_ 3:0:8:0 sdr 65:16 [active][ready]           }
 \_ another dev here                                   }
 \_ another dev here                                   } These all make up
path group two
 \_ another dev here                                   }

Our ISCSI net is very isolated and built with failover so unless we loose
two switches at once there should be no hick-up. Sorry cant fill in any more
on that end.

On a side note.. can we go of list Id like to hear more about your
Equallogic issue. Ive got some wierd nonsense filling up my OS logs and
Equallogic can't figure it out either.

-- 
:wq!
kevin.foote

On 11/14/07, Michael Vallaly <vaio@nolatency.com> wrote:
>
> Kevin,
>
> Correct me if im wrong, but I if I changed the path_grouping_policy to
> "failover" I lose the ability to aggregate IO traffic across multiple active
> paths at once. Unfortunately in our situation the performance hit would be
> undesirable.
>
> With your setup do your backend block device names ever change? Say if you
> had an extended network outage and had to manually reconnect to the SAN?
>
> -Mike
>
> On Wed, 14 Nov 2007 13:26:32 -0500
> "Kevin Foote" <kevin.foote@gmail.com> wrote:
>
> > Mike,
> > If you are going for failover things should look like this..
> > We go to an Equallogic PS box as well .. through a QLA4052 HBA
> >
> > You need to change this line in your /etc/multipath.conf file to
> > reflect what you want multipathd to do.
> > >         path_grouping_policy    multibus
> > should read ...
> >          path_grouping_policy    failover
> >
> > In turn your maps will look like this ..  (multipath -ll)
> > media-oracle (30690a018a0b3d369dd4f04191c4090f9)
> > [size=8 GB][features="0"][hwhandler="0"]
> > \_ round-robin 0 [active]
> >   \_ 2:0:8:0 sdj 8:144 [active][ready]
> > \_ round-robin 0 [enabled]
> >   \_ 3:0:8:0 sdr 65:16 [active][ready]
> >
> > and dmsetup table <dev>
> > #> dmsetup table /dev/mapper/media-oracle
> > 0 16803840 multipath 0 0 2 1 round-robin 0 1 1 8:144 100 round-robin 0
> > 1 1 65:16 100
> >
> > will show a multipath failover setup
> >
> >
> > --
> > :wq!
> > kevin.foote
> >
> > On Nov 14, 2007 1:07 AM, Michael Vallaly <vaio@nolatency.com> wrote:
> > >
> > > Hello,
> > >
> > > I am currently using the dm-multipather (multipath-tools) to allow
> high-availability / increased capacity to our Equallogic iSCSI SAN. I was
> wondering if anyone had come across a way to re-instantiate a failed path /
> paths from a multipath target, when the backend device (iscsi initiator)
> goes away.
> > >
> > > All goes well until we have a lengthy network hiccup or
> non-recoverable iSCSI error in which case the multipather seems to get
> wedged. The path seems to get stuck in a [active][faulty] state and the
> backend block device (sdX) actually gets removed from the system. I have
> tried reconnecting the iSCSI session, after this happens, and get a new
> (different IE: sdg vs. sdf) backend block level device, but the multipather
> never picks it up / never resumes IO operations, and I generally have then
> to power cycle the box.
> > >
> > > We have anywhere from 2 to 4 iSCSI sessions open per multipath target,
> but even one path failing seems to cause the whole multipath to die. I am
> hoping there is a way to continue on after a path failure, rather than the
> power cycle. I have tried multipath-tools 0.4.6/0.4.7/0.4.8, and almost
> every permutation of the configuration I can think of. Maybe I am missing
> something quite obvious.
> > >
> > > Working Multipather
> > > <snip>
> > > mpath89 (36090a0281051367df57194d2a37392d5) dm-4 EQLOGIC ,100E-00
> > > [size=300G][features=1 queue_if_no_path][hwhandler=0]
> > > \_ round-robin 0 [prio=2][active]
> > >  \_ 5:0:0:0  sdf 8:80  [active][ready]
> > >  \_ 6:0:0:0  sdg 8:96  [active][ready]
> > > </snip>
> > >
> > > Wedged Multipather (when a iSCSI session terminates) (All IO queues
> indefinitely)
> > > <snip>
> > > mpath94 (36090a0180087e6045673743d3c01401c) dm-10 ,
> > > [size=600G][features=1 queue_if_no_path][hwhandler=0]
> > > \_ round-robin 0 [prio=0][enabled]
> > >  \_ #:#:#:#  -   #:#   [active][faulty]
> > > </snip>
> > >
> > > Our multipath.conf looks like this:
> > > <snip>
> > > defaults {
> > >         udev_dir                /dev
> > >         polling_interval        10
> > >         selector                "round-robin 0"
> > >         path_grouping_policy    multibus
> > >         getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
> > >         #prio_callout            /bin/true
> > >         #path_checker            readsector0
> > >         path_checker            directio
> > >         rr_min_io               100
> > >         rr_weight               priorities
> > >         failback                immediate
> > >         no_path_retry           fail
> > >         #user_friendly_names     no
> > >         user_friendly_names     yes
> > > }
> > >
> > > blacklist {
> > >         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*"
> > >         devnode "^hd[a-z][[0-9]*]"
> > >         devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
> > > }
> > >
> > >
> > > devices {
> > >         device {
> > >                 vendor                  "EQLOGIC"
> > >                 product                 "100E-00"
> > >                 path_grouping_policy    multibus
> > >                 getuid_callout          "/lib/udev/scsi_id -g -u -s
> /block/%n"
> > >                 #path_checker            directio
> > >                 path_checker            readsector0
> > >                 path_selector           "round-robin 0"
> > >                 ##hardware_handler        "0"
> > >                 failback                immediate
> > >                 rr_weight               priorities
> > >                 no_path_retry           queue
> > >                 #no_path_retry           fail
> > >                 rr_min_io               100
> > >                 product_blacklist       LUN_Z
> > >         }
> > > }
> > >
> > > </snip>
> > >
> > > Thanks for your help.
> > >
> > > - Mike Vallaly
> > >
> > >
> > >
> > >
> > >
> > > --
> > > dm-devel mailing list
> > > dm-devel@redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> > >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

[-- Attachment #1.2: Type: text/html, Size: 13686 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-11-15  4:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-14  6:07 DM-Multipath path failure questions Michael Vallaly
2007-11-14 17:28 ` Mike Christie
2007-11-14 22:33   ` Michael Vallaly
2007-11-14 18:26 ` Kevin Foote
2007-11-14 21:44   ` Michael Vallaly
2007-11-14 23:39   ` Michael Vallaly
2007-11-15  4:55     ` Kevin Foote

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.