All of lore.kernel.org
 help / color / mirror / Atom feed
* rdac priority checker changing priorities
@ 2009-04-29 22:34 Lucas Brasilino
  2009-04-30  6:25 ` Hannes Reinecke
  0 siblings, 1 reply; 8+ messages in thread
From: Lucas Brasilino @ 2009-04-29 22:34 UTC (permalink / raw)
  To: dm-devel

Hi

I don't know if I'm misundertanding something. I've got an DS4700 and I'm
switching from RDAC[1] to multipath, since it's natively supported in
the distribution I use
here (SLES 10 SP2).

Since RDAC[1] works perfect, I'm trying to use 'rdac' priority in multipath.

My /etc/multiconf.conf is quite tiny, since I'm building it step-by-step :-) :

blacklist {
	devnode "^sda[0-9]*"
}

defaults {
	user_friendly_names	yes
	prio			rdac
	path_checker		tur
}

multipaths {
	multipath {
		wwid	3600a0b8000327b900000107549f85224
		alias	mpath0
	}
}

I think that using 'prio rdac' makes multipath to use 'mpath_prio_rdac' tool.

 # multipath -v2 -ll
mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
[size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=6][active]
 \_ 9:0:0:0  sdb 8:16  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 10:0:0:0 sdc 8:32  [active][ghost]

So the first path has priority 6, as I can confirm:

# mpath_prio_rdac /dev/sdb
6
# mpath_prio_rdac /dev/sdc
1

After the first path (prio=6) failure I get:

# multipath -v2 -ll
sdb: rdac prio: inquiry command indicates error
mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
[size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
 \_ 9:0:0:0  sdb 8:16  [failed][faulty]
\_ round-robin 0 [prio=1][enabled]
 \_ 10:0:0:0 sdc 8:32  [active][ghost]

Ok.. working great, activating the second path. But after the faulty
path is restored:

# multipath -v2 -ll
mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
[size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=2][enabled]
 \_ 9:0:0:0  sdb 8:16  [active][ghost]
\_ round-robin 0 [prio=5][active]
 \_ 10:0:0:0 sdc 8:32  [active][ready]

Second path is now priority!!! And of course does not fails back! By
the way, my LUN is configured in
DS4700 in sort a way that the first path *is* the path to preferred controller.

I think path priorities should not change. If so first path goes back
to 'active' status.
Am I misunderstanding something ? Or messing things up?

By the way, here comes the default 'multipath.conf':

#defaults {
#       udev_dir                /dev
#       polling_interval        10
#       selector                "round-robin 0"
#       path_grouping_policy    multibus
#       getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
#       prio                    const
#       path_checker            directio
#       rr_min_io               100
#       max_fds                 8192
#       rr_weight               priorities
#       failback                immediate
#       no_path_retry           fail
#       user_friendly_names     no
#}
#blacklist {
#       wwid 26353900f02796769
#       devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
#       devnode "^hd[a-z][[0-9]*]"
#       devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
#       device {
#               vendor DEC.*
#               product MSA[15]00
#       }
#}
[...]
#devices {
#       device {
#               vendor                  "COMPAQ  "
#               product                 "HSV110 (C)COMPAQ"
#               path_grouping_policy    multibus
#               getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
#               path_checker            directio
#               path_selector           "round-robin 0"
#               hardware_handler        "0"
#               failback                15
#               rr_weight               priorities
#               no_path_retry           queue
#               rr_min_io               100
#               product_blacklist       LUN_Z
#       }
#       device {
#               vendor                  "COMPAQ  "
#               product                 "MSA1000         "
#               path_grouping_policy    multibus
#       }
#}

Thanks a lot in advance
Lucas Brasilino

[1] http://www.lsi.com/rdac/ds4000.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rdac priority checker changing priorities
  2009-04-29 22:34 rdac priority checker changing priorities Lucas Brasilino
@ 2009-04-30  6:25 ` Hannes Reinecke
  2009-04-30 18:05   ` Chandra Seetharaman
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2009-04-30  6:25 UTC (permalink / raw)
  To: device-mapper development

Hi Lucas,

Lucas Brasilino wrote:
> Hi
> 
> I don't know if I'm misundertanding something. I've got an DS4700 and I'm
> switching from RDAC[1] to multipath, since it's natively supported in
> the distribution I use
> here (SLES 10 SP2).
> 
> Since RDAC[1] works perfect, I'm trying to use 'rdac' priority in multipath.
> 
> My /etc/multiconf.conf is quite tiny, since I'm building it step-by-step :-) :
> 
> blacklist {
> 	devnode "^sda[0-9]*"
> }
> 
> defaults {
> 	user_friendly_names	yes
> 	prio			rdac
> 	path_checker		tur
> }
> 
> multipaths {
> 	multipath {
> 		wwid	3600a0b8000327b900000107549f85224
> 		alias	mpath0
> 	}
> }
> 
> I think that using 'prio rdac' makes multipath to use 'mpath_prio_rdac' tool.
> 
>  # multipath -v2 -ll
> mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
> [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
> \_ round-robin 0 [prio=6][active]
>  \_ 9:0:0:0  sdb 8:16  [active][ready]
> \_ round-robin 0 [prio=1][enabled]
>  \_ 10:0:0:0 sdc 8:32  [active][ghost]
> 
> So the first path has priority 6, as I can confirm:
> 
> # mpath_prio_rdac /dev/sdb
> 6
> # mpath_prio_rdac /dev/sdc
> 1
> 
> After the first path (prio=6) failure I get:
> 
> # multipath -v2 -ll
> sdb: rdac prio: inquiry command indicates error
> mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
> [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
> \_ round-robin 0 [prio=0][enabled]
>  \_ 9:0:0:0  sdb 8:16  [failed][faulty]
> \_ round-robin 0 [prio=1][enabled]
>  \_ 10:0:0:0 sdc 8:32  [active][ghost]
> 
> Ok.. working great, activating the second path. But after the faulty
> path is restored:
> 
> # multipath -v2 -ll
> mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
> [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
> \_ round-robin 0 [prio=2][enabled]
>  \_ 9:0:0:0  sdb 8:16  [active][ghost]
> \_ round-robin 0 [prio=5][active]
>  \_ 10:0:0:0 sdc 8:32  [active][ready]
> 
> Second path is now priority!!! And of course does not fails back! By
> the way, my LUN is configured in
> DS4700 in sort a way that the first path *is* the path to preferred controller.
> 
> I think path priorities should not change. If so first path goes back
> to 'active' status.
> Am I misunderstanding something ? Or messing things up?
> 
You are using an old version of multipathing for SLES10 SP2.
This had a bug triggering priority inversion on RDAC.
Please update to the latest version.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rdac priority checker changing priorities
  2009-04-30  6:25 ` Hannes Reinecke
@ 2009-04-30 18:05   ` Chandra Seetharaman
  2009-05-04 10:43     ` Hannes Reinecke
  2009-05-05 17:59     ` Lucas Brasilino
  0 siblings, 2 replies; 8+ messages in thread
From: Chandra Seetharaman @ 2009-04-30 18:05 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

Hannes,

I think we need to revisit the priority value we provide for preferred
path(4) relative to active path (2) and non-preferred(1).

Consider the following scenario:

Access to a lun thru 2 preferred and 2 non-preferred path. Lets call
path group with preferred paths as pg1 and with non-preferred paths as
pg2. 

Initially pg1 has priority of 8 and pg2 has priority of 2. pg1 is chosen
and I/O goes thru pg1, all good.

Both the paths in pg1 fails, pg2 has been made the active path group and
I/O is sent thru that path and since it became "active", its priority
raises to 6 ( 2 path times (active + non-preferred)). 

When one of the paths in pg1 comes back, one would expect the failback
to happen. It doesn't happen as pg1's priority (4) is smaller than that
of pg2 (6). Which is not correct.

We can do the same exercise with more than 4 paths also, like 6, 8 etc.,
and the results are worse.

So, IMO we need to give the disproportionately large number for
preferred path w.r.t active and non-preferred. What do you think ?

chandra



On Thu, 2009-04-30 at 08:25 +0200, Hannes Reinecke wrote:
> Hi Lucas,
> 
> Lucas Brasilino wrote:
> > Hi
> > 
> > I don't know if I'm misundertanding something. I've got an DS4700 and I'm
> > switching from RDAC[1] to multipath, since it's natively supported in
> > the distribution I use
> > here (SLES 10 SP2).
> > 
> > Since RDAC[1] works perfect, I'm trying to use 'rdac' priority in multipath.
> > 
> > My /etc/multiconf.conf is quite tiny, since I'm building it step-by-step :-) :
> > 
> > blacklist {
> > 	devnode "^sda[0-9]*"
> > }
> > 
> > defaults {
> > 	user_friendly_names	yes
> > 	prio			rdac
> > 	path_checker		tur
> > }
> > 
> > multipaths {
> > 	multipath {
> > 		wwid	3600a0b8000327b900000107549f85224
> > 		alias	mpath0
> > 	}
> > }
> > 
> > I think that using 'prio rdac' makes multipath to use 'mpath_prio_rdac' tool.
> > 
> >  # multipath -v2 -ll
> > mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
> > [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
> > \_ round-robin 0 [prio=6][active]
> >  \_ 9:0:0:0  sdb 8:16  [active][ready]
> > \_ round-robin 0 [prio=1][enabled]
> >  \_ 10:0:0:0 sdc 8:32  [active][ghost]
> > 
> > So the first path has priority 6, as I can confirm:
> > 
> > # mpath_prio_rdac /dev/sdb
> > 6
> > # mpath_prio_rdac /dev/sdc
> > 1
> > 
> > After the first path (prio=6) failure I get:
> > 
> > # multipath -v2 -ll
> > sdb: rdac prio: inquiry command indicates error
> > mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
> > [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
> > \_ round-robin 0 [prio=0][enabled]
> >  \_ 9:0:0:0  sdb 8:16  [failed][faulty]
> > \_ round-robin 0 [prio=1][enabled]
> >  \_ 10:0:0:0 sdc 8:32  [active][ghost]
> > 
> > Ok.. working great, activating the second path. But after the faulty
> > path is restored:
> > 
> > # multipath -v2 -ll
> > mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
> > [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
> > \_ round-robin 0 [prio=2][enabled]
> >  \_ 9:0:0:0  sdb 8:16  [active][ghost]
> > \_ round-robin 0 [prio=5][active]
> >  \_ 10:0:0:0 sdc 8:32  [active][ready]
> > 
> > Second path is now priority!!! And of course does not fails back! By
> > the way, my LUN is configured in
> > DS4700 in sort a way that the first path *is* the path to preferred controller.
> > 
> > I think path priorities should not change. If so first path goes back
> > to 'active' status.
> > Am I misunderstanding something ? Or messing things up?
> > 
> You are using an old version of multipathing for SLES10 SP2.
> This had a bug triggering priority inversion on RDAC.
> Please update to the latest version.
> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rdac priority checker changing priorities
  2009-04-30 18:05   ` Chandra Seetharaman
@ 2009-05-04 10:43     ` Hannes Reinecke
  2009-05-04 17:30       ` Chandra Seetharaman
  2009-06-23  0:47       ` Chandra Seetharaman
  2009-05-05 17:59     ` Lucas Brasilino
  1 sibling, 2 replies; 8+ messages in thread
From: Hannes Reinecke @ 2009-05-04 10:43 UTC (permalink / raw)
  To: sekharan; +Cc: dm-devel

Hi Chandra,

Chandra Seetharaman wrote:
> Hannes,
> 
> I think we need to revisit the priority value we provide for preferred
> path(4) relative to active path (2) and non-preferred(1).
> 
> Consider the following scenario:
> 
> Access to a lun thru 2 preferred and 2 non-preferred path. Lets call
> path group with preferred paths as pg1 and with non-preferred paths as
> pg2. 
> 
> Initially pg1 has priority of 8 and pg2 has priority of 2. pg1 is chosen
> and I/O goes thru pg1, all good.
> 
> Both the paths in pg1 fails, pg2 has been made the active path group and
> I/O is sent thru that path and since it became "active", its priority
> raises to 6 ( 2 path times (active + non-preferred)). 
> 
> When one of the paths in pg1 comes back, one would expect the failback
> to happen. It doesn't happen as pg1's priority (4) is smaller than that
> of pg2 (6). Which is not correct.
> 
Is this really a valid case?
This means we'll have a setup like this:

rdac
 pg1
  sda failed
  sdb failed
 pg2
  sdc active
  sdd active

Correct?
So, given your assumptions, the proposed scenario would be represented
like this:

rdac
 pg1
  sda active
  sdb failed
 pg2
  sdc active
  sdd active

So it is really a good idea to switch paths in this case? The 'sdb'
path would not be reachable here, so any path switch command wouldn't
have been received, either. I'm not sure _what_ is going to happen
when we switch paths now and sdb comes back later; but most likely
the entire setup will be messed up then:
  sda (pref & owned) 6
  sdb                0
  sdc (sec)          1
  sdd (sec & owned)  3
and we'll be getting the path layout thoroughly jumbled then.
So I don't really like this idea. We should only be switching
paths when _all_ paths of a path group become available again.
Providing not all paths have failed in the active group, of course.
Then we should be switching paths regardless.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rdac priority checker changing priorities
  2009-05-04 10:43     ` Hannes Reinecke
@ 2009-05-04 17:30       ` Chandra Seetharaman
  2009-06-23  0:47       ` Chandra Seetharaman
  1 sibling, 0 replies; 8+ messages in thread
From: Chandra Seetharaman @ 2009-05-04 17:30 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

Hi Hannes
On Mon, 2009-05-04 at 12:43 +0200, Hannes Reinecke wrote:

<snip>
> > 
> Is this really a valid case?

Yes, Having more than one patch to a (or both) controller(s) is a valid
case.

> This means we'll have a setup like this:
> 
> rdac
>  pg1
>   sda failed
>   sdb failed
>  pg2
>   sdc active
>   sdd active
> 
> Correct?

correct.
> So, given your assumptions, the proposed scenario would be represented

It is not an assumption, it is the behavior I have seen :)
> like this:
> 
> rdac
>  pg1
>   sda active
>   sdb failed
>  pg2
>   sdc active
>   sdd active
> 
> So it is really a good idea to switch paths in this case? The 'sdb'

Yes. We need to switch for two reasons
 - since there is a preferred path available we _should_ use it
   (otherwise it will throw off the load balancing the admin has made
   in the storage).
 - To be consistent with multipath's state before the access to the 
   preferred controller failed. i.e if multipath has configured a dm
   device in this state, multipath _does_ make pg1 the active path 
   group.

> path would not be reachable here, so any path switch command wouldn't
> have been received, either. I'm not sure _what_ is going to happen

Since, both paths are leading to the same controller, mode select sent
for sda would have made sdb also the active controller. But, as you
mentioned it is not seen by dm-multipath.

> when we switch paths now and sdb comes back later; but most likely

The patch I re-submitted last week (Handle multipath paths in a path
group properly during pg_init :
http://marc.info/?l=dm-devel&m=124094710300894&w=2) handles this
situation correctly, by sending an activate during reinstate.

> the entire setup will be messed up then:
>   sda (pref & owned) 6
>   sdb                0
>   sdc (sec)          1
>   sdd (sec & owned)  3

No, this will not be the case. As soon as the access to sdb comes back
it will be seen as pref and owned and hence will get a priority value of
6.

Also, as soon as sda has been made active, sdd will become
passive/ghost, and hence will have the priority value of 1.

> and we'll be getting the path layout thoroughly jumbled then.
> So I don't really like this idea. We should only be switching
> paths when _all_ paths of a path group become available again.
> Providing not all paths have failed in the active group, of course.
> Then we should be switching paths regardless.
> 
Here are the details:

===========================================================
(1) Initial configuration (all are good):
pg1
  sda (pref and active) - 6
  sdb (pref and active) - 6
pg2
  sdc (sec and passive) - 1
  sdd (sec and passive) - 1
------
(2) Access to sdb goes down
------
pg1
  sda (pref and active) - 6
  sdb (not there)       - 0
pg2
  sdc (sec and passive) - 1
  sdd (sec and passive) - 1
------
(3) Access to sda goes down, path group switches
------
pg1
  sda (not there)       - 0
  sdb (not there)       - 0
pg2
  sdc (sec and active)  - 3
  sdd (sec and active)  - 3
------
(4) sda comes back, path group switch _should_ happen here.
    to be consistent with (1). If the path group switch happens, sda
    will have a priority of 6 and sdc/sdd will have priority
    of 1 each (as they will become passive).
    Path switch can happen only if the priority we give for preferred
    path is lot more than the sum of all priorities of all the paths
    in the other path group.
------
pg1
  sda (pref and passive)- 4
  sdb (not there)       - 0
pg2
  sdc (sec and active)  - 3
  sdd (sec and active)  - 3

Hope it is clear now.
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rdac priority checker changing priorities
  2009-04-30 18:05   ` Chandra Seetharaman
  2009-05-04 10:43     ` Hannes Reinecke
@ 2009-05-05 17:59     ` Lucas Brasilino
  1 sibling, 0 replies; 8+ messages in thread
From: Lucas Brasilino @ 2009-05-05 17:59 UTC (permalink / raw)
  To: device-mapper development

Well, maybe it's a newbie opinion, but looks like that's easier if
each path has a
'fixed value' priority, isn't it ? I've made a little code here that returns
a choosen fixed value to paths and everything goes fine.....

regards
Lucas Brasilino

2009/4/30 Chandra Seetharaman <sekharan@us.ibm.com>:
> Hannes,
>
> I think we need to revisit the priority value we provide for preferred
> path(4) relative to active path (2) and non-preferred(1).
>
> Consider the following scenario:
>
> Access to a lun thru 2 preferred and 2 non-preferred path. Lets call
> path group with preferred paths as pg1 and with non-preferred paths as
> pg2.
>
> Initially pg1 has priority of 8 and pg2 has priority of 2. pg1 is chosen
> and I/O goes thru pg1, all good.
>
> Both the paths in pg1 fails, pg2 has been made the active path group and
> I/O is sent thru that path and since it became "active", its priority
> raises to 6 ( 2 path times (active + non-preferred)).
>
> When one of the paths in pg1 comes back, one would expect the failback
> to happen. It doesn't happen as pg1's priority (4) is smaller than that
> of pg2 (6). Which is not correct.
>
> We can do the same exercise with more than 4 paths also, like 6, 8 etc.,
> and the results are worse.
>
> So, IMO we need to give the disproportionately large number for
> preferred path w.r.t active and non-preferred. What do you think ?
>
> chandra
>
>
>
> On Thu, 2009-04-30 at 08:25 +0200, Hannes Reinecke wrote:
>> Hi Lucas,
>>
>> Lucas Brasilino wrote:
>> > Hi
>> >
>> > I don't know if I'm misundertanding something. I've got an DS4700 and I'm
>> > switching from RDAC[1] to multipath, since it's natively supported in
>> > the distribution I use
>> > here (SLES 10 SP2).
>> >
>> > Since RDAC[1] works perfect, I'm trying to use 'rdac' priority in multipath.
>> >
>> > My /etc/multiconf.conf is quite tiny, since I'm building it step-by-step :-) :
>> >
>> > blacklist {
>> >     devnode "^sda[0-9]*"
>> > }
>> >
>> > defaults {
>> >     user_friendly_names     yes
>> >     prio                    rdac
>> >     path_checker            tur
>> > }
>> >
>> > multipaths {
>> >     multipath {
>> >             wwid    3600a0b8000327b900000107549f85224
>> >             alias   mpath0
>> >     }
>> > }
>> >
>> > I think that using 'prio rdac' makes multipath to use 'mpath_prio_rdac' tool.
>> >
>> >  # multipath -v2 -ll
>> > mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
>> > [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
>> > \_ round-robin 0 [prio=6][active]
>> >  \_ 9:0:0:0  sdb 8:16  [active][ready]
>> > \_ round-robin 0 [prio=1][enabled]
>> >  \_ 10:0:0:0 sdc 8:32  [active][ghost]
>> >
>> > So the first path has priority 6, as I can confirm:
>> >
>> > # mpath_prio_rdac /dev/sdb
>> > 6
>> > # mpath_prio_rdac /dev/sdc
>> > 1
>> >
>> > After the first path (prio=6) failure I get:
>> >
>> > # multipath -v2 -ll
>> > sdb: rdac prio: inquiry command indicates error
>> > mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
>> > [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
>> > \_ round-robin 0 [prio=0][enabled]
>> >  \_ 9:0:0:0  sdb 8:16  [failed][faulty]
>> > \_ round-robin 0 [prio=1][enabled]
>> >  \_ 10:0:0:0 sdc 8:32  [active][ghost]
>> >
>> > Ok.. working great, activating the second path. But after the faulty
>> > path is restored:
>> >
>> > # multipath -v2 -ll
>> > mpath0 (3600a0b8000327b900000107549f85224) dm-0 IBM,1814      FAStT
>> > [size=140G][features=1 queue_if_no_path][hwhandler=1 rdac]
>> > \_ round-robin 0 [prio=2][enabled]
>> >  \_ 9:0:0:0  sdb 8:16  [active][ghost]
>> > \_ round-robin 0 [prio=5][active]
>> >  \_ 10:0:0:0 sdc 8:32  [active][ready]
>> >
>> > Second path is now priority!!! And of course does not fails back! By
>> > the way, my LUN is configured in
>> > DS4700 in sort a way that the first path *is* the path to preferred controller.
>> >
>> > I think path priorities should not change. If so first path goes back
>> > to 'active' status.
>> > Am I misunderstanding something ? Or messing things up?
>> >
>> You are using an old version of multipathing for SLES10 SP2.
>> This had a bug triggering priority inversion on RDAC.
>> Please update to the latest version.
>>
>> Cheers,
>>
>> Hannes
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rdac priority checker changing priorities
  2009-05-04 10:43     ` Hannes Reinecke
  2009-05-04 17:30       ` Chandra Seetharaman
@ 2009-06-23  0:47       ` Chandra Seetharaman
  2009-06-23  6:20         ` Hannes Reinecke
  1 sibling, 1 reply; 8+ messages in thread
From: Chandra Seetharaman @ 2009-06-23  0:47 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

Hi Hannes,

Please see the attached file for the real example.

Can I go ahead and generate a patch to increase the priority of the
preferred path to say, 50 ?

chandra
On Mon, 2009-05-04 at 12:43 +0200, Hannes Reinecke wrote:
> Hi Chandra,
> 
> Chandra Seetharaman wrote:
> > Hannes,
> > 
> > I think we need to revisit the priority value we provide for preferred
> > path(4) relative to active path (2) and non-preferred(1).
> > 
> > Consider the following scenario:
> > 
> > Access to a lun thru 2 preferred and 2 non-preferred path. Lets call
> > path group with preferred paths as pg1 and with non-preferred paths as
> > pg2. 
> > 
> > Initially pg1 has priority of 8 and pg2 has priority of 2. pg1 is chosen
> > and I/O goes thru pg1, all good.
> > 
> > Both the paths in pg1 fails, pg2 has been made the active path group and
> > I/O is sent thru that path and since it became "active", its priority
> > raises to 6 ( 2 path times (active + non-preferred)). 
> > 
> > When one of the paths in pg1 comes back, one would expect the failback
> > to happen. It doesn't happen as pg1's priority (4) is smaller than that
> > of pg2 (6). Which is not correct.
> > 
> Is this really a valid case?
> This means we'll have a setup like this:
> 
> rdac
>  pg1
>   sda failed
>   sdb failed
>  pg2
>   sdc active
>   sdd active
> 
> Correct?
> So, given your assumptions, the proposed scenario would be represented
> like this:
> 
> rdac
>  pg1
>   sda active
>   sdb failed
>  pg2
>   sdc active
>   sdd active
> 
> So it is really a good idea to switch paths in this case? The 'sdb'
> path would not be reachable here, so any path switch command wouldn't
> have been received, either. I'm not sure _what_ is going to happen
> when we switch paths now and sdb comes back later; but most likely
> the entire setup will be messed up then:
>   sda (pref & owned) 6
>   sdb                0
>   sdc (sec)          1
>   sdd (sec & owned)  3
> and we'll be getting the path layout thoroughly jumbled then.
> So I don't really like this idea. We should only be switching
> paths when _all_ paths of a path group become available again.
> Providing not all paths have failed in the active group, of course.
> Then we should be switching paths regardless.
> 
> Cheers,
> 
> Hannes

[-- Attachment #2: typescript --]
[-- Type: text/plain, Size: 4379 bytes --]

$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=12][active]
 \_ 2:0:1:4 sdu 65:64 [active][ready]
 \_ 2:0:0:4 sdp 8:240 [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 1:0:1:4 sdk 8:160 [active][ghost]
 \_ 1:0:0:4 sdf 8:80  [active][ghost]
$ # disabled one preferred path
$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
sdp: rdac prio: inquiry command indicates error
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=6][active]
 \_ 2:0:1:4 sdu 65:64 [active][ready]
 \_ 2:0:0:4 sdp 8:240 [failed][faulty]
\_ round-robin 0 [prio=2][enabled]
 \_ 1:0:1:4 sdk 8:160 [active][ghost]
 \_ 1:0:0:4 sdf 8:80  [active][ghost]
$ # ALL GOOD
$ # disabled another preferred path
$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
sdu: rdac prio: inquiry command indicates error
sdp: rdac prio: inquiry command indicates error
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:1:4 sdu 65:64 [failed][faulty]
 \_ 2:0:0:4 sdp 8:240 [failed][faulty]
\_ round-robin 0 [prio=6][active]
 \_ 1:0:1:4 sdk 8:160 [active][ready]
 \_ 1:0:0:4 sdf 8:80  [active][ready]
$ # failed over to the non-preferred path
$ # that is good
$ # disabled a non-preferred path
$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
sdu: rdac prio: inquiry command indicates error
sdp: rdac prio: inquiry command indicates error
sdk: rdac prio: inquiry command indicates error
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:1:4 sdu 65:64 [failed][faulty]
 \_ 2:0:0:4 sdp 8:240 [failed][faulty]
\_ round-robin 0 [prio=3][active]
 \_ 1:0:1:4 sdk 8:160 [failed][faulty]
 \_ 1:0:0:4 sdf 8:80  [active][ready]
$ # all good
$ # enabled a non-preferred path
$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
sdu: rdac prio: inquiry command indicates error
sdp: rdac prio: inquiry command indicates error
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:1:4 sdu 65:64 [failed][faulty]
 \_ 2:0:0:4 sdp 8:240 [failed][faulty]
\_ round-robin 0 [prio=6][active]
 \_ 1:0:1:4 sdk 8:160 [active][ready]
 \_ 1:0:0:4 sdf 8:80  [active][ready]
$ # Good
$ # enabled a preferred path.
$ # expected failback to the preferred path group
$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
sdp: rdac prio: inquiry command indicates error
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=4][enabled]
 \_ 2:0:1:4 sdu 65:64 [active][ghost]
 \_ 2:0:0:4 sdp 8:240 [failed][faulty]
\_ round-robin 0 [prio=6][active]
 \_ 1:0:1:4 sdk 8:160 [active][ready]
 \_ 1:0:0:4 sdf 8:80  [active][ready]
$ # no. failback did not happen. [see the first path group still states "ghost"]
$ # the reason is that the priority of the preferred path group is less than
$ # that of the non-preferred path group.
$ # Basically, non-preferred path is used even though one preferred path is available
$ # which is not correct
$ # wait for a a minute, may be
$ sleep 60
$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
sdp: rdac prio: inquiry command indicates error
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=4][enabled]
 \_ 2:0:1:4 sdu 65:64 [active][ghost]
 \_ 2:0:0:4 sdp 8:240 [failed][faulty]
\_ round-robin 0 [prio=6][active]
 \_ 1:0:1:4 sdk 8:160 [active][ready]
 \_ 1:0:0:4 sdf 8:80  [active][ready]
$ # nope... failback didn't happen.
$ # enabled the other preferred path.
$ # only now the failback happens.
$ 
$ multipath -ll 3600a0b800011a1ee00003f834a3f7a65
3600a0b800011a1ee00003f834a3f7a65 dm-0 IBM,1815      FAStT
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac]
\_ round-robin 0 [prio=12][active]
 \_ 2:0:1:4 sdu 65:64 [active][ready]
 \_ 2:0:0:4 sdp 8:240 [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 1:0:1:4 sdk 8:160 [active][ghost]
 \_ 1:0:0:4 sdf 8:80  [active][ghost]
$ exit


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: rdac priority checker changing priorities
  2009-06-23  0:47       ` Chandra Seetharaman
@ 2009-06-23  6:20         ` Hannes Reinecke
  0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2009-06-23  6:20 UTC (permalink / raw)
  To: sekharan; +Cc: dm-devel

Chandra Seetharaman wrote:
> Hi Hannes,
> 
> Please see the attached file for the real example.
> 
> Can I go ahead and generate a patch to increase the priority of the
> preferred path to say, 50 ?
> 
No. That's just wrong and we'll run into the same problem once
someone increases the number of paths to 50.

What we should do here is to modify the priority value, or
rather the way priority is used.

We should be splitting the current priority value into two
fields, pg priority and number of paths in a pg.
pg priority is the priority of a _single_ path here, and,
by definition as we're using group_by_prio, the priority
of each path in the pg.

Then we should be modifying the algorithm to choose the
next pg to do something like this:

-> Choose the pg with the highest priority
-> If two pgs have the same priority choose the pg
   with the highest path count. Maybe we could
   even use the highest _valid_ path count here,
   depending if we have the information at that point.

This algorithm would solve the problem we're having
now once and for all.

Just adding the priorities of the individual paths will
always lead to these type of problems.

I see if I can find some time to draw up a patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-06-23  6:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-29 22:34 rdac priority checker changing priorities Lucas Brasilino
2009-04-30  6:25 ` Hannes Reinecke
2009-04-30 18:05   ` Chandra Seetharaman
2009-05-04 10:43     ` Hannes Reinecke
2009-05-04 17:30       ` Chandra Seetharaman
2009-06-23  0:47       ` Chandra Seetharaman
2009-06-23  6:20         ` Hannes Reinecke
2009-05-05 17:59     ` Lucas Brasilino

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.