All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: dm-devel@redhat.com
Subject: Re: multipath-tools: scsi_id based path priorities and multiple prioritizers
Date: Tue, 21 May 2013 16:19:34 +0200	[thread overview]
Message-ID: <519B8276.9070203@suse.de> (raw)
In-Reply-To: <001301ce53be$64265b10$2c731130$@larionov@salva.ee>

On 05/18/2013 01:54 PM, Viktor Larionov wrote:
> Hi everybody!
> 
>  
> 
> First of all, thanks for all the hard work you guys have been doing
> developing dm. It’s an amazing piece of work you have done!
> 
> While working with dm-multipath we have bumped into some limitations
> which we felt bit uncomfortable with, and seems like managed to
> change. I’d thought I share the experience on that with others, in
> hope that this would help somebody.
> 
>  
> 
> Long story short – our servers are connected to our SAN with both fc
> and iscsi links. (same targets, same wwid’s are exported both
> through fc and iscsi)
> 
> Pretty much a standard installation – two independent controllers on
> the storage side (fc and iscsi each), dual port fc controllers on
> the server side + iscsi.
> 
> All this leaves us with approximate of 6 paths per device. (2 fc,
> and 4 iscsi – 1 fc, and 2 iscsi per storage controller)
> 
>  
> 
> Now if we use ALUA, which is standard for our infra (IBM Storewize
> V3700), the picture looks pretty much like this:
> 
>  
> 
> alessandra viktor.larionov # multipath -ll www-2-mysql
> 
> www-2-mysql (360050763008080581000000000000029) dm-37 IBM,2145
> 
> size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw
> 
> |-+- policy='round-robin 0' prio=50 status=active
> 
> | |- 2:0:0:9  sdak 66:64   active ready running
> 
> | |- 3:0:0:9  sdcf 69:48   active ready running
> 
> | `- 4:0:0:9  sdcy 70:96   active ready running
> 
> `-+- policy='round-robin 0' prio=10 status=enabled
> 
>   |- 1:0:0:9  sdl  8:176   active ready running
> 
>   |- 5:0:0:9  sdcb 68:240  active ready running
> 
>   `- 6:0:0:9  sdct 70:16   active ready running
> 
>  
> 
> Where sdak and sdl are fiber links and the rest of those are iscsi.
> Priorities come from alua which correspond to san controller
> preference at this particular moment.
> 
> What we don’t like about this setup is that fc and iscsi links end
> up with the same prioriy in the same group. The idea behind having
> iscsi links on machines having fc at all, is redundancy to fc failures.
> 
> But we surely don’t want to operate iscsi links the times when
> either primary or backup fc are fully operational.
> 
>  
> 
> So this led us to the idea, of somehow telling the prioritizer to be
> more granular and separate fc and iscsi controller priorities. After
> doing some several hour googling, I found out that we are not the
> only ones with such a story, and there has been no solution to the
> point. (take this one for example
> http://www.redhat.com/archives/dm-devel/2008-August/msg00083.html)
> In fact prio_callout which could possibly solve this kind of thing,
> is deprecated.
> 
>  
> 
> It’s true that there’s no easy or trivial way to determine if a path
> behing an sg is fiber or iscsi (or something else). But thinking on
> this issue, we thought that we actully can satisfy if we could just
> assign a custom priority based on a scsi_id of the device. The idea
> behind it is simple – say in our case we have an IBM ServeRAID
> controller, which is SCSI host 0, Emulex Light Pulse which is SCSI
> host 1 and 2 (for each port respectively and all of the rest is
> iSCSI. So if we could give static priorities based on this
> information this could do the trick.
> 
>  
> 
> So, we poked up with code a bit, and wrote up a custom prioritizer,
> called sg_id. (patch for the latest multipath-tools available here:
> http://viktor.ee/multipath-tools-patches/sg_id_prio.patch)
> 
> Usage is very simple: in /etc/multipath.conf: prio „sg_id“, and
> priorities are passed through prio_args as regexes: e.g. a prio_args of
> 
> prio_sg_id(default)=0 prio_sg_id(^[0-2]:0)=40 prio_sg_id(^5:[2-3]:)=30
> 
> will give prio 40 for everything on SCSI hosts 0, 1 and 2, channel
> 0. 30 on scsi_host 5 channels 2 and 3, and everything else will get 0.
> 
>  
> 
> Using sg_id in the upper example we will have sdl and sdak in the
> first group, and all othe other stuff in the second. Which is ok,
> but not quite.
> 
> The problem with this approach for us is that ALUA gives us valuable
> information on our storage priorities (which controller is primary
> and which is secondary for that particular lun at this particular
> moment), and we’re not quite ready to sacrifice this information
> even for sg_id prios. If there only would be a way to use multiple
> prioritizers.
> 
> And so we’ve played another couple of our hours with multipath-tools
> code allowing it to accept multiple prioritizers in prio
> configuration. (patch here
> http://viktor.ee/multipath-tools-patches/multiprio.patch)
> 
> In this case, prioritizers should be separated by coma, semicolon or
> space, and the end priority would be a sum of priorities given by
> all of the specified prioritizers. (a single prioritizer value is
> also accepted of course.)
> 
> As an example:
> 
>         prio                  "sg_id, alua"
> 
>         prio_args             "prio_sg_id(default)=0
> prio_sg_id(^[0-2]:0)=100"
> 
>  
> 
> So combining the two of above with the same example we get:
> 
>  
> 
> alessandra multipath-tools-0.4.9 # multipath -r www-2-mysql
> 
> reload: www-2-mysql (360050763008080581000000000000029) undef IBM,2145
> 
> size=10G features='1 queue_if_no_path' hwhandler='0' wp=undef
> 
> |-+- policy='round-robin 0' prio=150 status=undef
> 
> | `- 2:0:0:9  sdak 66:64   active ready running
> 
> |-+- policy='round-robin 0' prio=110 status=undef
> 
> | `- 1:0:0:9  sdl  8:176   active ready running
> 
> |-+- policy='round-robin 0' prio=50 status=undef
> 
> | |- 3:0:0:9  sdcf 69:48   active ready running
> 
> | `- 4:0:0:9  sdcy 70:96   active ready running
> 
> `-+- policy='round-robin 0' prio=10 status=undef
> 
>   |- 5:0:0:9  sdcb 68:240  active ready running
> 
>   `- 6:0:0:9  sdct 70:16   active ready running
> 
>  
> 
> Exactly what we needed: primary FC link with 150, secondary 110, and
> then follow primary and secondary ISCSI links with 50 and 10
> respectively.
> 
> All in all this one seems to have solved our problem, and well maybe
> can help anybody elses too.
> 
Actually, I like the idea with the stackable prioritizers.
Not sure about the 'sg_id' thing; that's still too much to configure.
We should be identifying the transport, and base some priorities
based on the transport.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

       reply	other threads:[~2013-05-21 14:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <001301ce53be$64265b10$2c731130$@larionov@salva.ee>
2013-05-21 14:19 ` Hannes Reinecke [this message]
     [not found] <51976d53.0c95dc0a.60df.0745SMTPIN_ADDED_BROKEN@mx.google.com>
2013-05-18 13:40 ` multipath-tools: scsi_id based path priorities and multiple prioritizers Christophe Varoqui
2013-05-18 21:53   ` Viktor Larionov
2013-05-18 11:54 Viktor Larionov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519B8276.9070203@suse.de \
    --to=hare@suse.de \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.