From: Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Alex Netes <alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Hal Rosenstock <hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
Date: Wed, 29 Feb 2012 11:22:29 -0800 [thread overview]
Message-ID: <20120229112229.136f25b7.weiny2@llnl.gov> (raw)
In-Reply-To: <4F4DB11C.5080203-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Doug,
First thanks for this. Some comments below.
On Wed, 29 Feb 2012 00:01:16 -0500
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> There are two things that stand in the way of opensm being run on
> redundant fabrics easily:
>
> 1) The opensm init script only starts one instance of opensm and opensm
> will only work on one fabric per instance
> 2) Even if you start multiple instances, you have to hand modify config
> files for each instance and then when you upgrade the opensm rpm you
> either loose your modifications or loose getting new default settings
>
> I worked around both of these issues, I've attached the files I used to
> do so.
>
> First, I have an opensm init script that allows starting multiple opensm
> instances. It supports configuring this in one of two ways:
>
> 1) Create multiple opensm.conf files, each with a numbered suffix (so
> opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
> instance per config file. This allows an admin to copy the default
> config over and edit the things they need, and on rpm upgrade there will
> be a new default opensm.conf file so they can diff between their edited
> version and the new default and see if there are changes they need to
> bring back in. This also allows for complete flexibility in setting up
> the different fabrics, for instance you could use one type of routing on
> one and a totally different type on the others.
>
> 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
> the GUIDs variable. This will cause the opensm init script to
> automatically start one instance per GUID, passing the GUID in on the
> command line.
I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.
>
> For the most part, this works well. However, openmpi in particular
> doesn't like you to have physically separate fabrics that have the same
> subnet_prefix, and you can't specify a subnet_prefix on the command line
> to go along with the GUIDs. So I wrote a patch for that and made the
> init script unilaterally increment the subnet prefix for each different
> GUID it's attaching to.
If you only allow option 1 above this takes care of itself by making the admin configure his subnet prefixes in each config file as appropriate. The only down side is the loss of new configuration options as you upgrade. However, that is probably better taken care of by a default config file with each package. I mentioned this to Sasha years back and was denied since "you can always generate a new one with '-c'". :-(
Alex would a default config file be acceptable? It would mean more work on your part.
Ira
>
> All in all, we use the attached opensm file in /etc/sysconfig as the
> standard place you put options belonging to an init script, we have the
> opensm init script, the subnet_prefix patch I wrote, and with those
> combined things work quite well.
>
> However, I will note that our init script does not (and will not ever)
> play the passwordless root ssh stuff that upstream does. This is
> considered a serious security risk on side. The idea that a customer
> (let's say a wall street bank) should set up passwordless root ssh on
> their cluster that's a backend to their web farm? Oh hell no...
>
> I might recommend that it is long since past time for that particular
> misfeature of the upstream opensm init script to be done away with.
> Personally, I would simply recommend that on failover from a primary to
> a backup that it simply scan the fabric and build a "current guid2lid"
> map from what it finds, then start updating from there. Or something
> like that. But passwordless ssh...bleh.
>
> Oh, and while I've got your ear...is there a good reason the opensm libs
> have been soname bumping so frequently? Is it not possible to extend
> the APIs without soname bumps quite so often?
>
> --
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> GPG KeyID: 0E572FDD
> http://people.redhat.com/dledford
>
--
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-02-29 19:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-29 5:01 [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server Doug Ledford
[not found] ` <4F4DB11C.5080203-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-02-29 19:22 ` Ira Weiny [this message]
[not found] ` <20120229112229.136f25b7.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-02-29 19:47 ` Doug Ledford
[not found] ` <20120301021501.GB961@bukharin.us.cray.com>
[not found] ` <20120301021501.GB961-7GFyYy+Av7rWWZS0+0nfmVaTQe2KTcn/@public.gmane.org>
2012-03-01 13:31 ` Doug Ledford
[not found] ` <4F4F7A4B.4060007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-03-05 12:52 ` Hal Rosenstock
[not found] ` <4F54B707.1070606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-03-05 15:28 ` Doug Ledford
[not found] ` <2962b1d0-a679-45d0-a82b-5d624e2081f9-HOthUlaS0a9+R5eDjrG6zsCp5Q1pQRjfhaY/URYTgi6ny3qCrzbmXA@public.gmane.org>
2012-03-05 15:53 ` Hal Rosenstock
[not found] ` <4F54E177.9030302-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-03-05 17:25 ` Doug Ledford
2012-03-01 22:46 ` Ira Weiny
[not found] ` <20120301144645.09aa0d80.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-03-02 10:13 ` Alex Netes
2012-03-02 10:30 ` Alex Netes
2012-03-02 15:31 ` Doug Ledford
[not found] ` <4F50E7CE.6050204-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-03-02 15:47 ` Doug Ledford
2012-03-05 20:51 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120229112229.136f25b7.weiny2@llnl.gov \
--to=weiny2-i2bct+ncu+m@public.gmane.org \
--cc=alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=hal-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox