public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
@ 2012-02-29  5:01 Doug Ledford
       [not found] ` <4F4DB11C.5080203-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Ledford @ 2012-02-29  5:01 UTC (permalink / raw)
  To: Alex Netes
  Cc: Ira Weiny, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org


[-- Attachment #1.1: Type: text/plain, Size: 3223 bytes --]

There are two things that stand in the way of opensm being run on
redundant fabrics easily:

1) The opensm init script only starts one instance of opensm and opensm
will only work on one fabric per instance
2) Even if you start multiple instances, you have to hand modify config
files for each instance and then when you upgrade the opensm rpm you
either loose your modifications or loose getting new default settings

I worked around both of these issues, I've attached the files I used to
do so.

First, I have an opensm init script that allows starting multiple opensm
instances.  It supports configuring this in one of two ways:

1) Create multiple opensm.conf files, each with a numbered suffix (so
opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
instance per config file.  This allows an admin to copy the default
config over and edit the things they need, and on rpm upgrade there will
be a new default opensm.conf file so they can diff between their edited
version and the new default and see if there are changes they need to
bring back in.  This also allows for complete flexibility in setting up
the different fabrics, for instance you could use one type of routing on
one and a totally different type on the others.

2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
the GUIDs variable.  This will cause the opensm init script to
automatically start one instance per GUID, passing the GUID in on the
command line.

For the most part, this works well.  However, openmpi in particular
doesn't like you to have physically separate fabrics that have the same
subnet_prefix, and you can't specify a subnet_prefix on the command line
to go along with the GUIDs.  So I wrote a patch for that and made the
init script unilaterally increment the subnet prefix for each different
GUID it's attaching to.

All in all, we use the attached opensm file in /etc/sysconfig as the
standard place you put options belonging to an init script, we have the
opensm init script, the subnet_prefix patch I wrote, and with those
combined things work quite well.

However, I will note that our init script does not (and will not ever)
play the passwordless root ssh stuff that upstream does.  This is
considered a serious security risk on side.  The idea that a customer
(let's say a wall street bank) should set up passwordless root ssh on
their cluster that's a backend to their web farm?  Oh hell no...

I might recommend that it is long since past time for that particular
misfeature of the upstream opensm init script to be done away with.
Personally, I would simply recommend that on failover from a primary to
a backup that it simply scan the fabric and build a "current guid2lid"
map from what it finds, then start updating from there.  Or something
like that.  But passwordless ssh...bleh.

Oh, and while I've got your ear...is there a good reason the opensm libs
have been soname bumping so frequently?  Is it not possible to extend
the APIs without soname bumps quite so often?

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford


[-- Attachment #1.2: opensm.initd --]
[-- Type: text/plain, Size: 3706 bytes --]

#!/bin/bash
#
# Bring up/down opensm
#
# chkconfig: - 15 85
# description: Activates/Deactivates InfiniBand Subnet Manager
# config: /etc/ofed/opensm.conf
#
### BEGIN INIT INFO
# Provides:       opensm
# Default-Stop: 0 1 2 3 4 5 6
# Required-Start: rdma
# Required-Stop: rdma
# Short-Description: Starts/Stops the InfiniBand Subnet Manager
# Description: Starts/Stops the InfiniBand Subnet Manager
### END INIT INFO

. /etc/rc.d/init.d/functions

prog=/usr/sbin/opensm
PID_FILE=/var/run/opensm.pid
[ -f /etc/sysconfig/opensm ] && . /etc/sysconfig/opensm

[ -n "$PRIORITY" ] && prio="-p $PRIORITY"

ACTION=$1

start()
{
    local OSM_PID=
    if [ -z "$GUIDS" ]; then
        CONFIGS=""
        CONFIG_CNT=0
        for conf in /etc/rdma/opensm.conf.[0-9]*; do
            CONFIGS="$CONFIGS $conf"
            let CONFIG_CNT++
        done
    else
        GUID_CNT=0
        for guid in $GUIDS; do
            let GUID_CNT++
        done
    fi
    [ -f /var/lock/subsys/opensm ] && return 0
    # Start opensm
    echo -n "Starting IB Subnet Manager: "
    [ -n "$PRIORITY" ] && echo -n "Priority=$PRIORITY "
    [ -n "$GUIDS" ] && echo -n "$GUID_CNT guids "
    [ -n "$CONFIGS" ] && echo -n "$CONFIG_CNT instances "
    if [ -n "$GUIDS" ]; then
	SUBNET_COUNT=0
        for guid in $GUIDS; do
	    SUBNET_PREFIX=`printf "0xfe800000000000%02d" $SUBNET_COUNT`
            $prog -B $prio -g $guid --subnet_prefix $SUBNET_PREFIX >/dev/null 2>&1
	    let SUBNET_COUNT++
        done
    elif [ -n "$CONFIGS" ]; then
        for config in $CONFIGS; do
            $prog -B $prio -F $config >/dev/null 2>&1
        done
    else
        $prog -B $prio >/dev/null 2>&1
    fi
    sleep 1
    OSM_PID=`pidof $prog`
    checkpid $OSM_PID
    RC=$?
    [ $RC -eq 0 ] && echo_success || echo_failure
    [ $RC -eq 0 ] && touch /var/lock/subsys/opensm
    [ $RC -eq 0 ] && echo $OSM_PID > $PID_FILE
    echo
    return $RC    
}

stop()
{
    [ -f /var/lock/subsys/opensm ] || return 0

    echo -n "Stopping IB Subnet Manager(s)."

    OSM_PID=`cat $PID_FILE`

    checkpid $OSM_PID
    RC=$?
    if [ $RC -ne 0 ]; then
	rm -f $PID_FILE
	rm -f /var/lock/subsys/opensm
	echo_success
	return 0
    fi
    # Kill opensm
    kill -15 $OSM_PID >/dev/null 2>&1
    cnt=0
    while [ $cnt -lt 6 ]; do
        checkpid $OSM_PID
	if [ $? -ne 0 ]; then
	    break
	fi
        echo -n "."
	sleep 1
	let cnt++
    done

    checkpid $OSM_PID
    if [ $? -eq 0 ]; then
	kill -KILL $OSM_PID > /dev/null 2>&1
	echo -n "+"
        sleep 1
    fi
    checkpid $OSM_PID
    DEAD=$?
    if [ $DEAD -eq 0 ]; then
	echo_failure
	echo
	return 1
    fi
    echo_success 
    echo
 
    # Remove pid file if any.
    rm -f $PID_FILE
    rm -f /var/lock/subsys/opensm
    return 0    
}

restart ()
{
	stop
	start
}

condrestart ()
{
	[ -f /var/lock/subsys/opensm ] && restart || return 0
}

reload ()
{
	[ -f $PID_FILE ] || return 0
	OSM_PID=`cat $PID_FILE`
	action $"Rescanning IB Subnet:" kill -HUP $OSM_PID
	return $?
}

usage ()
{
	echo
	echo "Usage: `basename $0` {start|stop|restart|condrestart|try-restart|force-reload|status}"
	echo
	return 2
}

case $ACTION in
	start|stop|restart|reload|condrestart|try-restart|force-reload)
	    [ `id -u` != "0" ] && exit 4 ;;
esac

case $ACTION in
	start) start; RC=$? ;;
	stop) stop; RC=$? ;;
	restart) restart; RC=$? ;;
	reload) reload; RC=$? ;;
	condrestart) condrestart; RC=$? ;;
	try-restart) condrestart; RC=$? ;;
	force-reload) condrestart; RC=$? ;;
	status) status $prog; RC=$? ;;
	*) usage; RC=$? ;;
esac

exit $RC

[-- Attachment #1.3: opensm.sysconfig --]
[-- Type: text/plain, Size: 3920 bytes --]

# Problem #1: Multiple IB fabrics needing a subnet manager
#
# In the event that a machine has more than one IB subnet attached,
# and that machine is an opensm server, by default, opensm will
# only attach to one port and will not manage the fabric on the
# other port.  There are two ways to solve this problem:
#
# 1) Start opensm on multiple machines and configure it to manage
#    different fabrics on each machine
# 2) Configure opensm to start multiple instances on a single
#    machine
#
# Both solutions to this problem require non-standard configurations.
# In other words, you would normally have to modify /etc/rdma/opensm.conf
# and once you do that, the file will no longer be updated for new
# options when opensm is upgraded.  In an effort to allow people to
# have more than one subnet managed by opensm without having to modify
# the system default opensm.conf file, we have enabled two methods
# for modifying the default opensm config items needed to enable
# multiple fabric management.
#
# Method #1: Create multiple opensm.conf files in non-standard locations
#   Copy /etc/rdma/opensm.conf to /etc/rdma/opensm.conf.<number>
#     (do this once for each instance you want started)
#   Edit each copy of the opensm.conf file to reflect the necessary changes
#     for a multiple instance startup.  If you need to manage more than
#     one fabric, you will have to change the guid option in each file
#     to specify the guid of the specific port you want opensm attached
#     to.
#
# The advantage to method #1 is that, on the off chance you want to do
# really special custom things on different ports, like have different
# QoS settings depending on which port you are attached to, you have the
# freedom to edit any and all settings for each instance without those
# changes affecting other instances or being lost when opensm upgrades.
#
# Method #2: Specify multiple GUIDS variable entries in this file
#   Uncomment the below GUIDS variable and enter each guid you need to attach
#     to into the list.  If using this method you need to enter each
#     guid into the list as we won't attach to any default ports, only
#     those specified in the list.
#
#GUIDS="0x0002c90300048ca1 0x0002c90300048ca2"
#
# The obvious advantage to method #2 is that it's simple and doesn't
# clutter up your file system, but it is far more limited in what you
# can do.  If you enable method #2, then even if you create the files
# referenced in method #1, they will be ignored.
#
# Problem #2: Activating a backup subnet manager
#
# The default priority of opensm is set so that it wants to be the
# primary subnet manager.  This is great when you are only running
# opensm on one server, but if you want to have a non-primary opensm
# instance for failover, then you have to manually edit the opensm.conf
# file like for problem #1.  This carries with it all the problems
# listed above.  If you wish to enable opensm as a non-primary manager,
# then you can uncomment the PRIORITY variable below and set it to
# some number between 0 and 15, where 15 is the highest priority and
# the primary manager, with 0 being the lowest backup server.  This method
# will work with the GUIDS option above, and also with the multiple
# config files in method #1 above.  However, only a single priority is
# supported here.  If you wanted more than one priority (say this machine
# is the primary on the first fabric, and second on the second fabric,
# while the other opensm server is primary on the second fabric and
# second on the primary), then the only way to do that is to use method #1
# above and individually edit the config files.  If you edit the config
# files to set the priority and then also set the priority here, then
# this setting will override the config files and render that particular
# edit useless.
#
#PRIORITY=15

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.4: opensm-3.3.13-prefix.patch --]
[-- Type: text/x-patch; name="opensm-3.3.13-prefix.patch", Size: 2575 bytes --]

diff -up opensm-3.3.13/man/opensm.8.in.prefix opensm-3.3.13/man/opensm.8.in
--- opensm-3.3.13/man/opensm.8.in.prefix	2012-02-28 18:27:33.297714661 -0500
+++ opensm-3.3.13/man/opensm.8.in	2012-02-28 18:31:00.957696942 -0500
@@ -11,6 +11,7 @@ opensm \- InfiniBand subnet manager and 
 [\-g(uid) <GUID in hex>]
 [\-l(mc) <LMC>]
 [\-p(riority) <PRIORITY>]
+[\-\-subnet_prefix <PREFIX in hex>]
 [\-smkey <SM_Key>]
 [\-\-sm_sl <SL number>]
 [\-r(eassign_lids)]
@@ -130,6 +131,13 @@ This will effect the handover cases, whe
 is chosen by priority and GUID.  Range goes from 0
 (default and lowest priority) to 15 (highest).
 .TP
+\fB\-\-subnet_prefix\fR <PREFIX in hex>
+This option specifies the subnet prefix to use in
+on the fabric.  The default prefix is
+0xfe80000000000000.  OpenMPI in particular requires
+separate fabrics plugged into different ports to
+have different prefixes or else it won't run.
+.TP
 \fB\-smkey\fR <SM_Key value>
 This option specifies the SM\'s SM_Key (64 bits).
 This will effect SM authentication.
diff -up opensm-3.3.13/opensm/main.c.prefix opensm-3.3.13/opensm/main.c
--- opensm-3.3.13/opensm/main.c.prefix	2012-01-17 08:22:40.000000000 -0500
+++ opensm-3.3.13/opensm/main.c	2012-02-28 18:31:34.224694111 -0500
@@ -156,6 +156,9 @@ static void show_usage(void)
 	       "          This will effect the handover cases, where master\n"
 	       "          is chosen by priority and GUID.  Range goes\n"
 	       "          from 0 (lowest priority) to 15 (highest).\n\n");
+	printf("--subnet_prefix <prefix>\n"
+	       "          Set the subnet prefix to something other than the\n"
+	       "          default value of 0xfe80000000000000\n\n");
 	printf("--smkey, -k <SM_Key>\n"
 	       "          This option specifies the SM's SM_Key (64 bits).\n"
 	       "          This will effect SM authentication.\n"
@@ -607,6 +610,7 @@ int main(int argc, char *argv[])
 		{"once", 0, NULL, 'o'},
 		{"reassign_lids", 0, NULL, 'r'},
 		{"priority", 1, NULL, 'p'},
+		{"subnet_prefix", 1, NULL, 13},
 		{"smkey", 1, NULL, 'k'},
 		{"routing_engine", 1, NULL, 'R'},
 		{"ucast_cache", 0, NULL, 'A'},
@@ -911,6 +915,11 @@ int main(int argc, char *argv[])
 			printf(" Priority = %d\n", temp);
 			break;
 
+		case 13:
+			opt.subnet_prefix = cl_hton64(strtoull(optarg, NULL, 16));
+			printf(" Subnet_Prefix = <0x%" PRIx64 ">\n", cl_hton64(opt.subnet_prefix));
+			break;
+
 		case 'k':
 			sm_key = cl_hton64(strtoull(optarg, NULL, 16));
 			printf(" SM Key <0x%" PRIx64 ">\n", cl_hton64(sm_key));

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found] ` <4F4DB11C.5080203-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-02-29 19:22   ` Ira Weiny
       [not found]     ` <20120229112229.136f25b7.weiny2-i2BcT+NCU+M@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Ira Weiny @ 2012-02-29 19:22 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Doug,

First thanks for this.  Some comments below.

On Wed, 29 Feb 2012 00:01:16 -0500
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> There are two things that stand in the way of opensm being run on
> redundant fabrics easily:
> 
> 1) The opensm init script only starts one instance of opensm and opensm
> will only work on one fabric per instance
> 2) Even if you start multiple instances, you have to hand modify config
> files for each instance and then when you upgrade the opensm rpm you
> either loose your modifications or loose getting new default settings
> 
> I worked around both of these issues, I've attached the files I used to
> do so.
> 
> First, I have an opensm init script that allows starting multiple opensm
> instances.  It supports configuring this in one of two ways:
> 
> 1) Create multiple opensm.conf files, each with a numbered suffix (so
> opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
> instance per config file.  This allows an admin to copy the default
> config over and edit the things they need, and on rpm upgrade there will
> be a new default opensm.conf file so they can diff between their edited
> version and the new default and see if there are changes they need to
> bring back in.  This also allows for complete flexibility in setting up
> the different fabrics, for instance you could use one type of routing on
> one and a totally different type on the others.
> 
> 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
> the GUIDs variable.  This will cause the opensm init script to
> automatically start one instance per GUID, passing the GUID in on the
> command line.

I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.

> 
> For the most part, this works well.  However, openmpi in particular
> doesn't like you to have physically separate fabrics that have the same
> subnet_prefix, and you can't specify a subnet_prefix on the command line
> to go along with the GUIDs.  So I wrote a patch for that and made the
> init script unilaterally increment the subnet prefix for each different
> GUID it's attaching to.

If you only allow option 1 above this takes care of itself by making the admin configure his subnet prefixes in each config file as appropriate.  The only down side is the loss of new configuration options as you upgrade.  However, that is probably better taken care of by a default config file with each package.  I mentioned this to Sasha years back and was denied since "you can always generate a new one with '-c'".  :-(

Alex would a default config file be acceptable?  It would mean more work on your part.

Ira

> 
> All in all, we use the attached opensm file in /etc/sysconfig as the
> standard place you put options belonging to an init script, we have the
> opensm init script, the subnet_prefix patch I wrote, and with those
> combined things work quite well.
> 
> However, I will note that our init script does not (and will not ever)
> play the passwordless root ssh stuff that upstream does.  This is
> considered a serious security risk on side.  The idea that a customer
> (let's say a wall street bank) should set up passwordless root ssh on
> their cluster that's a backend to their web farm?  Oh hell no...
> 
> I might recommend that it is long since past time for that particular
> misfeature of the upstream opensm init script to be done away with.
> Personally, I would simply recommend that on failover from a primary to
> a backup that it simply scan the fabric and build a "current guid2lid"
> map from what it finds, then start updating from there.  Or something
> like that.  But passwordless ssh...bleh.
> 
> Oh, and while I've got your ear...is there a good reason the opensm libs
> have been soname bumping so frequently?  Is it not possible to extend
> the APIs without soname bumps quite so often?
> 
> -- 
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>               GPG KeyID: 0E572FDD
> 	      http://people.redhat.com/dledford
> 


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]     ` <20120229112229.136f25b7.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2012-02-29 19:47       ` Doug Ledford
       [not found]         ` <20120301021501.GB961@bukharin.us.cray.com>
  2012-03-02 10:30       ` Alex Netes
  1 sibling, 1 reply; 14+ messages in thread
From: Doug Ledford @ 2012-02-29 19:47 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[-- Attachment #1: Type: text/plain, Size: 2735 bytes --]

On 02/29/2012 02:22 PM, Ira Weiny wrote:
> Doug,
> 
> First thanks for this.  Some comments below.
> 
> On Wed, 29 Feb 2012 00:01:16 -0500
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> There are two things that stand in the way of opensm being run on
>> redundant fabrics easily:
>>
>> 1) The opensm init script only starts one instance of opensm and opensm
>> will only work on one fabric per instance
>> 2) Even if you start multiple instances, you have to hand modify config
>> files for each instance and then when you upgrade the opensm rpm you
>> either loose your modifications or loose getting new default settings
>>
>> I worked around both of these issues, I've attached the files I used to
>> do so.
>>
>> First, I have an opensm init script that allows starting multiple opensm
>> instances.  It supports configuring this in one of two ways:
>>
>> 1) Create multiple opensm.conf files, each with a numbered suffix (so
>> opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
>> instance per config file.  This allows an admin to copy the default
>> config over and edit the things they need, and on rpm upgrade there will
>> be a new default opensm.conf file so they can diff between their edited
>> version and the new default and see if there are changes they need to
>> bring back in.  This also allows for complete flexibility in setting up
>> the different fabrics, for instance you could use one type of routing on
>> one and a totally different type on the others.
>>
>> 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
>> the GUIDs variable.  This will cause the opensm init script to
>> automatically start one instance per GUID, passing the GUID in on the
>> command line.
> 
> I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.

Hehehe, I don't think you'll ever have to worry about that.  You have
looked at opensm.conf in recent times I take it?  Replacing that with
command line options in a shell startup script isn't reasonable.

However, if you are going to run a redundant fabric setup, then the two
things you *know* you will have to set are the guid and subnet_prefix
(assuming you want to use openmpi).  If you are going to run
master/slave setup, then the one thing you *know* you will have to set
is the priority.  Supporting setting those items in an init script is
reasonable.  Beyond that, I would agree, you should just edit the config
files.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]           ` <20120301021501.GB961-7GFyYy+Av7rWWZS0+0nfmVaTQe2KTcn/@public.gmane.org>
@ 2012-03-01 13:31             ` Doug Ledford
       [not found]               ` <4F4F7A4B.4060007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-03-01 22:46             ` Ira Weiny
  1 sibling, 1 reply; 14+ messages in thread
From: Doug Ledford @ 2012-03-01 13:31 UTC (permalink / raw)
  To: Brian Ginsbach
  Cc: Ira Weiny, Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[-- Attachment #1: Type: text/plain, Size: 4215 bytes --]

On 02/29/2012 09:15 PM, Brian Ginsbach wrote:
> On Wed, Feb 29, 2012 at 02:47:00PM -0500, Doug Ledford wrote:
>> On 02/29/2012 02:22 PM, Ira Weiny wrote:
>>> Doug,
>>>
>>> First thanks for this.  Some comments below.
>>>
>>> On Wed, 29 Feb 2012 00:01:16 -0500
>>> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>>> There are two things that stand in the way of opensm being run on
>>>> redundant fabrics easily:
>>>>
>>>> 1) The opensm init script only starts one instance of opensm and opensm
>>>> will only work on one fabric per instance
>>>> 2) Even if you start multiple instances, you have to hand modify config
>>>> files for each instance and then when you upgrade the opensm rpm you
>>>> either loose your modifications or loose getting new default settings
>>>>
>>>> I worked around both of these issues, I've attached the files I used to
>>>> do so.
>>>>
>>>> First, I have an opensm init script that allows starting multiple opensm
>>>> instances.  It supports configuring this in one of two ways:
>>>>
>>>> 1) Create multiple opensm.conf files, each with a numbered suffix (so
>>>> opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
>>>> instance per config file.  This allows an admin to copy the default
>>>> config over and edit the things they need, and on rpm upgrade there will
>>>> be a new default opensm.conf file so they can diff between their edited
>>>> version and the new default and see if there are changes they need to
>>>> bring back in.  This also allows for complete flexibility in setting up
>>>> the different fabrics, for instance you could use one type of routing on
>>>> one and a totally different type on the others.
>>>>
>>>> 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
>>>> the GUIDs variable.  This will cause the opensm init script to
>>>> automatically start one instance per GUID, passing the GUID in on the
>>>> command line.
>>>
>>> I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.
>>
>> Hehehe, I don't think you'll ever have to worry about that.  You have
>> looked at opensm.conf in recent times I take it?  Replacing that with
>> command line options in a shell startup script isn't reasonable.
>>
>> However, if you are going to run a redundant fabric setup, then the two
>> things you *know* you will have to set are the guid and subnet_prefix
>> (assuming you want to use openmpi).  If you are going to run
> 
> Assuming you are doing this for openmpi.  The subnet_prefix should
> not be needed if the separate subnets are for disjoint networks
> (mpi and storage) or multiple storage networks.

True enough, but that's why I said openmpi.  It is, after all, a primary
IB fabric user.

>> master/slave setup, then the one thing you *know* you will have to set
>> is the priority.  Supporting setting those items in an init script is
>> reasonable.  Beyond that, I would agree, you should just edit the config
>> files.
>>
> 
> Not everything can be done in the config files.  I'm not sure that
> it is a good idea to have every opensm instance using the same
> temporary and cache directories (OSM_TMP_DIR and OSM_CACHE_DIR
> environment variables).  Seems like these fall into the *know* you
> will have to set category.

Unless opensm is smart enough to allow more than one instance to open
the same log file and interleave their log messages successfully.
Temporary files or cache files could do something like use a pid suffix
if need be.  But yes, I see your point.  Opensm has lots of junk it
likes to put on the drive :-/

> You'd also want to make sure that other potentially very useful
> things are configured in the config files (e.g. log_file and
> log_prefix).  Aren't these also things you *know* you will have to
> set.

I would say we are simply getting to the point where we *know* we need
opensm to handle more than one fabric from a single instance ;-)

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]           ` <20120301021501.GB961-7GFyYy+Av7rWWZS0+0nfmVaTQe2KTcn/@public.gmane.org>
  2012-03-01 13:31             ` Doug Ledford
@ 2012-03-01 22:46             ` Ira Weiny
       [not found]               ` <20120301144645.09aa0d80.weiny2-i2BcT+NCU+M@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Ira Weiny @ 2012-03-01 22:46 UTC (permalink / raw)
  To: Brian Ginsbach
  Cc: Doug Ledford, Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Wed, 29 Feb 2012 20:15:02 -0600
Brian Ginsbach <ginsbach-WVYJKLFxKCc@public.gmane.org> wrote:

> On Wed, Feb 29, 2012 at 02:47:00PM -0500, Doug Ledford wrote:
> > On 02/29/2012 02:22 PM, Ira Weiny wrote:
> > > Doug,
> > > 
> > > First thanks for this.  Some comments below.
> > > 
> > > On Wed, 29 Feb 2012 00:01:16 -0500
> > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > 
> > >> There are two things that stand in the way of opensm being run on
> > >> redundant fabrics easily:
> > >>
> > >> 1) The opensm init script only starts one instance of opensm and opensm
> > >> will only work on one fabric per instance
> > >> 2) Even if you start multiple instances, you have to hand modify config
> > >> files for each instance and then when you upgrade the opensm rpm you
> > >> either loose your modifications or loose getting new default settings
> > >>
> > >> I worked around both of these issues, I've attached the files I used to
> > >> do so.
> > >>
> > >> First, I have an opensm init script that allows starting multiple opensm
> > >> instances.  It supports configuring this in one of two ways:
> > >>
> > >> 1) Create multiple opensm.conf files, each with a numbered suffix (so
> > >> opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
> > >> instance per config file.  This allows an admin to copy the default
> > >> config over and edit the things they need, and on rpm upgrade there will
> > >> be a new default opensm.conf file so they can diff between their edited
> > >> version and the new default and see if there are changes they need to
> > >> bring back in.  This also allows for complete flexibility in setting up
> > >> the different fabrics, for instance you could use one type of routing on
> > >> one and a totally different type on the others.
> > >>
> > >> 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
> > >> the GUIDs variable.  This will cause the opensm init script to
> > >> automatically start one instance per GUID, passing the GUID in on the
> > >> command line.
> > > 
> > > I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.
> > 
> > Hehehe, I don't think you'll ever have to worry about that.  You have
> > looked at opensm.conf in recent times I take it?  Replacing that with
> > command line options in a shell startup script isn't reasonable.
> > 
> > However, if you are going to run a redundant fabric setup, then the two
> > things you *know* you will have to set are the guid and subnet_prefix
> > (assuming you want to use openmpi).  If you are going to run
> 
> Assuming you are doing this for openmpi.  The subnet_prefix should
> not be needed if the separate subnets are for disjoint networks
> (mpi and storage) or multiple storage networks.
> 
> > master/slave setup, then the one thing you *know* you will have to set
> > is the priority.  Supporting setting those items in an init script is
> > reasonable.  Beyond that, I would agree, you should just edit the config
> > files.
> > 
> 
> Not everything can be done in the config files.  I'm not sure that
> it is a good idea to have every opensm instance using the same
> temporary and cache directories (OSM_TMP_DIR and OSM_CACHE_DIR
> environment variables).  Seems like these fall into the *know* you
> will have to set category.

Brian brings up a really good point.  Even though some things can't be configured now, opensm.conf is the better way to configure log file placement etc.  So in my mind this re-emphasises the need to simply allow for multiple opensm.conf's and not introduce another config file.  But as I said before it is Alex's call.

Ira

> 
> You'd also want to make sure that other potentially very useful
> things are configured in the config files (e.g. log_file and
> log_prefix).  Aren't these also things you *know* you will have to
> set.
> 
> -- 
> Brian Ginsbach                          Cray Inc.


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]               ` <20120301144645.09aa0d80.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2012-03-02 10:13                 ` Alex Netes
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Netes @ 2012-03-02 10:13 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Brian Ginsbach, Doug Ledford, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 14:46 Thu 01 Mar     , Ira Weiny wrote:
> On Wed, 29 Feb 2012 20:15:02 -0600
> Brian Ginsbach <ginsbach-WVYJKLFxKCc@public.gmane.org> wrote:
> 
> > On Wed, Feb 29, 2012 at 02:47:00PM -0500, Doug Ledford wrote:
> > > On 02/29/2012 02:22 PM, Ira Weiny wrote:
> > > > Doug,
> > > > 
> > > > First thanks for this.  Some comments below.
> > > > 
> > > > On Wed, 29 Feb 2012 00:01:16 -0500
> > > > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > 
> > > >> There are two things that stand in the way of opensm being run on
> > > >> redundant fabrics easily:
> > > >>
> > > >> 1) The opensm init script only starts one instance of opensm and opensm
> > > >> will only work on one fabric per instance
> > > >> 2) Even if you start multiple instances, you have to hand modify config
> > > >> files for each instance and then when you upgrade the opensm rpm you
> > > >> either loose your modifications or loose getting new default settings
> > > >>
> > > >> I worked around both of these issues, I've attached the files I used to
> > > >> do so.
> > > >>
> > > >> First, I have an opensm init script that allows starting multiple opensm
> > > >> instances.  It supports configuring this in one of two ways:
> > > >>
> > > >> 1) Create multiple opensm.conf files, each with a numbered suffix (so
> > > >> opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
> > > >> instance per config file.  This allows an admin to copy the default
> > > >> config over and edit the things they need, and on rpm upgrade there will
> > > >> be a new default opensm.conf file so they can diff between their edited
> > > >> version and the new default and see if there are changes they need to
> > > >> bring back in.  This also allows for complete flexibility in setting up
> > > >> the different fabrics, for instance you could use one type of routing on
> > > >> one and a totally different type on the others.
> > > >>
> > > >> 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
> > > >> the GUIDs variable.  This will cause the opensm init script to
> > > >> automatically start one instance per GUID, passing the GUID in on the
> > > >> command line.
> > > > 
> > > > I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.
> > > 
> > > Hehehe, I don't think you'll ever have to worry about that.  You have
> > > looked at opensm.conf in recent times I take it?  Replacing that with
> > > command line options in a shell startup script isn't reasonable.
> > > 
> > > However, if you are going to run a redundant fabric setup, then the two
> > > things you *know* you will have to set are the guid and subnet_prefix
> > > (assuming you want to use openmpi).  If you are going to run
> > 
> > Assuming you are doing this for openmpi.  The subnet_prefix should
> > not be needed if the separate subnets are for disjoint networks
> > (mpi and storage) or multiple storage networks.
> > 
> > > master/slave setup, then the one thing you *know* you will have to set
> > > is the priority.  Supporting setting those items in an init script is
> > > reasonable.  Beyond that, I would agree, you should just edit the config
> > > files.
> > > 
> > 
> > Not everything can be done in the config files.  I'm not sure that
> > it is a good idea to have every opensm instance using the same
> > temporary and cache directories (OSM_TMP_DIR and OSM_CACHE_DIR
> > environment variables).  Seems like these fall into the *know* you
> > will have to set category.
> 
> Brian brings up a really good point.  Even though some things can't be configured now, opensm.conf is the better way to configure log file placement etc.  So in my mind this re-emphasises the need to simply allow for multiple opensm.conf's and not introduce another config file.  But as I said before it is Alex's call.
> 
> Ira
> 

I agree with Ira on that point. sysconfig/opensm is being used for syncing
guid2lid cache file between remote SMs, which is out of scope of opensm
configuration and it's completely external mechanism.

Regarding the first approach (using multiple opensm.conf files), I think that
beyond starting all opensm instances, the init script should be able to
manipulate (restart/stop/signal HUP,USR1) to any of the instances of opensm
separately.
Moreover, OSM_TMP_DIR and OSM_CACHE_DIR can be added as options to opensm.conf
so if configured will overwrite the global setting or alternatively add pid or
opensm id as a suffix to the log/state files.

> > 
> > You'd also want to make sure that other potentially very useful
> > things are configured in the config files (e.g. log_file and
> > log_prefix).  Aren't these also things you *know* you will have to
> > set.
> > 
> > -- 
> > Brian Ginsbach                          Cray Inc.
> 
> 
> -- 
> Ira Weiny
> Member of Technical Staff
> Lawrence Livermore National Lab
> 925-423-8008
> weiny2-i2BcT+NCU+M@public.gmane.org
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

-- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]     ` <20120229112229.136f25b7.weiny2-i2BcT+NCU+M@public.gmane.org>
  2012-02-29 19:47       ` Doug Ledford
@ 2012-03-02 10:30       ` Alex Netes
  2012-03-02 15:31         ` Doug Ledford
  2012-03-05 20:51         ` Ira Weiny
  1 sibling, 2 replies; 14+ messages in thread
From: Alex Netes @ 2012-03-02 10:30 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Doug Ledford, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 11:22 Wed 29 Feb     , Ira Weiny wrote:
> Doug,
> 
> First thanks for this.  Some comments below.
> 
> On Wed, 29 Feb 2012 00:01:16 -0500
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > There are two things that stand in the way of opensm being run on
> > redundant fabrics easily:
> > 
> > 1) The opensm init script only starts one instance of opensm and opensm
> > will only work on one fabric per instance
> > 2) Even if you start multiple instances, you have to hand modify config
> > files for each instance and then when you upgrade the opensm rpm you
> > either loose your modifications or loose getting new default settings
> > 
> > I worked around both of these issues, I've attached the files I used to
> > do so.
> > 
> > First, I have an opensm init script that allows starting multiple opensm
> > instances.  It supports configuring this in one of two ways:
> > 
> > 1) Create multiple opensm.conf files, each with a numbered suffix (so
> > opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
> > instance per config file.  This allows an admin to copy the default
> > config over and edit the things they need, and on rpm upgrade there will
> > be a new default opensm.conf file so they can diff between their edited
> > version and the new default and see if there are changes they need to
> > bring back in.  This also allows for complete flexibility in setting up
> > the different fabrics, for instance you could use one type of routing on
> > one and a totally different type on the others.
> > 
> > 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
> > the GUIDs variable.  This will cause the opensm init script to
> > automatically start one instance per GUID, passing the GUID in on the
> > command line.
> 
> I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.
> 
> > 
> > For the most part, this works well.  However, openmpi in particular
> > doesn't like you to have physically separate fabrics that have the same
> > subnet_prefix, and you can't specify a subnet_prefix on the command line
> > to go along with the GUIDs.  So I wrote a patch for that and made the
> > init script unilaterally increment the subnet prefix for each different
> > GUID it's attaching to.
> 
> If you only allow option 1 above this takes care of itself by making the admin configure his subnet prefixes in each config file as appropriate.  The only down side is the loss of new configuration options as you upgrade.  However, that is probably better taken care of by a default config file with each package.  I mentioned this to Sasha years back and was denied since "you can always generate a new one with '-c'".  :-(
> 
> Alex would a default config file be acceptable?  It would mean more work on your part.
> 

What the default opensm.conf would be used for? Just as a reference to the
default values?

-- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
  2012-03-02 10:30       ` Alex Netes
@ 2012-03-02 15:31         ` Doug Ledford
       [not found]           ` <4F50E7CE.6050204-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-03-05 20:51         ` Ira Weiny
  1 sibling, 1 reply; 14+ messages in thread
From: Doug Ledford @ 2012-03-02 15:31 UTC (permalink / raw)
  To: Alex Netes
  Cc: Ira Weiny, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

[-- Attachment #1: Type: text/plain, Size: 4150 bytes --]

On 3/2/2012 5:30 AM, Alex Netes wrote:
> On 11:22 Wed 29 Feb     , Ira Weiny wrote:
>> Doug,
>>
>> First thanks for this.  Some comments below.
>>
>> On Wed, 29 Feb 2012 00:01:16 -0500
>> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>
>>> There are two things that stand in the way of opensm being run on
>>> redundant fabrics easily:
>>>
>>> 1) The opensm init script only starts one instance of opensm and opensm
>>> will only work on one fabric per instance
>>> 2) Even if you start multiple instances, you have to hand modify config
>>> files for each instance and then when you upgrade the opensm rpm you
>>> either loose your modifications or loose getting new default settings
>>>
>>> I worked around both of these issues, I've attached the files I used to
>>> do so.
>>>
>>> First, I have an opensm init script that allows starting multiple opensm
>>> instances.  It supports configuring this in one of two ways:
>>>
>>> 1) Create multiple opensm.conf files, each with a numbered suffix (so
>>> opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
>>> instance per config file.  This allows an admin to copy the default
>>> config over and edit the things they need, and on rpm upgrade there will
>>> be a new default opensm.conf file so they can diff between their edited
>>> version and the new default and see if there are changes they need to
>>> bring back in.  This also allows for complete flexibility in setting up
>>> the different fabrics, for instance you could use one type of routing on
>>> one and a totally different type on the others.
>>>
>>> 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
>>> the GUIDs variable.  This will cause the opensm init script to
>>> automatically start one instance per GUID, passing the GUID in on the
>>> command line.
>>
>> I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.
>>
>>>
>>> For the most part, this works well.  However, openmpi in particular
>>> doesn't like you to have physically separate fabrics that have the same
>>> subnet_prefix, and you can't specify a subnet_prefix on the command line
>>> to go along with the GUIDs.  So I wrote a patch for that and made the
>>> init script unilaterally increment the subnet prefix for each different
>>> GUID it's attaching to.
>>
>> If you only allow option 1 above this takes care of itself by making the admin configure his subnet prefixes in each config file as appropriate.  The only down side is the loss of new configuration options as you upgrade.  However, that is probably better taken care of by a default config file with each package.  I mentioned this to Sasha years back and was denied since "you can always generate a new one with '-c'".  :-(
>>
>> Alex would a default config file be acceptable?  It would mean more work on your part.
>>
> 
> What the default opensm.conf would be used for? Just as a reference to the
> default values?

No, he's referring to having a default config file that is parsed, then
an override config file that is parsed where you only put options you
want to update in the override config file.  That way you could have,
for instance, a default opensm.conf in the normal location and totally
unedited so that it gets updated with each update of the opensm rpm,
then you could create an opensm.conf.1 that is empty except for just a
guid setting, a subnet_prefix setting, maybe a cache dir setting, etc.
In that way, if say the default routing engine gets a new option in the
future, your override config file won't already be populated with the
old stuff.  It's a means of inheritance that is functionally identical
to specifying all this stuff on the command line, but doesn't require a
huge command line or a complex init script.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 898 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]           ` <4F50E7CE.6050204-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-03-02 15:47             ` Doug Ledford
  0 siblings, 0 replies; 14+ messages in thread
From: Doug Ledford @ 2012-03-02 15:47 UTC (permalink / raw)
  To: Alex Netes
  Cc: Ira Weiny, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org


[-- Attachment #1.1: Type: text/plain, Size: 1394 bytes --]

On 3/2/2012 10:31 AM, Doug Ledford wrote:
> On 3/2/2012 5:30 AM, Alex Netes wrote:
>> What the default opensm.conf would be used for? Just as a reference to the
>> default values?
> 
> No, he's referring to having a default config file that is parsed, then
> an override config file that is parsed where you only put options you
> want to update in the override config file.  That way you could have,
> for instance, a default opensm.conf in the normal location and totally
> unedited so that it gets updated with each update of the opensm rpm,
> then you could create an opensm.conf.1 that is empty except for just a
> guid setting, a subnet_prefix setting, maybe a cache dir setting, etc.
> In that way, if say the default routing engine gets a new option in the
> future, your override config file won't already be populated with the
> old stuff.  It's a means of inheritance that is functionally identical
> to specifying all this stuff on the command line, but doesn't require a
> huge command line or a complex init script.

And for what it's worth, it could be as simply done as the attached
(untested, but compiled) patch.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband

[-- Attachment #1.2: opensm-config.patch --]
[-- Type: text/plain, Size: 3087 bytes --]

diff -up opensm-3.3.13/opensm/main.c.config opensm-3.3.13/opensm/main.c
--- opensm-3.3.13/opensm/main.c.config	2012-03-02 10:35:26.783996345 -0500
+++ opensm-3.3.13/opensm/main.c	2012-03-02 10:46:33.471939369 -0500
@@ -131,6 +131,13 @@ static void show_usage(void)
 	       "          The name of the OpenSM config file. When not specified\n"
 	       "          " OSM_DEFAULT_CONFIG_FILE
 	       " will be used (if exists).\n\n");
+	printf("--extra-config, -E <file-name>\n"
+	       "          The name of an OpenSM config file used to over ride\n"
+	       "          the entries in the primary config file.  This is\n"
+	       "          useful when you have more than one opensm instance\n"
+	       "          to manage and you want them all to have a central,\n"
+	       "          shared set of options and you want a second, smaller\n"
+	       "          config file to hold their fabric specific options.\n\n");
 	printf("--create-config, -c <file-name>\n"
 	       "          OpenSM will dump its configuration to the specified file and exit.\n"
 	       "          This is a way to generate OpenSM configuration file template.\n\n");
@@ -569,10 +576,10 @@ int main(int argc, char *argv[])
 	boolean_t run_once_flag = FALSE;
 	int32_t vendor_debug = 0;
 	int next_option;
-	char *conf_template = NULL, *config_file = NULL;
+	char *conf_template, *config_file, *extra_config_file;
 	uint32_t val;
 	const char *const short_option =
-	    "F:c:i:w:O:f:ed:D:g:l:L:s:t:a:u:m:X:R:zM:U:S:P:Y:ANBIQvVhoryxp:n:q:k:C:G:H:";
+	    "F:E:c:i:w:O:f:ed:D:g:l:L:s:t:a:u:m:X:R:zM:U:S:P:Y:ANBIQvVhoryxp:n:q:k:C:G:H:";
 
 	/*
 	   In the array below, the 2nd parameter specifies the number
@@ -584,6 +591,7 @@ int main(int argc, char *argv[])
 	const struct option long_option[] = {
 		{"version", 0, NULL, 12},
 		{"config", 1, NULL, 'F'},
+		{"extra-config", 1, NULL, 'E'},
 		{"create-config", 1, NULL, 'c'},
 		{"debug", 1, NULL, 'd'},
 		{"guid", 1, NULL, 'g'},
@@ -647,6 +655,7 @@ int main(int argc, char *argv[])
 		{"torus_config", 1, NULL, 10},
 		{NULL, 0, NULL, 0}	/* Required at the end of the array */
 	};
+	conf_template = config_file = extra_config_file = NULL;
 
 	/* force stdout to be line-buffered */
 	setvbuf(stdout, NULL, _IOLBF, BUFSIZ);
@@ -672,6 +681,11 @@ int main(int argc, char *argv[])
 			config_file = optarg;
 			printf("Config file is `%s`:\n", config_file);
 			break;
+		case 'E':
+			extra_config_file = optarg;
+			printf("Extra Config file is `%s`:\n",
+			       extra_config_file);
+			break;
 		default:
 			break;
 		}
@@ -687,6 +701,11 @@ int main(int argc, char *argv[])
 	if (osm_subn_parse_conf_file(config_file, &opt) < 0)
 		printf("\nFail to parse config file \'%s\'\n", config_file);
 
+	if (extra_config_file)
+		if (osm_subn_parse_conf_file(extra_config_file, &opt) < 0)
+			printf("\nFailed to parse extra config file `%s`\n",
+			       extra_config_file);
+
 	printf("Command Line Arguments:\n");
 	do {
 		next_option = getopt_long_only(argc, argv, short_option,

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 898 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]               ` <4F4F7A4B.4060007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-03-05 12:52                 ` Hal Rosenstock
       [not found]                   ` <4F54B707.1070606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Hal Rosenstock @ 2012-03-05 12:52 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Brian Ginsbach, Ira Weiny, Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 3/1/2012 8:31 AM, Doug Ledford wrote:
> I would say we are simply getting to the point where we *know* we need
> opensm to handle more than one fabric from a single instance ;-)

Why does a single OpenSM need to handle multiple subnets/fabrics ?
What's the issue with running multiple OpenSMs with each handling it's
own subnet/fabric ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]                   ` <4F54B707.1070606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-03-05 15:28                     ` Doug Ledford
       [not found]                       ` <2962b1d0-a679-45d0-a82b-5d624e2081f9-HOthUlaS0a9+R5eDjrG6zsCp5Q1pQRjfhaY/URYTgi6ny3qCrzbmXA@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Doug Ledford @ 2012-03-05 15:28 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Brian Ginsbach, Ira Weiny, Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

----- Original Message -----
> On 3/1/2012 8:31 AM, Doug Ledford wrote:
> > I would say we are simply getting to the point where we *know* we
> > need
> > opensm to handle more than one fabric from a single instance ;-)
> 
> Why does a single OpenSM need to handle multiple subnets/fabrics ?
> What's the issue with running multiple OpenSMs with each handling
> it's
> own subnet/fabric ?

Because it wasn't built with the idea in mind of sharing logs or cache
directories, etc.  Not to mention that the proliferation of files when
you start multiple instances of opensm is pretty insane (have you
counted how many different files opensm can be configured to require
now a days...).

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]                       ` <2962b1d0-a679-45d0-a82b-5d624e2081f9-HOthUlaS0a9+R5eDjrG6zsCp5Q1pQRjfhaY/URYTgi6ny3qCrzbmXA@public.gmane.org>
@ 2012-03-05 15:53                         ` Hal Rosenstock
       [not found]                           ` <4F54E177.9030302-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Hal Rosenstock @ 2012-03-05 15:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Brian Ginsbach, Ira Weiny, Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 3/5/2012 10:28 AM, Doug Ledford wrote:
> ----- Original Message -----
>> On 3/1/2012 8:31 AM, Doug Ledford wrote:
>>> I would say we are simply getting to the point where we *know* we
>>> need
>>> opensm to handle more than one fabric from a single instance ;-)
>>
>> Why does a single OpenSM need to handle multiple subnets/fabrics ?
>> What's the issue with running multiple OpenSMs with each handling
>> it's
>> own subnet/fabric ?
> 
> Because it wasn't built with the idea in mind of sharing logs or cache
> directories, etc.  

Yes, each instance has it's own configuration and output files.

> Not to mention that the proliferation of files when
> you start multiple instances of opensm is pretty insane (have you
> counted how many different files opensm can be configured to require
> now a days...).

I'm not disagreeing with the fact that there are numerous config files
(the main config file is already large enough without the myriad of
features with their own separate config files) but different subnets
have totally different configuration requirements. It's not just the
subnet prefix, SM priority, and the few other config items that you
identified.

So what would make this better in your mind ?

-- Hal
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
       [not found]                           ` <4F54E177.9030302-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-03-05 17:25                             ` Doug Ledford
  0 siblings, 0 replies; 14+ messages in thread
From: Doug Ledford @ 2012-03-05 17:25 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Brian Ginsbach, Ira Weiny, Alex Netes, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

----- Original Message -----
> On 3/5/2012 10:28 AM, Doug Ledford wrote:
> > ----- Original Message -----
> >> On 3/1/2012 8:31 AM, Doug Ledford wrote:
> >>> I would say we are simply getting to the point where we *know* we
> >>> need
> >>> opensm to handle more than one fabric from a single instance ;-)
> >>
> >> Why does a single OpenSM need to handle multiple subnets/fabrics ?
> >> What's the issue with running multiple OpenSMs with each handling
> >> it's
> >> own subnet/fabric ?
> > 
> > Because it wasn't built with the idea in mind of sharing logs or
> > cache
> > directories, etc.
> 
> Yes, each instance has it's own configuration and output files.
> 
> > Not to mention that the proliferation of files when
> > you start multiple instances of opensm is pretty insane (have you
> > counted how many different files opensm can be configured to
> > require
> > now a days...).
> 
> I'm not disagreeing with the fact that there are numerous config
> files
> (the main config file is already large enough without the myriad of
> features with their own separate config files) but different subnets
> have totally different configuration requirements. It's not just the
> subnet prefix, SM priority, and the few other config items that you
> identified.

In some cases, and in other cases the different subnets are perfect mirrors
of each other (redundant, identical fabrics).

> So what would make this better in your mind ?

If I got to wave a magic wand, I would say it's time that opensm
management started getting simpler, not more complex.  With all of
the various options for routing engines and QoS and partitions, it's
gotten to where you need the equivalent of the old Cisco CNA in order
to configure opensm ;-)


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD
	      http://people.redhat.com/dledford

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server
  2012-03-02 10:30       ` Alex Netes
  2012-03-02 15:31         ` Doug Ledford
@ 2012-03-05 20:51         ` Ira Weiny
  1 sibling, 0 replies; 14+ messages in thread
From: Ira Weiny @ 2012-03-05 20:51 UTC (permalink / raw)
  To: Alex Netes
  Cc: Doug Ledford, Hal Rosenstock,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Fri, 2 Mar 2012 12:30:15 +0200
Alex Netes <alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:

> On 11:22 Wed 29 Feb     , Ira Weiny wrote:
> > Doug,
> > 
> > First thanks for this.  Some comments below.
> > 
> > On Wed, 29 Feb 2012 00:01:16 -0500
> > Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > There are two things that stand in the way of opensm being run on
> > > redundant fabrics easily:
> > > 
> > > 1) The opensm init script only starts one instance of opensm and opensm
> > > will only work on one fabric per instance
> > > 2) Even if you start multiple instances, you have to hand modify config
> > > files for each instance and then when you upgrade the opensm rpm you
> > > either loose your modifications or loose getting new default settings
> > > 
> > > I worked around both of these issues, I've attached the files I used to
> > > do so.
> > > 
> > > First, I have an opensm init script that allows starting multiple opensm
> > > instances.  It supports configuring this in one of two ways:
> > > 
> > > 1) Create multiple opensm.conf files, each with a numbered suffix (so
> > > opensm.conf.1, opensm.conf.2, etc.) and it will start one opensm
> > > instance per config file.  This allows an admin to copy the default
> > > config over and edit the things they need, and on rpm upgrade there will
> > > be a new default opensm.conf file so they can diff between their edited
> > > version and the new default and see if there are changes they need to
> > > bring back in.  This also allows for complete flexibility in setting up
> > > the different fabrics, for instance you could use one type of routing on
> > > one and a totally different type on the others.
> > > 
> > > 2) Edit the file /etc/sysconfig/opensm and define more than one GUID in
> > > the GUIDs variable.  This will cause the opensm init script to
> > > automatically start one instance per GUID, passing the GUID in on the
> > > command line.
> > 
> > I know you are going for ease of use here, which is good, however, I worry about this file becoming a redefinition of opensm.conf.
> > 
> > > 
> > > For the most part, this works well.  However, openmpi in particular
> > > doesn't like you to have physically separate fabrics that have the same
> > > subnet_prefix, and you can't specify a subnet_prefix on the command line
> > > to go along with the GUIDs.  So I wrote a patch for that and made the
> > > init script unilaterally increment the subnet prefix for each different
> > > GUID it's attaching to.
> > 
> > If you only allow option 1 above this takes care of itself by making the admin configure his subnet prefixes in each config file as appropriate.  The only down side is the loss of new configuration options as you upgrade.  However, that is probably better taken care of by a default config file with each package.  I mentioned this to Sasha years back and was denied since "you can always generate a new one with '-c'".  :-(
> > 
> > Alex would a default config file be acceptable?  It would mean more work on your part.
> > 
> 
> What the default opensm.conf would be used for? Just as a reference to the
> default values?

Mainly as a reference.  Right now the "-c" option generates the config with options enabled and set to the defaults.  However, most examples I have seen supply the options commented out for the user to comment in what they want set.

This is the method I chose to use in /etc/infiniband-diags/ibdiag.conf.

Another example is httpd.

bash-4.1# rpm -qil httpd | egrep "httpd\.conf"
/etc/httpd/conf/httpd.conf

bash-4.1# head /etc/httpd/conf/httpd.conf
#
# This is the main Apache server configuration file.  It contains the
# configuration directives that give the server its instructions.
# See <URL:http://httpd.apache.org/docs/2.2/> for detailed information.
# In particular, see
# <URL:http://httpd.apache.org/docs/2.2/mod/directives.html>
# for a discussion of each configuration directive.
#
#
# Do NOT simply read the instructions in here without understanding


Ira

> 
> -- Alex


-- 
Ira Weiny
Member of Technical Staff
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-03-05 20:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-29  5:01 [Patch opensm] Allow for easily configuring multiple fabrics on one opensm server Doug Ledford
     [not found] ` <4F4DB11C.5080203-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-02-29 19:22   ` Ira Weiny
     [not found]     ` <20120229112229.136f25b7.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-02-29 19:47       ` Doug Ledford
     [not found]         ` <20120301021501.GB961@bukharin.us.cray.com>
     [not found]           ` <20120301021501.GB961-7GFyYy+Av7rWWZS0+0nfmVaTQe2KTcn/@public.gmane.org>
2012-03-01 13:31             ` Doug Ledford
     [not found]               ` <4F4F7A4B.4060007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-03-05 12:52                 ` Hal Rosenstock
     [not found]                   ` <4F54B707.1070606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-03-05 15:28                     ` Doug Ledford
     [not found]                       ` <2962b1d0-a679-45d0-a82b-5d624e2081f9-HOthUlaS0a9+R5eDjrG6zsCp5Q1pQRjfhaY/URYTgi6ny3qCrzbmXA@public.gmane.org>
2012-03-05 15:53                         ` Hal Rosenstock
     [not found]                           ` <4F54E177.9030302-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-03-05 17:25                             ` Doug Ledford
2012-03-01 22:46             ` Ira Weiny
     [not found]               ` <20120301144645.09aa0d80.weiny2-i2BcT+NCU+M@public.gmane.org>
2012-03-02 10:13                 ` Alex Netes
2012-03-02 10:30       ` Alex Netes
2012-03-02 15:31         ` Doug Ledford
     [not found]           ` <4F50E7CE.6050204-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-03-02 15:47             ` Doug Ledford
2012-03-05 20:51         ` Ira Weiny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox