From: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
To: Dale Purdy <purdy-sJ/iWh9BUns@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] Dimension port order file support
Date: Wed, 24 Mar 2010 15:32:21 +0200 [thread overview]
Message-ID: <20100324133221.GK4808@me> (raw)
In-Reply-To: <alpine.DEB.2.00.1003031803390.14048-QmwFlmbGB5cTG1waqwXmH7Cf4lofQVJ7@public.gmane.org>
Hi Dale,
On 18:06 Wed 03 Mar , Dale Purdy wrote:
>
> Provide a means to specify on a per switch basis the mapping (order)
> between switch ports and dimensions for Dimension Order Routing. This
> allows the DOR routing engine to be used when the cabling is not
> properly aligned for DOR, either initially, or for an upgrade.
Nice stuff.
Is this something useful with ! '-R dor'?
>
> Signed-off-by: Dale Purdy <purdy-sJ/iWh9BUns@public.gmane.org>
The patch itself is broken somehow - it has double space at start of
non-changed line (it is fixable with sed -e 's/^ / /', so don't resend
patch only for this).
Some more minor comments are below.
> ---
> opensm/include/opensm/osm_subnet.h | 1 +
> opensm/include/opensm/osm_switch.h | 30 +++++++++
> opensm/man/opensm.8.in | 31 +++++++--
> opensm/opensm/main.c | 13 ++++-
> opensm/opensm/osm_subnet.c | 7 ++
> opensm/opensm/osm_switch.c | 2 +-
> opensm/opensm/osm_ucast_mgr.c | 120 ++++++++++++++++++++++++++++++++++++
> 7 files changed, 195 insertions(+), 9 deletions(-)
>
> diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
> index 3970e98..e4e298e 100644
> --- a/opensm/include/opensm/osm_subnet.h
> +++ b/opensm/include/opensm/osm_subnet.h
> @@ -186,6 +186,7 @@ typedef struct osm_subn_opt {
> uint16_t console_port;
> char *port_prof_ignore_file;
> char *hop_weights_file;
> + char *dimn_ports_file;
> boolean_t port_profile_switch_nodes;
> boolean_t sweep_on_trap;
> char *routing_engine_names;
> diff --git a/opensm/include/opensm/osm_switch.h b/opensm/include/opensm/osm_switch.h
> index cb6e5ac..1c6807e 100644
> --- a/opensm/include/opensm/osm_switch.h
> +++ b/opensm/include/opensm/osm_switch.h
> @@ -100,6 +100,7 @@ typedef struct osm_switch {
> uint16_t num_hops;
> uint8_t **hops;
> osm_port_profile_t *p_prof;
> + uint8_t *dimn_ports;
> uint8_t *lft;
> uint8_t *new_lft;
> uint16_t lft_size;
> @@ -871,6 +872,35 @@ static inline uint8_t osm_switch_get_mft_max_position(IN osm_switch_t * p_sw)
> * RETURN VALUE
> */
>
> +/****f* OpenSM: Switch/osm_switch_get_dimn_port
> +* NAME
> +* osm_switch_get_dimn_port
> +*
> +* DESCRIPTION
> +* Get the routing ordered port
> +*
> +* SYNOPSIS
> +*/
> +static inline uint8_t osm_switch_get_dimn_port(IN const osm_switch_t * p_sw,
> + IN uint8_t port_num)
> +{
> + CL_ASSERT(p_sw);
> + if (p_sw->dimn_ports == NULL)
> + return port_num;
> + return p_sw->dimn_ports[port_num];
> +}
> +/*
> +* PARAMETERS
> +* p_sw
> +* [in] Pointer to the switch object.
> +*
> +* port_num
> +* [in] Port number in the switch
> +*
> +* RETURN VALUES
> +* Returns the port number ordered for routing purposes.
> +*/
> +
> /****f* OpenSM: Switch/osm_switch_recommend_path
> * NAME
> * osm_switch_recommend_path
> diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in
> index 7aca8f9..9053611 100644
> --- a/opensm/man/opensm.8.in
> +++ b/opensm/man/opensm.8.in
> @@ -37,6 +37,7 @@ opensm \- InfiniBand subnet manager and administration (SM/SA)
> [\-console-port <port>]
> [\-i(gnore-guids) <equalize-ignore-guids-file>]
> [\-w | \-\-hop_weights_file <path to file>]
> +[\-O | \-\-dimn_ports_file <path to file>]
> [\-f <log file path> | \-\-log_file <log file path> ]
> [\-L | \-\-log_limit <size in MB>] [\-e(rase_log_file)]
> [\-P(config) <partition config file> ]
> @@ -273,6 +274,16 @@ factor of 1. Lines starting with # are comments. Weights affect only the
> output route from the port, so many useful configurations will require weights
> to be specified in pairs.
> .TP
> +\fB\-O\fR, \fB\-\-dimn_ports_file\fR <path to file>
> +This option provides a mapping between hypercube dimensions and ports
> +on a per switch basis for the DOR routing engine. The file consists
> +of lines containing a switch node GUID (specified as a 64 bit hex
> +number, with leading 0x) followed by a list of non-zero port numbers,
> +separated by spaces, one switch per line. The order for the port
> +numbers is in one to one correspondence to the dimensions. Ports not
> +listed on a line are assigned to the remaining dimensions, in port
> +order. Anything after a # is a comment.
> +.TP
> \fB\-x\fR, \fB\-\-honor_guid2lid\fR
> This option forces OpenSM to honor the guid2lid file,
> when it comes out of Standby state, if such file exists
> @@ -969,17 +980,20 @@ algorithm and so uses shortest paths. Instead of spreading traffic
> out across different paths with the same shortest distance, it chooses
> among the available shortest paths based on an ordering of dimensions.
> Each port must be consistently cabled to represent a hypercube
> -dimension or a mesh dimension. Paths are grown from a destination
> -back to a source using the lowest dimension (port) of available paths
> -at each step. This provides the ordering necessary to avoid deadlock.
> +dimension or a mesh dimension. Alternatively, the -O option can be
> +used to assign a custom mapping between the ports on a given switch,
> +and the associated dimension. Paths are grown from a destination back
> +to a source using the lowest dimension (port) of available paths at
> +each step. This provides the ordering necessary to avoid deadlock.
> When there are multiple links between any two switches, they still
> represent only one dimension and traffic is balanced across them
> unless port equalization is turned off. In the case of hypercubes,
> the same port must be used throughout the fabric to represent the
> -hypercube dimension and match on both ends of the cable. In the case
> -of meshes, the dimension should consistently use the same pair of
> -ports, one port on one end of the cable, and the other port on the
> -other end, continuing along the mesh dimension.
> +hypercube dimension and match on both ends of the cable, or the -O
> +option used to accomplish the alignment. In the case of meshes, the
> +dimension should consistently use the same pair of ports, one port on
> +one end of the cable, and the other port on the other end, continuing
> +along the mesh dimension, or the -O option used as an override.
>
> Use '-R dor' option to activate the DOR algorithm.
>
> @@ -1111,3 +1125,6 @@ Thomas Sodring
> .TP
> Ira Weiny
> .RI < weiny2-i2BcT+NCU+M@public.gmane.org >
> +.TP
> +Dale Purdy
> +.RI < purdy-sJ/iWh9BUns@public.gmane.org >
> diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
> index f9a33af..0093aa7 100644
> --- a/opensm/opensm/main.c
> +++ b/opensm/opensm/main.c
> @@ -275,6 +275,10 @@ static void show_usage(void)
> " This option provides the means to define a weighting\n"
> " factor per port for customizing the least weight\n"
> " hops for the routing.\n\n");
> + printf("--dimn_ports_file, -O <path to file>\n"
> + " This option provides the means to define a mapping\n"
> + " between ports and dimension (Order) for controlling\n"
> + " Dimension Order Routing (DOR).\n\n");
> printf("--honor_guid2lid, -x\n"
> " This option forces OpenSM to honor the guid2lid file,\n"
> " when it comes out of Standby state, if such file exists\n"
> @@ -543,7 +547,7 @@ int main(int argc, char *argv[])
> char *conf_template = NULL, *config_file = NULL;
> uint32_t val;
> const char *const short_option =
> - "F:c:i:w:f:ed:D:g:l:L:s:t:a:u:m:X:R:zM:U:S:P:Y:ANBIQvVhoryxp:n:q:k:C:G:H:";
> + "F:c:i:w:O:f:ed:D:g:l:L:s:t:a:u:m:X:R:zM:U:S:P:Y:ANBIQvVhoryxp:n:q:k:C:G:H:";
>
> /*
> In the array below, the 2nd parameter specifies the number
> @@ -560,6 +564,7 @@ int main(int argc, char *argv[])
> {"guid", 1, NULL, 'g'},
> {"ignore_guids", 1, NULL, 'i'},
> {"hop_weights_file", 1, NULL, 'w'},
> + {"dimn_ports_file", 1, NULL, 'O'},
> {"lmc", 1, NULL, 'l'},
> {"sweep", 1, NULL, 's'},
> {"timeout", 1, NULL, 't'},
> @@ -696,6 +701,12 @@ int main(int argc, char *argv[])
> opt.hop_weights_file);
> break;
>
> + case 'O':
> + opt.dimn_ports_file = optarg;
> + printf(" Dimension Ports File = %s\n",
> + opt.dimn_ports_file);
> + break;
> +
> case 'g':
> /*
> Specifies port guid with which to bind.
> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
> index e4126bc..e3f118b 100644
> --- a/opensm/opensm/osm_subnet.c
> +++ b/opensm/opensm/osm_subnet.c
> @@ -324,6 +324,7 @@ static const opt_rec_t opt_tbl[] = {
> { "force_heavy_sweep", OPT_OFFSET(force_heavy_sweep), opts_parse_boolean, NULL, 1 },
> { "port_prof_ignore_file", OPT_OFFSET(port_prof_ignore_file), opts_parse_charp, NULL, 0 },
> { "hop_weights_file", OPT_OFFSET(hop_weights_file), opts_parse_charp, NULL, 0 },
> + { "dimn_ports_file", OPT_OFFSET(dimn_ports_file), opts_parse_charp, NULL, 0 },
> { "port_profile_switch_nodes", OPT_OFFSET(port_profile_switch_nodes), opts_parse_boolean, NULL, 1 },
> { "sweep_on_trap", OPT_OFFSET(sweep_on_trap), opts_parse_boolean, NULL, 1 },
> { "routing_engine", OPT_OFFSET(routing_engine_names), opts_parse_charp, NULL, 0 },
> @@ -742,6 +743,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * p_opt)
> p_opt->accum_log_file = TRUE;
> p_opt->port_prof_ignore_file = NULL;
> p_opt->hop_weights_file = NULL;
> + p_opt->dimn_ports_file = NULL;
> p_opt->port_profile_switch_nodes = FALSE;
> p_opt->sweep_on_trap = TRUE;
> p_opt->use_ucast_cache = FALSE;
> @@ -1385,6 +1387,11 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t * p_opts)
> p_opts->hop_weights_file ? p_opts->hop_weights_file : null_str);
>
> fprintf(out,
> + "# The file holding non-default port order per switch for DOR routing \n"
> + "dimn_ports_file %s\n\n",
> + p_opts->dimn_ports_file ? p_opts->dimn_ports_file : null_str);
> +
> + fprintf(out,
> "# Routing engine\n"
> "# Multiple routing engines can be specified separated by\n"
> "# commas so that specific ordering of routing algorithms will\n"
> diff --git a/opensm/opensm/osm_switch.c b/opensm/opensm/osm_switch.c
> index 1cd8bfc..398fd65 100644
> --- a/opensm/opensm/osm_switch.c
> +++ b/opensm/opensm/osm_switch.c
> @@ -341,7 +341,7 @@ uint8_t osm_switch_recommend_path(IN const osm_switch_t * p_sw,
>
> /* port number starts with one and num_ports is 1 + num phys ports */
> for (i = start_from; i < start_from + num_ports; i++) {
> - port_num = i%num_ports;
> + port_num = osm_switch_get_dimn_port(p_sw, i%num_ports);
> if (!port_num ||
> osm_switch_get_hop_count(p_sw, base_lid, port_num) !=
> least_hops)
> diff --git a/opensm/opensm/osm_ucast_mgr.c b/opensm/opensm/osm_ucast_mgr.c
> index 7ec58b5..e74775c 100644
> --- a/opensm/opensm/osm_ucast_mgr.c
> +++ b/opensm/opensm/osm_ucast_mgr.c
> @@ -46,6 +46,7 @@
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> +#include <ctype.h>
> #include <iba/ib_types.h>
> #include <complib/cl_qmap.h>
> #include <complib/cl_debug.h>
> @@ -488,6 +489,112 @@ static void set_default_hop_wf(cl_map_item_t * p_map_item, void *ctx)
> }
> }
>
> +static void free_dimn_ports(cl_map_item_t * p_map_item, void *ctx)
> +{
> + osm_switch_t *sw = (osm_switch_t *) p_map_item;
> +
> + if (sw->dimn_ports) {
> + free(sw->dimn_ports);
> + sw->dimn_ports = NULL;
> + }
> +}
> +
> +static int set_dimn_ports(void *ctx, uint64_t guid, char *p)
> +{
> + osm_ucast_mgr_t *m = ctx;
> + osm_node_t *node = osm_get_node_by_guid(m->p_subn, cl_hton64(guid));
> + osm_switch_t *sw;
> + uint8_t *dimn_ports = NULL;
> + uint8_t port;
> + uint *ports = NULL;
'uint' is not something standard (we had some build compatibility issues
with 'uint' in infiniband-diags in the past), so what about 'unsigned
int'?
> + const int bpw = sizeof(*ports)*8;
> + int words;
> + int i = 1; /* port 0 maps to port 0 */
> +
> + if (!node || !(sw = node->sw)) {
> + OSM_LOG(m->p_log, OSM_LOG_DEBUG,
> + "switch with guid 0x%016" PRIx64 " is not found\n",
> + guid);
> + return 0;
> + }
> +
> + if (sw->dimn_ports) {
> + OSM_LOG(m->p_log, OSM_LOG_DEBUG,
> + "switch with guid 0x%016" PRIx64 " already listed\n",
> + guid);
It is GIUD double listed case, right? Wouldn't OSM_LOG_VERBOSE be more
appropriate?
> + return 0;
> + }
> +
> + dimn_ports = malloc(sizeof(*dimn_ports)*sw->num_ports);
> + if (!dimn_ports) {
> + OSM_LOG(m->p_log, OSM_LOG_ERROR,
> + "ERR 3A07: cannot allocate memory for dimn_ports\n");
> + return -1;
> + }
> + memset(dimn_ports, 0, sizeof(*dimn_ports)*sw->num_ports);
> +
> + /* the ports array is for record keeping of which ports have
> + * been seen */
> + words = (sw->num_ports + bpw - 1)/bpw;
> + ports = malloc(words*sizeof(*ports));
> + if (!ports) {
> + OSM_LOG(m->p_log, OSM_LOG_ERROR,
> + "ERR 3A08: cannot allocate memory for ports\n");
> + return -1;
> + }
> + memset(ports, 0, words*sizeof(*ports));
> +
> + while ((*p != '\0') && (*p != '#')) {
> + char *e;
> +
> + port = strtoul(p, &e, 0);
> + if ((p == e) || (port == 0) || (port >= sw->num_ports) ||
> + !osm_node_get_physp_ptr(node, port)) {
> + OSM_LOG(m->p_log, OSM_LOG_DEBUG,
> + "bad port %d specified for guid 0x%016" PRIx64 "\n",
> + port, guid);
> + free(dimn_ports);
> + free(ports);
Ditto.
> + return 0;
> + }
> +
> + if (ports[port/bpw] & (1u << (port%bpw))) {
> + OSM_LOG(m->p_log, OSM_LOG_DEBUG,
> + "port %d already specified for guid 0x%016" PRIx64 "\n",
> + port, guid);
Ditto.
> + free(dimn_ports);
> + free(ports);
> + return 0;
> + }
> +
> + ports[port/bpw] |= (1u << (port%bpw));
> + dimn_ports[i++] = port;
> +
> + p = e;
> + while (isspace(*p)) {
> + p++;
> + }
> + }
> +
> + if (i > 1) {
> + for (port = 1; port < sw->num_ports; port++) {
> + /* fill out the rest of the dimn_ports array
> + * in sequence using the remaining unspecified
> + * ports.
> + */
> + if (!(ports[port/bpw] & (1u << (port%bpw)))) {
> + dimn_ports[i++] = port;
> + }
> + }
> + sw->dimn_ports = dimn_ports;
> + } else {
> + free(dimn_ports);
> + }
> +
> + free(ports);
> + return 0;
> +}
> +
> int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * p_mgr)
> {
> uint32_t i;
> @@ -515,6 +622,19 @@ int osm_ucast_mgr_build_lid_matrices(IN osm_ucast_mgr_t * p_mgr)
> }
> }
>
> + cl_qmap_apply_func(p_sw_guid_tbl, free_dimn_ports, NULL);
> + if (p_mgr->p_subn->opt.dimn_ports_file) {
> + OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG,
> + "Fetching dimension ports file \'%s\'\n",
> + p_mgr->p_subn->opt.dimn_ports_file);
> + if (parse_node_map(p_mgr->p_subn->opt.dimn_ports_file,
> + set_dimn_ports, p_mgr)) {
> + OSM_LOG(p_mgr->p_log, OSM_LOG_ERROR, "ERR 3A05: "
> + "cannot parse dimn_ports_file \'%s\'\n",
> + p_mgr->p_subn->opt.dimn_ports_file);
> + }
> + }
> +
Hmm, if it is DOR only it can be done under 'if (is_dor)' (to save
cycles of other REs). Otherwise (generic usability)
ucast_mgr_setup_all_switches() seems as more appropriate place to have
such setup, no?
And what about adding:
if (sw->dimn_ports)
free(dimn_ports);
in osm_switch_delete()?
Sasha
> /*
> Set the switch matrices for each switch's own port 0 LID(s)
> then set the lid matrices for the each switch's leaf nodes.
> --
> 1.7.0
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-03-24 13:32 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-04 0:06 [PATCH] Dimension port order file support Dale Purdy
[not found] ` <alpine.DEB.2.00.1003031803390.14048-QmwFlmbGB5cTG1waqwXmH7Cf4lofQVJ7@public.gmane.org>
2010-03-24 13:32 ` Sasha Khapyorsky [this message]
2010-04-07 19:27 ` Dale Purdy
2010-04-13 14:02 ` Sasha Khapyorsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100324133221.GK4808@me \
--to=sashak-smomgflxvozwk0htik3j/w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=purdy-sJ/iWh9BUns@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox