Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.
       [not found]         ` <20090831170144.da0e7185.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2009-10-23 17:45           ` Sasha Khapyorsky
  0 siblings, 0 replies; 14+ messages in thread
From: Sasha Khapyorsky @ 2009-10-23 17:45 UTC (permalink / raw)
  To: Ira Weiny; +Cc: linux-rdma

On 17:01 Mon 31 Aug     , Ira Weiny wrote:
> 
> The discussion on the list has digressed from this patch.  I still think this
> patch is valid and adds a level of flexibility which is needed regardless of
> what is decided about libibmad.  Do you agree?

Not really. I still think that needed level of flexibility is available
for us without adding new mandatory APIs.

> Also, the last patch in the series ([PATCH 5/5] infiniband-diags/libibnetdisc:
> remove members of the fabric struct which are used in the scan only) cleans up
> some stuff from the external interface.  If you really don't want to introduce
> a context object, then I can regenerate that final patch without the context.

Yes, since this patch is unrelated logically to a context stuff. It would
be good to rebase it back and repost as independent change.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
       [not found]         ` <20090826164026.8dcce4b2.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2009-10-23 23:43           ` Sasha Khapyorsky
  2009-12-20 12:14             ` [PATCH] tests/subnet_discover: discover test utility Sasha Khapyorsky
  0 siblings, 1 reply; 14+ messages in thread
From: Sasha Khapyorsky @ 2009-10-23 23:43 UTC (permalink / raw)
  To: Ira Weiny; +Cc: linux-rdma

On 16:40 Wed 26 Aug     , Ira Weiny wrote:
> 
> Of course!  :-)  But first I would like to mention some numbers from the
> prototype code I have.  When running on a small fabric the additional overhead
> of thread creation actually slows down the scan.  :-(
> 
> Current master:         Threaded version:
> real    0m0.101s         0m0.266s
> user    0m0.000s         0m0.000s
> sys     0m0.011s         0m0.014s
> 
> 
> But, as expected, on a large system (1152 nodes) there is a decent speed up.
> 
> Current Master:         Threaded version:
> real    0m3.046s         0m1.748s
> user    0m0.073s         0m0.331s
> sys     0m0.158s         0m0.822s
> 
> However, the biggest speed up comes when there are errors on the fabric.  This
> is the same 1152 node cluster with just 14 "bad" ports on the fabric.  This is
> of course because the scan continues "around" the bad ports.
> 
> Current Master:         Threaded version:
> real    0m33.051s        0m5.609s
> user    0m0.071s         0m0.353s
> sys     0m0.156s         0m1.113s
> 
> Since you are usually running these tools when things are bad I think there is
> a big gain here.  Even running with a faster timeout of 200ms results in a big
> difference.
> 
> Current Master:        Threaded version:
> real    0m9.149s        0m2.223s
> user    0m0.016s        0m0.374s
> sys     0m0.372s        0m1.056s
> 
> With that in mind...

Good. So what do you think due to which factor most of this performance
gain was achieved? Due to using multiple threads or due to SMP queries
parallelism? I would suspect that it is a parallelism.

> > implemented with libibnetdisc: goals (in particular is it support for
> > multithreaded apps or just multithreaded discovery function), interaction
> > with caller application, etc.?
> 
> My initial goal was to make the libibnetdisc safe for multithreaded apps

Ok.

> and
> make a multithreaded discovery function.

And this is not the same as above. I would really suggest before doing
multithreaded implementation to check just single threaded but parallel
querying first. Somehow I believe that an impressive numbers could be
shown there and also without multithreaded overhead and complexity.

> However, since libibmad itself is
> not thread safe, and you expressed a desire to keep it that way[*], I reduced
> that goal to just making the discovery function multithreaded (using
> mad_[send|receive]_via).
> 
> Although I don't like this restriction I can see it as a valid design decision
> as long as it is documented that the discover function is not thread safe in
> regards to the ibmad_port object.  This is because the ibnd_discover_fabric
> uses libibmad calls and would require a complicated API to allow the user app
> to synchronize with those calls.
> 
> In order to make things thread safe for the user apps as well as the library I
> can see 3 options.
> 
>    1) make libibmad thread safe (which you were hesitant to do)
> 
>    2) add a thread safe interface to libibmad.  User apps will need to know to
>       use this interface while using libibnetdisc and libibnetdisc will use
>       this interface.

Why should it be related to libibmad? Make libibnetdisc by itself thread
safe and that is all (see below).

>    3) Create a wrapper lib which is thread safe.  In this case the apps and
>       libibnetdisc would call into this wrapper lib and we would have to
>       change the API to libibnetdisc.

4) Instead of bothering with slow (blocked) libibmad rpc use umad_send()
and umad_recv() directly in parallel hop by hop fabric discovery.

(And following my personal experience it is ended in smaller and simpler
code using umad directly instead of rpc for applications which slightly
more complex than 'smpquery' style tools).

> [*] http://lists.openfabrics.org/pipermail/general/2009-July/060677.html
> 
>    "madrpc() is too primitive interface for such applications. There would be
>    better to use umad_send/recv() directly or may be mad_send_via().  Example
>    is mcast_storm.c distributed with ibsim."

Another option is adding a new thread safe API. But I don't really see
how it is related to libibnetdisc discussion.

> [$] It is my opinion that mad_rpc is _not_ primitive.  In my mind it _is_
>    the wrapper around the primitive umad_send/recv calls.  If you are
>    interested perhaps I can try to explain what I wanted to do in the library
>    to make it thread safe more clearly.  The point I might not have made clear
>    was that I don't think the library will have to do any threading on it's
>    own, just some locks and storing of responses.  Of course the down side to
>    this is the libibmad code would be slightly slower.  But I don't think by
>    very much.

I don't like a lock introduction there - right, it is unneeded overhead
(and dependencies) for single threaded tools. And again -  I have nothing
against thread safe rpc++ API, but would prefer instead of locking use
reenrant states.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] tests/subnet_discover: discover test utility
  2009-10-23 23:43           ` Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.) Sasha Khapyorsky
@ 2009-12-20 12:14             ` Sasha Khapyorsky
       [not found]               ` <20091220182809.f7e17fae.weiny2@llnl.gov>
  0 siblings, 1 reply; 14+ messages in thread
From: Sasha Khapyorsky @ 2009-12-20 12:14 UTC (permalink / raw)
  To: Ira Weiny; +Cc: linux-rdma, Al Chu


'subnet_discover' is simple test utility which implements "non-blocking"
discovery method where mads are sending "in parallel" (unlike the
current implementation of 'ibnetdiscover' and similar to how OpenSM
does). For this a recently discovered node id value is encoded as lower
16 bits of mad transaction id.

Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
---

Hi Ira,

On 01:43 Sat 24 Oct     , Sasha Khapyorsky wrote:
> > 
> > Current Master:        Threaded version:
> > real    0m9.149s        0m2.223s
> > user    0m0.016s        0m0.374s
> > sys     0m0.372s        0m1.056s
> > 
> > With that in mind...
> 
> Good. So what do you think due to which factor most of this performance
> gain was achieved? Due to using multiple threads or due to SMP queries
> parallelism? I would suspect that it is a parallelism.

For some purposes in ibsim/tests I wrote a simple utility
'subnet_discover', this works as single thread and utilizes a "parallel"
mad sending method and also uses libibumad for all mad
sending/receiving stuff.

I think that similar implementation in libibnetdisc (I can do it if we
are in agreement :)) would improve its performance.

Would you like to look at this?

Sasha

 tests/Makefile          |    2 +-
 tests/subnet_discover.c |  495 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 496 insertions(+), 1 deletions(-)
 create mode 100644 tests/subnet_discover.c

diff --git a/tests/Makefile b/tests/Makefile
index dd4cd55..bd415d8 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -1,4 +1,4 @@
-progs:= mcast_storm
+progs:= subnet_discover mcast_storm
 
 -include ../defs.mk
 
diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
new file mode 100644
index 0000000..a577cc7
--- /dev/null
+++ b/tests/subnet_discover.c
@@ -0,0 +1,495 @@
+/*
+ * Copyright (c) 2009 Voltaire, Inc. All rights reserved.
+ *
+ * This is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <inttypes.h>
+#include <getopt.h>
+
+#include <infiniband/umad.h>
+#include <infiniband/mad.h>
+
+struct port {
+	struct node *node;
+	uint64_t guid;
+	struct port *remote;
+	uint8_t port_info[IB_SMP_DATA_SIZE];
+};
+
+struct node {
+	uint64_t guid;
+	unsigned num_ports;
+	unsigned is_switch;
+	uint8_t node_info[IB_SMP_DATA_SIZE];
+	uint8_t node_desc[IB_SMP_DATA_SIZE];
+	uint8_t switch_info[IB_SMP_DATA_SIZE];
+	struct port ports[];
+};
+
+static struct node *node_array[32 * 1024];
+static unsigned node_count = 0;
+static unsigned trid_cnt = 0;
+static unsigned outstanding = 0;
+static unsigned timeout = 100;
+static unsigned retries = 3;
+static unsigned verbose = 0;
+
+#define ERROR(fmt, ...) fprintf(stderr, "ERR: " fmt, ##__VA_ARGS__)
+#define VERBOSE(fmt, ...) if (verbose) fprintf(stderr, fmt, ##__VA_ARGS__)
+#define NOISE(fmt, ...) if (verbose > 1) fprintf(stderr, fmt, ##__VA_ARGS__)
+
+static const char *print_path(uint8_t path[], size_t path_cnt)
+{
+	static char buf[256];
+	int i, n = 0;
+	for (i = 0; i <= path_cnt; i++)
+		n += snprintf(buf + n, sizeof(buf) - n, "%u,", path[i]);
+	buf[n] = '\0';
+	return buf;
+}
+
+#define DBG_DUMP_FUNC(name) static void dbg_dump_##name(void *data) \
+{ \
+	char buf[2048]; \
+	mad_dump_##name(buf, sizeof(buf), data, IB_SMP_DATA_SIZE); \
+	NOISE("### "#name":\n%s\n", buf); \
+}
+
+DBG_DUMP_FUNC(nodeinfo);
+DBG_DUMP_FUNC(nodedesc);
+DBG_DUMP_FUNC(portinfo);
+DBG_DUMP_FUNC(switchinfo);
+
+static void build_umad_req(void *umad, uint8_t * path, unsigned path_cnt,
+			   uint64_t trid, uint8_t method,
+			   uint16_t attr_id, uint32_t attr_mod, uint64_t mkey)
+{
+	void *mad = umad_get_mad(umad);
+
+	memset(umad, 0, umad_size() + IB_MAD_SIZE);
+	umad_set_addr(umad, 0xffff, 0, 0, 0);
+	mad_set_field(mad, 0, IB_MAD_METHOD_F, method);
+	mad_set_field(mad, 0, IB_MAD_CLASSVER_F, 1);
+	mad_set_field(mad, 0, IB_MAD_MGMTCLASS_F, IB_SMI_DIRECT_CLASS);
+	mad_set_field(mad, 0, IB_MAD_BASEVER_F, 1);
+	mad_set_field(mad, 0, IB_DRSMP_HOPCNT_F, path_cnt);
+	mad_set_field(mad, 0, IB_DRSMP_HOPPTR_F, 0);
+	mad_set_field64(mad, 0, IB_MAD_TRID_F, trid);
+	mad_set_field(mad, 0, IB_DRSMP_DRDLID_F, 0xffff);
+	mad_set_field(mad, 0, IB_DRSMP_DRSLID_F, 0xffff);
+	mad_set_array(mad, 0, IB_DRSMP_PATH_F, path);
+	mad_set_field(mad, 0, IB_MAD_ATTRID_F, attr_id);
+	mad_set_field(mad, 0, IB_MAD_ATTRMOD_F, attr_mod);
+	mad_set_field64(mad, 0, IB_MAD_MKEY_F, mkey);
+}
+
+static int send_query(int fd, int agent, void *umad, unsigned node_id,
+		      uint8_t * path, size_t path_cnt, uint16_t attr_id,
+		      uint32_t attr_mod)
+{
+	uint64_t trid;
+	int ret;
+
+	trid = (trid_cnt++ << 16) | (node_id & 0xffff);
+	build_umad_req(umad, path, path_cnt, trid, IB_MAD_METHOD_GET, attr_id,
+		       attr_mod, 0);
+
+	ret = umad_send(fd, agent, umad, IB_MAD_SIZE, timeout, retries);
+	if (ret < 0) {
+		ERROR("umad_send failed: trid 0x%016" PRIx64
+		      ", attr_id %x, attr_mod %x: %s\n",
+		      trid, attr_id, attr_mod, strerror(errno));
+		return -1;
+	}
+
+	outstanding++;
+
+	VERBOSE("send %016" PRIx64 ": attr %x, mod %x to %s\n", trid, attr_id,
+		attr_mod, print_path(path, path_cnt));
+
+	return ret;
+}
+
+static int recv_response(int fd, int agent, uint8_t * umad, size_t length)
+{
+	int len = length, ret;
+
+	do {
+		ret = umad_recv(fd, umad, &len, timeout);
+	} while (ret >= 0 && ret != agent);
+
+	if (ret < 0 || umad_status(umad)) {
+		ERROR("umad_recv failed: umad status %x: %s\n",
+		      umad_status(umad), strerror(errno));
+		return -1;
+	}
+
+	return ret;
+}
+
+static int query_node_info(int fd, int agent, void *umad, unsigned node_id,
+			   uint8_t * path, size_t path_cnt)
+{
+	return send_query(fd, agent, umad, node_id, path, path_cnt,
+			  IB_ATTR_NODE_INFO, 0);
+}
+
+static int query_node_desc(int fd, int agent, void *umad, unsigned node_id,
+			   uint8_t * path, size_t path_cnt)
+{
+	return send_query(fd, agent, umad, node_id, path, path_cnt,
+			  IB_ATTR_NODE_DESC, 0);
+}
+
+static int query_switch_info(int fd, int agent, void *umad, unsigned node_id,
+			     uint8_t * path, size_t path_cnt)
+{
+	return send_query(fd, agent, umad, node_id, path, path_cnt,
+			  IB_ATTR_SWITCH_INFO, 0);
+}
+
+static int query_port_info(int fd, int agent, void *umad, unsigned node_id,
+			   uint8_t * path, size_t path_cnt, unsigned port_num)
+{
+	return send_query(fd, agent, umad, node_id, path, path_cnt,
+			  IB_ATTR_PORT_INFO, port_num);
+}
+
+static int add_node(uint8_t * node_info)
+{
+	struct node *node;
+	unsigned i, num_ports = mad_get_field(node_info, 0, IB_NODE_NPORTS_F);
+
+	node = malloc(sizeof(*node) + (num_ports + 1) * sizeof(node->ports[0]));
+	if (!node)
+		return -1;
+	memset(node, 0,
+	       sizeof(*node) + (num_ports + 1) * sizeof(node->ports[0]));
+
+	node->num_ports = num_ports;
+	node->guid = mad_get_field64(node_info, 0, IB_NODE_GUID_F);
+	node->is_switch = ((mad_get_field(node_info, 0, IB_NODE_TYPE_F)) ==
+			   IB_NODE_SWITCH);
+	memcpy(node->node_info, node_info, sizeof(node->node_info));
+	for (i = 0; i <= num_ports; i++)
+		node->ports[i].node = node;
+
+	node_array[node_count] = node;
+
+	return node_count++;
+}
+
+static int find_node(uint8_t * node_info)
+{
+	uint64_t guid = mad_get_field64(node_info, 0, IB_NODE_GUID_F);
+	unsigned i;
+
+	for (i = 0; i < node_count; i++)
+		if (node_array[i]->guid == guid)
+			return i;
+	return -1;
+}
+
+static int process_port_info(void *umad, unsigned node_id, int fd, int agent,
+			     uint8_t path[], size_t path_cnt)
+{
+	struct node *node = node_array[node_id];
+	struct port *port;
+	uint8_t *port_info = umad + umad_size() + IB_SMP_DATA_OFFS;
+	unsigned port_num, local_port;
+
+	dbg_dump_portinfo(port_info);
+
+	port_num = mad_get_field(umad_get_mad(umad), 0, IB_MAD_ATTRMOD_F);
+	local_port = mad_get_field(port_info, 0, IB_PORT_LOCAL_PORT_F);
+
+	port = &node->ports[port_num];
+	memcpy(port->port_info, port_info, sizeof(port->port_info));
+
+	if (port_num &&
+	    mad_get_field(port_info, 0, IB_PORT_PHYS_STATE_F) == 5 &&
+	    ((node->is_switch && port_num != local_port) ||
+	     (node_id == 0 && port_num == local_port))) {
+		path[++path_cnt] = port_num;
+		return query_node_info(fd, agent, umad, node_id, path,
+				       path_cnt);
+	}
+
+	return 0;
+}
+
+static int process_switch_info(unsigned node_id, uint8_t * switch_info)
+{
+	struct node *node = node_array[node_id];
+
+	dbg_dump_switchinfo(switch_info);
+	memcpy(node->switch_info, switch_info, sizeof(node->switch_info));
+
+	return 0;
+}
+
+static int process_node_desc(unsigned node_id, uint8_t * node_desc)
+{
+	struct node *node = node_array[node_id];
+
+	dbg_dump_nodedesc(node_desc);
+	memcpy(node->node_desc, node_desc, sizeof(node->node_desc));
+
+	return 0;
+}
+
+static void connect_ports(unsigned node1_id, unsigned port1_num,
+			  unsigned node2_id, unsigned port2_num)
+{
+	struct port *port1 = &node_array[node1_id]->ports[port1_num];
+	struct port *port2 = &node_array[node2_id]->ports[port2_num];
+	VERBOSE("connecting %u:%u <--> %u:%u\n",
+		node1_id, port1_num, node2_id, port2_num);
+	port1->remote = port2;
+	port2->remote = port1;
+}
+
+static int process_node(void *umad, unsigned remote_id, int fd, int agent,
+			uint8_t path[], size_t path_cnt)
+{
+	struct node *node;
+	uint8_t *node_info = umad_get_mad(umad) + IB_SMP_DATA_OFFS;
+	unsigned port_num = mad_get_field(node_info, 0, IB_NODE_LOCAL_PORT_F);
+	unsigned node_is_new = 0;
+	int i, id;
+
+	dbg_dump_nodeinfo(node_info);
+
+	if ((id = find_node(node_info)) < 0) {
+		id = add_node(node_info);
+		if (id < 0)
+			return -1;
+		node_is_new = 1;
+	}
+
+	node = node_array[id];
+
+	node->ports[port_num].guid =
+	    mad_get_field64(node_info, 0, IB_NODE_PORT_GUID_F);
+
+	if (id)			/* skip connect for very first node */
+		connect_ports(id, port_num, remote_id, path[path_cnt]);
+
+	if (!node_is_new)
+		return 0;
+
+	query_node_desc(fd, agent, umad, id, path, path_cnt);
+
+	if (node->is_switch)
+		query_switch_info(fd, agent, umad, id, path, path_cnt);
+
+	for (i = !node->is_switch; i <= node->num_ports; i++)
+		query_port_info(fd, agent, umad, id, path, path_cnt, i);
+
+	return 0;
+}
+
+static int recv_smp_resp(int fd, int agent, uint8_t * umad, uint8_t path[])
+{
+	void *mad;
+	uint64_t trid;
+	uint8_t method;
+	uint16_t status;
+	uint16_t attr_id;
+	uint32_t attr_mod;
+	size_t path_cnt;
+	unsigned node_id;
+	int ret;
+
+	ret = recv_response(fd, agent, umad, IB_MAD_SIZE);
+
+	mad = umad_get_mad(umad);
+	status = mad_get_field(mad, 0, IB_DRSMP_STATUS_F);
+	method = mad_get_field(mad, 0, IB_MAD_METHOD_F);
+	trid = mad_get_field64(mad, 0, IB_MAD_TRID_F);
+	attr_id = mad_get_field(mad, 0, IB_MAD_ATTRID_F);
+	attr_mod = mad_get_field(mad, 0, IB_MAD_ATTRMOD_F);
+	path_cnt = mad_get_field(mad, 0, IB_DRSMP_HOPCNT_F);
+	mad_get_array(mad, 0, IB_DRSMP_PATH_F, path);
+
+	if (method != IB_MAD_METHOD_GET)
+		return 0;
+
+	outstanding--;
+
+	if (ret < 0 || status) {
+		ERROR("error response 0x%016" PRIx64 ": attr_id %x"
+		      ", attr_mod %x from %s with status %x\n", trid,
+		      attr_id, attr_mod, print_path(path, path_cnt), status);
+		return -1;
+	}
+
+	node_id = trid & 0xffff;
+
+	VERBOSE("recv %016" PRIx64 ": attr %x, mod %x from %s\n", trid, attr_id,
+		attr_mod, print_path(path, path_cnt));
+
+	switch (attr_id) {
+	case IB_ATTR_NODE_INFO:
+		process_node(umad, node_id, fd, agent, path, path_cnt);
+		break;
+	case IB_ATTR_NODE_DESC:
+		process_node_desc(node_id, mad + IB_SMP_DATA_OFFS);
+		break;
+	case IB_ATTR_SWITCH_INFO:
+		process_switch_info(node_id, mad + IB_SMP_DATA_OFFS);
+		break;
+	case IB_ATTR_PORT_INFO:
+		process_port_info(umad, node_id, fd, agent, path, path_cnt);
+		break;
+	default:
+		VERBOSE("unsolicited response 0x%016" PRIx64 ": attr_id %x"
+			", attr_mod %x\n", trid, attr_id, attr_mod);
+		return 0;
+	}
+
+	return ret;
+}
+
+static int discovery(int fd, int agent)
+{
+	uint8_t path[64] = { 0 };
+	void *umad;
+	int ret;
+
+	umad = malloc(IB_MAD_SIZE + umad_size());
+	if (!umad)
+		return -ENOMEM;
+
+	ret = query_node_info(fd, agent, umad, 0, path, 0);
+	if (ret < 0)
+		return ret;
+
+	while (outstanding)
+		if (recv_smp_resp(fd, agent, umad, path))
+			ret = 1;
+
+	free(umad);
+
+	return ret;
+}
+
+static int umad_discovery(char *card_name, unsigned int port_num)
+{
+	int fd, agent, ret;
+
+	ret = umad_init();
+	if (ret) {
+		ERROR("cannot init umad\n");
+		return -1;
+	}
+
+	fd = umad_open_port(card_name, port_num);
+	if (fd < 0) {
+		ERROR("cannot open umad port %s:%u: %s\n",
+		      card_name ? card_name : "NULL", port_num,
+		      strerror(errno));
+		return -1;
+	}
+
+	agent = umad_register(fd, IB_SMI_DIRECT_CLASS, 1, 0, NULL);
+	if (agent < 0) {
+		ERROR("cannot register SMI DR class for umad port %s:%u: %s\n",
+		      card_name ? card_name : "NULL", port_num,
+		      strerror(errno));
+		return -1;
+	}
+
+	ret = discovery(fd, agent);
+	if (ret)
+		ERROR("Failed to discover.\n");
+
+	umad_unregister(fd, agent);
+	umad_close_port(fd);
+
+	umad_done();
+
+	return ret;
+}
+
+static void print_subnet()
+{
+	struct node *node;
+	struct port *local, *remote;
+	unsigned i, j;
+
+	for (i = 0; i < node_count; i++) {
+		node = node_array[i];
+		printf("%s %u \"%s-%016" PRIx64 "\" \t# %s\n",
+		       node->is_switch ? "Switch" : "Ca", node->num_ports,
+		       node->is_switch ? "S" : "H", node->guid,
+		       node->node_desc);
+		for (j = 1; j <= node->num_ports; j++) {
+			local = &node->ports[j];
+			remote = local->remote;
+			if (!remote)
+				continue;
+			printf("[%u] \t\"%s-%016" PRIx64 "\"[%lu] \t# %s\n", j,
+			       remote->node->is_switch ? "S" : "H",
+			       remote->node->guid, remote - remote->node->ports,
+			       remote->node->node_desc);
+		}
+		printf("\n");
+	}
+}
+
+int main(int argc, char **argv)
+{
+	const struct option long_opts[] = {
+		{"Card", 1, 0, 'C'},
+		{"Port", 1, 0, 'P'},
+		{"timeout", 1, 0, 't'},
+		{"retries", 1, 0, 'r'},
+		{}
+	};
+	char *card_name = NULL;
+	unsigned int port_num = 0;
+	int ch, ret;
+
+	while (1) {
+		ch = getopt_long(argc, argv, "C:P:t:r:v", long_opts, NULL);
+		if (ch == -1)
+			break;
+		switch (ch) {
+		case 'C':
+			card_name = optarg;
+			break;
+		case 'P':
+			port_num = strtoul(optarg, NULL, 0);
+			break;
+		case 't':
+			timeout = strtoul(optarg, NULL, 0);
+			break;
+		case 'r':
+			retries = strtoul(optarg, NULL, 0);
+			break;
+		case 'v':
+			verbose++;
+			break;
+		default:
+			printf("usage: %s [-C card_name] [-P port_num]"
+			       " [-t timeout] [-r retries] [-v[v]]\n", argv[0]);
+			exit(2);
+			break;
+		}
+	}
+
+	ret = umad_discovery(card_name, port_num);
+
+	print_subnet();
+
+	return ret;
+}
-- 
1.6.6.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
       [not found]                 ` <20091220182809.f7e17fae.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2009-12-21  7:35                   ` Sasha Khapyorsky
  2009-12-21 14:02                     ` Hal Rosenstock
  2009-12-28  9:22                     ` Sasha Khapyorsky
  0 siblings, 2 replies; 14+ messages in thread
From: Sasha Khapyorsky @ 2009-12-21  7:35 UTC (permalink / raw)
  To: Ira Weiny; +Cc: linux-rdma, Al Chu

Hi Ira,

On 18:28 Sun 20 Dec     , Ira Weiny wrote:
> 
> Yes, a similar mechanism would work in libibnetdisc.  However, it looks like you are doing a depth first search

I wouldn't call it so, it is rather "parallel" than "first" depth or
breath - discovery continues at first responding node doesn't matter how
was it queried in depth or in breath.

> which I fear might exceed the path count limit in a DR path?

Right, hops count should be limited. I will add this.

> Is this how OpenSM works?

I think so. The difference is that OpenSM has a limit of outstanding
MADs on the wire and subnet_discover doesn't (and there could be a lot
of MADs).

> I was trying to do a breath first search like ibnetdiscover does.
> 
> > 
> > Would you like to look at this?
> 
> No problem, I have enclosed the output from a run on Hyperion.
> There appears to be a lot of errors.  I am not sure what the issue is right off.  I have included an ibnetdiscover output for comparision.

Thanks.

An errors are response timeouts. I guess that most of them are due
to switches' VL15 overflow (could be verified by VL15Dropped counter
evaluation). Will look at this deeply.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
  2009-12-21  7:35                   ` Sasha Khapyorsky
@ 2009-12-21 14:02                     ` Hal Rosenstock
       [not found]                       ` <f0e08f230912210602i5e3f528h2b0630420346db82-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2009-12-28  9:22                     ` Sasha Khapyorsky
  1 sibling, 1 reply; 14+ messages in thread
From: Hal Rosenstock @ 2009-12-21 14:02 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: Ira Weiny, linux-rdma, Al Chu

On Mon, Dec 21, 2009 at 2:35 AM, Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> Hi Ira,
>
> On 18:28 Sun 20 Dec     , Ira Weiny wrote:
>>
>> Yes, a similar mechanism would work in libibnetdisc.  However, it looks like you are doing a depth first search
>
> I wouldn't call it so, it is rather "parallel" than "first" depth or
> breath - discovery continues at first responding node doesn't matter how
> was it queried in depth or in breath.

Does anything limit the amount of parallelism ?

>
>> which I fear might exceed the path count limit in a DR path?
>
> Right, hops count should be limited. I will add this.
>
>> Is this how OpenSM works?
>
> I think so. The difference is that OpenSM has a limit of outstanding
> MADs on the wire and subnet_discover doesn't (and there could be a lot
> of MADs).

Isn't that dangerous in a real subnet ?

-- Hal

>
>> I was trying to do a breath first search like ibnetdiscover does.
>>
>> >
>> > Would you like to look at this?
>>
>> No problem, I have enclosed the output from a run on Hyperion.
>> There appears to be a lot of errors.  I am not sure what the issue is right off.  I have included an ibnetdiscover output for comparision.
>
> Thanks.
>
> An errors are response timeouts. I guess that most of them are due
> to switches' VL15 overflow (could be verified by VL15Dropped counter
> evaluation). Will look at this deeply.
>
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
       [not found]                       ` <f0e08f230912210602i5e3f528h2b0630420346db82-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-12-22 11:27                         ` Sasha Khapyorsky
  0 siblings, 0 replies; 14+ messages in thread
From: Sasha Khapyorsky @ 2009-12-22 11:27 UTC (permalink / raw)
  To: Hal Rosenstock; +Cc: Ira Weiny, linux-rdma, Al Chu

On 09:02 Mon 21 Dec     , Hal Rosenstock wrote:
> > I wouldn't call it so, it is rather "parallel" than "first" depth or
> > breath - discovery continues at first responding node doesn't matter how
> > was it queried in depth or in breath.
> 
> Does anything limit the amount of parallelism ?

Nothing in this version.

> > I think so. The difference is that OpenSM has a limit of outstanding
> > MADs on the wire and subnet_discover doesn't (and there could be a lot
> > of MADs).
> 
> Isn't that dangerous in a real subnet ?

This is test tool, so I don't think that it is "dangerous". And of
course this can lead to an issues with test results.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
  2009-12-21  7:35                   ` Sasha Khapyorsky
  2009-12-21 14:02                     ` Hal Rosenstock
@ 2009-12-28  9:22                     ` Sasha Khapyorsky
  2010-01-11 13:56                       ` Hal Rosenstock
  1 sibling, 1 reply; 14+ messages in thread
From: Sasha Khapyorsky @ 2009-12-28  9:22 UTC (permalink / raw)
  To: Ira Weiny; +Cc: linux-rdma, Al Chu, Hal Rosenstock

Hi Ira,

On 09:35 Mon 21 Dec     , Sasha Khapyorsky wrote:
> 
> An errors are response timeouts. I guess that most of them are due
> to switches' VL15 overflow (could be verified by VL15Dropped counter
> evaluation). Will look at this deeply.

I did a couple of modifications in the code (exact log is listed below).
In particular there are default limitation for number of outstanding MADs
on the wire and proper tracking for failed (timedout) MADs. I tested
this where possible. Could you re-run this? Thanks.

Sasha


commit 9e24853c30351f6ea65ffcccf184bdf7586dfe8e
Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date:   Fri Dec 25 13:01:35 2009 +0200

    tests/subnet_discover: limit possible number of hops
    
    As pointed out by Ira there was no hops limitation. Adding this.
    
    Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>

diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
index a577cc7..9857913 100644
--- a/tests/subnet_discover.c
+++ b/tests/subnet_discover.c
@@ -17,6 +17,8 @@
 #include <infiniband/umad.h>
 #include <infiniband/mad.h>
 
+#define MAX_HOPS 63
+
 struct port {
 	struct node *node;
 	uint64_t guid;
@@ -217,8 +219,9 @@ static int process_port_info(void *umad, unsigned node_id, int fd, int agent,
 	if (port_num &&
 	    mad_get_field(port_info, 0, IB_PORT_PHYS_STATE_F) == 5 &&
 	    ((node->is_switch && port_num != local_port) ||
-	     (node_id == 0 && port_num == local_port))) {
-		path[++path_cnt] = port_num;
+	     (node_id == 0 && port_num == local_port)) &&
+	    path_cnt++ < MAX_HOPS) {
+		path[path_cnt] = port_num;
 		return query_node_info(fd, agent, umad, node_id, path,
 				       path_cnt);
 	}

commit 6e7817433c17bf2b8861639852dc0e70e8d0ec5f
Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date:   Fri Dec 25 16:11:53 2009 +0200

    tests/subnet_discover: add --help option
    
    Add --help command line option. Also cosmetic improvements.
    
    Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>

diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
index 9857913..7f8a85c 100644
--- a/tests/subnet_discover.c
+++ b/tests/subnet_discover.c
@@ -70,7 +70,7 @@ DBG_DUMP_FUNC(nodedesc);
 DBG_DUMP_FUNC(portinfo);
 DBG_DUMP_FUNC(switchinfo);
 
-static void build_umad_req(void *umad, uint8_t * path, unsigned path_cnt,
+static void build_umad_req(void *umad, uint8_t path[], unsigned path_cnt,
 			   uint64_t trid, uint8_t method,
 			   uint16_t attr_id, uint32_t attr_mod, uint64_t mkey)
 {
@@ -94,7 +94,7 @@ static void build_umad_req(void *umad, uint8_t * path, unsigned path_cnt,
 }
 
 static int send_query(int fd, int agent, void *umad, unsigned node_id,
-		      uint8_t * path, size_t path_cnt, uint16_t attr_id,
+		      uint8_t path[], size_t path_cnt, uint16_t attr_id,
 		      uint32_t attr_mod)
 {
 	uint64_t trid;
@@ -138,28 +138,28 @@ static int recv_response(int fd, int agent, uint8_t * umad, size_t length)
 }
 
 static int query_node_info(int fd, int agent, void *umad, unsigned node_id,
-			   uint8_t * path, size_t path_cnt)
+			   uint8_t path[], size_t path_cnt)
 {
 	return send_query(fd, agent, umad, node_id, path, path_cnt,
 			  IB_ATTR_NODE_INFO, 0);
 }
 
 static int query_node_desc(int fd, int agent, void *umad, unsigned node_id,
-			   uint8_t * path, size_t path_cnt)
+			   uint8_t path[], size_t path_cnt)
 {
 	return send_query(fd, agent, umad, node_id, path, path_cnt,
 			  IB_ATTR_NODE_DESC, 0);
 }
 
 static int query_switch_info(int fd, int agent, void *umad, unsigned node_id,
-			     uint8_t * path, size_t path_cnt)
+			     uint8_t path[], size_t path_cnt)
 {
 	return send_query(fd, agent, umad, node_id, path, path_cnt,
 			  IB_ATTR_SWITCH_INFO, 0);
 }
 
 static int query_port_info(int fd, int agent, void *umad, unsigned node_id,
-			   uint8_t * path, size_t path_cnt, unsigned port_num)
+			   uint8_t path[], size_t path_cnt, unsigned port_num)
 {
 	return send_query(fd, agent, umad, node_id, path, path_cnt,
 			  IB_ATTR_PORT_INFO, port_num);
@@ -456,6 +456,8 @@ int main(int argc, char **argv)
 		{"Port", 1, 0, 'P'},
 		{"timeout", 1, 0, 't'},
 		{"retries", 1, 0, 'r'},
+		{"verbose", 0, 0, 'v'},
+		{"help", 0, 0, 'h'},
 		{}
 	};
 	char *card_name = NULL;
@@ -463,7 +465,7 @@ int main(int argc, char **argv)
 	int ch, ret;
 
 	while (1) {
-		ch = getopt_long(argc, argv, "C:P:t:r:v", long_opts, NULL);
+		ch = getopt_long(argc, argv, "C:P:t:r:vh", long_opts, NULL);
 		if (ch == -1)
 			break;
 		switch (ch) {
@@ -482,6 +484,7 @@ int main(int argc, char **argv)
 		case 'v':
 			verbose++;
 			break;
+		case 'h':
 		default:
 			printf("usage: %s [-C card_name] [-P port_num]"
 			       " [-t timeout] [-r retries] [-v[v]]\n", argv[0]);

commit da6aa19840cb2d37e8cd3daa3874b87657a76ddc
Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date:   Fri Dec 25 16:24:13 2009 +0200

    tests/subnet_discover: --maxsmps (-n) option
    
    This implements the limitation of outstanding SMPs on a wire at any
    one time. --maxsmps=0 means - no limit.
    
    Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>

diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
index 7f8a85c..42e7aee 100644
--- a/tests/subnet_discover.c
+++ b/tests/subnet_discover.c
@@ -40,6 +40,7 @@ static struct node *node_array[32 * 1024];
 static unsigned node_count = 0;
 static unsigned trid_cnt = 0;
 static unsigned outstanding = 0;
+static unsigned max_outstanding = 8;
 static unsigned timeout = 100;
 static unsigned retries = 3;
 static unsigned verbose = 0;
@@ -93,14 +94,12 @@ static void build_umad_req(void *umad, uint8_t path[], unsigned path_cnt,
 	mad_set_field64(mad, 0, IB_MAD_MKEY_F, mkey);
 }
 
-static int send_query(int fd, int agent, void *umad, unsigned node_id,
-		      uint8_t path[], size_t path_cnt, uint16_t attr_id,
-		      uint32_t attr_mod)
+static int send_request(int fd, int agent, uint64_t trid, uint8_t * path,
+			size_t path_cnt, uint16_t attr_id, uint32_t attr_mod)
 {
-	uint64_t trid;
+	uint8_t umad[IB_MAD_SIZE + umad_size()];
 	int ret;
 
-	trid = (trid_cnt++ << 16) | (node_id & 0xffff);
 	build_umad_req(umad, path, path_cnt, trid, IB_MAD_METHOD_GET, attr_id,
 		       attr_mod, 0);
 
@@ -112,14 +111,85 @@ static int send_query(int fd, int agent, void *umad, unsigned node_id,
 		return -1;
 	}
 
-	outstanding++;
-
 	VERBOSE("send %016" PRIx64 ": attr %x, mod %x to %s\n", trid, attr_id,
 		attr_mod, print_path(path, path_cnt));
 
 	return ret;
 }
 
+static struct request_queue {
+	struct request_queue *next;
+	uint64_t trid;
+	uint16_t attr_id;
+	uint32_t attr_mod;
+	size_t path_cnt;
+	uint8_t path[0];
+} request_queue;
+
+static struct request_queue *request_last = &request_queue;
+
+static void run_request_queue(int fd, int agent)
+{
+	struct request_queue *prev, *q = request_queue.next;
+
+	while (q) {
+		if (outstanding > max_outstanding)
+			break;
+		if (send_request(fd, agent, q->trid, q->path, q->path_cnt,
+				 q->attr_id, q->attr_mod) < 0)
+			break;
+		prev = q;
+		q = q->next;
+		free(prev);
+		outstanding++;
+	}
+	request_queue.next = q;
+	if (!q)
+		request_last = &request_queue;
+}
+
+static int queue_request(uint64_t trid, uint8_t * path, size_t path_cnt,
+			 uint16_t attr_id, uint32_t attr_mod)
+{
+	struct request_queue *q = malloc(sizeof(*q) + path_cnt + 1);
+	if (!q)
+		return -1;
+	q->next = NULL;
+	q->trid = trid;
+	q->attr_id = attr_id;
+	q->attr_mod = attr_mod;
+	memcpy(q->path, path, path_cnt + 1);
+	q->path_cnt = path_cnt;
+
+	request_last->next = q;
+	request_last = q;
+
+	return 0;
+}
+
+static int send_query(int fd, int agent, unsigned node_id, uint8_t path[],
+		      size_t path_cnt, uint16_t attr_id, uint32_t attr_mod)
+{
+	uint64_t trid;
+	int ret;
+
+	trid = (trid_cnt++ << 16) | (node_id & 0xffff);
+
+	ret = queue_request(trid, path, path_cnt, attr_id, attr_mod);
+	if (ret < 0) {
+		ERROR("queue failed: trid 0x%016" PRIx64 ", attr_id %x,"
+		      " attr_mod %x\n", trid, attr_id, attr_mod);
+		return -1;
+	}
+
+	VERBOSE("queue %016" PRIx64 ": attr %x, mod %x to %s\n", trid, attr_id,
+		attr_mod, print_path(path, path_cnt));
+
+	run_request_queue(fd, agent);
+
+	return ret;
+}
+
 static int recv_response(int fd, int agent, uint8_t * umad, size_t length)
 {
 	int len = length, ret;
@@ -137,31 +207,31 @@ static int recv_response(int fd, int agent, uint8_t * umad, size_t length)
 	return ret;
 }
 
-static int query_node_info(int fd, int agent, void *umad, unsigned node_id,
+static int query_node_info(int fd, int agent, unsigned node_id,
 			   uint8_t path[], size_t path_cnt)
 {
-	return send_query(fd, agent, umad, node_id, path, path_cnt,
+	return send_query(fd, agent, node_id, path, path_cnt,
 			  IB_ATTR_NODE_INFO, 0);
 }
 
-static int query_node_desc(int fd, int agent, void *umad, unsigned node_id,
+static int query_node_desc(int fd, int agent, unsigned node_id,
 			   uint8_t path[], size_t path_cnt)
 {
-	return send_query(fd, agent, umad, node_id, path, path_cnt,
+	return send_query(fd, agent, node_id, path, path_cnt,
 			  IB_ATTR_NODE_DESC, 0);
 }
 
-static int query_switch_info(int fd, int agent, void *umad, unsigned node_id,
+static int query_switch_info(int fd, int agent, unsigned node_id,
 			     uint8_t path[], size_t path_cnt)
 {
-	return send_query(fd, agent, umad, node_id, path, path_cnt,
+	return send_query(fd, agent, node_id, path, path_cnt,
 			  IB_ATTR_SWITCH_INFO, 0);
 }
 
-static int query_port_info(int fd, int agent, void *umad, unsigned node_id,
+static int query_port_info(int fd, int agent, unsigned node_id,
 			   uint8_t path[], size_t path_cnt, unsigned port_num)
 {
-	return send_query(fd, agent, umad, node_id, path, path_cnt,
+	return send_query(fd, agent, node_id, path, path_cnt,
 			  IB_ATTR_PORT_INFO, port_num);
 }
 
@@ -222,8 +292,7 @@ static int process_port_info(void *umad, unsigned node_id, int fd, int agent,
 	     (node_id == 0 && port_num == local_port)) &&
 	    path_cnt++ < MAX_HOPS) {
 		path[path_cnt] = port_num;
-		return query_node_info(fd, agent, umad, node_id, path,
-				       path_cnt);
+		return query_node_info(fd, agent, node_id, path, path_cnt);
 	}
 
 	return 0;
@@ -289,13 +358,13 @@ static int process_node(void *umad, unsigned remote_id, int fd, int agent,
 	if (!node_is_new)
 		return 0;
 
-	query_node_desc(fd, agent, umad, id, path, path_cnt);
+	query_node_desc(fd, agent, id, path, path_cnt);
 
 	if (node->is_switch)
-		query_switch_info(fd, agent, umad, id, path, path_cnt);
+		query_switch_info(fd, agent, id, path, path_cnt);
 
 	for (i = !node->is_switch; i <= node->num_ports; i++)
-		query_port_info(fd, agent, umad, id, path, path_cnt, i);
+		query_port_info(fd, agent, id, path, path_cnt, i);
 
 	return 0;
 }
@@ -327,6 +396,7 @@ static int recv_smp_resp(int fd, int agent, uint8_t * umad, uint8_t path[])
 		return 0;
 
 	outstanding--;
+	run_request_queue(fd, agent);
 
 	if (ret < 0 || status) {
 		ERROR("error response 0x%016" PRIx64 ": attr_id %x"
@@ -362,17 +432,13 @@ static int recv_smp_resp(int fd, int agent, uint8_t * umad, uint8_t path[])
 	return ret;
 }
 
-static int discovery(int fd, int agent)
+static int discover(int fd, int agent)
 {
+	uint8_t umad[IB_MAD_SIZE + umad_size()];
 	uint8_t path[64] = { 0 };
-	void *umad;
 	int ret;
 
-	umad = malloc(IB_MAD_SIZE + umad_size());
-	if (!umad)
-		return -ENOMEM;
-
-	ret = query_node_info(fd, agent, umad, 0, path, 0);
+	ret = query_node_info(fd, agent, 0, path, 0);
 	if (ret < 0)
 		return ret;
 
@@ -380,12 +446,10 @@ static int discovery(int fd, int agent)
 		if (recv_smp_resp(fd, agent, umad, path))
 			ret = 1;
 
-	free(umad);
-
 	return ret;
 }
 
-static int umad_discovery(char *card_name, unsigned int port_num)
+static int umad_discover(char *card_name, unsigned int port_num)
 {
 	int fd, agent, ret;
 
@@ -411,7 +475,7 @@ static int umad_discovery(char *card_name, unsigned int port_num)
 		return -1;
 	}
 
-	ret = discovery(fd, agent);
+	ret = discover(fd, agent);
 	if (ret)
 		ERROR("Failed to discover.\n");
 
@@ -454,6 +518,7 @@ int main(int argc, char **argv)
 	const struct option long_opts[] = {
 		{"Card", 1, 0, 'C'},
 		{"Port", 1, 0, 'P'},
+		{"maxsmps", 1, 0, 'n'},
 		{"timeout", 1, 0, 't'},
 		{"retries", 1, 0, 'r'},
 		{"verbose", 0, 0, 'v'},
@@ -465,7 +530,7 @@ int main(int argc, char **argv)
 	int ch, ret;
 
 	while (1) {
-		ch = getopt_long(argc, argv, "C:P:t:r:vh", long_opts, NULL);
+		ch = getopt_long(argc, argv, "C:P:n:t:r:vh", long_opts, NULL);
 		if (ch == -1)
 			break;
 		switch (ch) {
@@ -475,6 +540,11 @@ int main(int argc, char **argv)
 		case 'P':
 			port_num = strtoul(optarg, NULL, 0);
 			break;
+		case 'n':
+			max_outstanding = strtoul(optarg, NULL, 0);
+			if (!max_outstanding)
+				max_outstanding = -1;
+			break;
 		case 't':
 			timeout = strtoul(optarg, NULL, 0);
 			break;
@@ -487,13 +557,14 @@ int main(int argc, char **argv)
 		case 'h':
 		default:
 			printf("usage: %s [-C card_name] [-P port_num]"
-			       " [-t timeout] [-r retries] [-v[v]]\n", argv[0]);
+			       " [-n maxsmps] [-t timeout] [-r retries]"
+			       " [-v[v]]\n", argv[0]);
 			exit(2);
 			break;
 		}
 	}
 
-	ret = umad_discovery(card_name, port_num);
+	ret = umad_discover(card_name, port_num);
 
 	print_subnet();
 

commit a422ea90334441144f2a1212de40085bbe36cf7e
Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date:   Sun Dec 27 18:55:35 2009 +0200

    tests/subnet_discover.c: print useful information
    
    Print additional useful information about a subnet and discovery
    process: such as number of MADs used, number of hops reached, direct
    paths for nodes as it was discovered. Better error messages (in
    particular - don't print MAD content in error message when returned
    valid data from umad_recv() is only umad header).
    
    Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>

diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
index 42e7aee..b6aada9 100644
--- a/tests/subnet_discover.c
+++ b/tests/subnet_discover.c
@@ -30,6 +30,8 @@ struct node {
 	uint64_t guid;
 	unsigned num_ports;
 	unsigned is_switch;
+	size_t path_size;
+	uint8_t path[64];
 	uint8_t node_info[IB_SMP_DATA_SIZE];
 	uint8_t node_desc[IB_SMP_DATA_SIZE];
 	uint8_t switch_info[IB_SMP_DATA_SIZE];
@@ -45,6 +47,9 @@ static unsigned timeout = 100;
 static unsigned retries = 3;
 static unsigned verbose = 0;
 
+static unsigned total_mads = 0;
+static unsigned max_hops = 0;
+
 #define ERROR(fmt, ...) fprintf(stderr, "ERR: " fmt, ##__VA_ARGS__)
 #define VERBOSE(fmt, ...) if (verbose) fprintf(stderr, fmt, ##__VA_ARGS__)
 #define NOISE(fmt, ...) if (verbose > 1) fprintf(stderr, fmt, ##__VA_ARGS__)
@@ -142,6 +147,7 @@ static void run_request_queue(int fd, int agent)
 		q = q->next;
 		free(prev);
 		outstanding++;
+		total_mads++;
 	}
 	request_queue.next = q;
 	if (!q)
@@ -201,10 +207,10 @@ static int recv_response(int fd, int agent, uint8_t * umad, size_t length)
 	if (ret < 0 || umad_status(umad)) {
 		ERROR("umad_recv failed: umad status %x: %s\n",
 		      umad_status(umad), strerror(errno));
-		return -1;
+		return len > umad_size() ? 1 : -1;
 	}
 
-	return ret;
+	return 0;
 }
 
 static int query_node_info(int fd, int agent, unsigned node_id,
@@ -235,7 +241,7 @@ static int query_port_info(int fd, int agent, unsigned node_id,
 			  IB_ATTR_PORT_INFO, port_num);
 }
 
-static int add_node(uint8_t * node_info)
+static int add_node(uint8_t * node_info, uint8_t path[], size_t path_size)
 {
 	struct node *node;
 	unsigned i, num_ports = mad_get_field(node_info, 0, IB_NODE_NPORTS_F);
@@ -250,6 +256,8 @@ static int add_node(uint8_t * node_info)
 	node->guid = mad_get_field64(node_info, 0, IB_NODE_GUID_F);
 	node->is_switch = ((mad_get_field(node_info, 0, IB_NODE_TYPE_F)) ==
 			   IB_NODE_SWITCH);
+	memcpy(node->path, path, path_size + 1);
+	node->path_size = path_size;
 	memcpy(node->node_info, node_info, sizeof(node->node_info));
 	for (i = 0; i <= num_ports; i++)
 		node->ports[i].node = node;
@@ -291,6 +299,8 @@ static int process_port_info(void *umad, unsigned node_id, int fd, int agent,
 	    ((node->is_switch && port_num != local_port) ||
 	     (node_id == 0 && port_num == local_port)) &&
 	    path_cnt++ < MAX_HOPS) {
+		if (path_cnt > max_hops)
+			max_hops = path_cnt;
 		path[path_cnt] = port_num;
 		return query_node_info(fd, agent, node_id, path, path_cnt);
 	}
@@ -341,7 +351,7 @@ static int process_node(void *umad, unsigned remote_id, int fd, int agent,
 	dbg_dump_nodeinfo(node_info);
 
 	if ((id = find_node(node_info)) < 0) {
-		id = add_node(node_info);
+		id = add_node(node_info, path, path_cnt);
 		if (id < 0)
 			return -1;
 		node_is_new = 1;
@@ -398,7 +408,9 @@ static int recv_smp_resp(int fd, int agent, uint8_t * umad, uint8_t path[])
 	outstanding--;
 	run_request_queue(fd, agent);
 
-	if (ret < 0 || status) {
+	if (ret < 0)
+		return ret;
+	else if (ret || status) {
 		ERROR("error response 0x%016" PRIx64 ": attr_id %x"
 		      ", attr_mod %x from %s with status %x\n", trid,
 		      attr_id, attr_mod, print_path(path, path_cnt), status);
@@ -477,7 +489,7 @@ static int umad_discover(char *card_name, unsigned int port_num)
 
 	ret = discover(fd, agent);
 	if (ret)
-		ERROR("Failed to discover.\n");
+		fprintf(stderr, "\nThere are problems during discovery.\n");
 
 	umad_unregister(fd, agent);
 	umad_close_port(fd);
@@ -493,12 +505,15 @@ static void print_subnet()
 	struct port *local, *remote;
 	unsigned i, j;
 
+	printf("\n# The subnet discovered using %u mads, reaching %d hops\n\n",
+	       total_mads, max_hops);
+
 	for (i = 0; i < node_count; i++) {
 		node = node_array[i];
-		printf("%s %u \"%s-%016" PRIx64 "\" \t# %s\n",
+		printf("%s %u \"%s-%016" PRIx64 "\" \t# %s %s\n",
 		       node->is_switch ? "Switch" : "Ca", node->num_ports,
 		       node->is_switch ? "S" : "H", node->guid,
-		       node->node_desc);
+		       print_path(node->path, node->path_size), node->node_desc);
 		for (j = 1; j <= node->num_ports; j++) {
 			local = &node->ports[j];
 			remote = local->remote;

commit 4a23f9e7f339e93f2a77f213d4ce80e4bc7d7b9f
Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Date:   Sun Dec 27 21:19:30 2009 +0200

    tests/subnet_discover: report unresponded transactions
    
    Report unresponded transactions (requests) in case of MAD failures.
    
    Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>

diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
index b6aada9..acc8c23 100644
--- a/tests/subnet_discover.c
+++ b/tests/subnet_discover.c
@@ -133,9 +133,60 @@ static struct request_queue {
 
 static struct request_queue *request_last = &request_queue;
 
+static unsigned tr_table_size;
+static struct request_queue **tr_table;
+
+static void add_to_tr_table(struct request_queue *q, uint64_t trid)
+{
+	unsigned n = trid >> 16;
+	if (n >= tr_table_size) {
+		unsigned new_size = tr_table_size ? tr_table_size * 2 : 4096;
+		if (n > new_size)
+			new_size = n + 1;
+		tr_table = realloc(tr_table, new_size * sizeof(tr_table[0]));
+		if (!tr_table) {
+			ERROR("cannot realloc request table\n");
+			tr_table_size = 0;
+			return;
+		}
+		memset(tr_table + tr_table_size, 0,
+		       (new_size - tr_table_size) * sizeof(tr_table[0]));
+		tr_table_size = new_size;
+	}
+
+	tr_table[n] = q;
+}
+
+static void clean_from_tr_table(uint64_t trid)
+{
+	unsigned n = (trid >> 16) & 0xffff;
+	if (n >= tr_table_size) {
+		ERROR("invalid request table index %u\n", n);
+		return;
+	}
+	free(tr_table[n]);
+	tr_table[n] = NULL;
+}
+
+static void free_unresponded()
+{
+	struct request_queue *q;
+	unsigned i;
+
+	for (i = 0 ; i < tr_table_size; i++) {
+		if (!(q = tr_table[i]))
+			continue;
+		fprintf(stderr, "Unresponded transaction %016" PRIx64 ": %s "
+			"attr_id %x, attr_mod %x\n", q->trid,
+			print_path(q->path, q->path_cnt), q->attr_id,
+			q->attr_mod);
+		free(q);
+	}
+}
+
 static void run_request_queue(int fd, int agent)
 {
-	struct request_queue *prev, *q = request_queue.next;
+	struct request_queue *q = request_queue.next;
 
 	while (q) {
 		if (outstanding > max_outstanding)
@@ -143,9 +194,7 @@ static void run_request_queue(int fd, int agent)
 		if (send_request(fd, agent, q->trid, q->path, q->path_cnt,
 				 q->attr_id, q->attr_mod) < 0)
 			break;
-		prev = q;
 		q = q->next;
-		free(prev);
 		outstanding++;
 		total_mads++;
 	}
@@ -170,6 +219,8 @@ static int queue_request(uint64_t trid, uint8_t * path, size_t path_cnt,
 	request_last->next = q;
 	request_last = q;
 
+	add_to_tr_table(q, trid);
+
 	return 0;
 }
 
@@ -417,6 +468,8 @@ static int recv_smp_resp(int fd, int agent, uint8_t * umad, uint8_t path[])
 		return -1;
 	}
 
+	clean_from_tr_table(trid);
+
 	node_id = trid & 0xffff;
 
 	VERBOSE("recv %016" PRIx64 ": attr %x, mod %x from %s\n", trid, attr_id,
@@ -458,6 +511,8 @@ static int discover(int fd, int agent)
 		if (recv_smp_resp(fd, agent, umad, path))
 			ret = 1;
 
+	free_unresponded();
+
 	return ret;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
  2009-12-28  9:22                     ` Sasha Khapyorsky
@ 2010-01-11 13:56                       ` Hal Rosenstock
       [not found]                         ` <f0e08f231001110556y7c47cc54oa3cfd5859f9a4e76-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Hal Rosenstock @ 2010-01-11 13:56 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: Ira Weiny, linux-rdma, Al Chu

Hi Sasha,

On Mon, Dec 28, 2009 at 4:22 AM, Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> Hi Ira,
>
> On 09:35 Mon 21 Dec     , Sasha Khapyorsky wrote:
>>
>> An errors are response timeouts. I guess that most of them are due
>> to switches' VL15 overflow (could be verified by VL15Dropped counter
>> evaluation). Will look at this deeply.
>
> I did a couple of modifications in the code (exact log is listed below).
> In particular there are default limitation for number of outstanding MADs
> on the wire and proper tracking for failed (timedout) MADs. I tested
> this where possible. Could you re-run this? Thanks.
>
> Sasha
>

[snip...]

> commit da6aa19840cb2d37e8cd3daa3874b87657a76ddc
> Author: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
> Date:   Fri Dec 25 16:24:13 2009 +0200
>
>    tests/subnet_discover: --maxsmps (-n) option
>
>    This implements the limitation of outstanding SMPs on a wire at any
>    one time. --maxsmps=0 means - no limit.
>
>    Signed-off-by: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
>
> diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
> index 7f8a85c..42e7aee 100644
> --- a/tests/subnet_discover.c
> +++ b/tests/subnet_discover.c
> @@ -40,6 +40,7 @@ static struct node *node_array[32 * 1024];
>  static unsigned node_count = 0;
>  static unsigned trid_cnt = 0;
>  static unsigned outstanding = 0;
> +static unsigned max_outstanding = 8;

Any reason why this default is different from the one which OpenSM
uses ? Seems to me it should be the same (or less).

-- Hal

>  static unsigned timeout = 100;
>  static unsigned retries = 3;
>  static unsigned verbose = 0;

[snip...]
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
       [not found]                         ` <f0e08f231001110556y7c47cc54oa3cfd5859f9a4e76-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-01-12  9:31                           ` Sasha Khapyorsky
  2010-01-13 20:11                             ` Hal Rosenstock
  0 siblings, 1 reply; 14+ messages in thread
From: Sasha Khapyorsky @ 2010-01-12  9:31 UTC (permalink / raw)
  To: Hal Rosenstock; +Cc: Ira Weiny, linux-rdma, Al Chu

Hi Hal,

On 08:56 Mon 11 Jan     , Hal Rosenstock wrote:
> >
> > diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
> > index 7f8a85c..42e7aee 100644
> > --- a/tests/subnet_discover.c
> > +++ b/tests/subnet_discover.c
> > @@ -40,6 +40,7 @@ static struct node *node_array[32 * 1024];
> >  static unsigned node_count = 0;
> >  static unsigned trid_cnt = 0;
> >  static unsigned outstanding = 0;
> > +static unsigned max_outstanding = 8;
> 
> Any reason why this default is different from the one which OpenSM
> uses ? Seems to me it should be the same (or less).

In my tests I found that '8' is more optimal number (the tool works
faster and without drops) than '4' used in OpenSM.

Of course it would be helpful to run this over bigger cluster than
what I have to see that the results are consistent.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
  2010-01-12  9:31                           ` Sasha Khapyorsky
@ 2010-01-13 20:11                             ` Hal Rosenstock
       [not found]                               ` <f0e08f231001131211y64489a51nd2621cefdb27ad25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Hal Rosenstock @ 2010-01-13 20:11 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: Ira Weiny, linux-rdma, Al Chu

Hi Sasha,

On Tue, Jan 12, 2010 at 4:31 AM, Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> Hi Hal,
>
> On 08:56 Mon 11 Jan     , Hal Rosenstock wrote:
>> >
>> > diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
>> > index 7f8a85c..42e7aee 100644
>> > --- a/tests/subnet_discover.c
>> > +++ b/tests/subnet_discover.c
>> > @@ -40,6 +40,7 @@ static struct node *node_array[32 * 1024];
>> >  static unsigned node_count = 0;
>> >  static unsigned trid_cnt = 0;
>> >  static unsigned outstanding = 0;
>> > +static unsigned max_outstanding = 8;
>>
>> Any reason why this default is different from the one which OpenSM
>> uses ? Seems to me it should be the same (or less).
>
> In my tests I found that '8' is more optimal number (the tool works
> faster and without drops) than '4' used in OpenSM.
>
> Of course it would be helpful to run this over bigger cluster than
> what I have to see that the results are consistent.

This is exactly my concern. Not only cluster size but use cases
including concurrent diag discover and SM operation where SMPs are
heavily in use.

There already have been a number of reports of dropped SMPs on this
list with the current diags and this change will only make things
worse IMO.

Also, the OpenSM default should be at least as large as the diags for this.

-- Hal

> Sasha
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
       [not found]                               ` <f0e08f231001131211y64489a51nd2621cefdb27ad25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-01-16 19:36                                 ` Sasha Khapyorsky
  2010-01-23 12:24                                   ` Hal Rosenstock
  2010-01-21 20:38                                 ` Ira Weiny
  1 sibling, 1 reply; 14+ messages in thread
From: Sasha Khapyorsky @ 2010-01-16 19:36 UTC (permalink / raw)
  To: Hal Rosenstock; +Cc: Ira Weiny, linux-rdma, Al Chu

On 15:11 Wed 13 Jan     , Hal Rosenstock wrote:
> >
> > In my tests I found that '8' is more optimal number (the tool works
> > faster and without drops) than '4' used in OpenSM.
> >
> > Of course it would be helpful to run this over bigger cluster than
> > what I have to see that the results are consistent.
> 
> This is exactly my concern. Not only cluster size but use cases
> including concurrent diag discover and SM operation where SMPs are
> heavily in use.

Was it investigated? What was a conclusions?

> There already have been a number of reports of dropped SMPs on this
> list with the current diags and this change will only make things
> worse IMO.
>
> Also, the OpenSM default should be at least as large as the diags for this.

I don't know where concurrent MADs are used in the current "diags", do
you?

This utility is not 'diags' at all. It is placed in ibsim/tests and I
wrote it for test purpose.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
       [not found]                               ` <f0e08f231001131211y64489a51nd2621cefdb27ad25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2010-01-16 19:36                                 ` Sasha Khapyorsky
@ 2010-01-21 20:38                                 ` Ira Weiny
       [not found]                                   ` <20100121123841.43df4cdc.weiny2-i2BcT+NCU+M@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Ira Weiny @ 2010-01-21 20:38 UTC (permalink / raw)
  To: Hal Rosenstock; +Cc: Sasha Khapyorsky, linux-rdma, Al Chu

Hey Sasha,

I am finally getting back to this...  Sorry.

On Wed, 13 Jan 2010 15:11:44 -0500
Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hi Sasha,
> 
> On Tue, Jan 12, 2010 at 4:31 AM, Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> > Hi Hal,
> >
> > On 08:56 Mon 11 Jan     , Hal Rosenstock wrote:
> >> >
> >> > diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
> >> > index 7f8a85c..42e7aee 100644
> >> > --- a/tests/subnet_discover.c
> >> > +++ b/tests/subnet_discover.c
> >> > @@ -40,6 +40,7 @@ static struct node *node_array[32 * 1024];
> >> >  static unsigned node_count = 0;
> >> >  static unsigned trid_cnt = 0;
> >> >  static unsigned outstanding = 0;
> >> > +static unsigned max_outstanding = 8;
> >>
> >> Any reason why this default is different from the one which OpenSM
> >> uses ? Seems to me it should be the same (or less).
> >
> > In my tests I found that '8' is more optimal number (the tool works
> > faster and without drops) than '4' used in OpenSM.
> >
> > Of course it would be helpful to run this over bigger cluster than
> > what I have to see that the results are consistent.

Here is some test data on a real cluster.

09:49:10 > ibhosts | wc -l
1158

09:49:28 > ibswitches | wc -l
281

09:44:45 > time ./subnet_discover -n 1 > /dev/null

real    0m1.414s
user    0m0.309s
sys     0m0.244s

09:44:55 > time ./subnet_discover -n 2 > /dev/null

real    0m1.025s
user    0m0.284s
sys     0m0.201s

09:45:00 > time ./subnet_discover -n 4 > /dev/null

real    0m0.644s
user    0m0.268s
sys     0m0.228s

09:45:04 > time ./subnet_discover -n 8 > /dev/null

real    0m0.550s
user    0m0.253s
sys     0m0.184s

09:45:08 > time ./subnet_discover -n 12 > /dev/null

real    0m0.524s
user    0m0.207s
sys     0m0.201s

09:45:14 > time ./subnet_discover -n 16 > /dev/null

real    0m0.432s
user    0m0.248s
sys     0m0.144s

09:45:18 > time ./subnet_discover -n 32 > /dev/null

real    0m0.484s
user    0m0.260s
sys     0m0.150s


09:45:57 > time ibnetdiscover  > /dev/null

real    0m3.180s
user    0m0.068s
sys     0m0.672s


What I find most interesting is that your test utility runs nearly 2x faster
even when there is only 1 outstanding MAD.  :-/  ibnetdiscover (libibnetdisc)
does do a lot more with the data but I would not have expected such a
difference.

As a comparison I ran iblinkinfo it would seem that there is something in the
library which takes a lot more time.

09:51:59 > time iblinkinfo > /dev/null

real    0m3.159s
user    0m0.063s
sys     0m0.526s


For further comparison I rebuilt the parallel version of libibnetdisc.

12:39:02 > time ./ibnetdiscover > /dev/null

real    0m2.552s
user    0m0.295s
sys     0m0.863s

This is with 8 threads (ie 8 outstanding SMP's).

I would appear that your algorithm is superior.  I will look at converting
libibnetdisc, test, and submit a patch.  I still don't know why there would be
so much difference when only using 1 outstanding MAD though?  :-/

> 
> This is exactly my concern. Not only cluster size but use cases
> including concurrent diag discover and SM operation where SMPs are
> heavily in use.
> 
> There already have been a number of reports of dropped SMPs on this
> list with the current diags and this change will only make things
> worse IMO.

This is a problem.  I have seen this issue with large systems which are having
trouble.  OpenSM is trying to discover and route.  We are running diags trying
to figure out what is going on.  There is hardware going up and down; bad
switches or nodes which are booting/rebooting.

I plan to go forward with this but having an option for outstanding MAD's is a
good idea.  I don't have an opinion on where it should default.

> 
> Also, the OpenSM default should be at least as large as the diags for this.

I agree.  OpenSM should have some priority in this matter.

Ira

> 
> -- Hal
> 
> > Sasha
> >


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
  2010-01-16 19:36                                 ` Sasha Khapyorsky
@ 2010-01-23 12:24                                   ` Hal Rosenstock
  0 siblings, 0 replies; 14+ messages in thread
From: Hal Rosenstock @ 2010-01-23 12:24 UTC (permalink / raw)
  To: Sasha Khapyorsky; +Cc: Ira Weiny, linux-rdma, Al Chu

On Sat, Jan 16, 2010 at 2:36 PM, Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> On 15:11 Wed 13 Jan     , Hal Rosenstock wrote:
>> >
>> > In my tests I found that '8' is more optimal number (the tool works
>> > faster and without drops) than '4' used in OpenSM.
>> >
>> > Of course it would be helpful to run this over bigger cluster than
>> > what I have to see that the results are consistent.
>>
>> This is exactly my concern. Not only cluster size but use cases
>> including concurrent diag discover and SM operation where SMPs are
>> heavily in use.
>
> Was it investigated? What was a conclusions?

I was pointing out testing that needs to be done rather than testing
already done. The use cases I mentioned are based on current
experience. I'm sure there are more.

>> There already have been a number of reports of dropped SMPs on this
>> list with the current diags and this change will only make things
>> worse IMO.
>>
>> Also, the OpenSM default should be at least as large as the diags for this.
>
> I don't know where concurrent MADs are used in the current "diags", do
> you?
>
> This utility is not 'diags' at all. It is placed in ibsim/tests and I
> wrote it for test purpose.

In your original email on this, you proposed that this algorithm be
incorporated in the diags (you wrote "I think that similar
implementation in libibnetdisc (I can do it if we
are in agreement :)) would improve its performance") so this comment
thread appears relevant to me.

-- Hal

> Sasha
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tests/subnet_discover: discover test utility
       [not found]                                   ` <20100121123841.43df4cdc.weiny2-i2BcT+NCU+M@public.gmane.org>
@ 2010-01-25 15:18                                     ` Sasha Khapyorsky
  0 siblings, 0 replies; 14+ messages in thread
From: Sasha Khapyorsky @ 2010-01-25 15:18 UTC (permalink / raw)
  To: Ira Weiny; +Cc: Hal Rosenstock, linux-rdma, Al Chu

Hi Ira,

On 12:38 Thu 21 Jan     , Ira Weiny wrote:
> 
> Here is some test data on a real cluster.
> 
> 09:49:10 > ibhosts | wc -l
> 1158
> 
> 09:49:28 > ibswitches | wc -l
> 281
> 
> 09:44:45 > time ./subnet_discover -n 1 > /dev/null
> 
> real    0m1.414s
> user    0m0.309s
> sys     0m0.244s
> 
> 09:44:55 > time ./subnet_discover -n 2 > /dev/null
> 
> real    0m1.025s
> user    0m0.284s
> sys     0m0.201s
> 
> 09:45:00 > time ./subnet_discover -n 4 > /dev/null
> 
> real    0m0.644s
> user    0m0.268s
> sys     0m0.228s
> 
> 09:45:04 > time ./subnet_discover -n 8 > /dev/null
> 
> real    0m0.550s
> user    0m0.253s
> sys     0m0.184s
> 
> 09:45:08 > time ./subnet_discover -n 12 > /dev/null
> 
> real    0m0.524s
> user    0m0.207s
> sys     0m0.201s
> 
> 09:45:14 > time ./subnet_discover -n 16 > /dev/null
> 
> real    0m0.432s
> user    0m0.248s
> sys     0m0.144s

Looks like a very nice results. Thanks for probing this.

> 09:45:18 > time ./subnet_discover -n 32 > /dev/null
> 
> real    0m0.484s
> user    0m0.260s
> sys     0m0.150s

With '-n 32' the total time is increased. It is probably due to VL15
packets drops on switches. This can be ensured by checking VL15Drops
counter value. Also in this case using different from default timeout
values with ./subnet_discover (for example --timeout=1000 (default is
100ms)) will affect the total running time (this doesn't happen when
packets were not dropped and ib_mad retrying mechanism was not
activated).

> 09:45:57 > time ibnetdiscover  > /dev/null
> 
> real    0m3.180s
> user    0m0.068s
> sys     0m0.672s
> 
> 
> What I find most interesting is that your test utility runs nearly 2x faster
> even when there is only 1 outstanding MAD.  :-/  ibnetdiscover (libibnetdisc)
> does do a lot more with the data but I would not have expected such a
> difference.

Your fix for subnet_discover explains the difference.

> I would appear that your algorithm is superior.  I will look at converting
> libibnetdisc, test, and submit a patch.

Thanks.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-01-25 15:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090813204306.dffc3237.weiny2@llnl.gov>
     [not found] ` <20090816110200.GS25501@me>
     [not found]   ` <20090817083023.da17378b.weiny2@llnl.gov>
     [not found]     ` <20090823120609.GG9547@me>
     [not found]       ` <20090831170144.da0e7185.weiny2@llnl.gov>
     [not found]         ` <20090831170144.da0e7185.weiny2-i2BcT+NCU+M@public.gmane.org>
2009-10-23 17:45           ` [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object Sasha Khapyorsky
     [not found]       ` <20090826164026.8dcce4b2.weiny2@llnl.gov>
     [not found]         ` <20090826164026.8dcce4b2.weiny2-i2BcT+NCU+M@public.gmane.org>
2009-10-23 23:43           ` Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.) Sasha Khapyorsky
2009-12-20 12:14             ` [PATCH] tests/subnet_discover: discover test utility Sasha Khapyorsky
     [not found]               ` <20091220182809.f7e17fae.weiny2@llnl.gov>
     [not found]                 ` <20091220182809.f7e17fae.weiny2-i2BcT+NCU+M@public.gmane.org>
2009-12-21  7:35                   ` Sasha Khapyorsky
2009-12-21 14:02                     ` Hal Rosenstock
     [not found]                       ` <f0e08f230912210602i5e3f528h2b0630420346db82-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-12-22 11:27                         ` Sasha Khapyorsky
2009-12-28  9:22                     ` Sasha Khapyorsky
2010-01-11 13:56                       ` Hal Rosenstock
     [not found]                         ` <f0e08f231001110556y7c47cc54oa3cfd5859f9a4e76-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-12  9:31                           ` Sasha Khapyorsky
2010-01-13 20:11                             ` Hal Rosenstock
     [not found]                               ` <f0e08f231001131211y64489a51nd2621cefdb27ad25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-16 19:36                                 ` Sasha Khapyorsky
2010-01-23 12:24                                   ` Hal Rosenstock
2010-01-21 20:38                                 ` Ira Weiny
     [not found]                                   ` <20100121123841.43df4cdc.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-01-25 15:18                                     ` Sasha Khapyorsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox