From: Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
To: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>,
linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Al Chu <chu11-i2BcT+NCU+M@public.gmane.org>
Subject: Re: [PATCH] tests/subnet_discover: discover test utility
Date: Thu, 21 Jan 2010 12:38:41 -0800 [thread overview]
Message-ID: <20100121123841.43df4cdc.weiny2@llnl.gov> (raw)
In-Reply-To: <f0e08f231001131211y64489a51nd2621cefdb27ad25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Hey Sasha,
I am finally getting back to this... Sorry.
On Wed, 13 Jan 2010 15:11:44 -0500
Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Sasha,
>
> On Tue, Jan 12, 2010 at 4:31 AM, Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> > Hi Hal,
> >
> > On 08:56 Mon 11 Jan , Hal Rosenstock wrote:
> >> >
> >> > diff --git a/tests/subnet_discover.c b/tests/subnet_discover.c
> >> > index 7f8a85c..42e7aee 100644
> >> > --- a/tests/subnet_discover.c
> >> > +++ b/tests/subnet_discover.c
> >> > @@ -40,6 +40,7 @@ static struct node *node_array[32 * 1024];
> >> > static unsigned node_count = 0;
> >> > static unsigned trid_cnt = 0;
> >> > static unsigned outstanding = 0;
> >> > +static unsigned max_outstanding = 8;
> >>
> >> Any reason why this default is different from the one which OpenSM
> >> uses ? Seems to me it should be the same (or less).
> >
> > In my tests I found that '8' is more optimal number (the tool works
> > faster and without drops) than '4' used in OpenSM.
> >
> > Of course it would be helpful to run this over bigger cluster than
> > what I have to see that the results are consistent.
Here is some test data on a real cluster.
09:49:10 > ibhosts | wc -l
1158
09:49:28 > ibswitches | wc -l
281
09:44:45 > time ./subnet_discover -n 1 > /dev/null
real 0m1.414s
user 0m0.309s
sys 0m0.244s
09:44:55 > time ./subnet_discover -n 2 > /dev/null
real 0m1.025s
user 0m0.284s
sys 0m0.201s
09:45:00 > time ./subnet_discover -n 4 > /dev/null
real 0m0.644s
user 0m0.268s
sys 0m0.228s
09:45:04 > time ./subnet_discover -n 8 > /dev/null
real 0m0.550s
user 0m0.253s
sys 0m0.184s
09:45:08 > time ./subnet_discover -n 12 > /dev/null
real 0m0.524s
user 0m0.207s
sys 0m0.201s
09:45:14 > time ./subnet_discover -n 16 > /dev/null
real 0m0.432s
user 0m0.248s
sys 0m0.144s
09:45:18 > time ./subnet_discover -n 32 > /dev/null
real 0m0.484s
user 0m0.260s
sys 0m0.150s
09:45:57 > time ibnetdiscover > /dev/null
real 0m3.180s
user 0m0.068s
sys 0m0.672s
What I find most interesting is that your test utility runs nearly 2x faster
even when there is only 1 outstanding MAD. :-/ ibnetdiscover (libibnetdisc)
does do a lot more with the data but I would not have expected such a
difference.
As a comparison I ran iblinkinfo it would seem that there is something in the
library which takes a lot more time.
09:51:59 > time iblinkinfo > /dev/null
real 0m3.159s
user 0m0.063s
sys 0m0.526s
For further comparison I rebuilt the parallel version of libibnetdisc.
12:39:02 > time ./ibnetdiscover > /dev/null
real 0m2.552s
user 0m0.295s
sys 0m0.863s
This is with 8 threads (ie 8 outstanding SMP's).
I would appear that your algorithm is superior. I will look at converting
libibnetdisc, test, and submit a patch. I still don't know why there would be
so much difference when only using 1 outstanding MAD though? :-/
>
> This is exactly my concern. Not only cluster size but use cases
> including concurrent diag discover and SM operation where SMPs are
> heavily in use.
>
> There already have been a number of reports of dropped SMPs on this
> list with the current diags and this change will only make things
> worse IMO.
This is a problem. I have seen this issue with large systems which are having
trouble. OpenSM is trying to discover and route. We are running diags trying
to figure out what is going on. There is hardware going up and down; bad
switches or nodes which are booting/rebooting.
I plan to go forward with this but having an option for outstanding MAD's is a
good idea. I don't have an opinion on where it should default.
>
> Also, the OpenSM default should be at least as large as the diags for this.
I agree. OpenSM should have some priority in this matter.
Ira
>
> -- Hal
>
> > Sasha
> >
--
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-01-21 20:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090813204306.dffc3237.weiny2@llnl.gov>
[not found] ` <20090816110200.GS25501@me>
[not found] ` <20090817083023.da17378b.weiny2@llnl.gov>
[not found] ` <20090823120609.GG9547@me>
[not found] ` <20090831170144.da0e7185.weiny2@llnl.gov>
[not found] ` <20090831170144.da0e7185.weiny2-i2BcT+NCU+M@public.gmane.org>
2009-10-23 17:45 ` [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object Sasha Khapyorsky
[not found] ` <20090826164026.8dcce4b2.weiny2@llnl.gov>
[not found] ` <20090826164026.8dcce4b2.weiny2-i2BcT+NCU+M@public.gmane.org>
2009-10-23 23:43 ` Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.) Sasha Khapyorsky
2009-12-20 12:14 ` [PATCH] tests/subnet_discover: discover test utility Sasha Khapyorsky
[not found] ` <20091220182809.f7e17fae.weiny2@llnl.gov>
[not found] ` <20091220182809.f7e17fae.weiny2-i2BcT+NCU+M@public.gmane.org>
2009-12-21 7:35 ` Sasha Khapyorsky
2009-12-21 14:02 ` Hal Rosenstock
[not found] ` <f0e08f230912210602i5e3f528h2b0630420346db82-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-12-22 11:27 ` Sasha Khapyorsky
2009-12-28 9:22 ` Sasha Khapyorsky
2010-01-11 13:56 ` Hal Rosenstock
[not found] ` <f0e08f231001110556y7c47cc54oa3cfd5859f9a4e76-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-12 9:31 ` Sasha Khapyorsky
2010-01-13 20:11 ` Hal Rosenstock
[not found] ` <f0e08f231001131211y64489a51nd2621cefdb27ad25-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-01-16 19:36 ` Sasha Khapyorsky
2010-01-23 12:24 ` Hal Rosenstock
2010-01-21 20:38 ` Ira Weiny [this message]
[not found] ` <20100121123841.43df4cdc.weiny2-i2BcT+NCU+M@public.gmane.org>
2010-01-25 15:18 ` Sasha Khapyorsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100121123841.43df4cdc.weiny2@llnl.gov \
--to=weiny2-i2bct+ncu+m@public.gmane.org \
--cc=chu11-i2BcT+NCU+M@public.gmane.org \
--cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox