From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
OpenFabrics EWG <ewg-G2znmakfqn7U1rindQTSdQ@public.gmane.org>,
Jeff Squyres <jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>,
Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Subject: Re: bug 1918 - openmpi broken due to rdma-cm changes
Date: Thu, 04 Feb 2010 17:04:23 -0600 [thread overview]
Message-ID: <4B6B5277.80307@opengridcomputing.com> (raw)
In-Reply-To: <DB361897EE7C40C5AB494AAC2D8625AC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
Sean Hefty wrote:
>> Well then the rdma-cm needs to know which devices support hw loopback.
>> Cuz on a T3-only system, no hwloop...
>>
>
> The problem sounds like it's more than just whether 127.0.0.1 is usable. That
> check may fix openmpi, but it sounds more like the app needs to know whether the
> device can actually support loopback, regardless of what addresses are used. Is
> this correct?
>
> What would openmpi do if there were two addresses assigned to the T3 device?
>
It would use them and might even create two connections.
> Does openmpi simply bypass RDMA for all connections on the local machine?
>
>
OpenMPI can be run to use hw loopback if its available. For T3
clusters, OMPI is run in a mode to use shared memory for intra-node
communications.
> Basically, I'm not sure that this is *just* an rdma_cm issue. Although it
> definitely appears that some sort of change needs to be made to the rdma_cm.
>
>
I think the OpenMPI rdmacm code needs to skip 127.0.0.1, in this
particular case. Prior to ofed-1.5.1, however, the bind would fail and
thus OpenMPI would not advertise 127.0.0.1 to its peer. I will work to
get that change done.
But lets also add a device attribute so the rdmacm can know if a device
supports loopback. Clearly, if the rdma-cm allows binds to T3,
loopback connections will fail at connect time.
Hey Roland, are you ok with a device attribute to indicate hw-loopback
support?
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-02-04 23:04 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4B6B47D0.9030507@aoot.com>
[not found] ` <4B6B47D0.9030507-10udUCx4aRo@public.gmane.org>
2010-02-04 22:33 ` bug 1918 - openmpi broken due to rdma-cm changes Sean Hefty
[not found] ` <DE586AB7003E4FC4B3850004A0B3CAF6-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-04 22:39 ` Steve Wise
[not found] ` <4B6B4C9B.8070804-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-04 22:42 ` Sean Hefty
[not found] ` <EEF75E01C32544C688989E66FECDC422-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-04 22:49 ` Steve Wise
[not found] ` <4B6B4EFE.3010205-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-04 22:54 ` Sean Hefty
[not found] ` <DB361897EE7C40C5AB494AAC2D8625AC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-04 23:04 ` Steve Wise [this message]
[not found] ` <4B6B5277.80307-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-04 23:35 ` Roland Dreier
[not found] ` <aday6j8efqs.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-04 23:44 ` Steve Wise
[not found] ` <4B6B5BD7.9090301-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-04 23:50 ` Roland Dreier
[not found] ` <adatytwef1i.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-05 0:03 ` Paul Grun
2010-02-04 22:55 ` Steve Wise
[not found] ` <4B6B5048.5020707-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-04 22:56 ` Sean Hefty
2010-02-05 11:32 Jeff Squyres (jsquyres)
[not found] ` <58D723FE08DC6A4398E6596E38F3FA170566DA-2KNrN6/GZtCAsgjym8flbKBKnGwkPULj@public.gmane.org>
2010-02-05 16:16 ` Steve Wise
[not found] ` <4B6C4460.3050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 16:45 ` Steve Wise
2010-02-05 17:51 ` Roland Dreier
[not found] ` <ada4olvefl4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-05 17:58 ` Jeff Squyres
[not found] ` <324EFA68-12F6-46E9-B876-7F4847B53224-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-05 18:32 ` Steve Wise
[not found] ` <4B6C6453.9090706-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 18:49 ` Roland Dreier
2010-02-05 18:56 ` Jason Gunthorpe
[not found] ` <20100205185616.GS16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-02-05 20:08 ` Jeff Squyres
[not found] ` <E8FF8BD1-80AC-4AA7-BC2A-CE7547FB9ABA-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-05 21:14 ` Jason Gunthorpe
[not found] ` <20100205211455.GT16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-02-05 21:40 ` Jeff Squyres
[not found] ` <697C6107-13A9-48E3-B451-02529305100D-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-05 21:53 ` Steve Wise
[not found] ` <4B6C9369.1070208-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 22:15 ` Sean Hefty
[not found] ` <77E29960440B4806B112A7158F4FA1C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 22:21 ` Steve Wise
2010-02-06 16:18 ` Steve Wise
2010-02-05 22:20 ` Jeff Squyres
2010-02-06 0:54 ` Roland Dreier
2010-02-05 18:42 ` Sean Hefty
[not found] ` <3762D25FD9474444A4B3E2240EFB8D0E-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 19:01 ` Steve Wise
[not found] ` <4B6C6B23.4010704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 19:24 ` Roland Dreier
2010-02-05 17:57 ` Jeff Squyres
2010-02-05 16:22 ` Sean Hefty
[not found] ` <0D5487526204477AA2ABED06E46768E2-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 16:38 ` Steve Wise
[not found] ` <4B6C498F.3060708-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 16:52 ` Sean Hefty
[not found] ` <F6DF49B759AD49EEB44BECD99FE26DCF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 17:08 ` Steve Wise
2010-02-05 20:09 ` Sean Hefty
[not found] ` <38B735478FE94F40BBA3E8BFD794B10F-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-06 16:31 ` Steve Wise
[not found] ` <4B6D9948.6040007-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-06 16:45 ` Steve Wise
2010-02-07 0:12 ` Sean Hefty
[not found] ` <B41CA82E76BB439B892B4874D38EA652-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-07 1:22 ` Steve Wise
2010-02-07 11:56 ` [ewg] " Tziporet Koren
2010-02-07 16:39 ` Steve Wise
2010-02-07 16:48 ` Roland Dreier
[not found] ` <ada4oltxa8j.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-07 17:42 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B6B5277.80307@opengridcomputing.com \
--to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
--cc=ewg-G2znmakfqn7U1rindQTSdQ@public.gmane.org \
--cc=jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox