* Re: bug 1918 - openmpi broken due to rdma-cm changes
@ 2010-02-05 11:32 Jeff Squyres (jsquyres)
[not found] ` <58D723FE08DC6A4398E6596E38F3FA170566DA-2KNrN6/GZtCAsgjym8flbKBKnGwkPULj@public.gmane.org>
0 siblings, 1 reply; 72+ messages in thread
From: Jeff Squyres (jsquyres) @ 2010-02-05 11:32 UTC (permalink / raw)
To: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
sean.hefty-ral2JQCrhuEAvxtiuMwx3w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ,
Roland Dreier (rdreier)
[-- Attachment #1.1: Type: text/plain, Size: 2364 bytes --]
Note that it is highly unlikely that we will release open mpi 1.4.2 in time for ofed 1.5.1.
Also note that trying to bind rdma cm to all interface ip addresses was the way that we were advised by openfabrics to figure out which devices are rdma-capable.
As such, it is highly desirable to get the fix transparently in rdmacm and preserve the old semantic. More specifically, it seems undesirable to change this semantic in a minor ofed point release.
-jms
Sent from my PDA. No type good.
----- Original Message -----
From: Steve Wise <swise@opengridcomputing.com>
To: Sean Hefty <sean.hefty@intel.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>; OpenFabrics EWG <ewg@openfabrics.org>; Jeff Squyres (jsquyres); Roland Dreier (rdreier)
Sent: Thu Feb 04 18:04:23 2010
Subject: Re: bug 1918 - openmpi broken due to rdma-cm changes
Sean Hefty wrote:
>> Well then the rdma-cm needs to know which devices support hw loopback.
>> Cuz on a T3-only system, no hwloop...
>>
>
> The problem sounds like it's more than just whether 127.0.0.1 is usable. That
> check may fix openmpi, but it sounds more like the app needs to know whether the
> device can actually support loopback, regardless of what addresses are used. Is
> this correct?
>
> What would openmpi do if there were two addresses assigned to the T3 device?
>
It would use them and might even create two connections.
> Does openmpi simply bypass RDMA for all connections on the local machine?
>
>
OpenMPI can be run to use hw loopback if its available. For T3
clusters, OMPI is run in a mode to use shared memory for intra-node
communications.
> Basically, I'm not sure that this is *just* an rdma_cm issue. Although it
> definitely appears that some sort of change needs to be made to the rdma_cm.
>
>
I think the OpenMPI rdmacm code needs to skip 127.0.0.1, in this
particular case. Prior to ofed-1.5.1, however, the bind would fail and
thus OpenMPI would not advertise 127.0.0.1 to its peer. I will work to
get that change done.
But lets also add a device attribute so the rdmacm can know if a device
supports loopback. Clearly, if the rdma-cm allows binds to T3,
loopback connections will fail at connect time.
Hey Roland, are you ok with a device attribute to indicate hw-loopback
support?
Steve.
[-- Attachment #1.2: Type: text/html, Size: 3165 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
ewg mailing list
ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
^ permalink raw reply [flat|nested] 72+ messages in thread[parent not found: <58D723FE08DC6A4398E6596E38F3FA170566DA-2KNrN6/GZtCAsgjym8flbKBKnGwkPULj@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <58D723FE08DC6A4398E6596E38F3FA170566DA-2KNrN6/GZtCAsgjym8flbKBKnGwkPULj@public.gmane.org> @ 2010-02-05 16:16 ` Steve Wise [not found] ` <4B6C4460.3050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-05 16:22 ` Sean Hefty 1 sibling, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-05 16:16 UTC (permalink / raw) To: Jeff Squyres (jsquyres) Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) Jeff Squyres (jsquyres) wrote: > > Note that it is highly unlikely that we will release open mpi 1.4.2 in > time for ofed 1.5.1. > Jeff, there is no way to handle high priority bug fixes in the current released stream? > Also note that trying to bind rdma cm to all interface ip addresses > was the way that we were advised by openfabrics to figure out which > devices are rdma-capable. > > As such, it is highly desirable to get the fix transparently in rdmacm > and preserve the old semantic. More specifically, it seems undesirable > to change this semantic in a minor ofed point release. > I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1 at all because it regresses OpenMPI. Even with IB systems, if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that rdma interface and advertises this address to its peer as an address to-which that peer can rdma connect! This will break IB clusters too, not just T3/iWARP cluster. While I think OpenMPI needs to skip 127.0.0.1 in its logic, I think we should probably defer allowing 127.0.0.1 binds until ofed-1.6. But Jeff, note that if someone uses the upstream kernel and OpenMPI, its busted... So I recommend: 1) Don't allow 127.0.0.1 binds in ofed-1.5.1 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm connect address (get it in ofed-1.5.2 or ofed-1.6). Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6C4460.3050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C4460.3050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-05 16:45 ` Steve Wise 2010-02-05 17:51 ` Roland Dreier 2010-02-05 17:57 ` Jeff Squyres 2 siblings, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-05 16:45 UTC (permalink / raw) To: Jeff Squyres (jsquyres) Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) > I agree that we should probably not allow 127.0.0.1 binds in > ofed-1.5.1 at all because it regresses OpenMPI. Even with IB systems, > if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is > bound to that rdma interface and advertises this address to its peer > as an address to-which that peer can rdma connect! This will break IB > clusters too, not just T3/iWARP cluster. While I think OpenMPI needs > to skip 127.0.0.1 in its logic, I think we should probably defer > allowing 127.0.0.1 binds until ofed-1.6. > > But Jeff, note that if someone uses the upstream kernel and OpenMPI, > its busted... > > So I recommend: > > 1) Don't allow 127.0.0.1 binds in ofed-1.5.1 > > 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm > connect address (get it in ofed-1.5.2 or ofed-1.6). Also, there is a good argument for never allowing 127.0.0.1 for rdma anyway. It implies a _software_ loopback. It should NEVER be bound to a real NIC interface and thus rdma binds shouldn't be allowed to it since there is no software rdma loopback support... Unless someone implements software rdma loobpack... ;) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C4460.3050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-05 16:45 ` Steve Wise @ 2010-02-05 17:51 ` Roland Dreier [not found] ` <ada4olvefl4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 2010-02-05 17:57 ` Jeff Squyres 2 siblings, 1 reply; 72+ messages in thread From: Roland Dreier @ 2010-02-05 17:51 UTC (permalink / raw) To: Steve Wise Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, ewg-G2znmakfqn7U1rindQTSdQ > But Jeff, note that if someone uses the upstream kernel and OpenMPI, > its busted... Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This just went in for 2.6.33, which is still at -rc6, so if we can quickly reach a consensus, there is still time to get a fix in for 2.6.33. - R. -- Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <ada4olvefl4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <ada4olvefl4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> @ 2010-02-05 17:58 ` Jeff Squyres [not found] ` <324EFA68-12F6-46E9-B876-7F4847B53224-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 2010-02-05 18:42 ` Sean Hefty 1 sibling, 1 reply; 72+ messages in thread From: Jeff Squyres @ 2010-02-05 17:58 UTC (permalink / raw) To: Roland Dreier (rdreier) Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, ewg-G2znmakfqn7U1rindQTSdQ On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote: > > But Jeff, note that if someone uses the upstream kernel and OpenMPI, > > its busted... > > Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This > just went in for 2.6.33, which is still at -rc6, so if we can quickly > reach a consensus, there is still time to get a fix in for 2.6.33. Oh oh oh! Yes, that would be fabulous... Thanks! -- Jeff Squyres <jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <324EFA68-12F6-46E9-B876-7F4847B53224-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <324EFA68-12F6-46E9-B876-7F4847B53224-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-02-05 18:32 ` Steve Wise [not found] ` <4B6C6453.9090706-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-05 18:32 UTC (permalink / raw) To: Jeff Squyres Cc: Roland Dreier (rdreier), sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Jeff Squyres wrote: > On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote: > > >> > But Jeff, note that if someone uses the upstream kernel and OpenMPI, >> > its busted... >> >> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This >> just went in for 2.6.33, which is still at -rc6, so if we can quickly >> reach a consensus, there is still time to get a fix in for 2.6.33. >> > > Oh oh oh! Yes, that would be fabulous... > > Thanks! > > I think we should remove the feature of allowing binds to 127.0.0.1 altogether based on Jeff's arguments and my assertion that 127.0.0.1 is a sw-loopback mechanism anyway... I'm not sure if that commit does more or not... Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6C6453.9090706-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C6453.9090706-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-05 18:49 ` Roland Dreier 2010-02-05 18:56 ` Jason Gunthorpe 1 sibling, 0 replies; 72+ messages in thread From: Roland Dreier @ 2010-02-05 18:49 UTC (permalink / raw) To: Steve Wise Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, ewg-G2znmakfqn7U1rindQTSdQ > I think we should remove the feature of allowing binds to 127.0.0.1 > altogether based on Jeff's arguments and my assertion that 127.0.0.1 > is a sw-loopback mechanism anyway... Well, someone propose a patch please. -- Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C6453.9090706-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-05 18:49 ` Roland Dreier @ 2010-02-05 18:56 ` Jason Gunthorpe [not found] ` <20100205185616.GS16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Jason Gunthorpe @ 2010-02-05 18:56 UTC (permalink / raw) To: Steve Wise Cc: Jeff Squyres, Roland Dreier (rdreier), sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ On Fri, Feb 05, 2010 at 12:32:51PM -0600, Steve Wise wrote: > I think we should remove the feature of allowing binds to 127.0.0.1 > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is > a sw-loopback mechanism anyway... I don't agree, the kernel should be free to provide a loop back service any way it likes, and if that means using one of the HW adaptors to accelerate the work, then fine. Consider if we see the RDMAoE (soft RDMA) patches then it would be reasonable for all kernels to support RDMA on the loopback. At a minimum, RDMA CM is an IP service, so whatever logic you use to determine addresses for TCP must also be done after determining a list of valid RDMA IPs. Trying to do RDMA CM bind just gives you the list of candidate addreses, no different than netlink does for TCP. One of those steps must be at least filtering 127.0.0.0/8. The user should also be able to have some input into the IP filter - software RDMAoE for instance really make this important. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <20100205185616.GS16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <20100205185616.GS16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-02-05 20:08 ` Jeff Squyres [not found] ` <E8FF8BD1-80AC-4AA7-BC2A-CE7547FB9ABA-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Jeff Squyres @ 2010-02-05 20:08 UTC (permalink / raw) To: Jason Gunthorpe Cc: Steve Wise, Roland Dreier (rdreier), sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote: > > I think we should remove the feature of allowing binds to 127.0.0.1 > > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is > > a sw-loopback mechanism anyway... > > I don't agree, the kernel should be free to provide a loop back > service any way it likes, and if that means using one of the HW Ok, fine. Should we push back OFED 1.5.1 until Open MPI can get 1.4.2 out? I don't know when that will be. In short: you're breaking backward compatibility with zero warning. There is real software out there that will break if people upgrade their kernel/OFED/RDMA CM/whatever (e.g., Open MPI). Isn't this supposed to be the Enterprise distribution (meaning: stability)? (trying to keep the frustration out of my voice...) This is a terrible, terrible idea. How about this: back out the change for now. Give everyone time to upgrade. If nothing else, ***give those of us who are involved in this community*** time to upgrade. Then put the feature back in after adequate time has passed. -- Jeff Squyres <jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <E8FF8BD1-80AC-4AA7-BC2A-CE7547FB9ABA-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <E8FF8BD1-80AC-4AA7-BC2A-CE7547FB9ABA-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-02-05 21:14 ` Jason Gunthorpe [not found] ` <20100205211455.GT16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Jason Gunthorpe @ 2010-02-05 21:14 UTC (permalink / raw) To: Jeff Squyres Cc: Steve Wise, Roland Dreier (rdreier), sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ On Fri, Feb 05, 2010 at 03:08:10PM -0500, Jeff Squyres wrote: > On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote: > > > > I think we should remove the feature of allowing binds to 127.0.0.1 > > > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is > > > a sw-loopback mechanism anyway... > > > > I don't agree, the kernel should be free to provide a loop back > > service any way it likes, and if that means using one of the HW > > Ok, fine. Should we push back OFED 1.5.1 until Open MPI can get 1.4.2 out? I don't know when that will be. > In short: you're breaking backward compatibility with zero warning. > There is real software out there that will break if people upgrade > their kernel/OFED/RDMA CM/whatever (e.g., Open MPI). Isn't this > supposed to be the Enterprise distribution (meaning: stability)? > (trying to keep the frustration out of my voice...) Well, I think you are right. This kind of change seems appropriate to me for mainline, but OFED/RHEL should carry a responsibility to manage an identified incompatibility, either patch their kernel, patch their OMPI, or publish an errata. That is the role of a distribution. > How about this: back out the change for now. Give everyone time to > upgrade. If nothing else, ***give those of us who are involved in > this community*** time to upgrade. Then put the feature back in > after adequate time has passed. I've seen this approach go badly too :( If it isn't actually in a mainline kernel userspace devs tend to ignore it .. Sounds like this is taken care for now anyhow, Sean's patch to remove it for iwarp since it doesn't work today with any iwarp drivers does obscure the problem.. But it does seem like rdma_cm mode for IB networks will still be broken in OMPI with the new kernels. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <20100205211455.GT16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <20100205211455.GT16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-02-05 21:40 ` Jeff Squyres [not found] ` <697C6107-13A9-48E3-B451-02529305100D-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Jeff Squyres @ 2010-02-05 21:40 UTC (permalink / raw) To: Jason Gunthorpe Cc: Steve Wise, Roland Dreier (rdreier), sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote: > Well, I think you are right. This kind of change seems appropriate to > me for mainline, but OFED/RHEL should carry a responsibility to manage > an identified incompatibility, either patch their kernel, patch their > OMPI, or publish an errata. That is the role of a distribution. RHEL has said, multiple times, that they rely on OpenFabrics to do the Right Thing. They don't do a lot of testing, validating, etc. > Sounds like this is taken care for now anyhow, Sean's patch to remove > it for iwarp since it doesn't work today with any iwarp drivers does > obscure the problem.. But it does seem like rdma_cm mode for IB > networks will still be broken in OMPI with the new kernels. Correct. So why not back off putting this in the kernel that's coming out now now now? Why not put it in *next* kernel? (or even better, the one after that) Is there a rush / need to have this in *now*? -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <697C6107-13A9-48E3-B451-02529305100D-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <697C6107-13A9-48E3-B451-02529305100D-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-02-05 21:53 ` Steve Wise [not found] ` <4B6C9369.1070208-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-06 0:54 ` Roland Dreier 1 sibling, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-05 21:53 UTC (permalink / raw) To: Jeff Squyres Cc: Jason Gunthorpe, Roland Dreier (rdreier), sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Jeff Squyres wrote: > On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote: > > >> Well, I think you are right. This kind of change seems appropriate to >> me for mainline, but OFED/RHEL should carry a responsibility to manage >> an identified incompatibility, either patch their kernel, patch their >> OMPI, or publish an errata. That is the role of a distribution. >> > > RHEL has said, multiple times, that they rely on OpenFabrics to do the Right Thing. They don't do a lot of testing, validating, etc. > > >> Sounds like this is taken care for now anyhow, Sean's patch to remove >> it for iwarp since it doesn't work today with any iwarp drivers does >> obscure the problem.. But it does seem like rdma_cm mode for IB >> networks will still be broken in OMPI with the new kernels. >> > > Correct. > > So why not back off putting this in the kernel that's coming out now now now? Why not put it in *next* kernel? (or even better, the one after that) > > Is there a rush / need to have this in *now*? > > There is still some inconsistency here. Sean, you claimed binds to 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running IB/openmpi/rdmacm should be seeing issues. We need to dig a little more... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6C9369.1070208-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C9369.1070208-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-05 22:15 ` Sean Hefty [not found] ` <77E29960440B4806B112A7158F4FA1C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-05 22:20 ` Jeff Squyres 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-05 22:15 UTC (permalink / raw) To: 'Steve Wise', Jeff Squyres Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier (rdreier), ewg-G2znmakfqn7U1rindQTSdQ >There is still some inconsistency here. Sean, you claimed binds to >127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running >IB/openmpi/rdmacm should be seeing issues. We need to dig a little more... You can verify this by running ucmatose -b 127.0.0.1 and see if the test enters the listening state. Can you also try testing iwarp with the patch that I sent? - Sean ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <77E29960440B4806B112A7158F4FA1C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <77E29960440B4806B112A7158F4FA1C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-05 22:21 ` Steve Wise 2010-02-06 16:18 ` Steve Wise 1 sibling, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-05 22:21 UTC (permalink / raw) To: Sean Hefty Cc: Jeff Squyres, Jason Gunthorpe, Roland Dreier (rdreier), linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Sean Hefty wrote: >> There is still some inconsistency here. Sean, you claimed binds to >> 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running >> IB/openmpi/rdmacm should be seeing issues. We need to dig a little more... >> > > You can verify this by running ucmatose -b 127.0.0.1 and see if the test enters > the listening state. > Well ofed-1.4.1 with openmpi gets failures when binding to 127.0.0.1 on mthca devs. Jeff will post the results soon. Are you sure ucmatose is really binding to that address? :) > Can you also try testing iwarp with the patch that I sent? > > I will soon. Can't do it right now. I'll try tonight or tomorrow. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <77E29960440B4806B112A7158F4FA1C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-05 22:21 ` Steve Wise @ 2010-02-06 16:18 ` Steve Wise 1 sibling, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-06 16:18 UTC (permalink / raw) To: Sean Hefty Cc: Jeff Squyres, Jason Gunthorpe, Roland Dreier (rdreier), linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Sean Hefty wrote: >> There is still some inconsistency here. Sean, you claimed binds to >> 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running >> IB/openmpi/rdmacm should be seeing issues. We need to dig a little more... >> > > You can verify this by running ucmatose -b 127.0.0.1 and see if the test enters > the listening state. > > Can you also try testing iwarp with the patch that I sent? > > I backported your patch to ofed-1.5.1 and tried it, and apparently binds to 127.0.0.1 are still working even though the only device in the system is iWARP. I'm debugging now. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C9369.1070208-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-05 22:15 ` Sean Hefty @ 2010-02-05 22:20 ` Jeff Squyres 1 sibling, 0 replies; 72+ messages in thread From: Jeff Squyres @ 2010-02-05 22:20 UTC (permalink / raw) To: Steve Wise Cc: Jason Gunthorpe, Roland Dreier (rdreier), sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ On Feb 5, 2010, at 4:53 PM, Steve Wise wrote: > There is still some inconsistency here. Sean, you claimed binds to > 127.0.0.1 succeed in ofed-1.4 for IB devices. If so, then folks running > IB/openmpi/rdmacm should be seeing issues. We need to dig a little more... FWIW, I can run Open MPI v1.4.2beta on my OFED 1.4.1 cluster over IB devices using RDMA CM with no problems. I added some debug statements in OMPI showing which rdma_cm_bind's it attempts, just to be sure. Here's a run across 2 nodes, each with a single 2-port mthca (each port connected to a different IB subnet, not that that matters): $ mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring [svbu-mpi025:05592] FAILED to bind to 127.0.0.1 [svbu-mpi025:05592] FAILED to bind to 172.29.218.165 [svbu-mpi025:05592] SUCCEEDED to bind to 10.10.30.165 [svbu-mpi025:05592] SUCCEEDED to bind to 10.10.20.165 [svbu-mpi026:05529] FAILED to bind to 127.0.0.1 [svbu-mpi026:05529] FAILED to bind to 172.29.218.166 [svbu-mpi026:05529] SUCCEEDED to bind to 10.10.30.166 [svbu-mpi026:05529] SUCCEEDED to bind to 10.10.20.166 ... The 172.x address is my gigE device (eth0). -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <697C6107-13A9-48E3-B451-02529305100D-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 2010-02-05 21:53 ` Steve Wise @ 2010-02-06 0:54 ` Roland Dreier 1 sibling, 0 replies; 72+ messages in thread From: Roland Dreier @ 2010-02-06 0:54 UTC (permalink / raw) To: Jeff Squyres Cc: Jason Gunthorpe, Steve Wise, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ > > Well, I think you are right. This kind of change seems appropriate to > > me for mainline, but OFED/RHEL should carry a responsibility to manage > > an identified incompatibility, either patch their kernel, patch their > > OMPI, or publish an errata. That is the role of a distribution. > > RHEL has said, multiple times, that they rely on OpenFabrics to do the Right Thing. They don't do a lot of testing, validating, etc. In that case OFED plays the role of distribution. -- Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <ada4olvefl4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 2010-02-05 17:58 ` Jeff Squyres @ 2010-02-05 18:42 ` Sean Hefty [not found] ` <3762D25FD9474444A4B3E2240EFB8D0E-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-05 18:42 UTC (permalink / raw) To: 'Roland Dreier', Steve Wise Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ >Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This >just went in for 2.6.33, which is still at -rc6, so if we can quickly >reach a consensus, there is still time to get a fix in for 2.6.33. That should be the patch in question. I'm not sure about reaching consensus. :) If the other changes to the rdma_cm aren't closely tied to that change, we may be able to back that one patch out until we can get whatever other fix may be needed. In my view, openmpi has a bug in that it can pass a loopback address to a remote peer and expect it to be used to establish a connection. Steve seems to agree with this. My original intent was to allow the use of the loopback address with the rdma_cm. I.e. 127.0.0.1 meant 'this host', and not 'software loopback'. I just had Arlin run a quick test with OFED 1.4 over IB, and it allows binding to 127.0.0.1, but never forms connections. I.e. ucmatose -b 127.0.0.1 succeeds in listening, but ucmatose -s 127.0.0.1 fails to connect because of a route error. (Hmm... I'm still confused about what openmpi is doing then.) Even if an application were to use non-loopback IP addresses, there's no guarantee of forming a connection if those addresses map to an iwarp device. So, even if the rdma_cm fails binding to 127.0.0.1 unless there's some RDMA device (software or hardware - not sure why we care) capable of supporting it, an application would need to also deal with failures from rdma_resolve_addr. Indicating loopback through a device capability flag seems like the right approach, and the rdma_cm can use this to fail rdma_bind_addr/rdma_resolve_addr calls. That's probably not a trivial patch however. - Sean ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <3762D25FD9474444A4B3E2240EFB8D0E-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <3762D25FD9474444A4B3E2240EFB8D0E-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-05 19:01 ` Steve Wise [not found] ` <4B6C6B23.4010704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-05 19:01 UTC (permalink / raw) To: Sean Hefty Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, 'Roland Dreier', ewg-G2znmakfqn7U1rindQTSdQ Sean Hefty wrote: >> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")? This >> just went in for 2.6.33, which is still at -rc6, so if we can quickly >> reach a consensus, there is still time to get a fix in for 2.6.33. >> > > That should be the patch in question. I'm not sure about reaching consensus. :) > If the other changes to the rdma_cm aren't closely tied to that change, we may > be able to back that one patch out until we can get whatever other fix may be > needed. > I'd like to do this approach. Then re-submit once we come to consensus... > In my view, openmpi has a bug in that it can pass a loopback address to a remote > peer and expect it to be used to establish a connection. Steve seems to agree > with this. > > My original intent was to allow the use of the loopback address with the > rdma_cm. I.e. 127.0.0.1 meant 'this host', and not 'software loopback'. I just > had Arlin run a quick test with OFED 1.4 over IB, and it allows binding to > 127.0.0.1, but never forms connections. I.e. ucmatose -b 127.0.0.1 succeeds in > listening, but ucmatose -s 127.0.0.1 fails to connect because of a route error. > (Hmm... I'm still confused about what openmpi is doing then.) > But it must fail in OFED-1.4 if binding to an iwarp interface. Maybe there was IB-only logic allowing 127.0.0.1 binds in OFED-1.4? The reason openmpi might still work on IB is that its not typical to use the rdma-cm for IB setups. Its required for iwarp though. Jeff, what's the default CPC for IB devices? > Even if an application were to use non-loopback IP addresses, there's no > guarantee of forming a connection if those addresses map to an iwarp device. > So, even if the rdma_cm fails binding to 127.0.0.1 unless there's some RDMA > device (software or hardware - not sure why we care) capable of supporting it, > an application would need to also deal with failures from rdma_resolve_addr. > > Indicating loopback through a device capability flag seems like the right > approach, and the rdma_cm can use this to fail rdma_bind_addr/rdma_resolve_addr > calls. That's probably not a trivial patch however. > > - Sean > ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6C6B23.4010704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C6B23.4010704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-05 19:24 ` Roland Dreier 0 siblings, 0 replies; 72+ messages in thread From: Roland Dreier @ 2010-02-05 19:24 UTC (permalink / raw) To: Steve Wise Cc: Sean Hefty, Jeff Squyres (jsquyres), linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ > > That should be the patch in question. I'm not sure about reaching consensus. :) > > If the other changes to the rdma_cm aren't closely tied to that change, we may > > be able to back that one patch out until we can get whatever other fix may be > > needed. > I'd like to do this approach. Then re-submit once we come to consensus... That makes sense to me. Someone please send me a tested revert. -- Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C4460.3050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-05 16:45 ` Steve Wise 2010-02-05 17:51 ` Roland Dreier @ 2010-02-05 17:57 ` Jeff Squyres 2 siblings, 0 replies; 72+ messages in thread From: Jeff Squyres @ 2010-02-05 17:57 UTC (permalink / raw) To: Steve Wise Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) On Feb 5, 2010, at 11:16 AM, Steve Wise wrote: > > Note that it is highly unlikely that we will release open mpi 1.4.2 in > > time for ofed 1.5.1. > > Jeff, there is no way to handle high priority bug fixes in the current > released stream? We have 1.4.2 cooking, but it's not ready yet. I'll take it back to the OMPI community to see if they want to do a high-priority release, but I'm not excited about it (see below). > > Also note that trying to bind rdma cm to all interface ip addresses > > was the way that we were advised by openfabrics to figure out which > > devices are rdma-capable. > > > > As such, it is highly desirable to get the fix transparently in rdmacm > > and preserve the old semantic. More specifically, it seems undesirable > > to change this semantic in a minor ofed point release. > > I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1 > at all because it regresses OpenMPI. Even with IB systems, if the bind > to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that > rdma interface and advertises this address to its peer as an address > to-which that peer can rdma connect! This will break IB clusters too, > not just T3/iWARP cluster. While I think OpenMPI needs to skip > 127.0.0.1 in its logic, I think we should probably defer allowing > 127.0.0.1 binds until ofed-1.6. I agree that Open MPI should not advertise 127.0.0.1 to peers. However, the logic that we were advised to use was to try to RDMA CM bind to each IP address. If the bind succeeds, then it's an RDMA-capable device and therefore it's advertisable. The rationale was that 127.0.0.1 (really, any loopback address) is *not* an RDMA device and therefore the RDMA CM bind should *never* succeed on it. Hence, it wasn't necessary to add a "is this a loopback address?" check in the logic. I guess I don't understand why that rationale is now incorrect -- 127.0.0.1 is still not an RDMA-capable device, right? > But Jeff, note that if someone uses the upstream kernel and OpenMPI, its > busted... > > So I recommend: > > 1) Don't allow 127.0.0.1 binds in ofed-1.5.1 > > 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm > connect address (get it in ofed-1.5.2 or ofed-1.6). We can add this logic (because I understand that some upstream kernels now allow binding to loopback addresses), but I'm still confused (in principle) as to why it should be necessary. Can you clarify what kernel versions allow binding LOOPBACK addresses with RDMA CM? -- Jeff Squyres <jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <58D723FE08DC6A4398E6596E38F3FA170566DA-2KNrN6/GZtCAsgjym8flbKBKnGwkPULj@public.gmane.org> 2010-02-05 16:16 ` Steve Wise @ 2010-02-05 16:22 ` Sean Hefty [not found] ` <0D5487526204477AA2ABED06E46768E2-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-05 16:22 UTC (permalink / raw) To: 'Jeff Squyres (jsquyres)', swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) >Also note that trying to bind rdma cm to all interface ip addresses was the way >that we were advised by openfabrics to figure out which devices are rdma- >capable. > >As such, it is highly desirable to get the fix transparently in rdmacm and >preserve the old semantic. More specifically, it seems undesirable to change >this semantic in a minor ofed point release. I think the issue is larger than just the rdma_cm. First, it sounds like openmpi tries to bind to 127.0.0.1, which now works. If opemmpi uses shared memory for connections on the same machine, I'm not sure why this is a problem, unless it is passing that address to another machine to use for a connection. If this is the case, then that is a bug in openmpi. Second, I still don't understand whether iwarp is limited to 'loopback' connections that are not bound to 127.0.0.1. For instance, if the RDMA device is associated with 192.168.0.1, then can it handle a connection from 192.168.0.1 <-> 192.168.0.1? If it can't, then the rdma_cm can't help in this case when bind is called. The failure has to come during connect, which sounds like the behavior that's seen today with 127.0.0.1. So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't support loopback, I'm still not sure how much of a fix this is. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <0D5487526204477AA2ABED06E46768E2-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <0D5487526204477AA2ABED06E46768E2-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-05 16:38 ` Steve Wise [not found] ` <4B6C498F.3060708-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-05 16:38 UTC (permalink / raw) To: Sean Hefty Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) Sean Hefty wrote: >> Also note that trying to bind rdma cm to all interface ip addresses was the way >> that we were advised by openfabrics to figure out which devices are rdma- >> capable. >> >> As such, it is highly desirable to get the fix transparently in rdmacm and >> preserve the old semantic. More specifically, it seems undesirable to change >> this semantic in a minor ofed point release. >> > > I think the issue is larger than just the rdma_cm. > > First, it sounds like openmpi tries to bind to 127.0.0.1, which now works. If > opemmpi uses shared memory for connections on the same machine, I'm not sure why > this is a problem, unless it is passing that address to another machine to use > for a connection. If this is the case, then that is a bug in openmpi. > Yes, OpenMPI incorrectly advertises 127.0.0.1 as a valid address to-which the peer can connect. This needs to be fixed. > Second, I still don't understand whether iwarp is limited to 'loopback' > connections that are not bound to 127.0.0.1. For instance, if the RDMA device > is associated with 192.168.0.1, then can it handle a connection from 192.168.0.1 > <-> 192.168.0.1? If it can't, then the rdma_cm can't help in this case when > bind is called. The failure has to come during connect, which sounds like the > behavior that's seen today with 127.0.0.1. > Its not iWARP specific. A device may or may not support hw loopback. Now the IB spec mandates this support, but the iWARP spec doesn't. Ammasso and Chelsio T3 rnics do not support HW loopback. They will fail if you try to connect to a local address. The rdma-cm shouldn't allow binds to 127.0.0.1 for these devices since it 100% implies that the connection will require hw loopback for that device. > So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't > support loopback, I'm still not sure how much of a fix this is. > My concern is breaking an existing working OpenMPI in a point release because we changed semantics of the rdma-cm in an ofed point release... BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new kernel version? Steve. > - Sean > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6C498F.3060708-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* RE: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C498F.3060708-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-05 16:52 ` Sean Hefty [not found] ` <F6DF49B759AD49EEB44BECD99FE26DCF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-05 20:09 ` Sean Hefty 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-05 16:52 UTC (permalink / raw) To: 'Steve Wise' Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) >My concern is breaking an existing working OpenMPI in a point release >because we changed semantics of the rdma-cm in an ofed point release... OFED can call this release a point release, but in reality, the content makes it a major release... >BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new kernel >version? apparently -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <F6DF49B759AD49EEB44BECD99FE26DCF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <F6DF49B759AD49EEB44BECD99FE26DCF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-05 17:08 ` Steve Wise 2010-02-07 21:44 ` [ewg] " Tziporet Koren 1 sibling, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-05 17:08 UTC (permalink / raw) To: Sean Hefty Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) Sean Hefty wrote: >> My concern is breaking an existing working OpenMPI in a point release >> because we changed semantics of the rdma-cm in an ofed point release... >> > > OFED can call this release a point release, but in reality, the content makes it > a major release... > > >> BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new kernel >> version? >> > > apparently > > Well as it stands now: OpenMPI on ofed-1.5.1 is broken for IB if they use the rdma-cm for connection setup, and all IW clusters which require the rdma-cm connect method. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <F6DF49B759AD49EEB44BECD99FE26DCF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-05 17:08 ` Steve Wise @ 2010-02-07 21:44 ` Tziporet Koren [not found] ` <4B6F3451.2070304-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Tziporet Koren @ 2010-02-07 21:44 UTC (permalink / raw) To: Sean Hefty Cc: 'Steve Wise', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier), dwilder-r/Jw6+rmf7HQT0dZR+AlfA On 2/5/2010 6:52 PM, Sean Hefty wrote: > >> BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new kernel >> version? >> > apparently > > Sorry to jump late on this thread OFED 1.5.1 was not rebased on a new kernel - its still based on 2.6.30. But many time we take patches that were accepted by the kernel to OFED. These patches where pushed to OFED by David Wilder from IBM, since loopback support of CMA was important for them. Therefore I add David to the thread too. Steve/Jeff - how come no one tested iWARP with the new kernel patches when Sean submitted them? Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6F3451.2070304-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>]
* Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6F3451.2070304-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> @ 2010-02-08 5:38 ` Steve Wise 0 siblings, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-08 5:38 UTC (permalink / raw) To: tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier), dwilder-r/Jw6+rmf7HQT0dZR+AlfA Tziporet Koren wrote: > On 2/5/2010 6:52 PM, Sean Hefty wrote: >> >>> BTW: Was this change an artifact of rebasing ofed-1.5.1 on a new >>> kernel >>> version? >>> >> apparently >> >> > Sorry to jump late on this thread > OFED 1.5.1 was not rebased on a new kernel - its still based on 2.6.30. > But many time we take patches that were accepted by the kernel to OFED. > These patches where pushed to OFED by David Wilder from IBM, since > loopback support of CMA was important for them. > Therefore I add David to the thread too. > > Steve/Jeff - how come no one tested iWARP with the new kernel patches > when Sean submitted them? > I tested iwarp in 2.6.33-rc4. But not using OpenMPI. Steve. > Tziporet > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6C498F.3060708-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-05 16:52 ` Sean Hefty @ 2010-02-05 20:09 ` Sean Hefty [not found] ` <38B735478FE94F40BBA3E8BFD794B10F-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-05 20:09 UTC (permalink / raw) To: 'Steve Wise' Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) >Ammasso and Chelsio T3 rnics do not support HW loopback. It looks like the NES driver doesn't support 127.0.0.1, but does support loopback connections (gurgle). Here's an untested patch for 2.6.33 (not even compile tested) for consideration then. I'll be testing this shortly unless there's disagreement. rdma/cm: disallow loopback address for iwarp devices From: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> The current RDMA iWarp devices cannot be used to establish connections using the loopback address. Prevent rdma_bind_addr from associating the loopback address with an iWarp device. This fixes an issue with openmpi, where it tries to identify which IP addresses map to RDMA devices by calling rdma_bind_addr on each address and seeing if the bind succeeds. Prior to patch 6f8372b6 "RDMA/cm: fix loopback address support", this process worked. But the rdma_cm now allows rdma_bind_addr to bind to an RDMA device using the loopback address, and attaches the rdma_cm_id to the RDMA device as part of the bind. Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- drivers/infiniband/core/cma.c | 14 ++++++++++---- 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc9b594..5850411 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1739,6 +1739,9 @@ err: } EXPORT_SYMBOL(rdma_resolve_route); +/* + * Only IB devices support loopback connections. + */ static int cma_bind_loopback(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; @@ -1753,11 +1756,16 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) ret = -ENODEV; goto out; } - list_for_each_entry(cma_dev, &dev_list, list) + list_for_each_entry(cma_dev, &dev_list, list) { + if (rdma_node_get_transport(cma_dev->device->node_type) != + RDMA_TRANSPORT_IB) + continue; + for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) if (!ib_query_port(cma_dev->device, p, &port_attr) && port_attr.state == IB_PORT_ACTIVE) goto port_found; + } p = 1; cma_dev = list_entry(dev_list.next, struct cma_device, list); @@ -1771,9 +1779,7 @@ port_found: if (ret) goto out; - id_priv->id.route.addr.dev_addr.dev_type = - (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB) ? - ARPHRD_INFINIBAND : ARPHRD_ETHER; + id_priv->id.route.addr.dev_addr.dev_type = ARPHRD_INFINIBAND; rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 72+ messages in thread
[parent not found: <38B735478FE94F40BBA3E8BFD794B10F-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <38B735478FE94F40BBA3E8BFD794B10F-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-06 16:31 ` Steve Wise [not found] ` <4B6D9948.6040007-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-06 16:31 UTC (permalink / raw) To: Sean Hefty Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) > rdma/cm: disallow loopback address for iwarp devices > > From: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > > The current RDMA iWarp devices cannot be used to establish > connections using the loopback address. Prevent rdma_bind_addr > from associating the loopback address with an iWarp device. > > This fixes an issue with openmpi, where it tries to identify which > IP addresses map to RDMA devices by calling rdma_bind_addr on > each address and seeing if the bind succeeds. Prior to patch > 6f8372b6 "RDMA/cm: fix loopback address support", this process > worked. But the rdma_cm now allows rdma_bind_addr to bind to an > RDMA device using the loopback address, and attaches the rdma_cm_id > to the RDMA device as part of the bind. > > Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > --- > > drivers/infiniband/core/cma.c | 14 ++++++++++---- > 1 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index cc9b594..5850411 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -1739,6 +1739,9 @@ err: > } > EXPORT_SYMBOL(rdma_resolve_route); > > +/* > + * Only IB devices support loopback connections. > + */ > static int cma_bind_loopback(struct rdma_id_private *id_priv) > { > struct cma_device *cma_dev; > @@ -1753,11 +1756,16 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) > ret = -ENODEV; > goto out; > } > - list_for_each_entry(cma_dev, &dev_list, list) > + list_for_each_entry(cma_dev, &dev_list, list) { > + if (rdma_node_get_transport(cma_dev->device->node_type) != > + RDMA_TRANSPORT_IB) > + continue; > + > for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) > if (!ib_query_port(cma_dev->device, p, &port_attr) && > port_attr.state == IB_PORT_ACTIVE) > goto port_found; > + } > Here you need to: ret = -ENODEV; goto out; instead of: > > p = 1; > cma_dev = list_entry(dev_list.next, struct cma_device, list); > Otherwise it will still bind to the first device even if its iwarp... With this mod, it works. > @@ -1771,9 +1779,7 @@ port_found: > if (ret) > goto out; > > - id_priv->id.route.addr.dev_addr.dev_type = > - (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB) ? > - ARPHRD_INFINIBAND : ARPHRD_ETHER; > + id_priv->id.route.addr.dev_addr.dev_type = ARPHRD_INFINIBAND; > > rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); > ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6D9948.6040007-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6D9948.6040007-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-06 16:45 ` Steve Wise 2010-02-07 0:12 ` Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-06 16:45 UTC (permalink / raw) To: Sean Hefty Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) Note, even though this patch resolved the openmpi failure on my iwarp nodes, ucmatose -b 127.0.0.1 doesn't fail. I haven't looked at the src, but something funny must be happening. So we still have a regression issue with ofed-1.5.1/upstream kernels and openmpi over IB with rdmacm. Steve. Steve Wise wrote: > >> rdma/cm: disallow loopback address for iwarp devices >> >> From: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> >> The current RDMA iWarp devices cannot be used to establish >> connections using the loopback address. Prevent rdma_bind_addr >> from associating the loopback address with an iWarp device. >> >> This fixes an issue with openmpi, where it tries to identify which >> IP addresses map to RDMA devices by calling rdma_bind_addr on >> each address and seeing if the bind succeeds. Prior to patch >> 6f8372b6 "RDMA/cm: fix loopback address support", this process >> worked. But the rdma_cm now allows rdma_bind_addr to bind to an >> RDMA device using the loopback address, and attaches the rdma_cm_id >> to the RDMA device as part of the bind. >> >> Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> --- >> >> drivers/infiniband/core/cma.c | 14 ++++++++++---- >> 1 files changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/infiniband/core/cma.c >> b/drivers/infiniband/core/cma.c >> index cc9b594..5850411 100644 >> --- a/drivers/infiniband/core/cma.c >> +++ b/drivers/infiniband/core/cma.c >> @@ -1739,6 +1739,9 @@ err: >> } >> EXPORT_SYMBOL(rdma_resolve_route); >> >> +/* >> + * Only IB devices support loopback connections. >> + */ >> static int cma_bind_loopback(struct rdma_id_private *id_priv) >> { >> struct cma_device *cma_dev; >> @@ -1753,11 +1756,16 @@ static int cma_bind_loopback(struct >> rdma_id_private *id_priv) >> ret = -ENODEV; >> goto out; >> } >> - list_for_each_entry(cma_dev, &dev_list, list) >> + list_for_each_entry(cma_dev, &dev_list, list) { >> + if (rdma_node_get_transport(cma_dev->device->node_type) != >> + RDMA_TRANSPORT_IB) >> + continue; >> + >> for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) >> if (!ib_query_port(cma_dev->device, p, &port_attr) && >> port_attr.state == IB_PORT_ACTIVE) >> goto port_found; >> + } >> > > Here you need to: > ret = -ENODEV; > goto out; > > instead of: >> >> p = 1; >> cma_dev = list_entry(dev_list.next, struct cma_device, list); >> > > Otherwise it will still bind to the first device even if its iwarp... > > With this mod, it works. > >> @@ -1771,9 +1779,7 @@ port_found: >> if (ret) >> goto out; >> >> - id_priv->id.route.addr.dev_addr.dev_type = >> - (rdma_node_get_transport(cma_dev->device->node_type) == >> RDMA_TRANSPORT_IB) ? >> - ARPHRD_INFINIBAND : ARPHRD_ETHER; >> + id_priv->id.route.addr.dev_addr.dev_type = ARPHRD_INFINIBAND; >> >> rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); >> ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6D9948.6040007-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-06 16:45 ` Steve Wise @ 2010-02-07 0:12 ` Sean Hefty [not found] ` <B41CA82E76BB439B892B4874D38EA652-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-07 0:12 UTC (permalink / raw) To: 'Steve Wise' Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) >> - list_for_each_entry(cma_dev, &dev_list, list) >> + list_for_each_entry(cma_dev, &dev_list, list) { >> + if (rdma_node_get_transport(cma_dev->device->node_type) != >> + RDMA_TRANSPORT_IB) >> + continue; >> + >> for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) >> if (!ib_query_port(cma_dev->device, p, &port_attr) && >> port_attr.state == IB_PORT_ACTIVE) >> goto port_found; >> + } >> > >Here you need to: > ret = -ENODEV; > goto out; Good catch, I'll update the patch and submit for 2.6.33 on Monday. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <B41CA82E76BB439B892B4874D38EA652-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <B41CA82E76BB439B892B4874D38EA652-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-07 1:22 ` Steve Wise [not found] ` <4B6E15C4.9020703-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-08 6:02 ` [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices Sean Hefty 1 sibling, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-07 1:22 UTC (permalink / raw) To: Sean Hefty Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) >> >> Good catch, I'll update the patch and submit for 2.6.33 on Monday. >> >> NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6E15C4.9020703-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6E15C4.9020703-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-07 11:56 ` Tziporet Koren [not found] ` <4B6EAA5F.1000208-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Tziporet Koren @ 2010-02-07 11:56 UTC (permalink / raw) To: Steve Wise Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) On 2/7/2010 3:22 AM, Steve Wise wrote: > >>> >>> Good catch, I'll update the patch and submit for 2.6.33 on Monday. >>> >>> >>> > NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1. > > If this patch will be accepted to the kernel 2.6.33 we can take it too Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6EAA5F.1000208-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>]
* Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6EAA5F.1000208-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> @ 2010-02-07 16:39 ` Steve Wise [not found] ` <4B6EECBE.6020509-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-07 16:39 UTC (permalink / raw) To: Tziporet Koren Cc: Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) Tziporet Koren wrote: > On 2/7/2010 3:22 AM, Steve Wise wrote: >> >>>> >>>> Good catch, I'll update the patch and submit for 2.6.33 on Monday. >>>> >>>> >>>> >> NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1. >> >> > If this patch will be accepted to the kernel 2.6.33 we can take it too > If ofed-1.5.1 is based on 2.6.33 then it will get this patch automatically (assuming it goes upstream and makes 2.6.33). Or we can pull it in as a kernel_patches/fixes/ patch. My point, though, is that even with this patch in ofed-1.5.1, we still have an openmpi/IB/rdmacm regression. The only way to avoid this regression without changing openmpi is to disallow _all_ rdma binds to 127.0.0.1. Steve. > Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6EECBE.6020509-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6EECBE.6020509-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-07 16:48 ` Roland Dreier [not found] ` <ada4oltxa8j.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 2010-02-08 11:52 ` Tziporet Koren 1 sibling, 1 reply; 72+ messages in thread From: Roland Dreier @ 2010-02-07 16:48 UTC (permalink / raw) To: Steve Wise Cc: Tziporet Koren, Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ > My point, though, is that even with this patch in ofed-1.5.1, we still > have an openmpi/IB/rdmacm regression. The only way to avoid this > regression without changing openmpi is to disallow _all_ rdma binds to > 127.0.0.1. Can you identify the source of the regression? ie what was the change that broke things? I'm most concerned that there is another regression in 2.6.33, and if so I would like to try and avoid letting that get into the final release. -- Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> Cisco.com - http://www.cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <ada4oltxa8j.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>]
* Re: bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <ada4oltxa8j.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> @ 2010-02-07 17:42 ` Steve Wise 2010-02-08 5:27 ` [ewg] " Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-07 17:42 UTC (permalink / raw) To: Roland Dreier Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sean Hefty, ewg-G2znmakfqn7U1rindQTSdQ Roland Dreier wrote: > > My point, though, is that even with this patch in ofed-1.5.1, we still > > have an openmpi/IB/rdmacm regression. The only way to avoid this > > regression without changing openmpi is to disallow _all_ rdma binds to > > 127.0.0.1. > > Can you identify the source of the regression? ie what was the change > that broke things? > > It is the same commit you sited earlier. It enables binding rdma cm_ids to 127.0.0.1. Sean's proposed patch on top of that disables this only for iwarp devices. > I'm most concerned that there is another regression in 2.6.33, and if so > I would like to try and avoid letting that get into the final release. > ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <ada4oltxa8j.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 2010-02-07 17:42 ` Steve Wise @ 2010-02-08 5:27 ` Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Sean Hefty @ 2010-02-08 5:27 UTC (permalink / raw) To: 'Roland Dreier', Steve Wise Cc: Tziporet Koren, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ >Can you identify the source of the regression? ie what was the change >that broke things? My understanding is that support for loopback addresses exposes an existing bug in openmpi. It tries to bind to 127.0.0.1, which now succeeds. Openmpi passes that address to a remote node for use in connections. >I'm most concerned that there is another regression in 2.6.33, and if so >I would like to try and avoid letting that get into the final release. Unless we never support loopback addresses, openmpi will see a regression. The only other problem that I'm aware of for 2.6.33 is that the bind to a loopback address will succeed, even though the RDMA device may not support loopback. This is true for the Chelsio and Ammasso drivers. Connections should still fail, but the bind is basically useless in this case. I will try to get a patch for that tomorrow. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6EECBE.6020509-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-07 16:48 ` Roland Dreier @ 2010-02-08 11:52 ` Tziporet Koren [not found] ` <4B6FFB07.1070701-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Tziporet Koren @ 2010-02-08 11:52 UTC (permalink / raw) To: Steve Wise Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sean Hefty, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) On 2/7/2010 6:39 PM, Steve Wise wrote: > > If ofed-1.5.1 is based on 2.6.33 then it will get this patch > automatically (assuming it goes upstream and makes 2.6.33). Or we can > pull it in as a kernel_patches/fixes/ patch. > OFED 1.5.1 is not based on 2.6.33, but on 2.6.30, so we need the patch under fixes. Steve - can you prepare such a patch? Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6FFB07.1070701-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>]
* Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes [not found] ` <4B6FFB07.1070701-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> @ 2010-02-08 14:29 ` Steve Wise 0 siblings, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-08 14:29 UTC (permalink / raw) To: tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sean Hefty, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) Tziporet Koren wrote: > On 2/7/2010 6:39 PM, Steve Wise wrote: >> >> If ofed-1.5.1 is based on 2.6.33 then it will get this patch >> automatically (assuming it goes upstream and makes 2.6.33). Or we can >> pull it in as a kernel_patches/fixes/ patch. >> > OFED 1.5.1 is not based on 2.6.33, but on 2.6.30, so we need the patch > under fixes. > Steve - can you prepare such a patch? > > Tziporet > > The reason I thought it was based on 2.6.33, is because I see 2.6.33 git tags in the ofed kernel tree. I misinterpreted what that meant. I can develop a patch, but it will disable _all_ 127.0.0.1 binds. Otherwise openmpi is still broken on IB. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <B41CA82E76BB439B892B4874D38EA652-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-07 1:22 ` Steve Wise @ 2010-02-08 6:02 ` Sean Hefty [not found] ` <79BAA34231304F1E84C5A5A53C50A207-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-08 6:02 UTC (permalink / raw) To: Hefty, Sean, 'Steve Wise' Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) Since iWarp devices are not guaranteed to support loopback connections, prevent rdma_bind_addr from associating the loopback address with an iWarp device. Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- This includes feedback from Steve Wise based on the initial rfc patch. Although this patch is needed to prevent binding to RDMA devices that may not support loopback addressing, is also works around a bug in openmpi using the loopback address to bind to an iwarp device. This is not a perfect solution either, since it disable all iwarp devices. The NES driver should be able to support loopback connections, though that feature has never been tested. We may need a per device attribute. I will look at creating such a patch, but wanted to post this in case I can't get that one done in the next couple of days. drivers/infiniband/core/cma.c | 18 ++++++++++++------ 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc9b594..fe8b0c0 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1739,6 +1739,9 @@ err: } EXPORT_SYMBOL(rdma_resolve_route); +/* + * Only IB devices are guaranteed to support loopback connections. + */ static int cma_bind_loopback(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; @@ -1753,14 +1756,19 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv) ret = -ENODEV; goto out; } - list_for_each_entry(cma_dev, &dev_list, list) + list_for_each_entry(cma_dev, &dev_list, list) { + if (rdma_node_get_transport(cma_dev->device->node_type) != + RDMA_TRANSPORT_IB) + continue; + for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) if (!ib_query_port(cma_dev->device, p, &port_attr) && port_attr.state == IB_PORT_ACTIVE) goto port_found; + } - p = 1; - cma_dev = list_entry(dev_list.next, struct cma_device, list); + ret = -ENODEV; + goto out; port_found: ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); @@ -1771,9 +1779,7 @@ port_found: if (ret) goto out; - id_priv->id.route.addr.dev_addr.dev_type = - (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB) ? - ARPHRD_INFINIBAND : ARPHRD_ETHER; + id_priv->id.route.addr.dev_addr.dev_type = ARPHRD_INFINIBAND; rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); ^ permalink raw reply related [flat|nested] 72+ messages in thread
[parent not found: <79BAA34231304F1E84C5A5A53C50A207-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <79BAA34231304F1E84C5A5A53C50A207-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-08 11:52 ` Tziporet Koren [not found] ` <4B6FFB1B.40905-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> 2010-02-09 16:32 ` [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices Sean Hefty 1 sibling, 1 reply; 72+ messages in thread From: Tziporet Koren @ 2010-02-08 11:52 UTC (permalink / raw) To: Sean Hefty Cc: 'Steve Wise', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) On 2/8/2010 8:02 AM, Sean Hefty wrote: > Since iWarp devices are not guaranteed to support loopback connections, > prevent rdma_bind_addr from associating the loopback address with > an iWarp device. > > Signed-off-by: Sean Hefty<sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > Steve Have you tested this patch? When accepted to kernel can you prepare a patch for OFED 1.5.1 under fixes Thanks Tziporet -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B6FFB1B.40905-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>]
* Re: [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B6FFB1B.40905-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org> @ 2010-02-08 14:29 ` Steve Wise [not found] ` <4B701FE6.60302-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-08 14:29 UTC (permalink / raw) To: tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Sean Hefty, ewg-G2znmakfqn7U1rindQTSdQ, Roland Dreier (rdreier) This patch doesn't solve the openmpi/IB regression. So for OFED, IMO, we need a different patch... Tziporet Koren wrote: > On 2/8/2010 8:02 AM, Sean Hefty wrote: >> Since iWarp devices are not guaranteed to support loopback connections, >> prevent rdma_bind_addr from associating the loopback address with >> an iWarp device. >> >> Signed-off-by: Sean Hefty<sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> > > Steve > Have you tested this patch? > When accepted to kernel can you prepare a patch for OFED 1.5.1 under > fixes > > Thanks > Tziporet ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B701FE6.60302-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B701FE6.60302-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-08 16:52 ` Roland Dreier [not found] ` <adawrynwtz9.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Roland Dreier @ 2010-02-08 16:52 UTC (permalink / raw) To: Steve Wise Cc: tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ > This patch doesn't solve the openmpi/IB regression. So for OFED, > IMO, we need a different patch... If this doesn't solve the regression the we should have a different patch for upstream too. The goal for 2.6.33 should be to keep open mpi working, even if that requires us to go back to old breakage. -- Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <adawrynwtz9.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>]
* Re: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <adawrynwtz9.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> @ 2010-02-08 19:19 ` Jason Gunthorpe [not found] ` <20100208191927.GU16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-02-09 0:41 ` [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback Sean Hefty 1 sibling, 1 reply; 72+ messages in thread From: Jason Gunthorpe @ 2010-02-08 19:19 UTC (permalink / raw) To: Roland Dreier Cc: Steve Wise, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ On Mon, Feb 08, 2010 at 08:52:10AM -0800, Roland Dreier wrote: > > This patch doesn't solve the openmpi/IB regression. So for OFED, > > IMO, we need a different patch... > > If this doesn't solve the regression the we should have a different > patch for upstream too. The goal for 2.6.33 should be to keep open mpi > working, even if that requires us to go back to old breakage. Steve, I thought you said earlier in the thread that the rdmacm OMPI method is not used that often with IB - and the other IB connect methods work fine. This really is a bug in OMPI, how long do you think this new feature should remain outside the upstream kernel? Is someone going to commit to fixing OMPI soon if the patch is removed? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <20100208191927.GU16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <20100208191927.GU16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-02-08 20:02 ` Steve Wise [not found] ` <4B706DED.9080403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-08 20:02 UTC (permalink / raw) To: Jason Gunthorpe Cc: Roland Dreier, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Sean Hefty, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Jeff Squyres Jason Gunthorpe wrote: > On Mon, Feb 08, 2010 at 08:52:10AM -0800, Roland Dreier wrote: > >> > This patch doesn't solve the openmpi/IB regression. So for OFED, >> > IMO, we need a different patch... >> >> If this doesn't solve the regression the we should have a different >> patch for upstream too. The goal for 2.6.33 should be to keep open mpi >> working, even if that requires us to go back to old breakage. >> > > Steve, I thought you said earlier in the thread that the rdmacm OMPI > method is not used that often with IB - and the other IB connect > methods work fine. > > Maybe Jeff can chime in here, but he mentioned to me that Sandia Labs were using IB/rdmacm. > This really is a bug in OMPI, how long do you think this new feature > should remain outside the upstream kernel? Is someone going to commit > to fixing OMPI soon if the patch is removed? > > IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback. But I believe Jeff asked at least that we pull it from 2.6.33 and let OMPI get its next release out with the OMPI fix. Then you can push it into 2.6.34 if we really want this feature. I will commit to get the fix in openmpi asap. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B706DED.9080403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* RE: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B706DED.9080403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-08 20:33 ` Sean Hefty [not found] ` <C8A2C57AD5FA4141860DBFF60BFDE2DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-08 20:33 UTC (permalink / raw) To: 'Steve Wise', Jason Gunthorpe Cc: Roland Dreier, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Jeff Squyres >IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback. I disagree, but what does it matter? So, we add a 'software' loopback that uses 127.0.0.1. Openmpi still wouldn't work. >I will commit to get the fix in openmpi asap. If we don't care if the fix is in the kernel or user space, then we could add an a 'disable-loopback-support' build option to librdmacm, which can fail any attempt to bind to a loopback address. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <C8A2C57AD5FA4141860DBFF60BFDE2DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [ewg] [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <C8A2C57AD5FA4141860DBFF60BFDE2DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-08 21:16 ` Steve Wise [not found] ` <4B707F2D.3030508-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-08 21:16 UTC (permalink / raw) To: Sean Hefty Cc: Jason Gunthorpe, Roland Dreier, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Jeff Squyres Sean Hefty wrote: >> IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback. >> > > I disagree, but what does it matter? So, we add a 'software' loopback that uses > 127.0.0.1. Openmpi still wouldn't work. > > I guess that's true. >> I will commit to get the fix in openmpi asap. >> > > If we don't care if the fix is in the kernel or user space, then we could add an > a 'disable-loopback-support' build option to librdmacm, which can fail any > attempt to bind to a loopback address. > > I'd rather see it removed from 2.6.33 kernel before it shipts, and then we fix openmpi, and then re-submit 127.0.0.1 support once openmpi publishes a release with its fix. See my other email that submits a potential commit to remove 127.0.0.1 support for 2.6.33. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B707F2D.3030508-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B707F2D.3030508-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-08 21:56 ` Jeff Squyres [not found] ` <41CC15C4-0200-4C9E-9E10-3D2A9B76D16B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Jeff Squyres @ 2010-02-08 21:56 UTC (permalink / raw) To: Steve Wise Cc: Sean Hefty, Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ Sorry -- I missed many of these mails today due to mail filtering (don't ask). FWIW: - I'm not opposed to adding LOOPBACK checks into OMPI to avoid this problem (I'm waiting for a patch, actually). I'm just saying that we're not going to get a release out immediately with this fix. Our next release was scheduled to be 1.4.2, and it is still at least several weeks away. So allowing this in 2.6.33 would be Bad because a) we know it breaks OMPI, and b) OMPI can't get a release out immediately to fix the issue. - There are customers who are using RDMA CM with IB (e.g., Sandia with their Mesh/IB routing stuff). - I see the following in rdma_bind_addr(3): ----- DESCRIPTION Associates a source address with an rdma_cm_id. The address may be wildcarded. If binding to a specific local address, the rdma_cm_id will also be bound to a local RDMA device. ----- What RDMA device is bound to when you use 127.0.0.1? I'm not 100% sure, but I think that this might be where we got the rationale that we didn't need additional LOOPBACK tests in OMPI... (if anyone else agrees with this interpretation, then it's at least one argument that allowing binding to LOOPBACK devices *is* a change in semantics, and therefore should be treated extremely carefully) On Feb 8, 2010, at 4:16 PM, Steve Wise wrote: > > Sean Hefty wrote: > >> IMO 127.0.0.1 should be for SW loopback, not HW RDMA loopback. > >> > > > > I disagree, but what does it matter? So, we add a 'software' loopback that uses > > 127.0.0.1. Openmpi still wouldn't work. > > > > > > I guess that's true. > > >> I will commit to get the fix in openmpi asap. > >> > > > > If we don't care if the fix is in the kernel or user space, then we could add an > > a 'disable-loopback-support' build option to librdmacm, which can fail any > > attempt to bind to a loopback address. > > > > > > I'd rather see it removed from 2.6.33 kernel before it shipts, and then > we fix openmpi, and then re-submit 127.0.0.1 support once openmpi > publishes a release with its fix. See my other email that submits a > potential commit to remove 127.0.0.1 support for 2.6.33. > > Steve. > -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <41CC15C4-0200-4C9E-9E10-3D2A9B76D16B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <41CC15C4-0200-4C9E-9E10-3D2A9B76D16B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-02-08 22:09 ` Jason Gunthorpe [not found] ` <20100208220903.GW16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-02-08 22:13 ` Sean Hefty 1 sibling, 1 reply; 72+ messages in thread From: Jason Gunthorpe @ 2010-02-08 22:09 UTC (permalink / raw) To: Jeff Squyres Cc: Steve Wise, Sean Hefty, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ On Mon, Feb 08, 2010 at 04:56:23PM -0500, Jeff Squyres wrote: > DESCRIPTION > Associates a source address with an rdma_cm_id. The address may be > wildcarded. If binding to a specific local address, the rdma_cm_id > will also be bound to a local RDMA device. > What RDMA device is bound to when you use 127.0.0.1? I'm not 100% > sure, but I think that this might be where we got the rationale that > we didn't need additional LOOPBACK tests in OMPI... (if anyone else > agrees with this interpretation, then it's at least one argument > that allowing binding to LOOPBACK devices *is* a change in > semantics, and therefore should be treated extremely carefully) This statement is trying to say that if a source address is given then the rdma_cm_id will be bound to a device. Designating which APIs bind the device is important for the API user, once the device is bound you can allocate resource against it. It doesn't matter which device is picked, that is up to the kernel. For instance if the the same IP is assigned to multiple RDMA devices then the kernel will pick one. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <20100208220903.GW16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <20100208220903.GW16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-02-08 22:11 ` Jeff Squyres 0 siblings, 0 replies; 72+ messages in thread From: Jeff Squyres @ 2010-02-08 22:11 UTC (permalink / raw) To: Jason Gunthorpe Cc: Steve Wise, Sean Hefty, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ On Feb 8, 2010, at 5:09 PM, Jason Gunthorpe wrote: >> DESCRIPTION >> Associates a source address with an rdma_cm_id. The address may be >> wildcarded. If binding to a specific local address, the rdma_cm_id >> will also be bound to a local RDMA device. > This statement is trying to say that if a source address is given then > the rdma_cm_id will be bound to a device. Which device is bound to if you specify 127.0.0.1 as the source address? (which is what OMPI is doing) Is it possible to assign 127.0.0.1 to an RDMA device? -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rdma/cm: disallow loopback address for iwarp devices [not found] ` <41CC15C4-0200-4C9E-9E10-3D2A9B76D16B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 2010-02-08 22:09 ` Jason Gunthorpe @ 2010-02-08 22:13 ` Sean Hefty [not found] ` <7CC17592BE414EFCA00A1CB6033D047A-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> [not found] ` <44864D85-03D1-412E-906C-D6FF9 04157C8@cisco.com> 1 sibling, 2 replies; 72+ messages in thread From: Sean Hefty @ 2010-02-08 22:13 UTC (permalink / raw) To: 'Jeff Squyres', Steve Wise Cc: Jason Gunthorpe, Linux RDMA List, Roland Dreier (rdreier), ewg-G2znmakfqn7U1rindQTSdQ >What RDMA device is bound to when you use 127.0.0.1? I'm not 100% sure, but I >think that this might be where we got the rationale that we didn't need >additional LOOPBACK tests in OMPI... (if anyone else agrees with this >interpretation, then it's at least one argument that allowing binding to >LOOPBACK devices *is* a change in semantics, and therefore should be treated >extremely carefully) Are you certain that rdma_bind_addr does NOT work with 127.0.0.1, and that this is now the problem? It does appear to work on OFED 1.4 and on 2.6.26 based on ucmatose. Is the problem really with rdma_bind_addr succeeding, or with rdma_connect, which now works, or rdma_bind_addr now assigning a device? - Sean ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <7CC17592BE414EFCA00A1CB6033D047A-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <7CC17592BE414EFCA00A1CB6033D047A-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-08 22:17 ` Jeff Squyres 0 siblings, 0 replies; 72+ messages in thread From: Jeff Squyres @ 2010-02-08 22:17 UTC (permalink / raw) To: Sean Hefty Cc: Steve Wise, Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ On Feb 8, 2010, at 5:13 PM, Sean Hefty wrote: > Are you certain that rdma_bind_addr does NOT work with 127.0.0.1, and that this > is now the problem? > > It does appear to work on OFED 1.4 and on 2.6.26 based on ucmatose. Is the > problem really with rdma_bind_addr succeeding, or with rdma_connect, which now > works, or rdma_bind_addr now assigning a device? On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to bind to 127.0.0.1 per the email I sent Friday: http://www.spinics.net/lists/linux-rdma/msg02568.html I have not checked any other combinations; Steve was saying that he saw it rdma_bind_addr() succeeding on his machines with OFED 1.5.1rcwhatever (I don't recall the OS he said he was using). -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <44864D85-03D1-412E-906C-D6FF9 04157C8@cisco.com>]
[parent not found: <44864D85-03D1-412E-906C-D6FF904157C8-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* RE: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <44864D85-03D1-412E-906C-D6FF904157C8-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-02-08 22:26 ` Sean Hefty [not found] ` <F533284C543140B0994C54C83C4AFF2B-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-08 22:26 UTC (permalink / raw) To: 'Jeff Squyres' Cc: Steve Wise, Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ >On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to >bind to 127.0.0.1 per the email I sent Friday: > > http://www.spinics.net/lists/linux-rdma/msg02568.html This is what I see over IB on 2.6.26, with a couple extra prints added to cmatose: cst-lin1:/home/mshefty/librdmacm# examples/ucmatose -b 127.0.0.1 cmatose: starting server src addr 0x100007f rdma_bind_addr: 0 so we're missing something else. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <F533284C543140B0994C54C83C4AFF2B-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <F533284C543140B0994C54C83C4AFF2B-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-08 22:28 ` Steve Wise [not found] ` <4B709008.9020902-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-08 22:28 UTC (permalink / raw) To: Sean Hefty Cc: 'Jeff Squyres', Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds. I don't understand the difference yet... Sean Hefty wrote: >> On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when attempting to >> bind to 127.0.0.1 per the email I sent Friday: >> >> http://www.spinics.net/lists/linux-rdma/msg02568.html >> > > This is what I see over IB on 2.6.26, with a couple extra prints added to > cmatose: > > cst-lin1:/home/mshefty/librdmacm# examples/ucmatose -b 127.0.0.1 > cmatose: starting server > src addr 0x100007f > rdma_bind_addr: 0 > > so we're missing something else. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B709008.9020902-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* RE: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B709008.9020902-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-08 23:48 ` Sean Hefty [not found] ` <1966FBDAD40C4EAC8611372D2B15AE84-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-09 0:30 ` Pradeep Satyanarayana 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-08 23:48 UTC (permalink / raw) To: 'Steve Wise' Cc: 'Jeff Squyres', Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ >Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds. >I don't understand the difference yet... I believe the issue is that rdma_bind_addr succeeds (returns 0), but no device is assigned to the rdma_cm_id (verbs field is NULL). This was a change from commit 6f8372b69c3198e06cecb1df2cb9682d0c55e657: The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdma_bind_addr, no device is associated with the rdma_cm_id. Fix this. There are two places where rdma_bind_addr() is called in the openmpi source code (based on a tarball download of the trunk). One is btl_openib_iwarp.c: rc = rdma_bind_addr(cm_id, ipaddr); if (rc || !cm_id->verbs) { rc = OMPI_SUCCESS; goto out3; } The other is btl_openib_connect_rdmacm.c, but that deals with listening. I can't quickly determine if btl_openib_iwarp.c is usually used for IB or not. So, to fully keep the behavior of 2.6.32, rdma_bind_addr for 127.0.0.1 should succeed, but not assign a device. I think this was the change from commit ..c55e657 that changed the behavior: @@ -2089,7 +2096,9 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND)) return -EINVAL; - if (!cma_any_addr(addr)) { + if (cma_loopback_addr(addr)) { + ret = cma_bind_loopback(id_priv); + } else if (!cma_zero_addr(addr)) { ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); if (ret) goto err1; I'll see if reverting this gives the desired(?) behavior. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <1966FBDAD40C4EAC8611372D2B15AE84-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <1966FBDAD40C4EAC8611372D2B15AE84-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-09 0:28 ` Jeff Squyres 0 siblings, 0 replies; 72+ messages in thread From: Jeff Squyres @ 2010-02-09 0:28 UTC (permalink / raw) To: Sean Hefty Cc: Steve Wise, Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ On Feb 8, 2010, at 6:48 PM, Sean Hefty wrote: > rc = rdma_bind_addr(cm_id, ipaddr); > if (rc || !cm_id->verbs) { > rc = OMPI_SUCCESS; > goto out3; > } Ah, yes! Per the OMPI code you cited, I amended my printf's and see: [svbu-mpi.cisco.com:19315] FAILED to bind to 127.0.0.1: rc=0, verbs=(nil) So the rc from from rdma_bind_addr was 0, but you're right that the verbs pointer was NULL, and we therefore rule that it was no good. > The other is btl_openib_connect_rdmacm.c, but that deals with listening. I > can't quickly determine if btl_openib_iwarp.c is usually used for IB or not. It is. > So, to fully keep the behavior of 2.6.32, rdma_bind_addr for 127.0.0.1 should > succeed, but not assign a device. I think this was the change from commit > ..c55e657 that changed the behavior: > > @@ -2089,7 +2096,9 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr > *addr) > if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND)) > return -EINVAL; > > - if (!cma_any_addr(addr)) { > + if (cma_loopback_addr(addr)) { > + ret = cma_bind_loopback(id_priv); > + } else if (!cma_zero_addr(addr)) { > ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); > if (ret) > goto err1; > > I'll see if reverting this gives the desired(?) behavior. Thanks! -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B709008.9020902-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-08 23:48 ` Sean Hefty @ 2010-02-09 0:30 ` Pradeep Satyanarayana [not found] ` <4B70ACB6.5070008-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Pradeep Satyanarayana @ 2010-02-09 0:30 UTC (permalink / raw) To: Steve Wise Cc: Sean Hefty, 'Jeff Squyres', Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ Steve Wise wrote: > Sean, can you try openmpi? It fails for me, and yet ucmatose succeeds. > I don't understand the difference yet... > > > Sean Hefty wrote: >>> On my OFED 1.4.1 RHEL4u6 systems, rdma_bind_addr() fails when >>> attempting to >>> bind to 127.0.0.1 per the email I sent Friday: >>> >>> http://www.spinics.net/lists/linux-rdma/msg02568.html >>> >> >> This is what I see over IB on 2.6.26, with a couple extra prints added to >> cmatose: >> >> cst-lin1:/home/mshefty/librdmacm# examples/ucmatose -b 127.0.0.1 >> cmatose: starting server >> src addr 0x100007f >> rdma_bind_addr: 0 >> >> so we're missing something else. >> Hi Steve, I am attempting to duplicate the problem that you reported with today's OFED build (on Sles11, if that matters). I have rarely used openMPI, so suggestions would be helpful. Here is what I see: elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring -------------------------------------------------------------------------- mpirun was unable to launch the specified application as it could not find an executable: Executable: ring Node: elm3b199 while attempting to start process rank 0. -------------------------------------------------------------------------- elm3b199:/usr/lib # Incidentally tvflash did not build (this is a ppc64 machine). Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B70ACB6.5070008-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B70ACB6.5070008-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2010-02-09 0:45 ` Jeff Squyres [not found] ` <FE273021-D385-45EE-9376-6479A92211AF-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Jeff Squyres @ 2010-02-09 0:45 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Steve Wise, Sean Hefty, Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote: > elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring > -------------------------------------------------------------------------- > mpirun was unable to launch the specified application as it could not find an executable: > > Executable: ring > Node: elm3b199 > > while attempting to start process rank 0. > -------------------------------------------------------------------------- > elm3b199:/usr/lib # Is there an executable named "ring" either in your $PATH or in /usr/lib? Open MPI is telling you it can't find an executable named "ring". -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <FE273021-D385-45EE-9376-6479A92211AF-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <FE273021-D385-45EE-9376-6479A92211AF-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-02-09 0:50 ` Pradeep Satyanarayana [not found] ` <4B70B152.4080308-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Pradeep Satyanarayana @ 2010-02-09 0:50 UTC (permalink / raw) To: Jeff Squyres Cc: Steve Wise, Sean Hefty, Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ Jeff Squyres wrote: > On Feb 8, 2010, at 7:30 PM, Pradeep Satyanarayana wrote: > >> elm3b199:/usr/lib # /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring >> -------------------------------------------------------------------------- >> mpirun was unable to launch the specified application as it could not find an executable: >> >> Executable: ring >> Node: elm3b199 >> >> while attempting to start process rank 0. >> -------------------------------------------------------------------------- >> elm3b199:/usr/lib # > > Is there an executable named "ring" either in your $PATH or in /usr/lib? > > Open MPI is telling you it can't find an executable named "ring". Hi Jeff, No, there is none. I got this command from one of the mails in the thread. What should I use instead? Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B70B152.4080308-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [ewg] rdma/cm: disallow loopback address for iwarp devices [not found] ` <4B70B152.4080308-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2010-02-09 1:02 ` Jeff Squyres 0 siblings, 0 replies; 72+ messages in thread From: Jeff Squyres @ 2010-02-09 1:02 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Steve Wise, Sean Hefty, Jason Gunthorpe, Roland Dreier (rdreier), Tziporet Koren, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ, Brad Benton [-- Attachment #1: Type: text/plain, Size: 1480 bytes --] On Feb 8, 2010, at 7:50 PM, Pradeep Satyanarayana wrote: > No, there is none. I got this command from one of the mails in the thread. What should I use instead? You need to compile and run an MPI program. "ring" is a typical test program that sends a message around in a ring. I think that OFED installs those test apps somewhere, but I don't recall where offhand. ring_c.c is attached. Compile it with: mpicc ring_c.c -o ring (you might need the full path to mpicc if it's not in your path?) A better mpirun command line would be: /usr/mpi/gcc/openmpi-1.4.1/bin/mpirun -np 2 --host HOSTNAME1,HOSTNAME2 \ --mca btl openib,sm,self --mca btl_openib_cpc_include rdmacm ring Put in your own HOSTNAME1 and HOSTNAME2 values. You'll also need to ensure that both Open MPI and "ring" are available on both names (preferably in the same filesystem locations on both nodes, for simplicity) and that you can ssh to from one node to the other without being prompted for a password or passphrase. This will run a 2-process MPI job across the two nodes, passing a message between the two processes a few times before quitting. The various --mca parameters on this mpirun command line ensure that you are definitely using the OpenFabrics verbs support and forcing the use of RDMA CM. -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ [-- Attachment #2: ring_c.c --] [-- Type: application/octet-stream, Size: 2418 bytes --] /* * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana * University Research and Technology * Corporation. All rights reserved. * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * * Simple ring test program */ #include <stdio.h> #include "mpi.h" int main(int argc, char *argv[]) { int rank, size, next, prev, message, tag = 201; /* Start up MPI */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); /* Calculate the rank of the next process in the ring. Use the modulus operator so that the last process "wraps around" to rank zero. */ next = (rank + 1) % size; prev = (rank + size - 1) % size; /* If we are the "master" process (i.e., MPI_COMM_WORLD rank 0), put the number of times to go around the ring in the message. */ if (0 == rank) { message = 10; printf("Process 0 sending %d to %d, tag %d (%d processes in ring)\n", message, next, tag, size); MPI_Send(&message, 1, MPI_INT, next, tag, MPI_COMM_WORLD); printf("Process 0 sent to %d\n", next); } /* Pass the message around the ring. The exit mechanism works as follows: the message (a positive integer) is passed around the ring. Each time it passes rank 0, it is decremented. When each processes receives a message containing a 0 value, it passes the message on to the next process and then quits. By passing the 0 message first, every process gets the 0 message and can quit normally. */ while (1) { MPI_Recv(&message, 1, MPI_INT, prev, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE); if (0 == rank) { --message; printf("Process 0 decremented value: %d\n", message); } MPI_Send(&message, 1, MPI_INT, next, tag, MPI_COMM_WORLD); if (0 == message) { printf("Process %d exiting\n", rank); break; } } /* The last process does one extra send to process 0, which needs to be received before the program can exit */ if (0 == rank) { MPI_Recv(&message, 1, MPI_INT, prev, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } /* All done */ MPI_Finalize(); return 0; } ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <adawrynwtz9.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 2010-02-08 19:19 ` Jason Gunthorpe @ 2010-02-09 0:41 ` Sean Hefty [not found] ` <421D3D6710E847C5B7CAC00EB73117C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Sean Hefty @ 2010-02-09 0:41 UTC (permalink / raw) To: 'Roland Dreier', Steve Wise Cc: tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Revert the following change from commit 6f8372b69c3198e06cecb1df2cb9682d0c55e657: The defined behavior of rdma_bind_addr is to associate an RDMA device with an rdma_cm_id, as long as the user specified a non- zero address. (ie they weren't just trying to reserve a port) Currently, if the loopback address is passed to rdma_bind_addr, no device is associated with the rdma_cm_id. Fix this. It turns out that openmpi depends on rdma_bind_addr NOT associating any RDMA device when binding to a loopback address. Openmpi is being updated to correct this, but until a new openmpi release is available, maintain the previous behavior: allow rdma_bind_addr to succeed, but do not bind to a device. Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- I believe this patch comes the closest to matching the behavior of rdma_bind_addr from 2.6.32 and before without completely reverting all support for loopback connections. Loopback connections will work, but rdma_bind_addr will not automatically select an RDMA device if given a loopback address. Steve, can you test this with iwarp and openmpi, and ack it if it works for you? drivers/infiniband/core/cma.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc9b594..875e34e 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2115,9 +2115,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) if (ret) goto err1; - if (cma_loopback_addr(addr)) { - ret = cma_bind_loopback(id_priv); - } else if (!cma_zero_addr(addr)) { + if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); if (ret) goto err1; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 72+ messages in thread
[parent not found: <421D3D6710E847C5B7CAC00EB73117C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <421D3D6710E847C5B7CAC00EB73117C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2010-02-09 15:29 ` Steve Wise [not found] ` <4B717F5D.8020403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-10 18:10 ` [PATCH] [for-2.6.33] " Roland Dreier 1 sibling, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-09 15:29 UTC (permalink / raw) To: Sean Hefty Cc: 'Roland Dreier', tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ, Jeff Squyres This patch works. It also backports cleanly to ofed-1.5.1/RH5.3. Acked-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> Steve. Sean Hefty wrote: > Revert the following change from commit > 6f8372b69c3198e06cecb1df2cb9682d0c55e657: > > The defined behavior of rdma_bind_addr is to associate an RDMA > device with an rdma_cm_id, as long as the user specified a non- > zero address. (ie they weren't just trying to reserve a port) > Currently, if the loopback address is passed to rdma_bind_addr, > no device is associated with the rdma_cm_id. Fix this. > > It turns out that openmpi depends on rdma_bind_addr NOT associating > any RDMA device when binding to a loopback address. Openmpi is > being updated to correct this, but until a new openmpi release is > available, maintain the previous behavior: allow rdma_bind_addr to > succeed, but do not bind to a device. > > Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > --- > I believe this patch comes the closest to matching the behavior of > rdma_bind_addr from 2.6.32 and before without completely reverting > all support for loopback connections. Loopback connections will > work, but rdma_bind_addr will not automatically select an RDMA > device if given a loopback address. > > Steve, can you test this with iwarp and openmpi, and ack it if it > works for you? > > drivers/infiniband/core/cma.c | 4 +--- > 1 files changed, 1 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index cc9b594..875e34e 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -2115,9 +2115,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) > if (ret) > goto err1; > > - if (cma_loopback_addr(addr)) { > - ret = cma_bind_loopback(id_priv); > - } else if (!cma_zero_addr(addr)) { > + if (!cma_any_addr(addr)) { > ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); > if (ret) > goto err1; > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B717F5D.8020403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <4B717F5D.8020403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-09 16:15 ` Pradeep Satyanarayana [not found] ` <4B718A2C.2030602-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 72+ messages in thread From: Pradeep Satyanarayana @ 2010-02-09 16:15 UTC (permalink / raw) To: Steve Wise Cc: Sean Hefty, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Steve Wise wrote: > This patch works. It also backports cleanly to ofed-1.5.1/RH5.3. > > Acked-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> > > Steve. Steve, Was this tested against both iWARP and IB? Thanks Pradeep -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B718A2C.2030602-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <4B718A2C.2030602-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2010-02-09 16:18 ` Steve Wise [not found] ` <4B718ADB.5020602-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2010-02-09 22:01 ` Jeff Squyres 1 sibling, 1 reply; 72+ messages in thread From: Steve Wise @ 2010-02-09 16:18 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Sean Hefty, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Pradeep Satyanarayana wrote: > Steve Wise wrote: > >> This patch works. It also backports cleanly to ofed-1.5.1/RH5.3. >> >> Acked-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> >> >> Steve. >> > Steve, Was this tested against both iWARP and IB? > > No. I only tested in on T3/iWARP. I don't have an IB setup available for testing. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4B718ADB.5020602-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* RE: [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <4B718ADB.5020602-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2010-02-09 16:23 ` Sean Hefty 0 siblings, 0 replies; 72+ messages in thread From: Sean Hefty @ 2010-02-09 16:23 UTC (permalink / raw) To: 'Steve Wise', Pradeep Satyanarayana Cc: tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ >No. I only tested in on T3/iWARP. I don't have an IB setup available >for testing. Before posting the patch, I tested with IB on 2.6.33-rc6 and verified that the results of rdma_bind_addr was as expected (returned 0, with no device attached). I didn't see any side effects with limited testing. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <4B718A2C.2030602-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2010-02-09 16:18 ` Steve Wise @ 2010-02-09 22:01 ` Jeff Squyres [not found] ` <4FA7F42E-308A-4A4D-82D8-87794CB8C4DE-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Jeff Squyres @ 2010-02-09 22:01 UTC (permalink / raw) To: Pradeep Satyanarayana Cc: Linux RDMA List, Sean Hefty, ewg-G2znmakfqn7U1rindQTSdQ Open MPI also now checks for 127.0.0.1/8 and skips them. This behavior will be included in the upcoming Open MPI v1.4.2 (possibly within a few weeks?) and Open MPI v1.5.0. Two followup questions: 1. Is this now the recommended way to find all the IP interfaces that support RDMA: - loop over all local IP addresses - if 127.0.0.1/8, skip - try to rdma_bind_addr() - if it succeeds and verbs ptr is != NULL, it's an RDMA device (I believe Steve Wise proposed adding an API function to just return a list of IP addresses of RDMA devices a while back; it was rejected, which is why either we use the try-to-rdma_bind_addr() approach) 2. Before Sean backed out the localhost behavior, when you rdma_addr_bind(127.0.0.1), what did the id->verbs pointer correspond to? On Feb 9, 2010, at 11:15 AM, Pradeep Satyanarayana wrote: > Steve Wise wrote: > > This patch works. It also backports cleanly to ofed-1.5.1/RH5.3. > > > > Acked-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> > > > > Steve. > Steve, Was this tested against both iWARP and IB? > > Thanks > Pradeep > > _______________________________________________ > ewg mailing list > ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > -- Jeff Squyres jsquyres-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <4FA7F42E-308A-4A4D-82D8-87794CB8C4DE-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: [ewg] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <4FA7F42E-308A-4A4D-82D8-87794CB8C4DE-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-02-09 22:17 ` Jason Gunthorpe 2010-02-09 22:20 ` Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Jason Gunthorpe @ 2010-02-09 22:17 UTC (permalink / raw) To: Jeff Squyres Cc: Pradeep Satyanarayana, Steve Wise, Linux RDMA List, Sean Hefty, ewg-G2znmakfqn7U1rindQTSdQ On Tue, Feb 09, 2010 at 05:01:21PM -0500, Jeff Squyres wrote: > 1. Is this now the recommended way to find all the IP interfaces that support RDMA: > > - loop over all local IP addresses > - if 127.0.0.1/8, skip > - try to rdma_bind_addr() > - if it succeeds and verbs ptr is != NULL, it's an RDMA device RDMA is not special, it is just like any other IP service. RDMA is supported on loopback. To find the list of RMDA capable IPs you do the rdma_bind_addr test. You then have to transform that list exactly as you would for TCP to get a list of candidate addresses that could be used for remote connection. This means removing loopback, doing someting about link local addresses, and matching as necessary IPs to interfaces, to networks, and to source IPs on the connecting side. It isn't trivial, but it is exactly the same as for TCP. I suppose ideally OMPI would use the same codes for both TCP and RDMACM - and it should have user configurables! It is worth reviewing what the OMPI TCP does and at least checking that the RDMACM hits all the same points. > 2. Before Sean backed out the localhost behavior, when you > rdma_addr_bind(127.0.0.1), what did the id->verbs pointer > correspond to? One of the RDMA verbs devices in the system. The API does not define which one the kernel will select. I think the current patches simply picked the first one. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [ewg] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <4FA7F42E-308A-4A4D-82D8-87794CB8C4DE-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 2010-02-09 22:17 ` [ewg] " Jason Gunthorpe @ 2010-02-09 22:20 ` Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Sean Hefty @ 2010-02-09 22:20 UTC (permalink / raw) To: 'Jeff Squyres', Pradeep Satyanarayana Cc: Steve Wise, Linux RDMA List, ewg-G2znmakfqn7U1rindQTSdQ >1. Is this now the recommended way to find all the IP interfaces that support >RDMA: > >- loop over all local IP addresses >- if 127.0.0.1/8, skip Include ipv6 in the checks. >- try to rdma_bind_addr() >- if it succeeds and verbs ptr is != NULL, it's an RDMA device The intent is that rdma_bind_addr should bind to an RDMA device as long as the address isn't a wildcard, so that resource can be allocated. If an address is passed into rdma_bind_addr and the call returns success, then verbs ptr 'should' be set. (This is the bug that was fixed, then reverted.) >(I believe Steve Wise proposed adding an API function to just return a list of >IP addresses of RDMA devices a while back; it was rejected Not sure why the idea it was rejected. I can't think of a problem with returning a list of devices/ports and the corresponding IP addresses. Although, I'd expect the implementation to match what you have above. At the very least it could reduce code duplication in librdmacm users. >2. Before Sean backed out the localhost behavior, when you >rdma_addr_bind(127.0.0.1), what did the id->verbs pointer correspond to? id->verbs referenced the first device found with an active port, or the first device if no ports were active. This behavior was never updated when iwarp was added to the stack. Note that this patch reverted the behavior of rdma_bind_addr(127.0.0.1), but other new functionality was left in. The result is that rdma_connect(127.0.0.1) will still work as long as the selected device can support loopback. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <421D3D6710E847C5B7CAC00EB73117C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-09 15:29 ` Steve Wise @ 2010-02-10 18:10 ` Roland Dreier [not found] ` <ada6365hsgm.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 1 sibling, 1 reply; 72+ messages in thread From: Roland Dreier @ 2010-02-10 18:10 UTC (permalink / raw) To: Sean Hefty; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ OK, I'm planning on sending this upstream later today. Looks very small and simple, and then we can figure our what if anything we want to do for 2.6.34. Make sense for everyone? - R. -- Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html ^ permalink raw reply [flat|nested] 72+ messages in thread
[parent not found: <ada6365hsgm.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>]
* Re: [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <ada6365hsgm.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> @ 2010-02-10 18:18 ` Steve Wise 2010-02-10 19:13 ` Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Steve Wise @ 2010-02-10 18:18 UTC (permalink / raw) To: Roland Dreier Cc: Sean Hefty, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ Roland Dreier wrote: > OK, I'm planning on sending this upstream later today. Looks very small > and simple, and then we can figure our what if anything we want to do > for 2.6.34. > > Make sense for everyone? > > - R. > Yes. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback [not found] ` <ada6365hsgm.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org> 2010-02-10 18:18 ` Steve Wise @ 2010-02-10 19:13 ` Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Sean Hefty @ 2010-02-10 19:13 UTC (permalink / raw) To: 'Roland Dreier' Cc: Steve Wise, tziporet-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ >OK, I'm planning on sending this upstream later today. Looks very small >and simple, and then we can figure our what if anything we want to do >for 2.6.34. > >Make sense for everyone? yes - thanks -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* RE: [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices [not found] ` <79BAA34231304F1E84C5A5A53C50A207-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2010-02-08 11:52 ` [ewg] " Tziporet Koren @ 2010-02-09 16:32 ` Sean Hefty 1 sibling, 0 replies; 72+ messages in thread From: Sean Hefty @ 2010-02-09 16:32 UTC (permalink / raw) To: Hefty, Sean, 'Steve Wise', Roland Dreier (rdreier) Cc: 'Jeff Squyres (jsquyres)', linux-rdma-u79uwXL29TY76Z2rM5mHXA, ewg-G2znmakfqn7U1rindQTSdQ This patch should be dropped. The current proposed fix is: rdma/cm: revert associating an RDMA device when binding to loopback -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2010-02-10 19:13 UTC | newest]
Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-05 11:32 bug 1918 - openmpi broken due to rdma-cm changes Jeff Squyres (jsquyres)
[not found] ` <58D723FE08DC6A4398E6596E38F3FA170566DA-2KNrN6/GZtCAsgjym8flbKBKnGwkPULj@public.gmane.org>
2010-02-05 16:16 ` Steve Wise
[not found] ` <4B6C4460.3050908-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 16:45 ` Steve Wise
2010-02-05 17:51 ` Roland Dreier
[not found] ` <ada4olvefl4.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-05 17:58 ` Jeff Squyres
[not found] ` <324EFA68-12F6-46E9-B876-7F4847B53224-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-05 18:32 ` Steve Wise
[not found] ` <4B6C6453.9090706-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 18:49 ` Roland Dreier
2010-02-05 18:56 ` Jason Gunthorpe
[not found] ` <20100205185616.GS16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-02-05 20:08 ` Jeff Squyres
[not found] ` <E8FF8BD1-80AC-4AA7-BC2A-CE7547FB9ABA-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-05 21:14 ` Jason Gunthorpe
[not found] ` <20100205211455.GT16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-02-05 21:40 ` Jeff Squyres
[not found] ` <697C6107-13A9-48E3-B451-02529305100D-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-05 21:53 ` Steve Wise
[not found] ` <4B6C9369.1070208-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 22:15 ` Sean Hefty
[not found] ` <77E29960440B4806B112A7158F4FA1C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 22:21 ` Steve Wise
2010-02-06 16:18 ` Steve Wise
2010-02-05 22:20 ` Jeff Squyres
2010-02-06 0:54 ` Roland Dreier
2010-02-05 18:42 ` Sean Hefty
[not found] ` <3762D25FD9474444A4B3E2240EFB8D0E-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 19:01 ` Steve Wise
[not found] ` <4B6C6B23.4010704-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 19:24 ` Roland Dreier
2010-02-05 17:57 ` Jeff Squyres
2010-02-05 16:22 ` Sean Hefty
[not found] ` <0D5487526204477AA2ABED06E46768E2-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 16:38 ` Steve Wise
[not found] ` <4B6C498F.3060708-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-05 16:52 ` Sean Hefty
[not found] ` <F6DF49B759AD49EEB44BECD99FE26DCF-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-05 17:08 ` Steve Wise
2010-02-07 21:44 ` [ewg] " Tziporet Koren
[not found] ` <4B6F3451.2070304-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-02-08 5:38 ` Steve Wise
2010-02-05 20:09 ` Sean Hefty
[not found] ` <38B735478FE94F40BBA3E8BFD794B10F-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-06 16:31 ` Steve Wise
[not found] ` <4B6D9948.6040007-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-06 16:45 ` Steve Wise
2010-02-07 0:12 ` Sean Hefty
[not found] ` <B41CA82E76BB439B892B4874D38EA652-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-07 1:22 ` Steve Wise
[not found] ` <4B6E15C4.9020703-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-07 11:56 ` [ewg] " Tziporet Koren
[not found] ` <4B6EAA5F.1000208-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-02-07 16:39 ` Steve Wise
[not found] ` <4B6EECBE.6020509-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-07 16:48 ` Roland Dreier
[not found] ` <ada4oltxa8j.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-07 17:42 ` Steve Wise
2010-02-08 5:27 ` [ewg] " Sean Hefty
2010-02-08 11:52 ` Tziporet Koren
[not found] ` <4B6FFB07.1070701-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-02-08 14:29 ` Steve Wise
2010-02-08 6:02 ` [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices Sean Hefty
[not found] ` <79BAA34231304F1E84C5A5A53C50A207-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-08 11:52 ` [ewg] " Tziporet Koren
[not found] ` <4B6FFB1B.40905-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
2010-02-08 14:29 ` Steve Wise
[not found] ` <4B701FE6.60302-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-08 16:52 ` [ewg] " Roland Dreier
[not found] ` <adawrynwtz9.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-08 19:19 ` Jason Gunthorpe
[not found] ` <20100208191927.GU16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-02-08 20:02 ` Steve Wise
[not found] ` <4B706DED.9080403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-08 20:33 ` Sean Hefty
[not found] ` <C8A2C57AD5FA4141860DBFF60BFDE2DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-08 21:16 ` Steve Wise
[not found] ` <4B707F2D.3030508-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-08 21:56 ` [ewg] " Jeff Squyres
[not found] ` <41CC15C4-0200-4C9E-9E10-3D2A9B76D16B-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-08 22:09 ` Jason Gunthorpe
[not found] ` <20100208220903.GW16490-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-02-08 22:11 ` Jeff Squyres
2010-02-08 22:13 ` Sean Hefty
[not found] ` <7CC17592BE414EFCA00A1CB6033D047A-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-08 22:17 ` [ewg] " Jeff Squyres
[not found] ` <44864D85-03D1-412E-906C-D6FF9 04157C8@cisco.com>
[not found] ` <44864D85-03D1-412E-906C-D6FF904157C8-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-08 22:26 ` Sean Hefty
[not found] ` <F533284C543140B0994C54C83C4AFF2B-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-08 22:28 ` Steve Wise
[not found] ` <4B709008.9020902-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-08 23:48 ` Sean Hefty
[not found] ` <1966FBDAD40C4EAC8611372D2B15AE84-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-09 0:28 ` Jeff Squyres
2010-02-09 0:30 ` Pradeep Satyanarayana
[not found] ` <4B70ACB6.5070008-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-02-09 0:45 ` Jeff Squyres
[not found] ` <FE273021-D385-45EE-9376-6479A92211AF-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-09 0:50 ` Pradeep Satyanarayana
[not found] ` <4B70B152.4080308-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-02-09 1:02 ` Jeff Squyres
2010-02-09 0:41 ` [PATCH] [for-2.6.33] rdma/cm: revert associating an RDMA device when binding to loopback Sean Hefty
[not found] ` <421D3D6710E847C5B7CAC00EB73117C4-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-02-09 15:29 ` Steve Wise
[not found] ` <4B717F5D.8020403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-09 16:15 ` Pradeep Satyanarayana
[not found] ` <4B718A2C.2030602-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-02-09 16:18 ` Steve Wise
[not found] ` <4B718ADB.5020602-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-09 16:23 ` Sean Hefty
2010-02-09 22:01 ` Jeff Squyres
[not found] ` <4FA7F42E-308A-4A4D-82D8-87794CB8C4DE-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-02-09 22:17 ` [ewg] " Jason Gunthorpe
2010-02-09 22:20 ` Sean Hefty
2010-02-10 18:10 ` [PATCH] [for-2.6.33] " Roland Dreier
[not found] ` <ada6365hsgm.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-10 18:18 ` Steve Wise
2010-02-10 19:13 ` Sean Hefty
2010-02-09 16:32 ` [PATCH] [for-2.6.33] rdma/cm: disallow loopback address for iwarp devices Sean Hefty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox