* [PATCH] link-local address fix for rdma_resolve_addr
@ 2009-10-19 22:47 David J. Wilder
[not found] ` <1255992430.12075.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-19 22:47 UTC (permalink / raw)
To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5
Sean, Roland
Here is the updated patch that Jason and I discussed last week.
rdma_resolve_addr() returns an error when attempting to resolve ipv6
link-local address. This patch fixes the handling of link-local address.
The patch was tested using rping run as such:
Link-local with scope:
# /usr/bin/rping -c -a <remote-system-link-local>%ib0
Link-local w/out scope: (expect failure)
# /usr/bin/rping -c -a <remote-system-link-local>
rdma_resolve_addr error -1
Own interface link local:
# /usr/bin/rping -c -a <remote-system-link-local>%ib0
Other ipv6 address:
# /usr/bin/rping -c -a 2001:db8:1234::2
(server side started with rping -s -P -v -a ::0)
Tested against ofed build OFED-1.5-20091019-0811 and kernel 2.6.30.
Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
------------------------------------------------------
drivers/infiniband/core/addr.c | 25 ++++++++++++++++++++++---
1 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..3442256 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
+ if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+ if (!src_in->sin6_scope_id)
+ return -EINVAL;
+ fl.oif = src_in->sin6_scope_id;
+ }
+ if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+ if (dst_in->sin6_scope_id) {
+ if (fl.oif && fl.oif != dst_in->sin6_scope_id)
+ return -EINVAL;
+ fl.oif = dst_in->sin6_scope_id;
+ }
+ if (!fl.oif)
+ return -EINVAL;
+ }
+
dst = ip6_route_output(&init_net, NULL, &fl);
if (!dst)
return ret;
@@ -390,14 +405,16 @@ static int addr_resolve_local(struct sockaddr *src_in,
case AF_INET6:
{
struct in6_addr *a;
+ int found = 0;
for_each_netdev(&init_net, dev)
if (ipv6_chk_addr(&init_net,
&((struct sockaddr_in6 *) dst_in)->sin6_addr,
- dev, 1))
+ dev, 1)) {
+ found = 1;
break;
-
- if (!dev)
+ }
+ if (!found)
return -EADDRNOTAVAIL;
a = &((struct sockaddr_in6 *) src_in)->sin6_addr;
@@ -406,6 +423,8 @@ static int addr_resolve_local(struct sockaddr *src_in,
src_in->sa_family = dst_in->sa_family;
((struct sockaddr_in6 *) src_in)->sin6_addr =
((struct sockaddr_in6 *) dst_in)->sin6_addr;
+ ((struct sockaddr_in6 *) src_in)->sin6_scope_id =
+ ((struct sockaddr_in6 *) dst_in)->sin6_scope_id;
ret = rdma_copy_addr(addr, dev, dev->dev_addr);
} else if (ipv6_addr_loopback(a)) {
ret = rdma_translate_ip(dst_in, addr);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 44+ messages in thread[parent not found: <1255992430.12075.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1255992430.12075.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-19 23:43 ` Jason Gunthorpe [not found] ` <20091019234329.GC9643-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-19 23:43 UTC (permalink / raw) To: David J. Wilder Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Mon, Oct 19, 2009 at 03:47:09PM -0700, David J. Wilder wrote: > +++ b/drivers/infiniband/core/addr.c > @@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in, > fl.nl_u.ip6_u.daddr = dst_in->sin6_addr; > fl.nl_u.ip6_u.saddr = src_in->sin6_addr; > > + if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) { > + if (!src_in->sin6_scope_id) > + return -EINVAL; > + fl.oif = src_in->sin6_scope_id; > + } Seeing it all together like this make it clear this test needs to move up the call chain and test the sockaddr passed from userspace, not the one created by addr_resolve_local. Probably somewhere along the rdma_resolve_addr -> cma_bind_addr -> rmda_bind_addr -> rdma_translate_ip path. Maybe rdma_translate_ip should use and check the scope as a temporary hack? BTW, while researching the above comment, I'm not certain your last patch is at all correct: commit 85f20b39fd44310a163a9b33708fea57f08a4e40 RDMA/addr: Fix resolution of local IPv6 addresses This patch allows a local IPv6 address to be resolved by rdma_cm. To reproduce the problem: $ rping -s -v -a ::0 & $ rping -c -v -a <IPv6 address local to this system> --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in, for_each_netdev(&init_net, dev) if (ipv6_chk_addr(&init_net, - &((struct sockaddr_in6 *) addr)->sin6_addr, + &((struct sockaddr_in6 *) dst_in)->sin6_addr, dev, 1)) break; I can believe it fixes the case you describe (ie loopback) but matching the *dest* IP against the local interface's IP list cannot possibly be right. The primary problem is that for_each_netdev/ipv6_chk_addr is NOT the same as ip_dev_find. ip_dev_find is a routing lookup, ipv6_chk_addr compares the local address list. Not at all the same. I don't see a route lookup helper for ipv6, so you have to code full flowi lookup. With your change I expect ipv6 is 100% broken now for non loop cases? Regards, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091019234329.GC9643-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091019234329.GC9643-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-19 23:47 ` Sean Hefty [not found] ` <676AB781CD644CC28E1AD4951EA4EEF8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-19 23:47 UTC (permalink / raw) To: 'Jason Gunthorpe', David J. Wilder Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 >@@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in, > > for_each_netdev(&init_net, dev) > if (ipv6_chk_addr(&init_net, >- &((struct sockaddr_in6 *) addr)- >>sin6_addr, >+ &((struct sockaddr_in6 *) dst_in)- >>sin6_addr, > dev, 1)) > break; > >I can believe it fixes the case you describe (ie loopback) but >matching the *dest* IP against the local interface's IP list cannot >possibly be right. The intent is to see if the destination address is local. A source address may not be given. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <676AB781CD644CC28E1AD4951EA4EEF8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <676AB781CD644CC28E1AD4951EA4EEF8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-20 0:33 ` Jason Gunthorpe [not found] ` <20091020003344.GA14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-20 0:33 UTC (permalink / raw) To: Sean Hefty Cc: David J. Wilder, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Mon, Oct 19, 2009 at 04:47:59PM -0700, Sean Hefty wrote: > >@@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in, > > > > for_each_netdev(&init_net, dev) > > if (ipv6_chk_addr(&init_net, > >- &((struct sockaddr_in6 *) addr)- > >>sin6_addr, > >+ &((struct sockaddr_in6 *) dst_in)- > >>sin6_addr, > > dev, 1)) > > break; > > > >I can believe it fixes the case you describe (ie loopback) but > >matching the *dest* IP against the local interface's IP list cannot > >possibly be right. > > The intent is to see if the destination address is local. A source > address may not be given. Well, that makes more sense, but it still pretty strange to match the IP list like that, the proper thing is to query RT6_TABLE_LOCAL, like the IPv4 case does. Anyhow, couldn't the whole addr_resolve_local routine be replaced with something like this in addr_resolve_remote: if (rt->idev == init_net->loopback_dev) rdma_translate_ip(rt->rt_src, dev_addr, NULL); for IPv4 and similar for IPv6? That does query the proper RT_TABLEs to determine if the IP is local and then we get the searching and ip_dev_find only for the case where the address is definitely looped back. Much closer to how the IP stack works normally. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091020003344.GA14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091020003344.GA14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-21 22:30 ` David J. Wilder [not found] ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-21 22:30 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma, Sean Hefty, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Mon, 2009-10-19 at 18:33 -0600, Jason Gunthorpe wrote: > Well, that makes more sense, but it still pretty strange to match the > IP list like that, the proper thing is to query RT6_TABLE_LOCAL, like > the IPv4 case does. > > Anyhow, couldn't the whole addr_resolve_local routine be replaced with > something like this in addr_resolve_remote: > if (rt->idev == init_net->loopback_dev) > rdma_translate_ip(rt->rt_src, dev_addr, NULL); > > for IPv4 and similar for IPv6? > > That does query the proper RT_TABLEs to determine if the IP is local > and then we get the searching and ip_dev_find only for the case where > the address is definitely looped back. Much closer to how the IP stack > works normally. > > Jason Jason, Sean I have separated the patches into three to make them simpler to discuss. One patch fixes the handling of link-local address in addr6_resolve_remote(). The second patch fixes the bugs in addr_resolve_local(). The 3d patch I am posting now for discussion. This patch, as Jason's suggested, moves the function of addr_resolve_local() into addr4_resolve_remote() and addr6_resolve_remote(). It eliminates the need for addr_resolve_local(). drivers/infiniband/core/addr.c | 99 +++++++-------------------------------- 1 files changed, 18 insertions(+), 81 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index bd07803..f7a5861 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -233,6 +233,14 @@ static int addr4_resolve_remote(struct sockaddr_in *src_in, if (ret) goto out; + if (rt->idev->dev == init_net.loopback_dev){ + ret = rdma_translate_ip((struct sockaddr *)dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, addr->src_dev_addr, + MAX_ADDR_LEN); + goto put; + } + /* If the device does ARP internally, return 'done' */ if (rt->idev->dev->flags & IFF_NOARP) { rdma_copy_addr(addr, rt->idev->dev, NULL); @@ -282,6 +290,14 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in, if (!dst) return ret; + if (dst->dev == init_net.loopback_dev) { + ret=rdma_translate_ip((struct sockaddr *)dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, addr->src_dev_addr, + MAX_ADDR_LEN); + goto release; + } + if (dst->dev->flags & IFF_NOARP) { ret = rdma_copy_addr(addr, dst->dev, NULL); } else { @@ -289,7 +305,7 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in, if (neigh && (neigh->nud_state & NUD_VALID)) ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); } - +release: dst_release(dst); return ret; } @@ -352,82 +368,6 @@ static void process_req(struct work_struct *work) } } -static int addr_resolve_local(struct sockaddr *src_in, - struct sockaddr *dst_in, - struct rdma_dev_addr *addr) -{ - struct net_device *dev; - int ret; - - switch (dst_in->sa_family) { - case AF_INET: - { - __be32 src_ip = ((struct sockaddr_in *) src_in)->sin_addr.s_addr; - __be32 dst_ip = ((struct sockaddr_in *) dst_in)->sin_addr.s_addr; - - dev = ip_dev_find(&init_net, dst_ip); - if (!dev) - return -EADDRNOTAVAIL; - - if (ipv4_is_zeronet(src_ip)) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in *) src_in)->sin_addr.s_addr = dst_ip; - ret = rdma_copy_addr(addr, dev, dev->dev_addr); - } else if (ipv4_is_loopback(src_ip)) { - ret = rdma_translate_ip(dst_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); - } else { - ret = rdma_translate_ip(src_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); - } - dev_put(dev); - break; - } - -#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) - case AF_INET6: - { - struct in6_addr *a; - - for_each_netdev(&init_net, dev) - if (ipv6_chk_addr(&init_net, - &((struct sockaddr_in6 *) dst_in)->sin6_addr, - dev, 1)) - break; - - if (!dev) - return -EADDRNOTAVAIL; - - a = &((struct sockaddr_in6 *) src_in)->sin6_addr; - - if (ipv6_addr_any(a)) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in6 *) src_in)->sin6_addr = - ((struct sockaddr_in6 *) dst_in)->sin6_addr; - ret = rdma_copy_addr(addr, dev, dev->dev_addr); - } else if (ipv6_addr_loopback(a)) { - ret = rdma_translate_ip(dst_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); - } else { - ret = rdma_translate_ip(src_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); - } - break; - } -#endif - - default: - ret = -EADDRNOTAVAIL; - break; - } - - return ret; -} - int rdma_resolve_ip(struct rdma_addr_client *client, struct sockaddr *src_addr, struct sockaddr *dst_addr, struct rdma_dev_addr *addr, int timeout_ms, @@ -455,10 +395,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client, src_in = (struct sockaddr *) &req->src_addr; dst_in = (struct sockaddr *) &req->dst_addr; - req->status = addr_resolve_local(src_in, dst_in, addr); - if (req->status == -EADDRNOTAVAIL) - req->status = addr_resolve_remote(src_in, dst_in, addr); - + req->status = addr_resolve_remote(src_in, dst_in, addr); switch (req->status) { case 0: req->timeout = jiffies; ^ permalink raw reply related [flat|nested] 44+ messages in thread
[parent not found: <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-21 23:08 ` Jason Gunthorpe [not found] ` <20091021230845.GR14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2009-10-21 23:17 ` [PATCH] link-local address fix for rdma_resolve_addr Sean Hefty 1 sibling, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-21 23:08 UTC (permalink / raw) To: David J. Wilder Cc: linux-rdma, Sean Hefty, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 21, 2009 at 03:30:29PM -0700, David J. Wilder wrote: > addr6_resolve_remote(). The second patch fixes the bugs in > addr_resolve_local(). The 3d patch I am posting now for discussion. > > This patch, as Jason's suggested, moves the function of > addr_resolve_local() into addr4_resolve_remote() and > addr6_resolve_remote(). It eliminates the need for > addr_resolve_local(). This looks exactly like what I was thinking of - have you tested this? If it is OK, I'd make it the first in the series. There were two things I was not sure about in my example. 1) Is 'init_net.loopback_dev' the correct reference for the loop device? Or is it something like dev_net(rt->idev->dev)->loopback_dev ? I'm sensing it may be the latter, but can't investigate right now Donno much about this new namespace stuff really 2) Was rt->idev->dev the right choice for the ipv4 case? Or is it rt->u.dst.dev ? The TCP case kinda looks like int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) tmp = ip_route_connect(&rt, nexthop, inet->saddr, RT_CONN_FLAGS(sk), sk->sk_bound_dev_if, IPPROTO_TCP, inet->sport, usin->sin_port, sk, 1); sk_setup_caps(sk, &rt->u.dst); void sk_setup_caps(struct sock *sk, struct dst_entry *dst) __sk_dst_set(sk, dst); And all later things key off the sk_get_dst. So I'm thinking that u.dst.dev might be correct. I have no idea what the difference is though (can't look too hard right now) The main other fixup I see is to remove ret = cma_bind_addr(id, src_addr, dst_addr); >From rdma_resolve_addr and rely on the routing lookup in addr_resolve_remote called by addr_resolve_ip to setup the bind device from the routing lookup. (This is what I mentioned in my last email) Which then lets you fixup the checking and handling of the sin6_scopeid on the source address - and fixes the main other routing difference against the TCP stack. Thanks for working on this! Jason ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091021230845.GR14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091021230845.GR14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-22 21:12 ` David J. Wilder [not found] ` <1256245942.12075.46.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-22 21:12 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sean Hefty, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, 2009-10-21 at 17:08 -0600, Jason Gunthorpe wrote: > This looks exactly like what I was thinking of - have you tested this? Yes I did do some testing, but that brings up a good question. I am not sure I know what all should be tested? I am running rping with different destination address (and scoping). On the ipv4 side: rping -c -a <ip-of-my-ib0-interface> rping -c -a <ip-of-remote-nodes-ib0-interface> For ipv6 I ran what I described previously. What I do need to do is add the option to rping to specify a source address and run it with various address. Any help you can give defining what exactly needs to be tested would be appreciated. > > If it is OK, I'd make it the first in the series. > > There were two things I was not sure about in my example. > 1) Is 'init_net.loopback_dev' the correct reference for the loop device? Or > is it something like dev_net(rt->idev->dev)->loopback_dev ? > > I'm sensing it may be the latter, but can't investigate right now > Donno much about this new namespace stuff really I think you may be correct I will look at that closer. I did explicitly verify the test worked in both cases. > > 2) Was rt->idev->dev the right choice for the ipv4 case? Or is it > rt->u.dst.dev ? > > The TCP case kinda looks like > int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) > tmp = ip_route_connect(&rt, nexthop, inet->saddr, > RT_CONN_FLAGS(sk), sk->sk_bound_dev_if, > IPPROTO_TCP, > inet->sport, usin->sin_port, sk, 1); > sk_setup_caps(sk, &rt->u.dst); > > void sk_setup_caps(struct sock *sk, struct dst_entry *dst) > __sk_dst_set(sk, dst); > > And all later things key off the sk_get_dst. So I'm thinking > that u.dst.dev might be correct. > > I have no idea what the difference is though (can't look too hard > right now) > > The main other fixup I see is to remove > ret = cma_bind_addr(id, src_addr, dst_addr); > > From rdma_resolve_addr and rely on the routing lookup in > addr_resolve_remote called by addr_resolve_ip to setup the bind device > from the routing lookup. (This is what I mentioned in my last email) > > Which then lets you fixup the checking and handling of the > sin6_scopeid on the source address - and fixes the main other routing > difference against the TCP stack. > > Thanks for working on this! > > Jason Lots of discussion :) I will go through the mails, address the comments and post the entire series of patches. Thanks for all your input. Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1256245942.12075.46.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1256245942.12075.46.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-22 22:02 ` Sean Hefty [not found] ` <660D538F30E647F3AE1E5E6C1ACBE882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-22 22:02 UTC (permalink / raw) To: 'David J. Wilder', Jason Gunthorpe Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 >For ipv6 I ran what I described previously. What I do need to do is add >the option to rping to specify a source address and run it with various >address. Any help you can give defining what exactly needs to be tested >would be appreciated. You can also test with ucmatose to verify ipv4 still works. Use the -b option to bind to a specific address. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <660D538F30E647F3AE1E5E6C1ACBE882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <660D538F30E647F3AE1E5E6C1ACBE882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-27 18:22 ` David J. Wilder [not found] ` <1256667738.16192.1.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-27 18:22 UTC (permalink / raw) To: Sean Hefty Cc: Jason Gunthorpe, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Thu, 2009-10-22 at 15:02 -0700, Sean Hefty wrote: > >For ipv6 I ran what I described previously. What I do need to do is add > >the option to rping to specify a source address and run it with various > >address. Any help you can give defining what exactly needs to be tested > >would be appreciated. > > You can also test with ucmatose to verify ipv4 still works. Use the -b option > to bind to a specific address. > > - Sean > Sean This patch adds ipv6 support to ucmatose. Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ------------------------------------------------------ examples/cmatose.c | 31 +++++++++++++++++++------------ 1 files changed, 19 insertions(+), 12 deletions(-) diff --git a/examples/cmatose.c b/examples/cmatose.c index 8c12347..481a6d0 100644 --- a/examples/cmatose.c +++ b/examples/cmatose.c @@ -516,15 +516,15 @@ static int get_addr(char *dst, struct sockaddr_in *addr) return ret; } - if (res->ai_family != PF_INET) { + if (res->ai_family == PF_INET) + memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in)); + else if (res->ai_family == PF_INET6) + memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in6)); + else ret = -1; - goto out; - } - *addr = *(struct sockaddr_in *) res->ai_addr; -out: - freeaddrinfo(res); - return ret; + freeaddrinfo(res); + return ret; } static int run_server(void) @@ -543,11 +543,18 @@ static int run_server(void) ret = get_addr(src_addr, &test.src_in); if (ret) goto out; - } else + if(test.src_in.sin_family == AF_INET) + ((struct sockaddr_in *) &test.src_in)->sin_port = port; + else + ((struct sockaddr_in6 *) &test.src_in)->sin6_port=port; + + } else { test.src_in.sin_family = PF_INET; + test.src_in.sin_port = port; + } + + ret = rdma_bind_addr(listen_id, (struct sockaddr *)&test.src_in); - test.src_in.sin_port = port; - ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { perror("cmatose: bind address failed"); goto out; @@ -628,8 +635,8 @@ static int run_client(void) printf("cmatose: connecting\n"); for (i = 0; i < connections; i++) { ret = rdma_resolve_addr(test.nodes[i].cma_id, - src_addr ? test.src_addr : NULL, - test.dst_addr, 2000); + src_addr ? (struct sockaddr *)&test.src_in : NULL, + (struct sockaddr *)&test.dst_in, 2000); if (ret) { perror("cmatose: failure getting addr"); connect_error(); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 44+ messages in thread
[parent not found: <1256667738.16192.1.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* re: [PATCH] librdmacm/cmatose: add support for ipv6 [not found] ` <1256667738.16192.1.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-27 18:58 ` Sean Hefty [not found] ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-27 18:58 UTC (permalink / raw) To: 'David J. Wilder' Cc: Jason Gunthorpe, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 >This patch adds ipv6 support to ucmatose. > >Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Thanks. I pulled this patch into my local git tree with just a couple of minor cleanups. What other patches, if any, did you use to test with it? - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* RE: [PATCH] librdmacm/cmatose: add support for ipv6 [not found] ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-27 19:42 ` Sean Hefty 2009-10-27 21:50 ` David J. Wilder 1 sibling, 0 replies; 44+ messages in thread From: Sean Hefty @ 2009-10-27 19:42 UTC (permalink / raw) To: Hefty, Sean, 'David J. Wilder' Cc: Jason Gunthorpe, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 From: David J. Wilder <[dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org]> Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- We need to update struct cmatest to allow storing an IPv6 address, or we can overrun the buffer. Running with this patch, the client causes a kernel bug on 2.6.31 in addr_resolve_local, so we'll still fixes for that. It also meant that I couldn't test this patch. examples/cmatose.c | 53 ++++++++++++++++++++++++++++------------------------ 1 files changed, 29 insertions(+), 24 deletions(-) diff --git a/examples/cmatose.c b/examples/cmatose.c index 8c12347..84831ec 100644 --- a/examples/cmatose.c +++ b/examples/cmatose.c @@ -75,10 +75,7 @@ struct cmatest { int connects_left; int disconnects_left; - struct sockaddr_in dst_in; - struct sockaddr *dst_addr; - struct sockaddr_in src_in; - struct sockaddr *src_addr; + struct rdma_addr addr; }; static struct cmatest test; @@ -505,7 +502,7 @@ static int migrate_channel(struct rdma_cm_id *listen_id) return ret; } -static int get_addr(char *dst, struct sockaddr_in *addr) +static int get_addr(char *dst, struct sockaddr *addr) { struct addrinfo *res; int ret; @@ -516,15 +513,15 @@ static int get_addr(char *dst, struct sockaddr_in *addr) return ret; } - if (res->ai_family != PF_INET) { + if (res->ai_family == PF_INET) + memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in)); + else if (res->ai_family == PF_INET6) + memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in6)); + else ret = -1; - goto out; - } - *addr = *(struct sockaddr_in *) res->ai_addr; -out: - freeaddrinfo(res); - return ret; + freeaddrinfo(res); + return ret; } static int run_server(void) @@ -540,14 +537,21 @@ static int run_server(void) } if (src_addr) { - ret = get_addr(src_addr, &test.src_in); + ret = get_addr(src_addr, &test.addr.src_addr); if (ret) goto out; - } else - test.src_in.sin_family = PF_INET; + if (test.addr.src_addr.sa_family == AF_INET) + ((struct sockaddr_in *) &test.addr.src_addr)->sin_port = port; + else + ((struct sockaddr_in6 *) &test.addr.src_addr)->sin6_port = port; + + } else { + test.addr.src_addr.sa_family = PF_INET; + ((struct sockaddr_in *) &test.addr.src_addr)->sin_port = port; + } + + ret = rdma_bind_addr(listen_id, &test.addr.src_addr); - test.src_in.sin_port = port; - ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { perror("cmatose: bind address failed"); goto out; @@ -614,22 +618,25 @@ static int run_client(void) printf("cmatose: starting client\n"); if (src_addr) { - ret = get_addr(src_addr, &test.src_in); + ret = get_addr(src_addr, &test.addr.src_addr); if (ret) return ret; } - ret = get_addr(dst_addr, &test.dst_in); + ret = get_addr(dst_addr, &test.addr.dst_addr); if (ret) return ret; - test.dst_in.sin_port = port; + if (test.addr.dst_addr.sa_family == AF_INET) + ((struct sockaddr_in *) &test.addr.dst_addr)->sin_port = port; + else + ((struct sockaddr_in6 *) &test.addr.dst_addr)->sin6_port = port; printf("cmatose: connecting\n"); for (i = 0; i < connections; i++) { ret = rdma_resolve_addr(test.nodes[i].cma_id, - src_addr ? test.src_addr : NULL, - test.dst_addr, 2000); + src_addr ? &test.addr.src_addr : NULL, + &test.addr.dst_addr, 2000); if (ret) { perror("cmatose: failure getting addr"); connect_error(); @@ -717,8 +724,6 @@ int main(int argc, char **argv) } } - test.dst_addr = (struct sockaddr *) &test.dst_in; - test.src_addr = (struct sockaddr *) &test.src_in; test.connects_left = connections; test.disconnects_left = connections; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 44+ messages in thread
* re: [PATCH] librdmacm/cmatose: add support for ipv6 [not found] ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2009-10-27 19:42 ` Sean Hefty @ 2009-10-27 21:50 ` David J. Wilder 1 sibling, 0 replies; 44+ messages in thread From: David J. Wilder @ 2009-10-27 21:50 UTC (permalink / raw) To: Sean Hefty Cc: Jason Gunthorpe, linux-rdma, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Tue, 2009-10-27 at 11:58 -0700, Sean Hefty wrote: > >This patch adds ipv6 support to ucmatose. > > > >Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> > > Thanks. > > I pulled this patch into my local git tree with just a couple of minor cleanups. > What other patches, if any, did you use to test with it? > > - Sean > I have been testing ucmatose against today's ofed 1.5 build, with the patch I posted earlier with fixes to resolve_local() and ipv6_resolve_remote(). This patch is still missing the binding changes Jason suggested it just fixes the existing code. I did not see the panic you saw on 2.6.32, but I have run on that kernel yet. Dave.. Patch (posted earlier) rdma_resolve_addr() returns an error when attempting to resolve ipv6 link-local address. This patch fixes the handling of link-local address. Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ------------------------------------------------------ drivers/infiniband/core/addr.c | 25 ++++++++++++++++++++++--- 1 files changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index bd07803..3442256 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in, fl.nl_u.ip6_u.daddr = dst_in->sin6_addr; fl.nl_u.ip6_u.saddr = src_in->sin6_addr; + if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) { + if (!src_in->sin6_scope_id) + return -EINVAL; + fl.oif = src_in->sin6_scope_id; + } + if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) { + if (dst_in->sin6_scope_id) { + if (fl.oif && fl.oif != dst_in->sin6_scope_id) + return -EINVAL; + fl.oif = dst_in->sin6_scope_id; + } + if (!fl.oif) + return -EINVAL; + } + dst = ip6_route_output(&init_net, NULL, &fl); if (!dst) return ret; @@ -390,14 +405,16 @@ static int addr_resolve_local(struct sockaddr *src_in, case AF_INET6: { struct in6_addr *a; + int found = 0; for_each_netdev(&init_net, dev) if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) dst_in)->sin6_addr, - dev, 1)) + dev, 1)) { + found = 1; break; - - if (!dev) + } + if (!found) return -EADDRNOTAVAIL; a = &((struct sockaddr_in6 *) src_in)->sin6_addr; @@ -406,6 +423,8 @@ static int addr_resolve_local(struct sockaddr *src_in, src_in->sa_family = dst_in->sa_family; ((struct sockaddr_in6 *) src_in)->sin6_addr = ((struct sockaddr_in6 *) dst_in)->sin6_addr; + ((struct sockaddr_in6 *) src_in)->sin6_scope_id = + ((struct sockaddr_in6 *) dst_in)->sin6_scope_id; ret = rdma_copy_addr(addr, dev, dev->dev_addr); } else if (ipv6_addr_loopback(a)) { ret = rdma_translate_ip(dst_in, addr); ^ permalink raw reply related [flat|nested] 44+ messages in thread
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 2009-10-21 23:08 ` Jason Gunthorpe @ 2009-10-21 23:17 ` Sean Hefty [not found] ` <9D257695083141E79685CB2B260D7D7C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-21 23:17 UTC (permalink / raw) To: 'David J. Wilder', Jason Gunthorpe Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 >This patch, as Jason's suggested, moves the function of addr_resolve_local() >into addr4_resolve_remote() >and addr6_resolve_remote(). It eliminates the need for addr_resolve_local(). One quick comment, remove '_remote' from function names: addr4_resolve_remote, addr6_resolve_remote, and addr_resolve_remote > drivers/infiniband/core/addr.c | 99 +++++++-------------------------------- > 1 files changed, 18 insertions(+), 81 deletions(-) > >diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c >index bd07803..f7a5861 100644 >--- a/drivers/infiniband/core/addr.c >+++ b/drivers/infiniband/core/addr.c >@@ -233,6 +233,14 @@ static int addr4_resolve_remote(struct sockaddr_in >*src_in, > if (ret) > goto out; > >+ if (rt->idev->dev == init_net.loopback_dev){ >+ ret = rdma_translate_ip((struct sockaddr *)dst_in, addr); >+ if (!ret) >+ memcpy(addr->dst_dev_addr, addr->src_dev_addr, >+ MAX_ADDR_LEN); >+ goto put; >+ } This doesn't end up doing the same thing as what resolve_local did. It only matches up with the 'else if' portion below: >-static int addr_resolve_local(struct sockaddr *src_in, >- struct sockaddr *dst_in, >- struct rdma_dev_addr *addr) >-{ >- struct net_device *dev; >- int ret; >- >- switch (dst_in->sa_family) { >- case AF_INET: >- { >- __be32 src_ip = ((struct sockaddr_in *) src_in)->sin_addr.s_addr; >- __be32 dst_ip = ((struct sockaddr_in *) dst_in)->sin_addr.s_addr; >- >- dev = ip_dev_find(&init_net, dst_ip); >- if (!dev) >- return -EADDRNOTAVAIL; >- >- if (ipv4_is_zeronet(src_ip)) { >- src_in->sa_family = dst_in->sa_family; >- ((struct sockaddr_in *) src_in)->sin_addr.s_addr = dst_ip; >- ret = rdma_copy_addr(addr, dev, dev->dev_addr); >- } else if (ipv4_is_loopback(src_ip)) { >- ret = rdma_translate_ip(dst_in, addr); >- if (!ret) >- memcpy(addr->dst_dev_addr, dev->dev_addr, >MAX_ADDR_LEN); >- } else { >- ret = rdma_translate_ip(src_in, addr); >- if (!ret) >- memcpy(addr->dst_dev_addr, dev->dev_addr, >MAX_ADDR_LEN); >- } We need to handle the case where the source address is not given and provide one. We also need to handle the case where the source address is given, but may not match the destination address, or even use the same RDMA device. I didn't look at the ipv6 changes yet. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <9D257695083141E79685CB2B260D7D7C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <9D257695083141E79685CB2B260D7D7C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-21 23:36 ` Jason Gunthorpe [not found] ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-21 23:36 UTC (permalink / raw) To: Sean Hefty Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 21, 2009 at 04:17:01PM -0700, Sean Hefty wrote: > This doesn't end up doing the same thing as what resolve_local did. It only > matches up with the 'else if' portion below: It does all the behaviors, it passes the input into a routing lookup - though it is true the source needs to be passed into the routing lookup, which I think David had in another patch. Dave; I guess you need to include that here now. So, first case: > >- if (ipv4_is_zeronet(src_ip)) { Looks like: $ ip route get 10.0.0.11 local 10.0.0.11 dev lo src 10.0.0.11 cache <local> mtu 16436 advmss 16396 hoplimit 64 trips through the if == loopback and does 'rdma_translate_ip(10.0.0.11)' Same as the old code (little different path, but seems to give the same result) Second case: > >- } else if (ipv4_is_loopback(src_ip)) { Looks like: $ ip route get 10.0.0.11 from 127.0.0.1 local 10.0.0.11 from 127.0.0.1 dev lo cache <local> mtu 16436 advmss 16396 hoplimit 64 so trips through the if == loopback and does rdma_translate_ip(10.0.0.11) Same as the old code Third case: > >- } else { $ ip route get 10.0.0.11 from 10.0.0.11 local 10.0.0.11 from 10.0.0.11 dev lo cache <local> mtu 16436 advmss 16396 hoplimit 64 Again, does rdma_translate_ip(10.0.0.11) And the weird case is different: $ ip route get 10.0.0.11 from 192.168.122.1 local 10.0.0.11 from 192.168.122.1 dev lo cache <local> mtu 16436 advmss 16396 hoplimit 64 (192.168.122.1 is bound to a different device on my system than 10.0.0.11) The new case trips the if == loopback and does rdma_translate_ip(10.0.0.11) The old case does rdma_translate_ip(192.168.122.1) I don't think this is a significant difference, both behaviors are reasonable choices and the code/complexity savings are worth it, IMHO. [Alternatively, I suppose one could call rdma_translate_ip(rt->rt_src) and if that fails then do rdma_translate_ip(dst_in) but why bother?] Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-21 23:46 ` Jason Gunthorpe 2009-10-21 23:58 ` Sean Hefty 1 sibling, 0 replies; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-21 23:46 UTC (permalink / raw) To: Sean Hefty Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 21, 2009 at 05:36:39PM -0600, Jason Gunthorpe wrote: > > >- if (ipv4_is_zeronet(src_ip)) { > > Looks like: > $ ip route get 10.0.0.11 > local 10.0.0.11 dev lo src 10.0.0.11 > cache <local> mtu 16436 advmss 16396 hoplimit 64 > > trips through the if == loopback and does > 'rdma_translate_ip(10.0.0.11)' > > Same as the old code (little different path, but seems to give the > same result) Oops, there is a little woopsie here: Dave: + if (rt->idev->dev == init_net.loopback_dev){ + ret = rdma_translate_ip((struct sockaddr *)dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, addr->src_dev_addr, + MAX_ADDR_LEN); + goto put; + } + The 'goto put' will skip over the source address assignment step. Maybe move this: if (!src_ip) { src_in->sin_family = dst_in->sin_family; src_in->sin_addr.s_addr = rt->rt_src; } Up above your if (rt->idev) And similarly for v6. Also add a src_in = rt->rt_src assignment for v6. I'd also remove the test for 0 address, just do it unconditionally (IIRC routing table always returns src if src is not 0) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2009-10-21 23:46 ` Jason Gunthorpe @ 2009-10-21 23:58 ` Sean Hefty [not found] ` <BFF5CD71FFA74BF09A2EED5F3319F370-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-21 23:58 UTC (permalink / raw) To: 'Jason Gunthorpe' Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 >$ ip route get 10.0.0.11 from 192.168.122.1 >local 10.0.0.11 from 192.168.122.1 dev lo > cache <local> mtu 16436 advmss 16396 hoplimit 64 > >(192.168.122.1 is bound to a different device on my system than >10.0.0.11) > >The new case trips the if == loopback and does >rdma_translate_ip(10.0.0.11) > >The old case does rdma_translate_ip(192.168.122.1) > >I don't think this is a significant difference, both behaviors are >reasonable choices and the code/complexity savings are worth it, IMHO. >[Alternatively, I suppose one could call rdma_translate_ip(rt->rt_src) >and if that fails then do rdma_translate_ip(dst_in) but why bother?] I agree this usage case is weird, but I'm still not sure about the patch. If the destination service is listening on 10.0.0.11, then the listen is bound to a different gid than the source gid that we're trying to connect from. The src_dev_addr and dst_dev_addr need to reflect this, so that a PR gives us a path that routes the CM REQ correctly. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <BFF5CD71FFA74BF09A2EED5F3319F370-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <BFF5CD71FFA74BF09A2EED5F3319F370-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-22 0:28 ` Jason Gunthorpe [not found] ` <20091022002846.GU14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-22 0:28 UTC (permalink / raw) To: Sean Hefty Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 21, 2009 at 04:58:02PM -0700, Sean Hefty wrote: > >$ ip route get 10.0.0.11 from 192.168.122.1 > >local 10.0.0.11 from 192.168.122.1 dev lo > > cache <local> mtu 16436 advmss 16396 hoplimit 64 > > > >(192.168.122.1 is bound to a different device on my system than > >10.0.0.11) > > > >The new case trips the if == loopback and does > >rdma_translate_ip(10.0.0.11) > > > >The old case does rdma_translate_ip(192.168.122.1) > > > >I don't think this is a significant difference, both behaviors are > >reasonable choices and the code/complexity savings are worth it, IMHO. > >[Alternatively, I suppose one could call rdma_translate_ip(rt->rt_src) > >and if that fails then do rdma_translate_ip(dst_in) but why bother?] > > I agree this usage case is weird, but I'm still not sure about the > patch. If the destination service is listening on 10.0.0.11, then > the listen is bound to a different gid than the source gid that > we're trying to connect from. The src_dev_addr and dst_dev_addr > need to reflect this, so that a PR gives us a path that routes the > CM REQ correctly. You've lost me.. The local loopback case uses PRs? Even so, it still seems OK to me: Path: addr4_resolve_remote $ ip route get 10.0.0.11 from 192.168.122.1 local 10.0.0.11 from 192.168.122.1 dev lo srcIP = 192.168.122.1 rdma_translate_ip(dst_ip = 10.0.0.11) rdma_copy_addr("eth0"); src_dev_addr = eth0.dev_addr (ie GID of 10.0.0.11) memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11) So everthing is bound to the GID of 10.0.0.11 which matches the listen of 10.0.0.11, which seems OK. The CM REP is sent to GID of 10.0.0.11 from GID of 10.0.0.11, with an private data that indicates a source IP of 192.168.122.1 and a dest IP of 10.0.0.11. This seems optimal to me? By what I understand you said I would think the prior version would be the one that is in trouble, if I am listening on 10.0.0.11 and you make a connection sourced from 192.168.122.1, which is looped back, it seems reasonable to me that this will succeed and the outgoing connection would necessarily have be bound to the destination address since the listening process is attached there. Am I missing something? Remember, there is still one more missing part there, the cma_bind I was talking about has to go away. We want to bind the source device based on the routing lookup, with loopback binding to the destination device. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091022002846.GU14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091022002846.GU14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-22 0:40 ` Sean Hefty [not found] ` <4803112A5B7A4953B62ABAFD1BBD9881-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-22 0:40 UTC (permalink / raw) To: 'Jason Gunthorpe' Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 >The local loopback case uses PRs? Yes - the rdma cm makes no distinction when resolve route is called. It does a PR query. >Even so, it still seems OK to me: > >Path: > addr4_resolve_remote > $ ip route get 10.0.0.11 from 192.168.122.1 > local 10.0.0.11 from 192.168.122.1 dev lo > srcIP = 192.168.122.1 > rdma_translate_ip(dst_ip = 10.0.0.11) > rdma_copy_addr("eth0"); > src_dev_addr = eth0.dev_addr (ie GID of 10.0.0.11) > memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11) > >So everthing is bound to the GID of 10.0.0.11 which matches the listen >of 10.0.0.11, which seems OK. The source could have called rdma_bind_addr(192.168.122.1) prior to calling rdma_resolve_addr(). (DAPL does this.) This would have returned a different RDMA device than binding to 10.0.0.11. The client app could have allocated resources on that device, but the CM REQ will carry the gid/lid of the other device. The endpoints won't be able to communicate. Yes, it's weird, and may not be optimal, but if a source address is explicitly given, then its mapping to a specific RDMA device should be honored. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <4803112A5B7A4953B62ABAFD1BBD9881-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <4803112A5B7A4953B62ABAFD1BBD9881-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-22 1:19 ` Jason Gunthorpe [not found] ` <20091022011939.GW14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-22 1:19 UTC (permalink / raw) To: Sean Hefty Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 21, 2009 at 05:40:30PM -0700, Sean Hefty wrote: > >Even so, it still seems OK to me: > > > >Path: > > addr4_resolve_remote > > $ ip route get 10.0.0.11 from 192.168.122.1 > > local 10.0.0.11 from 192.168.122.1 dev lo > > srcIP = 192.168.122.1 > > rdma_translate_ip(dst_ip = 10.0.0.11) > > rdma_copy_addr("eth0"); > > src_dev_addr = eth0.dev_addr (ie GID of 10.0.0.11) > > memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11) > > > >So everthing is bound to the GID of 10.0.0.11 which matches the listen > >of 10.0.0.11, which seems OK. > > The source could have called rdma_bind_addr(192.168.122.1) prior to calling > rdma_resolve_addr(). (DAPL does this.) This would have returned a different > RDMA device than binding to 10.0.0.11. The client app could have allocated > resources on that device, but the CM REQ will carry the gid/lid of the other > device. The endpoints won't be able to communicate. That is very difficult to fit into the semantics the IP routing model uses :( And it looks like an API problem in DAPL :( So, I see now, you are proposing that in this case the connection attempt to be routed through the network and not looped back.. I actually have a big problem with that, ignoring a 'lo' entry in a routing table is very much not IP like and not a good idea. That should be respected.. I guess I'd much rather see that one situation return EHOSTUNREACH or something. But, I suppose you are going to tell me that Intel MPI uses DAPL to loopback connect to other processes on the same node, and relies on this? :( :( :( Sigh. Anyhow, lets not get side tracked. It seems to me, the easy way out for David's approach is to simply check if the device is already bound via rdma_bind() and if so force it to that device no matter what the routing table lookup returns. Can you suggest a reliable way to make that check? [What happens now if I do this: rdma_bind(10.0.0.11) rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11) Does the cma_bind path check that it is already bound and give out an error? too late for me to check] Once the cma_bind for rdma_resolve_addr is moved into the addr_resolve_remote function then people using the API without calling bind on the client path will get sane IP-like behavior. > Yes, it's weird, and may not be optimal, but if a source address is > explicitly given, then its mapping to a specific RDMA device should > be honored. Remember, on Linux the IP is *not* attached to a device, it is part of the host itself. So the idea that a source address somehow specifies a RDMA device does not fit into the Linux IP networking model. Unfortunately the definition of rdma_bind kinda bakes this mismatched model into the API :( Truth be told, to fit the Linux IP model, the RDMA CM should have provided exactly only two ways to bind a cm_id to a specific device - rdma_accept and rdma_resolve_addr. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091022011939.GW14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091022011939.GW14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-22 5:24 ` Sean Hefty [not found] ` <60E9443821EE4FFA90448DF5CC1368A8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-22 5:24 UTC (permalink / raw) To: 'Jason Gunthorpe' Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 >That is very difficult to fit into the semantics the IP routing >model uses :( And it looks like an API problem in DAPL :( This depends on if your view is that the rdma cm is trying to match IP routing, trying to use IP addresses as convenient names for RDMA ports, or something in between that may lean more one way or the other depending on the device type. IMO, the primary reason for using IP addressing over IB is more for convenience, than compliance. >So, I see now, you are proposing that in this case the connection >attempt to be routed through the network and not looped back.. I >actually have a big problem with that, ignoring a 'lo' entry in a >routing table is very much not IP like and not a good idea. That >should be respected.. I hesitate comparing RDMA versus IP too closely. >Sigh. Anyhow, lets not get side tracked. It seems to me, the easy way >out for David's approach is to simply check if the device is already >bound via rdma_bind() and if so force it to that device no matter what >the routing table lookup returns. Can you suggest a reliable way to >make that check? I'm not entirely sure of the best way to test this. Checking the src_dev_addr is my first thought. >[What happens now if I do this: > rdma_bind(10.0.0.11) > rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11) >Does the cma_bind path check that it is already bound and give out an >error? too late for me to check] rdma_resolve_addr only calls bind if the rdma id is not already bound. The src_addr simply gets ignored in this case, and the bound address is used instead. >Once the cma_bind for rdma_resolve_addr is moved into the >addr_resolve_remote function then people using the API without calling >bind on the client path will get sane IP-like behavior. I like the approach, I'm just concerned about the case where the app has requested a specific source address, which today means that a specific RDMA device should be used. >Remember, on Linux the IP is *not* attached to a device, it is part of >the host itself. So the idea that a source address somehow specifies a >RDMA device does not fit into the Linux IP networking model. I would say we're extending the linux networking model to incorporate this concept, while staying in our imposed little RDMA sand box world. Even iWarp has these issues. >Truth be told, to fit the Linux IP model, the RDMA CM should have >provided exactly only two ways to bind a cm_id to a specific device - >rdma_accept and rdma_resolve_addr. I think this is more restrictive than things need to be. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <60E9443821EE4FFA90448DF5CC1368A8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <60E9443821EE4FFA90448DF5CC1368A8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-22 6:25 ` Jason Gunthorpe [not found] ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-22 6:25 UTC (permalink / raw) To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma On Wed, Oct 21, 2009 at 10:24:29PM -0700, Sean Hefty wrote: > >That is very difficult to fit into the semantics the IP routing > >model uses :( And it looks like an API problem in DAPL :( > > This depends on if your view is that the rdma cm is trying to match IP routing, > trying to use IP addresses as convenient names for RDMA ports, or something in > between that may lean more one way or the other depending on the device type. > IMO, the primary reason for using IP addressing over IB is more for convenience, > than compliance. I more or less agree for IB, but iWarp really should match exactly, and TBH I'm disappointed that no iWarp folks ever reviewed this stuff for consistency with the IP stack, it goes to show that netdev wasn't far off their criticisms I guess :( > >[What happens now if I do this: > > rdma_bind(10.0.0.11) > > rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11) > >Does the cma_bind path check that it is already bound and give out an > >error? too late for me to check] > > rdma_resolve_addr only calls bind if the rdma id is not already bound. The > src_addr simply gets ignored in this case, and the bound address is used > instead. This should get updated to return -EINVAL for this case I think, since it is nonsense. Either src = null or src == bind addr > >Truth be told, to fit the Linux IP model, the RDMA CM should have > >provided exactly only two ways to bind a cm_id to a specific device - > >rdma_accept and rdma_resolve_addr. > > I think this is more restrictive than things need to be. You get somethings and you loose somethings.. The Linux IP model is not choosen arbitarily, there are solid reasons why it works like that. Mainly for RDMA what you get is more kernel flexability, IP like capabilities, better bonding, and better support of IB APM semantics: - bind() + listen() actually works properly if more than one interface is bound to the same IP - the cm_id returned by accept is bound to the hca and port that accepted the connection [ This is a L3 form of bonding Linux supports ] This is actually something of a mandatory notion to implement the full generality of the IB CM protocol which allows the CM REP to contain a port GUID of another port on the same node (multi-port APM is an IB feature). So you never know what port the accept() result will get bound to. BTW: I suppose ideally AF_IB would have a way to say 'CM accept REPs on any port on this node' Hmm, reserved GID prefix perhaps? Hmm. When used with bonding this would also afford the kernel with the ability to accept incoming connections across all the redundant RDMA devices - and still have correct bound-to-IP semantics. - rdma_resolve_addr more or less as the inverse of all the above * multiple interfaces with same IP case works, kernel and routing table can distribute outgoing connections * multi-port APM works, kernel and user space can choose primary and backup port for the IP addy * bonding works, kernel can balance outgoing connections across the bond slaves. These are all useful features. Could we have had the above and still had pretty much enough flexability? Yes, I think so. The main stumbling block, IMHO, seems to have been the tying of PDs and a few other non QP verbs objects to a ibv_context, it should have been the other way around - a PD, comp channel, etc are fist class kernel objects that exist across all ports and devices. When a QP is connected to them then the kernel would setup the HW to match the PD, MRs, etc. CQs and SRQs would still have to be created after an accpt/route_addr, but that seems pretty mild <shrug> But that is of course not how things worked out, it would be possible to get there from here, but I doubt there is any interest.. It is late, I'm just musing what could have been :) Still, it must be a total nightmare to write a daemon that listens on INADDR_ANY, works correctly with multiple devices plus hotplug, and dosn't leak PDs or something silly - and that is a shame. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-22 6:49 ` Sean Hefty [not found] ` <B1D32A650C2F400E92AFC4909D8880A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 2009-10-25 11:52 ` Or Gerlitz 1 sibling, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-22 6:49 UTC (permalink / raw) To: 'Jason Gunthorpe'; +Cc: 'David J. Wilder', linux-rdma > This is actually something of a mandatory notion to implement the > full generality of the IB CM protocol which allows the CM REP to > contain a port GUID of another port on the same node (multi-port > APM is an IB feature). So you never know what port the accept() > result will get bound to. You can redirect (REJ) a REQ to a different port, but you can't accept (REP) a connection on a different port. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <B1D32A650C2F400E92AFC4909D8880A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <B1D32A650C2F400E92AFC4909D8880A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-22 16:47 ` Jason Gunthorpe [not found] ` <20091022164717.GG26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-22 16:47 UTC (permalink / raw) To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma On Wed, Oct 21, 2009 at 11:49:52PM -0700, Sean Hefty wrote: > > This is actually something of a mandatory notion to implement the > > full generality of the IB CM protocol which allows the CM REP to > > contain a port GUID of another port on the same node (multi-port > > APM is an IB feature). So you never know what port the accept() > > result will get bound to. > > You can redirect (REJ) a REQ to a different port, but you can't accept (REP) a > connection on a different port. Do you have a compliance statment for that? :) Spec says: The initiator is responsible for proposing the Port Addresses (Primary and optional Alternate) that the target (REQ recipient) is to use for the channel. Based on the path defined by those port addresses, the initiator provides timeout information and the Service Level to be used by the target for any messages that it initiates. The SL from initiator to target need not be the same as from target to initiator, but the SL that is contained within the REQ message is the one that the initiator would prefer the target use. Path in- formation is available from Subnet Administration (see section 15.2.5.16 PathRecord). 'Proposing the Port Addresses' == Any Port On the Node, IMHO. Someone went to alot of trouble to include enough unnecessary information to make that possible, and it is necessary for multi port APM. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091022164717.GG26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091022164717.GG26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-22 17:59 ` Sean Hefty [not found] ` <A03920DE5BAE4309BAFC1B6471AE928C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-22 17:59 UTC (permalink / raw) To: 'Jason Gunthorpe'; +Cc: 'David J. Wilder', linux-rdma >> You can redirect (REJ) a REQ to a different port, but you can't accept (REP) >a >> connection on a different port. > >Do you have a compliance statment for that? :) The REP doesn't carry port (i.e. GID) information. >Spec says: > > The initiator is responsible for proposing the Port Addresses > (Primary and optional Alternate) that the target (REQ recipient) is to > use for the channel. Based on the path defined by those port > addresses, the initiator provides timeout information and the Service > Level to be used by the target for any messages that it initiates. The > SL from initiator to target need not be the same as from target to > initiator, but the SL that is contained within the REQ message is the > one that the initiator would prefer the target use. Path in- formation > is available from Subnet Administration (see section 15.2.5.16 > PathRecord). > >'Proposing the Port Addresses' == Any Port On the Node, IMHO. Under reject reasons: 12 Primary Remote Port GID rejected The recipient of the REQ message could not (or would not) accept the Primary Remote Port GID GID of acceptable port. 13 Primary Remote Port LID rejected The recipient of the REQ message could not (or would not) accept the Primary Remote Port LID LID of acceptable port. I interpret this as the REQ carries the proposal. If the passive side aggress with the proposal, it accepts the connection, otherwise it rejects. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <A03920DE5BAE4309BAFC1B6471AE928C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <A03920DE5BAE4309BAFC1B6471AE928C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-22 18:22 ` Jason Gunthorpe [not found] ` <20091022182221.GZ14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-22 18:22 UTC (permalink / raw) To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma On Thu, Oct 22, 2009 at 10:59:37AM -0700, Sean Hefty wrote: > >> You can redirect (REJ) a REQ to a different port, but you can't accept (REP) > >a > >> connection on a different port. > > > >Do you have a compliance statment for that? :) > > The REP doesn't carry port (i.e. GID) information. Why would it need to? The REQ contains all the port information necessary, the active side doesn't get to pick the port, it just gets to approve it. > Under reject reasons: > > 12 Primary Remote Port GID rejected > The recipient of the REQ message could not (or would not) accept the Primary > Remote Port GID > GID of acceptable port. > > 13 Primary Remote Port LID rejected > The recipient of the REQ message could not (or would not) accept the Primary > Remote Port LID > LID of acceptable port. > > I interpret this as the REQ carries the proposal. If the passive side aggress > with the proposal, it accepts the connection, otherwise it rejects. I agree. But the proposal contains a complete description of the data carrying port (GUID, GID, LID) and there is *nothing* in the spec that says that description must match the port the GMP is received on. Don't get confused by GMP redirection, that is yet again something else :) So, IBA allows the active side to propose a data connection with primary path on Port 1, alternate path on Port 2 and send the GMP on Port 2. The passive CM then looks at this and decides if it is within local policy constraints, if not it returns 12 or 13. Thus, you cannot know what port the data QP will be bound to until you process the REQ - just like IP. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091022182221.GZ14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091022182221.GZ14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-22 18:31 ` Sean Hefty [not found] ` <60293EE44E7243B59C41498F20D558A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Sean Hefty @ 2009-10-22 18:31 UTC (permalink / raw) To: 'Jason Gunthorpe'; +Cc: 'David J. Wilder', linux-rdma >Why would it need to? The REQ contains all the port information >necessary, the active side doesn't get to pick the port, it just gets >to approve it. I'm not following you at all now. You said: This is actually something of a mandatory notion to implement the full generality of the IB CM protocol which allows the CM REP to contain a port GUID of another port on the same node (multi-port APM is an IB feature). So you never know what port the accept() result will get bound to. The REP does not contain a port GUID, so what exactly were you trying to say here? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <60293EE44E7243B59C41498F20D558A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <60293EE44E7243B59C41498F20D558A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org> @ 2009-10-22 18:40 ` Jason Gunthorpe 0 siblings, 0 replies; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-22 18:40 UTC (permalink / raw) To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma On Thu, Oct 22, 2009 at 11:31:07AM -0700, Sean Hefty wrote: > The REP does not contain a port GUID, so what exactly were you trying to say > here? Sorry, typo, it was too late. I ment REQ. The point was, that you don't know what port the QP will be bound to until you get the REQ (ie the RDMA CM user does rdma_accept()). So early binding limits capabilities. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2009-10-22 6:49 ` Sean Hefty @ 2009-10-25 11:52 ` Or Gerlitz [not found] ` <4AE43BED.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org> 1 sibling, 1 reply; 44+ messages in thread From: Or Gerlitz @ 2009-10-25 11:52 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Sean Hefty, 'David J. Wilder', linux-rdma Jason Gunthorpe wrote: > Mainly for RDMA what you get is more kernel flexability, IP like > capabilities, better bonding, and better support of IB APM semantics: > - bind() + listen() actually works properly if more than one > interface is bound to the same IP - the cm_id returned by accept is > bound to the hca and port that accepted the connection > [ This is a L3 form of bonding Linux supports ] > > This is actually something of a mandatory notion to implement the > full generality of the IB CM protocol which allows the CM REP to > contain a port GUID of another port on the same node (multi-port > APM is an IB feature). So you never know what port the accept() > result will get bound to. > > BTW: I suppose ideally AF_IB would have a way to say 'CM accept REPs on > any port on this node' Hmm, reserved GID prefix perhaps? Hmm. > > When used with bonding this would also afford the kernel with the > ability to accept incoming connections across all the redundant > RDMA devices - and still have correct bound-to-IP semantics. > > - rdma_resolve_addr more or less as the inverse of all the above > * multiple interfaces with same IP case works, kernel and routing > table can distribute outgoing connections > * multi-port APM works, kernel and user space can choose primary > and backup port for the IP addy > * bonding works, kernel can balance outgoing connections across the > bond slaves. > > These are all useful features. Jason, Have you even looked into or tested any of the bonding load-balancing modes with ipoib? some/most of them are not applicable to IPoIB and I don't think that the ones which may be such were ever tested. Next, multiple interfaces with the same ip address isn't something I see very useful for production environment (but I'd be happy to get educated what L3 bonding is and where it can play), next, multi-port APM isn't something I ever heard to be required by customers and more important, from comments made by Sean in the past, I don't think it fits the rdma-cm spirit. All in all, someone comes here and suggests some fixes to the rdma-cm address resolution code to have IPv6 work. I don't think Dave should carry on his back/patch all your proposed future enhancements. Let him fix things and following that you can work on the patches to support all these nice/nitch features starting with IPv4 and then IPv6. Or -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <4AE43BED.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <4AE43BED.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org> @ 2009-10-25 20:03 ` Jason Gunthorpe [not found] ` <20091025200332.GN26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-25 20:03 UTC (permalink / raw) To: Or Gerlitz; +Cc: Sean Hefty, 'David J. Wilder', linux-rdma On Sun, Oct 25, 2009 at 01:52:13PM +0200, Or Gerlitz wrote: > Jason, Have you even looked into or tested any of the bonding > load-balancing modes with ipoib? some/most of them are not applicable to > IPoIB and I don't think that the ones which may be such were ever > tested. I was saying that point in the rdmacm where the rdma_cm_id is bound to a local RDMA device should have only been rdma_resolve_addr and rdma_accept. Overloading rdma_bind_addr to both bind to an IP and bind to an RDMA device was a bad API choice. Sean is right, there may be special cases that require an early binding, but a seperate API - like IP's SO_BINDTODEVICE - would has been better - and users are forewarned that calling it restricts the environments their app will support :( As it stands we have several impossible situations. Sean, Dave, and I were disucssing the trades offs of what this means relative to IP route resolution - but it affects bonding too. If you rdma_bind_addr to the IP of a bonding device, the stack must pick one of the local RDMA ports immediately. If you then call rdma_listen there is a problem: incoming connections may target either RDMA device, but you are only bound to one of them. An app cannot say 'I want to listen on this IP, any RDMA device' with the current API, as you can in IP, and that is a shame. > Next, multiple interfaces with the same ip address isn't > something I see very useful for production environment (but I'd be happy > to get educated what L3 bonding is and where it can play), next, Traditionally with ethernet the L2 bonding is really only used for link aggregation, L1 failure, and a simple multi-switch HA scheme. It is not deployed if you have multiple ethernet domains. Some people prefer to have dual, independent ethernet fabrics, and in that case you rely on routing features to get the multipath, and HA features of bonding. Go back on the list and look up the posts from Leo who first discovered this, what he was trying to do is kinda the L3 bonding approach. > and more important, from comments made by Sean in the past, I don't > think it fits the rdma-cm spirit. I think multi-port APM can be fit into the API we have, but yes, there are some unfortunate design choices in verbs that make it a little awkward. > All in all, someone comes here and suggests some fixes to the rdma-cm > address resolution code to have IPv6 work. I don't think Dave should > carry on his back/patch all your proposed future enhancements. Let him > fix things and following that you can work on the patches to support all > these nice/nitch features starting with IPv4 and then IPv6. David has been doing a good job and I am glad he is working on the IPv6 support. My comments are only intended to clarify how this is all supposed to work and why the IP flow is actually still relevant to RDMA connections. This (now quite long) discussion about RDMA device binding I really think has clarified what everyone expects from the API, and I've certainly found Sean's comments about early device binding quite interesting. Unfortunately to make sin6_scope_id, and the loopback cases work properly does kinda require confronting these issues. The IPv6 notion of scope is new and it conflicts with the RDMA CM notion of 'scope' created by the local RDMA device binding. This is very different from how IP works. Reconciling the two ideas in a sane way is a big source of trouble. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091025200332.GN26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091025200332.GN26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-27 13:01 ` Or Gerlitz [not found] ` <4AE6EF44.5040004-smomgflXvOZWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Or Gerlitz @ 2009-10-27 13:01 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Sean Hefty, 'David J. Wilder', linux-rdma Jason Gunthorpe wrote: > I was saying that point in the rdmacm where the rdma_cm_id is bound to a local RDMA device should have only been rdma_resolve_addr and rdma_accept. Overloading rdma_bind_addr to both bind to an IP and bind to an RDMA device was a bad API choice. As you wrote, for the most case, binding comes into play only for users calling rdma_resolve_address or rdma_accept, for users the need explicit binding the rdma-cm provides rdma_bind and it binds to both IP and Device, if you can do better, send a patch, binding can't be removed from the API since it has users and it makes sense from users to require it. > Sean is right, there may be special cases that require an early binding, but a seperate API - like IP's SO_BINDTODEVICE - would has been better - and users are forewarned that calling it restricts the environments their app will support its just naming, will $ sed s/rdma_bind/rdma_set_opt(RDMA_BINDTODEVICE)/g make you happier? why? > As it stands we have several impossible situations. Sean, Dave, and I were disucssing the trades offs of what this means relative to IP route resolution Don't tell me that Dave's patches are blocked b/c you discovered the rdma_bind design and now you don't like it, as I wrote you, Dave sent patch to fix the IPv6 support, during the discussion on his patches you come and bring up more and more issues you consider as problems (but I don't) and block the patch set, I don't think this is appropriate. Let the patches go and send your patches to fix the problems you see. Why anyone touching some code piece has to fix problems you see in that piece?! > - but it affects bonding too. If you rdma_bind_addr to the IP of a bonding device, the stack must pick one of the local RDMA ports immediately. If you then call rdma_listen there is a problem: incoming connections may target either RDMA device, but you are only bound to one of them. An app cannot say 'I want to listen on this IP, any RDMA device' with the current API, as you can in IP, and that is a shame An app can say, I want to listen on that IP and the RDMA device which is associated with this IP now. When bonding does fail-over it generated a netevent, the rdma-cm catches this event and generates address change event, apps can redo their bind/listen at this point. For the time being, we never got a user report on a problem, people are doing listen on all IPs probably which works perfect with bonding. Currently the HA mode of bonding will respond on ARP only on one of the devices and as such connection requests will not target any rdma device but rather only the active one. If this is such a shame, send fix, spraying mud on the maintainer and/or someone sending another patch is a shame, isn't it? > Traditionally with ethernet the L2 bonding is really only used for link aggregation, L1 failure, and a simple multi-switch HA scheme. It is not deployed if you have multiple ethernet domains. Some people prefer to have dual, independent ethernet fabrics, and in that case you rely on routing features to get the multipath, and HA features of bonding. okay, thanks for the crash course. > Go back on the list and look up the posts from Leo who first discovered this, what he was trying to do is kinda the L3 bonding approach. if Loe has a problem and you want to help him, bring it on the list, debate, send patches, jumping into someone's else patch isn't the constructive way to go. > David has been doing a good job and I am glad he is working on the IPv6 support. My comments are only intended to clarify how this is all supposed to work and why the IP flow is actually still relevant to RDMA connections. As I see it, your comments block the the patches sent by Dave, Sean? Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <4AE6EF44.5040004-smomgflXvOZWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <4AE6EF44.5040004-smomgflXvOZWk0Htik3J/w@public.gmane.org> @ 2009-10-27 17:14 ` Jason Gunthorpe [not found] ` <20091027171403.GP26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-27 17:14 UTC (permalink / raw) To: Or Gerlitz; +Cc: Sean Hefty, linux-rdma On Tue, Oct 27, 2009 at 03:01:56PM +0200, Or Gerlitz wrote: > Don't tell me that Dave's patches are blocked b/c you discovered the > rdma_bind design and now you don't like it, as I wrote you, Dave sent > patch to fix the IPv6 support, during the discussion on his patches you > come and bring up more and more issues you consider as problems (but I > don't) and block the patch set, I don't think this is appropriate. Let > the patches go and send your patches to fix the problems you see. Why > anyone touching some code piece has to fix problems you see in that piece?! Wow, seriously? You do understand the purpose of review, right? Documentation/development-process/7.AdvancedTopics: Different developers will review code from different points of view. Some are mostly concerned with coding style and whether code lines have trailing white space. Others will focus primarily on whether the change implemented by the patch as a whole is a good thing for the kernel or not. Yet others will check for problematic locking, excessive stack usage, possible security issues, duplication of code found elsewhere, adequate documentation, adverse effects on performance, user-space ABI changes, etc. All types of review, if they lead to better code going into the kernel, are welcome and worthwhile. And yes, actually, accounting for how rdma_bind() is different from bind() when doing route resolution is pretty much the main remaining problem. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091027171403.GP26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091027171403.GP26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-28 14:33 ` Or Gerlitz 0 siblings, 0 replies; 44+ messages in thread From: Or Gerlitz @ 2009-10-28 14:33 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Sean Hefty, linux-rdma Jason Gunthorpe wrote: > Wow, seriously? You do understand the purpose of review, right? I think I do, maybe not to the depth you and your arguments are, but again, repeating myself: my kind of simple argument is that your review is way beyond the --change-- suggested by a patch but rather of a whole logic, and you block a patch b/c you don't like the logic this patch integrates with. To some extent such practice is excepted, but you took it to way beyond acceptable limit. I don't accept your assertion that the whole logic is broken and it makes sense to me to have a patch from Dave to fix the IPv6 part of it. Next or in parallel you are welcome to sent a patch fixing/re-writing the whole bind logic or even the whole rdma stack or the whole kernel. > And yes, actually, accounting for how rdma_bind() is different from bind() when doing route resolution is pretty much the main remaining problem go and fix that Or. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH] link-local address fix for rdma_resolve_addr
@ 2009-10-13 22:09 David J. Wilder
[not found] ` <1255471781.14513.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-13 22:09 UTC (permalink / raw)
To: ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, linux-rdma,
pradeep-r/Jw6+rmf7HQT0dZR+AlfA, wilder-r/Jw6+rmf7HQT0dZR+AlfA
Here is a patch to addr6_resolve_remote() to correctly handle link-local address.
It should cover all the conditions Jason described.
With this patch rping works as expected.
Link-local with scope:
# /usr/bin/rping -c -a fe80::202:c903:1:1925%ib0
Link-local w/out scope:
# /usr/bin/rping -c -a fe80::202:c903:1:1925
rdma_resolve_addr error -1
Other ipv6 address:
# /usr/bin/rping -c -a 2001:db8:1234::2
(server side started with rping -s -P -v -a ::0)
Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
-----------------------------------------------------------------
--- drivers/infiniband/core/addr.c.1759 2009-10-13 15:57:48.000000000 -0500
+++ drivers/infiniband/core/addr.c 2009-10-13 16:11:02.000000000 -0500
@@ -278,6 +278,15 @@ static int addr6_resolve_remote(struct s
fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
+ if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){
+ if (!dst_in->sin6_scope_id)
+ return -EINVAL;
+ if ( src_in->sin6_scope_id &&
+ (src_in->sin6_scope_id != dst_in->sin6_scope_id))
+ return -EINVAL;
+ fl.oif = dst_in->sin6_scope_id;
+ }
+
dst = ip6_route_output(&init_net, NULL, &fl);
if (!dst)
return ret;
^ permalink raw reply [flat|nested] 44+ messages in thread[parent not found: <1255471781.14513.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1255471781.14513.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-13 23:12 ` Jason Gunthorpe [not found] ` <20091013231234.GK5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-13 23:12 UTC (permalink / raw) To: David J. Wilder Cc: linux-rdma, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Tue, Oct 13, 2009 at 03:09:40PM -0700, David J. Wilder wrote: > Here is a patch to addr6_resolve_remote() to correctly handle link-local address. > It should cover all the conditions Jason described. Looks pretty good to me, definitely on the right track. Hmm.. Actually, upon comparing to tcp_ipv6.c, I'd say one more behavior is necessary. The code in tcp_ipv6 allows the destination to not specify a scope id if an interface has already been set. Looks like the two ways to set an interface ID are to use bind() or SO_BINDTODEVICE.. Specifying a source address to RDMA CM is similar to bind(), so if the source address is link local it must specify a sin6_scope_id and the dest address can specify 0, or the same value. How af_inet6 handles bind(): if (addr_type & IPV6_ADDR_LINKLOCAL) { if (addr_len >= sizeof(struct sockaddr_in6) && addr->sin6_scope_id) { /* Override any existing binding, if another one * is supplied by user. */ sk->sk_bound_dev_if = addr->sin6_scope_id; } /* Binding to link-local address requires an interface */ if (!sk->sk_bound_dev_if) { err = -EINVAL; goto out; } dev = dev_get_by_index(net, sk->sk_bound_dev_if); if (!dev) { err = -ENODEV; goto out; } } How tcp_ipv6 checks the destination address: if (addr_type&IPV6_ADDR_LINKLOCAL) { if (addr_len >= sizeof(struct sockaddr_in6) && usin->sin6_scope_id) { /* If interface is set while binding, indices * must coincide. */ if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != usin->sin6_scope_id) return -EINVAL; sk->sk_bound_dev_if = usin->sin6_scope_id; } /* Connect to link-local address requires an interface */ if (!sk->sk_bound_dev_if) return -EINVAL; } Jason ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091013231234.GK5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091013231234.GK5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-14 16:23 ` David J. Wilder [not found] ` <1255537437.14513.28.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-14 16:23 UTC (permalink / raw) To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/ Cc: linux-rdma, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 Sean- This patch should fix the behavior of rdma_resolve_addr when using link-local addressing. On Tue, 2009-10-13 at 17:12 -0600, Jason Gunthorpe wrote: > On Tue, Oct 13, 2009 at 03:09:40PM -0700, David J. Wilder wrote: > > Here is a patch to addr6_resolve_remote() to correctly handle link-local address. > > It should cover all the conditions Jason described. > > Looks pretty good to me, definitely on the right track. > > Hmm.. > > Actually, upon comparing to tcp_ipv6.c, I'd say one more behavior is > necessary. The code in tcp_ipv6 allows the destination to not specify > a scope id if an interface has already been set. Looks like the two > ways to set an interface ID are to use bind() or SO_BINDTODEVICE.. > > Specifying a source address to RDMA CM is similar to bind(), so if the > source address is link local it must specify a sin6_scope_id and the > dest address can specify 0, or the same value. Jason - This new patch should closely emulate tcp_ipv6.c. when both source and destination scope_ids are given with link-local address. With this patch rping works as expected. Link-local with scope: # /usr/bin/rping -c -a fe80::202:c903:1:1925%ib0 Link-local w/out scope: # /usr/bin/rping -c -a fe80::202:c903:1:1925 rdma_resolve_addr error -1 Other ipv6 address: # /usr/bin/rping -c -a 2001:db8:1234::2 (server side started with rping -s -P -v -a ::0) Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ----------------------------------------------------------------- --- drivers/infiniband/core/addr.c.1759 2009-10-13 15:57:48.000000000 -0500 +++ drivers/infiniband/core/addr.c 2009-10-14 10:47:56.000000000 -0500 @@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct s fl.nl_u.ip6_u.daddr = dst_in->sin6_addr; fl.nl_u.ip6_u.saddr = src_in->sin6_addr; + if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){ + // Link-local address require an interface to be specified. + if (!(dst_in->sin6_scope_id||src_in->sin6_scope_id)) + return -EINVAL; + + // If src and dst interfaces are supplied they must match. + if ( (dst_in->sin6_scope_id && src_in->sin6_scope_id) && + (src_in->sin6_scope_id != dst_in->sin6_scope_id) ) + return -EINVAL; + if ( dst_in->sin6_scope_id ) + fl.oif = dst_in->sin6_scope_id; + else + fl.oif = src_in->sin6_scope_id; + } + dst = ip6_route_output(&init_net, NULL, &fl); if (!dst) return ret; ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1255537437.14513.28.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1255537437.14513.28.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-14 17:01 ` Jason Gunthorpe [not found] ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-14 17:01 UTC (permalink / raw) To: David J. Wilder Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote: > This new patch should closely emulate tcp_ipv6.c. when both source and > destination scope_ids are given with link-local address. Maybe like this: fl.oif = 0; if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) { if (!src_in->sin6_scope_id) return -EINVAL; fl.oif = src_in->sin6scope_id; } if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){ if (dst_in.sin6_scope_id) { if (fl.oif && fl.oif != dst_in.sin6_scope_id) return -EINVAL; fl.oif = dst_in.sin6_scope_id; } if (!fl.oif) return -EINVAL; } src and dest have to be tested seperately, the TCPv6 versions don't require them to be both link local. Don't forget to run checkpatch ;) Regards, ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-14 17:30 ` David J. Wilder [not found] ` <1255541405.5111.14.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 2009-10-14 19:33 ` David J. Wilder 1 sibling, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-14 17:30 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote: > On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote: > > > This new patch should closely emulate tcp_ipv6.c. when both source and > > destination scope_ids are given with link-local address. > > Maybe like this: > > fl.oif = 0; > > if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) { > if (!src_in->sin6_scope_id) > return -EINVAL; > fl.oif = src_in->sin6scope_id; > } > if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){ > if (dst_in.sin6_scope_id) { > if (fl.oif && fl.oif != dst_in.sin6_scope_id) > return -EINVAL; > fl.oif = dst_in.sin6_scope_id; > } > if (!fl.oif) > return -EINVAL; > } > > > src and dest have to be tested seperately, the TCPv6 versions don't > require them to be both link local. > > Don't forget to run checkpatch ;) This looks good. Once concern, it may be obtuse, but if both the src and dst are link-local addresses should only one need to be scoped? This patch will required the src to always be scoped when using link local. Dave... ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1255541405.5111.14.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1255541405.5111.14.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-14 17:40 ` Jason Gunthorpe [not found] ` <20091014174017.GM5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-14 17:40 UTC (permalink / raw) To: David J. Wilder Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 14, 2009 at 10:30:05AM -0700, David J. Wilder wrote: > This looks good. Once concern, it may be obtuse, but if both the src and > dst are link-local addresses should only one need to be scoped? This > patch will required the src to always be scoped when using link local. The TCPv6 code requires the src to be scoped (or SO_BINDTODEVICE to be used prior). If src is scoped then it is optional to scope dest, but if dest is scoped then it must match. I intended to capture that logic.. Can you give a quick test and send a git format-patch/checkpatch'd patch to Roland, with your signed off line, etc. You can put a Reviewed-by line from me as well if you like. Thanks, Jason ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091014174017.GM5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091014174017.GM5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-15 19:27 ` David J. Wilder [not found] ` <1255634841.5111.40.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-15 19:27 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, 2009-10-14 at 13:09 Jason Gunthorpe wrote: > So, it tries to match the source addr to the addrs bound to the > device, which is wrong - that isn't how the ip stack works. > You can patch this up a little bit by fixing up addr_resolve_local to > set sin6_scope_ip. I found the bug in addr_resolve_local(). (more comments below) --- addr.c.1759 2009-10-13 15:57:48.000000000 -0500 +++ addr.c.ip_local 2009-10-15 14:03:50.000000000 -0500 @@ -390,14 +390,17 @@ static int addr_resolve_local(struct soc case AF_INET6: { struct in6_addr *a; + int found = 0; for_each_netdev(&init_net, dev) if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) dst_in)->sin6_addr, - dev, 1)) + dev, 1)){ + found = 1; break; + } - if (!dev) + if (!found) return -EADDRNOTAVAIL; a = &((struct sockaddr_in6 *) src_in)->sin6_addr; @@ -406,6 +409,8 @@ static int addr_resolve_local(struct soc src_in->sa_family = dst_in->sa_family; ((struct sockaddr_in6 *) src_in)->sin6_addr = ((struct sockaddr_in6 *) dst_in)->sin6_addr; + ((struct sockaddr_in6 *) src_in)->sin6_scope_id = + ((struct sockaddr_in6 *) dst_in)->sin6_scope_id; ret = rdma_copy_addr(addr, dev, dev->dev_addr); } else if (ipv6_addr_loopback(a)) { ret = rdma_translate_ip(dst_in, addr); > But really the correct thing to do is to remove addr_resolve_local and > place the source address into the struct flowi and use the result of > the route lookup to bind to the source device, and set the source > address if it is unset. Sorry I don't get it.. Are you saying that ip6_route_output() will resolve the address even if it is a link-local address bound to my own interface? Therefor addr_resolve_local() is not needed. ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1255634841.5111.40.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1255634841.5111.40.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-15 21:32 ` Jason Gunthorpe [not found] ` <20091015213205.GV5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-15 21:32 UTC (permalink / raw) To: David J. Wilder Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Thu, Oct 15, 2009 at 12:27:21PM -0700, David J. Wilder wrote: > On Wed, 2009-10-14 at 13:09 Jason Gunthorpe wrote: > > > So, it tries to match the source addr to the addrs bound to the > > device, which is wrong - that isn't how the ip stack works. > > > You can patch this up a little bit by fixing up addr_resolve_local to > > set sin6_scope_ip. > > I found the bug in addr_resolve_local(). (more comments below) Yes, that is the hacky work around I was mentioning.. > > But really the correct thing to do is to remove addr_resolve_local and > > place the source address into the struct flowi and use the result of > > the route lookup to bind to the source device, and set the source > > address if it is unset. > > Sorry I don't get it.. > Are you saying that ip6_route_output() will resolve the address even if > it is a link-local address bound to my own interface? Therefor > addr_resolve_local() is not needed. Yes, and more. In Linux the routing table takes as input the source (optional), device (optional) and destination address and returns as output the device to use. To determine the device to bind to you ask the routing table what device to use for all the route information you have. For example: $ ip route get fe80::c2d from fe80::213:72ff:fe29:e65d oif eth0 fe80::c2d via fe80::c2d dev eth0 src fe80::213:72ff:fe29:e65d metric 0 cache mtu 1500 advmss 1440 hoplimit 4294967295 $ ip route get fe80::c2d oif eth0 fe80::c2d via fe80::c2d dev eth0 src fe80::213:72ff:fe29:e65d metric 0 cache mtu 1500 advmss 1440 hoplimit 4294967295 You can see in both cases the routing table returns a 'src' entry. 'src' is the address to bind to if no bind address was specified. When doing link local addresess the sin6_scope_id should sets the 'oif' key in the routing lookup, which will result in the correct src address and output device being selected by the routing algorithm. For instance on my machine here, I have two interfaces: $ ip route get fe80::c2d oif virbr0 fe80::c2d via fe80::c2d dev virbr0 src fe80::2c5d:c4ff:feb8:1ce5 metric 0 cache mtu 1500 advmss 1440 hoplimit 4294967295 As you can see it is returning the link local address for virbr0 as the source. So the algorithm in RDMA CM should look like this: - If src is specified then set the bind local address to src [if src is link local then it must specify sin6_scope_id, and sin6_scope_id becomes the oif input to the route lookup] - If dst is link local then its sin6_scope_id is the oif to the route lookup (and must match src, as we did last go round) - Src (or 'any'), dst and device (or 'any') are passed to the route lookup - The RDMA CM ID is bound to the device returned by the route lookup - If the src address was not specified then the connection source IP is set to the 'src' value from the route lookup. This is why addr_resolve_local/rdma_translate_ip is not needed, that entire entire function is done by the routing table code. You can see why this becomes important when it is combined with policy routing, for instance consider this example: $ ip rule 32765: from 10.0.0.4 lookup dnat $ ip route show table dnat default via 10.0.0.1 dev eth1 $ ip route get 10.0.0.100 10.0.0.100 dev eth0 src 10.0.0.2 $ ip route get 10.0.0.100 from 10.0.0.4 10.0.0.100 from 10.0.0.4 via 10.0.0.1 dev eth1 cache mtu 1500 advmss 1460 hoplimit 64 The two results are radically different and dependant on the source address. (10.0.0.4 could be attached to eth0, and eth1!) The actual fixing to the code is not hard, remove rdma_translate_ip, addr_resolve_local, split addr_resolve_remote into a part to resolve the route and a part that does the arp/nd. Make the route resolve part work almost exactly like addr4_resolve_remote (noting that the v6 version is wrong, since is doesn't respect unset source addres, another bug). Call rdma_copy_addr based on the rt->idev->dev (or should it be odev??). Do the ARP. The pain is in retesting everything :| Jason ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20091015213205.GV5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091015213205.GV5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2009-10-16 18:54 ` David J. Wilder [not found] ` <1255719280.14675.30.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-16 18:54 UTC (permalink / raw) To: Jason Gunthorpe Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, wilder-r/Jw6+rmf7HQT0dZR+AlfA On Thu, 2009-10-15 at 15:32 -0600, Jason Gunthorpe wrote: > > The actual fixing to the code is not hard, remove rdma_translate_ip, > addr_resolve_local, split addr_resolve_remote into a part to resolve > the route and a part that does the arp/nd. Make the route resolve part > work almost exactly like addr4_resolve_remote (noting that the v6 > version is wrong, since is doesn't respect unset source addres, > another bug). Call rdma_copy_addr based on the rt->idev->dev (or > should it be odev??). Do the ARP. > > The pain is in retesting everything :| > > Jason There is the wrinkle in this plan. If you pass ip6_route_output() your own link-local address (scoped or not) it returns a neighbor entry bound to the loop back device. $ ip route get fe80::202:c903:1:28ed oif ib0 local fe80::202:c903:1:28ed from :: via :: dev lo table local proto none src fe80::202:c903:1:28ed metric 0 mtu 16436 advmss 16376 hoplimit 4294967295 Thats not going to work as lo has no RDMA support. I suspect this is why addr_resolve_local() was added in the first place :) I will post a patch with our first solution and include the fix to addr_resolve_local(). Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1255719280.14675.30.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1255719280.14675.30.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-16 19:28 ` Jason Gunthorpe 0 siblings, 0 replies; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-16 19:28 UTC (permalink / raw) To: David J. Wilder Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA, wilder-r/Jw6+rmf7HQT0dZR+AlfA On Fri, Oct 16, 2009 at 11:54:39AM -0700, David J. Wilder wrote: > There is the wrinkle in this plan. If you pass ip6_route_output() your > own link-local address (scoped or not) it returns a neighbor entry bound > to the loop back device. Well, that really is correct. The RDMA CM could try to do something smart if lo is returned from the route lookup - ie choose the first RDMA device bound to that IP or something. There are other problems with the loop back case, like the Linux stack doesn't generate neighbor entries for its own IP addresses. The loopback case is far less important than the other policy routing features anyhow.. > I will post a patch with our first solution and include the fix to > addr_resolve_local(). May as well, a little bit of progress is better than none at all. Also note that I observed that the IPv6 code path never sets the source address from the routing table if IN6_ADDR_ANY is used as source, this is certainly wrong and could cause other problems down the line, probably more so for iwarp than IB though.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2009-10-14 17:30 ` David J. Wilder @ 2009-10-14 19:33 ` David J. Wilder [not found] ` <1255548798.5111.24.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> 1 sibling, 1 reply; 44+ messages in thread From: David J. Wilder @ 2009-10-14 19:33 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote: > On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote: > > > This new patch should closely emulate tcp_ipv6.c. when both source and > > destination scope_ids are given with link-local address. > > Maybe like this: > > fl.oif = 0; > > if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) { > if (!src_in->sin6_scope_id) > return -EINVAL; > fl.oif = src_in->sin6scope_id; > } > if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){ > if (dst_in.sin6_scope_id) { > if (fl.oif && fl.oif != dst_in.sin6_scope_id) > return -EINVAL; > fl.oif = dst_in.sin6_scope_id; > } > if (!fl.oif) > return -EINVAL; > } > > > src and dest have to be tested seperately, the TCPv6 versions don't > require them to be both link local. Hum.. this change is not working with rping. The src address has been filled in already with a link-local address and no scope_id so it is returning -EINVAL. Another bug I guess.... I am doing some more debug. ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <1255548798.5111.24.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] link-local address fix for rdma_resolve_addr [not found] ` <1255548798.5111.24.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org> @ 2009-10-14 20:09 ` Jason Gunthorpe 0 siblings, 0 replies; 44+ messages in thread From: Jason Gunthorpe @ 2009-10-14 20:09 UTC (permalink / raw) To: David J. Wilder Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w, wilder-r/Jw6+rmf7HQT0dZR+AlfA, ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5 On Wed, Oct 14, 2009 at 12:33:17PM -0700, David J. Wilder wrote: > > On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote: > > On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote: > > > > > This new patch should closely emulate tcp_ipv6.c. when both source and > > > destination scope_ids are given with link-local address. > > > > Maybe like this: > > > > fl.oif = 0; > > > > if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) { > > if (!src_in->sin6_scope_id) > > return -EINVAL; > > fl.oif = src_in->sin6scope_id; > > } > > if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){ > > if (dst_in.sin6_scope_id) { > > if (fl.oif && fl.oif != dst_in.sin6_scope_id) > > return -EINVAL; > > fl.oif = dst_in.sin6_scope_id; > > } > > if (!fl.oif) > > return -EINVAL; > > } > > > > > > src and dest have to be tested seperately, the TCPv6 versions don't > > require them to be both link local. > > Hum.. this change is not working with rping. The src address has been > filled in already with a link-local address and no scope_id so it is > returning -EINVAL. Another bug I guess.... I am doing some more > debug. Ah feh, you are running into that other bug.. http://lists.openfabrics.org/pipermail/general/2009-July/060612.html Basically it does this: int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr, int timeout_ms) ret = rdma_resolve_ip(&addr_client, (struct sockaddr *) &id->route.addr.src_addr, dst_addr, &id->route.addr.dev_addr, timeout_ms, addr_handler, id_priv); int rdma_resolve_ip(struct rdma_addr_client *client, struct sockaddr *src_addr, struct sockaddr *dst_addr, struct rdma_dev_addr *addr, int timeout_ms, void (*callback)(int status, struct sockaddr *src_addr, struct rdma_dev_addr *addr, void *context), void *context) req->status = addr_resolve_local(src_in, dst_in, addr); static int addr_resolve_local(struct sockaddr *src_in, struct sockaddr *dst_in, struct rdma_dev_addr *addr) dev = ip_dev_find(&init_net, dst_ip); [..] for_each_netdev(&init_net, dev) if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) addr)->sin6_addr, dev, 1)) break; So, it tries to match the source addr to the addrs bound to the device, which is wrong - that isn't how the ip stack works. You can patch this up a little bit by fixing up addr_resolve_local to set sin6_scope_ip. But really the correct thing to do is to remove addr_resolve_local and place the source address into the struct flowi and use the result of the route lookup to bind to the source device, and set the source address if it is unset. Jason ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2009-10-28 14:33 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-19 22:47 [PATCH] link-local address fix for rdma_resolve_addr David J. Wilder
[not found] ` <1255992430.12075.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-19 23:43 ` Jason Gunthorpe
[not found] ` <20091019234329.GC9643-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-19 23:47 ` Sean Hefty
[not found] ` <676AB781CD644CC28E1AD4951EA4EEF8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-20 0:33 ` Jason Gunthorpe
[not found] ` <20091020003344.GA14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-21 22:30 ` David J. Wilder
[not found] ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-21 23:08 ` Jason Gunthorpe
[not found] ` <20091021230845.GR14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 21:12 ` David J. Wilder
[not found] ` <1256245942.12075.46.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-22 22:02 ` Sean Hefty
[not found] ` <660D538F30E647F3AE1E5E6C1ACBE882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-27 18:22 ` David J. Wilder
[not found] ` <1256667738.16192.1.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-27 18:58 ` [PATCH] librdmacm/cmatose: add support for ipv6 Sean Hefty
[not found] ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-27 19:42 ` Sean Hefty
2009-10-27 21:50 ` David J. Wilder
2009-10-21 23:17 ` [PATCH] link-local address fix for rdma_resolve_addr Sean Hefty
[not found] ` <9D257695083141E79685CB2B260D7D7C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-21 23:36 ` Jason Gunthorpe
[not found] ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-21 23:46 ` Jason Gunthorpe
2009-10-21 23:58 ` Sean Hefty
[not found] ` <BFF5CD71FFA74BF09A2EED5F3319F370-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 0:28 ` Jason Gunthorpe
[not found] ` <20091022002846.GU14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 0:40 ` Sean Hefty
[not found] ` <4803112A5B7A4953B62ABAFD1BBD9881-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 1:19 ` Jason Gunthorpe
[not found] ` <20091022011939.GW14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 5:24 ` Sean Hefty
[not found] ` <60E9443821EE4FFA90448DF5CC1368A8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 6:25 ` Jason Gunthorpe
[not found] ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 6:49 ` Sean Hefty
[not found] ` <B1D32A650C2F400E92AFC4909D8880A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 16:47 ` Jason Gunthorpe
[not found] ` <20091022164717.GG26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 17:59 ` Sean Hefty
[not found] ` <A03920DE5BAE4309BAFC1B6471AE928C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 18:22 ` Jason Gunthorpe
[not found] ` <20091022182221.GZ14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 18:31 ` Sean Hefty
[not found] ` <60293EE44E7243B59C41498F20D558A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 18:40 ` Jason Gunthorpe
2009-10-25 11:52 ` Or Gerlitz
[not found] ` <4AE43BED.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-10-25 20:03 ` Jason Gunthorpe
[not found] ` <20091025200332.GN26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-27 13:01 ` Or Gerlitz
[not found] ` <4AE6EF44.5040004-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-10-27 17:14 ` Jason Gunthorpe
[not found] ` <20091027171403.GP26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-28 14:33 ` Or Gerlitz
-- strict thread matches above, loose matches on Subject: below --
2009-10-13 22:09 David J. Wilder
[not found] ` <1255471781.14513.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-13 23:12 ` Jason Gunthorpe
[not found] ` <20091013231234.GK5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-14 16:23 ` David J. Wilder
[not found] ` <1255537437.14513.28.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-14 17:01 ` Jason Gunthorpe
[not found] ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-14 17:30 ` David J. Wilder
[not found] ` <1255541405.5111.14.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-14 17:40 ` Jason Gunthorpe
[not found] ` <20091014174017.GM5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-15 19:27 ` David J. Wilder
[not found] ` <1255634841.5111.40.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-15 21:32 ` Jason Gunthorpe
[not found] ` <20091015213205.GV5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-16 18:54 ` David J. Wilder
[not found] ` <1255719280.14675.30.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-16 19:28 ` Jason Gunthorpe
2009-10-14 19:33 ` David J. Wilder
[not found] ` <1255548798.5111.24.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-14 20:09 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox