public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH]  link-local address fix for rdma_resolve_addr
@ 2009-10-13 22:09 David J. Wilder
       [not found] ` <1255471781.14513.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-13 22:09 UTC (permalink / raw)
  To: ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA, wilder-r/Jw6+rmf7HQT0dZR+AlfA

Here is a patch to addr6_resolve_remote() to correctly handle link-local address.
It should cover all the conditions Jason described. 

With this patch rping works as expected.

Link-local with scope:
# /usr/bin/rping -c -a fe80::202:c903:1:1925%ib0
 
Link-local w/out scope:
# /usr/bin/rping -c  -a fe80::202:c903:1:1925
rdma_resolve_addr error -1

Other ipv6 address:
# /usr/bin/rping -c -a 2001:db8:1234::2
 
(server side started with rping -s -P -v -a ::0)

Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
-----------------------------------------------------------------

--- drivers/infiniband/core/addr.c.1759	2009-10-13 15:57:48.000000000 -0500
+++ drivers/infiniband/core/addr.c	2009-10-13 16:11:02.000000000 -0500
@@ -278,6 +278,15 @@ static int addr6_resolve_remote(struct s
 	fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
 	fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
 
+	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){
+		if (!dst_in->sin6_scope_id)
+			return -EINVAL;
+		if ( src_in->sin6_scope_id &&
+			(src_in->sin6_scope_id != dst_in->sin6_scope_id))
+			return -EINVAL;
+		fl.oif = dst_in->sin6_scope_id;
+	}
+
 	dst = ip6_route_output(&init_net, NULL, &fl);
 	if (!dst)
 		return ret;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found] ` <1255471781.14513.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-13 23:12   ` Jason Gunthorpe
       [not found]     ` <20091013231234.GK5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-13 23:12 UTC (permalink / raw)
  To: David J. Wilder
  Cc: linux-rdma, wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Tue, Oct 13, 2009 at 03:09:40PM -0700, David J. Wilder wrote:
> Here is a patch to addr6_resolve_remote() to correctly handle link-local address.
> It should cover all the conditions Jason described. 

Looks pretty good to me, definitely on the right track.

Hmm..

Actually, upon comparing to tcp_ipv6.c, I'd say one more behavior is
necessary. The code in tcp_ipv6 allows the destination to not specify
a scope id if an interface has already been set. Looks like the two
ways to set an interface ID are to use bind() or SO_BINDTODEVICE..

Specifying a source address to RDMA CM is similar to bind(), so if the
source address is link local it must specify a sin6_scope_id and the
dest address can specify 0, or the same value.

How af_inet6 handles bind():

                        if (addr_type & IPV6_ADDR_LINKLOCAL) {
                                if (addr_len >= sizeof(struct sockaddr_in6) &&
                                    addr->sin6_scope_id) {
                                        /* Override any existing binding, if another one
                                         * is supplied by user.
                                         */
                                        sk->sk_bound_dev_if = addr->sin6_scope_id;
                                }

                                /* Binding to link-local address requires an interface */
                                if (!sk->sk_bound_dev_if) {
                                        err = -EINVAL;
                                        goto out;
                                }
                                dev = dev_get_by_index(net, sk->sk_bound_dev_if);
                                if (!dev) {
                                        err = -ENODEV;
                                        goto out;
                                }
                        }

How tcp_ipv6 checks the destination address:
        if (addr_type&IPV6_ADDR_LINKLOCAL) {
                if (addr_len >= sizeof(struct sockaddr_in6) &&
                    usin->sin6_scope_id) {
                        /* If interface is set while binding, indices
                         * must coincide.
                         */
                        if (sk->sk_bound_dev_if &&
                            sk->sk_bound_dev_if != usin->sin6_scope_id)
                                return -EINVAL;

                        sk->sk_bound_dev_if = usin->sin6_scope_id;
                }

                /* Connect to link-local address requires an interface */
                if (!sk->sk_bound_dev_if)
                        return -EINVAL;
        }

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]     ` <20091013231234.GK5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-14 16:23       ` David J. Wilder
       [not found]         ` <1255537437.14513.28.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-14 16:23 UTC (permalink / raw)
  To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/
  Cc: linux-rdma, wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Sean-
This patch should fix the behavior of rdma_resolve_addr  when using
link-local addressing.
  
On Tue, 2009-10-13 at 17:12 -0600, Jason Gunthorpe wrote:
> On Tue, Oct 13, 2009 at 03:09:40PM -0700, David J. Wilder wrote:
> > Here is a patch to addr6_resolve_remote() to correctly handle link-local address.
> > It should cover all the conditions Jason described. 
> 
> Looks pretty good to me, definitely on the right track.
> 
> Hmm..
> 
> Actually, upon comparing to tcp_ipv6.c, I'd say one more behavior is
> necessary. The code in tcp_ipv6 allows the destination to not specify
> a scope id if an interface has already been set. Looks like the two
> ways to set an interface ID are to use bind() or SO_BINDTODEVICE..
> 
> Specifying a source address to RDMA CM is similar to bind(), so if the
> source address is link local it must specify a sin6_scope_id and the
> dest address can specify 0, or the same value.

Jason - 
This new patch should closely emulate tcp_ipv6.c. when both source and
destination scope_ids are given with link-local address. 

With this patch rping works as expected.

Link-local with scope:
# /usr/bin/rping -c -a fe80::202:c903:1:1925%ib0

Link-local w/out scope:
# /usr/bin/rping -c  -a fe80::202:c903:1:1925
rdma_resolve_addr error -1

Other ipv6 address:
# /usr/bin/rping -c -a 2001:db8:1234::2

(server side started with rping -s -P -v -a ::0)

Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
-----------------------------------------------------------------
--- drivers/infiniband/core/addr.c.1759	2009-10-13 15:57:48.000000000 -0500
+++ drivers/infiniband/core/addr.c	2009-10-14 10:47:56.000000000 -0500
@@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct s
 	fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
 	fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
 
+	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){
+		// Link-local address require an interface to be specified.
+		if (!(dst_in->sin6_scope_id||src_in->sin6_scope_id))
+			return -EINVAL;
+
+		// If src and dst interfaces are supplied they must match.
+		if ( (dst_in->sin6_scope_id && src_in->sin6_scope_id) &&
+			(src_in->sin6_scope_id != dst_in->sin6_scope_id) )
+				return -EINVAL;
+		if ( dst_in->sin6_scope_id )
+			fl.oif = dst_in->sin6_scope_id;
+		else
+			fl.oif = src_in->sin6_scope_id;
+	}
+
 	dst = ip6_route_output(&init_net, NULL, &fl);
 	if (!dst)
 		return ret;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]         ` <1255537437.14513.28.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-14 17:01           ` Jason Gunthorpe
       [not found]             ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-14 17:01 UTC (permalink / raw)
  To: David J. Wilder
  Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:

> This new patch should closely emulate tcp_ipv6.c. when both source and
> destination scope_ids are given with link-local address. 

Maybe like this:

        fl.oif = 0;

	if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
	    if (!src_in->sin6_scope_id)
                return -EINVAL;
            fl.oif = src_in->sin6scope_id;
        }
	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){
                if (dst_in.sin6_scope_id) {
                        if (fl.oif && fl.oif != dst_in.sin6_scope_id)
			        return -EINVAL;
                        fl.oif = dst_in.sin6_scope_id;
                }
                if (!fl.oif)
			return -EINVAL;
	}


src and dest have to be tested seperately, the TCPv6 versions don't
require them to be both link local.

Don't forget to run checkpatch ;)

Regards,

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]             ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-14 17:30               ` David J. Wilder
       [not found]                 ` <1255541405.5111.14.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  2009-10-14 19:33               ` David J. Wilder
  1 sibling, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-14 17:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5


On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote:
> On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:
> 
> > This new patch should closely emulate tcp_ipv6.c. when both source and
> > destination scope_ids are given with link-local address. 
> 
> Maybe like this:
> 
>         fl.oif = 0;
> 
> 	if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
> 	    if (!src_in->sin6_scope_id)
>                 return -EINVAL;
>             fl.oif = src_in->sin6scope_id;
>         }
> 	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){
>                 if (dst_in.sin6_scope_id) {
>                         if (fl.oif && fl.oif != dst_in.sin6_scope_id)
> 			        return -EINVAL;
>                         fl.oif = dst_in.sin6_scope_id;
>                 }
>                 if (!fl.oif)
> 			return -EINVAL;
> 	}
> 
> 
> src and dest have to be tested seperately, the TCPv6 versions don't
> require them to be both link local.
> 
> Don't forget to run checkpatch ;)

This looks good. Once concern, it may be obtuse, but if both the src and
dst are link-local addresses should only one need to be scoped?  This
patch will required the src to always be scoped when using link local.  

Dave...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]                 ` <1255541405.5111.14.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-14 17:40                   ` Jason Gunthorpe
       [not found]                     ` <20091014174017.GM5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-14 17:40 UTC (permalink / raw)
  To: David J. Wilder
  Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 14, 2009 at 10:30:05AM -0700, David J. Wilder wrote:

> This looks good. Once concern, it may be obtuse, but if both the src and
> dst are link-local addresses should only one need to be scoped?  This
> patch will required the src to always be scoped when using link local.  

The TCPv6 code requires the src to be scoped (or SO_BINDTODEVICE to be
used prior). If src is scoped then it is optional to scope dest, but
if dest is scoped then it must match. I intended to capture that logic..

Can you give a quick test and send a git format-patch/checkpatch'd
patch to Roland, with your signed off line, etc. You can put a
Reviewed-by line from me as well if you like.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]             ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-10-14 17:30               ` David J. Wilder
@ 2009-10-14 19:33               ` David J. Wilder
       [not found]                 ` <1255548798.5111.24.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  1 sibling, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-14 19:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5


On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote:
> On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:
> 
> > This new patch should closely emulate tcp_ipv6.c. when both source and
> > destination scope_ids are given with link-local address. 
> 
> Maybe like this:
> 
>         fl.oif = 0;
> 
> 	if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
> 	    if (!src_in->sin6_scope_id)
>                 return -EINVAL;
>             fl.oif = src_in->sin6scope_id;
>         }
> 	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){
>                 if (dst_in.sin6_scope_id) {
>                         if (fl.oif && fl.oif != dst_in.sin6_scope_id)
> 			        return -EINVAL;
>                         fl.oif = dst_in.sin6_scope_id;
>                 }
>                 if (!fl.oif)
> 			return -EINVAL;
> 	}
> 
> 
> src and dest have to be tested seperately, the TCPv6 versions don't
> require them to be both link local.

Hum.. this change is not working with rping. The src address has been
filled in already with a link-local address and no scope_id so it is
returning -EINVAL.  Another bug I guess.... I am doing some more
debug.   

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]                 ` <1255548798.5111.24.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-14 20:09                   ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-14 20:09 UTC (permalink / raw)
  To: David J. Wilder
  Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 14, 2009 at 12:33:17PM -0700, David J. Wilder wrote:
> 
> On Wed, 2009-10-14 at 11:01 -0600, Jason Gunthorpe wrote:
> > On Wed, Oct 14, 2009 at 09:23:57AM -0700, David J. Wilder wrote:
> > 
> > > This new patch should closely emulate tcp_ipv6.c. when both source and
> > > destination scope_ids are given with link-local address. 
> > 
> > Maybe like this:
> > 
> >         fl.oif = 0;
> > 
> > 	if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
> > 	    if (!src_in->sin6_scope_id)
> >                 return -EINVAL;
> >             fl.oif = src_in->sin6scope_id;
> >         }
> > 	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL){
> >                 if (dst_in.sin6_scope_id) {
> >                         if (fl.oif && fl.oif != dst_in.sin6_scope_id)
> > 			        return -EINVAL;
> >                         fl.oif = dst_in.sin6_scope_id;
> >                 }
> >                 if (!fl.oif)
> > 			return -EINVAL;
> > 	}
> > 
> > 
> > src and dest have to be tested seperately, the TCPv6 versions don't
> > require them to be both link local.
> 
> Hum.. this change is not working with rping. The src address has been
> filled in already with a link-local address and no scope_id so it is
> returning -EINVAL.  Another bug I guess.... I am doing some more
> debug.   

Ah feh, you are running into that other bug..

http://lists.openfabrics.org/pipermail/general/2009-July/060612.html

Basically it does this:

int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
                      struct sockaddr *dst_addr, int timeout_ms)
                ret = rdma_resolve_ip(&addr_client, (struct sockaddr *) &id->route.addr.src_addr,
                                      dst_addr, &id->route.addr.dev_addr,
                                      timeout_ms, addr_handler,
				      id_priv);
int rdma_resolve_ip(struct rdma_addr_client *client,
                    struct sockaddr *src_addr, struct sockaddr *dst_addr,
                    struct rdma_dev_addr *addr, int timeout_ms,
                    void (*callback)(int status, struct sockaddr *src_addr,
                                     struct rdma_dev_addr *addr, void *context),
                    void *context)
        req->status = addr_resolve_local(src_in, dst_in, addr);

static int addr_resolve_local(struct sockaddr *src_in,
                              struct sockaddr *dst_in,
                              struct rdma_dev_addr *addr)
                dev = ip_dev_find(&init_net, dst_ip);
[..]
       for_each_netdev(&init_net, dev)
                        if (ipv6_chk_addr(&init_net,
                                          &((struct sockaddr_in6 *) addr)->sin6_addr,
                                          dev, 1))
                                break;

So, it tries to match the source addr to the addrs bound to the
device, which is wrong - that isn't how the ip stack works.

You can patch this up a little bit by fixing up addr_resolve_local to
set sin6_scope_ip.

But really the correct thing to do is to remove addr_resolve_local and
place the source address into the struct flowi and use the result of
the route lookup to bind to the source device, and set the source
address if it is unset.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]                     ` <20091014174017.GM5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-15 19:27                       ` David J. Wilder
       [not found]                         ` <1255634841.5111.40.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-15 19:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, 2009-10-14 at 13:09 Jason Gunthorpe wrote:

> So, it tries to match the source addr to the addrs bound to the
> device, which is wrong - that isn't how the ip stack works.

> You can patch this up a little bit by fixing up addr_resolve_local to
> set sin6_scope_ip.

I found the bug in addr_resolve_local().  (more comments below)

--- addr.c.1759	2009-10-13 15:57:48.000000000 -0500
+++ addr.c.ip_local	2009-10-15 14:03:50.000000000 -0500
@@ -390,14 +390,17 @@ static int addr_resolve_local(struct soc
 	case AF_INET6:
 	{
 		struct in6_addr *a;
+		int found = 0;
 
 		for_each_netdev(&init_net, dev)
 			if (ipv6_chk_addr(&init_net,
 					  &((struct sockaddr_in6 *) dst_in)->sin6_addr,
-					  dev, 1))
+					  dev, 1)){
+				found = 1;
 				break;
+			}
 
-		if (!dev)
+		if (!found)
 			return -EADDRNOTAVAIL;
 
 		a = &((struct sockaddr_in6 *) src_in)->sin6_addr;
@@ -406,6 +409,8 @@ static int addr_resolve_local(struct soc
 			src_in->sa_family = dst_in->sa_family;
 			((struct sockaddr_in6 *) src_in)->sin6_addr =
 				((struct sockaddr_in6 *) dst_in)->sin6_addr;
+			((struct sockaddr_in6 *) src_in)->sin6_scope_id =
+                                ((struct sockaddr_in6 *) dst_in)->sin6_scope_id;
 			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
 		} else if (ipv6_addr_loopback(a)) {
 			ret = rdma_translate_ip(dst_in, addr);


> But really the correct thing to do is to remove addr_resolve_local and
> place the source address into the struct flowi and use the result of
> the route lookup to bind to the source device, and set the source
> address if it is unset.

Sorry I don't get it..
Are you saying that ip6_route_output() will resolve the address even if
it is a link-local address bound to my own interface? Therefor
addr_resolve_local() is not needed.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]                         ` <1255634841.5111.40.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-15 21:32                           ` Jason Gunthorpe
       [not found]                             ` <20091015213205.GV5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-15 21:32 UTC (permalink / raw)
  To: David J. Wilder
  Cc: linux-rdma, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	wilder-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Thu, Oct 15, 2009 at 12:27:21PM -0700, David J. Wilder wrote:
> On Wed, 2009-10-14 at 13:09 Jason Gunthorpe wrote:
> 
> > So, it tries to match the source addr to the addrs bound to the
> > device, which is wrong - that isn't how the ip stack works.
> 
> > You can patch this up a little bit by fixing up addr_resolve_local to
> > set sin6_scope_ip.
> 
> I found the bug in addr_resolve_local().  (more comments below)

Yes, that is the hacky work around I was mentioning..

> > But really the correct thing to do is to remove addr_resolve_local and
> > place the source address into the struct flowi and use the result of
> > the route lookup to bind to the source device, and set the source
> > address if it is unset.
> 
> Sorry I don't get it..
> Are you saying that ip6_route_output() will resolve the address even if
> it is a link-local address bound to my own interface? Therefor
> addr_resolve_local() is not needed.

Yes, and more. In Linux the routing table takes as input the source
(optional), device (optional) and destination address and returns as
output the device to use.

To determine the device to bind to you ask the routing table what
device to use for all the route information you have.

For example:

$ ip route get fe80::c2d from fe80::213:72ff:fe29:e65d oif eth0
fe80::c2d via fe80::c2d dev eth0  src fe80::213:72ff:fe29:e65d  metric 0 
    cache  mtu 1500 advmss 1440 hoplimit 4294967295

$ ip route get fe80::c2d oif eth0
fe80::c2d via fe80::c2d dev eth0  src fe80::213:72ff:fe29:e65d  metric 0 
    cache  mtu 1500 advmss 1440 hoplimit 4294967295

You can see in both cases the routing table returns a 'src'
entry. 'src' is the address to bind to if no bind address was specified.

When doing link local addresess the sin6_scope_id should sets the
'oif' key in the routing lookup, which will result in the correct src
address and output device being selected by the routing algorithm. For
instance on my machine here, I have two interfaces:

$ ip route get fe80::c2d oif virbr0 
fe80::c2d via fe80::c2d dev virbr0  src fe80::2c5d:c4ff:feb8:1ce5  metric 0 
    cache  mtu 1500 advmss 1440 hoplimit 4294967295

As you can see it is returning the link local address for virbr0 as
the source.

So the algorithm in RDMA CM should look like this:
 - If src is specified then set the bind local address to src
   [if src is link local then it must specify sin6_scope_id, and
   sin6_scope_id becomes the oif input to the route lookup]
 - If dst is link local then its sin6_scope_id is the oif to the route
   lookup (and must match src, as we did last go round)
 - Src (or 'any'), dst and device (or 'any') are passed to the route
   lookup
 - The RDMA CM ID is bound to the device returned by the route lookup
 - If the src address was not specified then the connection source IP
   is set to the 'src' value from the route lookup.

This is why addr_resolve_local/rdma_translate_ip is not needed, that
entire entire function is done by the routing table code.

You can see why this becomes important when it is combined with policy
routing, for instance consider this example:

$ ip rule
32765:  from 10.0.0.4 lookup dnat
$ ip route show table dnat
default via 10.0.0.1 dev eth1
$ ip route get 10.0.0.100
10.0.0.100 dev eth0  src 10.0.0.2 
$ ip route get 10.0.0.100 from 10.0.0.4
10.0.0.100 from 10.0.0.4 via 10.0.0.1 dev eth1
    cache  mtu 1500 advmss 1460 hoplimit 64

The two results are radically different and dependant on the source
address. (10.0.0.4 could be attached to eth0, and eth1!)

The actual fixing to the code is not hard, remove rdma_translate_ip,
addr_resolve_local, split addr_resolve_remote into a part to resolve
the route and a part that does the arp/nd. Make the route resolve part
work almost exactly like addr4_resolve_remote (noting that the v6
version is wrong, since is doesn't respect unset source addres,
another bug). Call rdma_copy_addr based on the rt->idev->dev (or
should it be odev??). Do the ARP.

The pain is in retesting everything :|

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]                             ` <20091015213205.GV5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-16 18:54                               ` David J. Wilder
       [not found]                                 ` <1255719280.14675.30.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-16 18:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA, wilder-r/Jw6+rmf7HQT0dZR+AlfA


On Thu, 2009-10-15 at 15:32 -0600, Jason Gunthorpe wrote:

> 
> The actual fixing to the code is not hard, remove rdma_translate_ip,
> addr_resolve_local, split addr_resolve_remote into a part to resolve
> the route and a part that does the arp/nd. Make the route resolve part
> work almost exactly like addr4_resolve_remote (noting that the v6
> version is wrong, since is doesn't respect unset source addres,
> another bug). Call rdma_copy_addr based on the rt->idev->dev (or
> should it be odev??). Do the ARP.
> 
> The pain is in retesting everything :|
> 
> Jason

There is the wrinkle in this plan.  If you pass ip6_route_output() your
own link-local address (scoped or not) it returns a neighbor entry bound
to the loop back device.

$ ip route get fe80::202:c903:1:28ed oif ib0
local fe80::202:c903:1:28ed from :: via :: dev lo  table local  proto
none  src fe80::202:c903:1:28ed  metric 0  mtu 16436 advmss 16376
hoplimit 4294967295

Thats not going to work as lo has no RDMA support.  I suspect this is
why addr_resolve_local() was added in the first place :)

I will post a patch with our first solution and include the fix to
addr_resolve_local(). 

 Dave.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH]  link-local address fix for rdma_resolve_addr
       [not found]                                 ` <1255719280.14675.30.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-16 19:28                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-16 19:28 UTC (permalink / raw)
  To: David J. Wilder
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA, wilder-r/Jw6+rmf7HQT0dZR+AlfA

On Fri, Oct 16, 2009 at 11:54:39AM -0700, David J. Wilder wrote:

> There is the wrinkle in this plan.  If you pass ip6_route_output() your
> own link-local address (scoped or not) it returns a neighbor entry bound
> to the loop back device.

Well, that really is correct. The RDMA CM could try to do something
smart if lo is returned from the route lookup - ie choose the first
RDMA device bound to that IP or something.

There are other problems with the loop back case, like the Linux stack
doesn't generate neighbor entries for its own IP addresses.

The loopback case is far less important than the other policy routing
features anyhow..

> I will post a patch with our first solution and include the fix to
> addr_resolve_local(). 

May as well, a little bit of progress is better than none at all. 

Also note that I observed that the IPv6 code path never sets the
source address from the routing table if IN6_ADDR_ANY is used as
source, this is certainly wrong and could cause other problems down
the line, probably more so for iwarp than IB though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH] link-local address fix for rdma_resolve_addr
@ 2009-10-19 22:47 David J. Wilder
       [not found] ` <1255992430.12075.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-19 22:47 UTC (permalink / raw)
  To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

Sean, Roland

Here is the updated patch that Jason and I discussed last week.

rdma_resolve_addr() returns an error when attempting to resolve ipv6
link-local address.  This patch fixes the handling of link-local address.

The patch was tested using rping run as such:

Link-local with scope:
# /usr/bin/rping -c -a <remote-system-link-local>%ib0

Link-local w/out scope: (expect failure)
# /usr/bin/rping -c -a <remote-system-link-local>
rdma_resolve_addr error -1

Own interface link local:
# /usr/bin/rping -c -a <remote-system-link-local>%ib0

Other ipv6 address:
# /usr/bin/rping -c -a 2001:db8:1234::2

(server side started with rping -s -P -v -a ::0)

Tested against ofed build OFED-1.5-20091019-0811 and kernel 2.6.30.

Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

------------------------------------------------------
 drivers/infiniband/core/addr.c |   25 ++++++++++++++++++++++---
 1 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..3442256 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
 	fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
 	fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
 
+	if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+		if (!src_in->sin6_scope_id)
+			return -EINVAL;
+		fl.oif = src_in->sin6_scope_id;
+	}
+	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+		if (dst_in->sin6_scope_id) {
+			if (fl.oif && fl.oif != dst_in->sin6_scope_id)
+				return -EINVAL;
+			fl.oif = dst_in->sin6_scope_id;
+		}
+		if (!fl.oif)
+			return -EINVAL;
+	}
+
 	dst = ip6_route_output(&init_net, NULL, &fl);
 	if (!dst)
 		return ret;
@@ -390,14 +405,16 @@ static int addr_resolve_local(struct sockaddr *src_in,
 	case AF_INET6:
 	{
 		struct in6_addr *a;
+		int found = 0;
 
 		for_each_netdev(&init_net, dev)
 			if (ipv6_chk_addr(&init_net,
 					  &((struct sockaddr_in6 *) dst_in)->sin6_addr,
-					  dev, 1))
+					  dev, 1)) {
+				found = 1;
 				break;
-
-		if (!dev)
+			}
+		if (!found)
 			return -EADDRNOTAVAIL;
 
 		a = &((struct sockaddr_in6 *) src_in)->sin6_addr;
@@ -406,6 +423,8 @@ static int addr_resolve_local(struct sockaddr *src_in,
 			src_in->sa_family = dst_in->sa_family;
 			((struct sockaddr_in6 *) src_in)->sin6_addr =
 				((struct sockaddr_in6 *) dst_in)->sin6_addr;
+			((struct sockaddr_in6 *) src_in)->sin6_scope_id =
+				((struct sockaddr_in6 *) dst_in)->sin6_scope_id;
 			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
 		} else if (ipv6_addr_loopback(a)) {
 			ret = rdma_translate_ip(dst_in, addr);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found] ` <1255992430.12075.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-19 23:43   ` Jason Gunthorpe
       [not found]     ` <20091019234329.GC9643-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-19 23:43 UTC (permalink / raw)
  To: David J. Wilder
  Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Mon, Oct 19, 2009 at 03:47:09PM -0700, David J. Wilder wrote:

> +++ b/drivers/infiniband/core/addr.c
> @@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
>  	fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
>  	fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
>  
> +	if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
> +		if (!src_in->sin6_scope_id)
> +			return -EINVAL;
> +		fl.oif = src_in->sin6_scope_id;
> +	}

Seeing it all together like this make it clear this test needs to move
up the call chain and test the sockaddr passed from userspace, not
the one created by addr_resolve_local. Probably somewhere along the
rdma_resolve_addr -> cma_bind_addr -> rmda_bind_addr ->
rdma_translate_ip path. Maybe rdma_translate_ip should use and check
the scope as a temporary hack?

BTW, while researching the above comment, I'm not certain your last
patch is at all correct:

commit 85f20b39fd44310a163a9b33708fea57f08a4e40
    RDMA/addr: Fix resolution of local IPv6 addresses
    
    This patch allows a local IPv6 address to be resolved by rdma_cm.
    
    To reproduce the problem:
    
     $ rping -s -v -a ::0  &
     $ rping -c -v -a <IPv6 address local to this system>
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in,
 
                for_each_netdev(&init_net, dev)
                        if (ipv6_chk_addr(&init_net,
-                                         &((struct sockaddr_in6 *) addr)->sin6_addr,
+                                         &((struct sockaddr_in6 *) dst_in)->sin6_addr,
                                          dev, 1))
                                break;

I can believe it fixes the case you describe (ie loopback) but
matching the *dest* IP against the local interface's IP list cannot
possibly be right.

The primary problem is that for_each_netdev/ipv6_chk_addr is NOT the
same as ip_dev_find. ip_dev_find is a routing lookup, ipv6_chk_addr
compares the local address list. Not at all the same. I don't see a
route lookup helper for ipv6, so you have to code full flowi lookup.

With your change I expect ipv6 is 100% broken now for non loop cases?

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]     ` <20091019234329.GC9643-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-19 23:47       ` Sean Hefty
       [not found]         ` <676AB781CD644CC28E1AD4951EA4EEF8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-19 23:47 UTC (permalink / raw)
  To: 'Jason Gunthorpe', David J. Wilder
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

>@@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in,
>
>                for_each_netdev(&init_net, dev)
>                        if (ipv6_chk_addr(&init_net,
>-                                         &((struct sockaddr_in6 *) addr)-
>>sin6_addr,
>+                                         &((struct sockaddr_in6 *) dst_in)-
>>sin6_addr,
>                                          dev, 1))
>                                break;
>
>I can believe it fixes the case you describe (ie loopback) but
>matching the *dest* IP against the local interface's IP list cannot
>possibly be right.

The intent is to see if the destination address is local.  A source address may
not be given.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]         ` <676AB781CD644CC28E1AD4951EA4EEF8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-20  0:33           ` Jason Gunthorpe
       [not found]             ` <20091020003344.GA14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-20  0:33 UTC (permalink / raw)
  To: Sean Hefty
  Cc: David J. Wilder, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Mon, Oct 19, 2009 at 04:47:59PM -0700, Sean Hefty wrote:
> >@@ -393,7 +393,7 @@ static int addr_resolve_local(struct sockaddr *src_in,
> >
> >                for_each_netdev(&init_net, dev)
> >                        if (ipv6_chk_addr(&init_net,
> >-                                         &((struct sockaddr_in6 *) addr)-
> >>sin6_addr,
> >+                                         &((struct sockaddr_in6 *) dst_in)-
> >>sin6_addr,
> >                                          dev, 1))
> >                                break;
> >
> >I can believe it fixes the case you describe (ie loopback) but
> >matching the *dest* IP against the local interface's IP list cannot
> >possibly be right.
> 
> The intent is to see if the destination address is local.  A source
> address may not be given.

Well, that makes more sense, but it still pretty strange to match the
IP list like that, the proper thing is to query RT6_TABLE_LOCAL, like
the IPv4 case does.

Anyhow, couldn't the whole addr_resolve_local routine be replaced with
something like this in addr_resolve_remote:
 if (rt->idev == init_net->loopback_dev)
    rdma_translate_ip(rt->rt_src, dev_addr, NULL);

for IPv4 and similar for IPv6?

That does query the proper RT_TABLEs to determine if the IP is local
and then we get the searching and ip_dev_find only for the case where
the address is definitely looped back. Much closer to how the IP stack
works normally.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]             ` <20091020003344.GA14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-21 22:30               ` David J. Wilder
       [not found]                 ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-21 22:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, Sean Hefty, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5


On Mon, 2009-10-19 at 18:33 -0600, Jason Gunthorpe wrote:

> Well, that makes more sense, but it still pretty strange to match the
> IP list like that, the proper thing is to query RT6_TABLE_LOCAL, like
> the IPv4 case does.
> 
> Anyhow, couldn't the whole addr_resolve_local routine be replaced with
> something like this in addr_resolve_remote:
>  if (rt->idev == init_net->loopback_dev)
>     rdma_translate_ip(rt->rt_src, dev_addr, NULL);
> 
> for IPv4 and similar for IPv6?
> 
> That does query the proper RT_TABLEs to determine if the IP is local
> and then we get the searching and ip_dev_find only for the case where
> the address is definitely looped back. Much closer to how the IP stack
> works normally.
> 
> Jason

Jason, Sean
I have separated the patches into three to make them simpler to discuss.
One patch fixes the handling of link-local address in
addr6_resolve_remote(). The second patch fixes the bugs in
addr_resolve_local().  The 3d patch I am posting now for discussion.

This patch,  as Jason's suggested,  moves the function of addr_resolve_local() into addr4_resolve_remote()
and addr6_resolve_remote().  It eliminates the need for addr_resolve_local(). 

 drivers/infiniband/core/addr.c |   99 +++++++--------------------------------
 1 files changed, 18 insertions(+), 81 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..f7a5861 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -233,6 +233,14 @@ static int addr4_resolve_remote(struct sockaddr_in *src_in,
 	if (ret)
 		goto out;
 
+	if (rt->idev->dev == init_net.loopback_dev){
+		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+				MAX_ADDR_LEN);
+		goto put;
+	}
+
 	/* If the device does ARP internally, return 'done' */
 	if (rt->idev->dev->flags & IFF_NOARP) {
 		rdma_copy_addr(addr, rt->idev->dev, NULL);
@@ -282,6 +290,14 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
 	if (!dst)
 		return ret;
 
+	if (dst->dev == init_net.loopback_dev) {
+		ret=rdma_translate_ip((struct sockaddr *)dst_in, addr);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+				MAX_ADDR_LEN);
+		goto release;
+	}
+
 	if (dst->dev->flags & IFF_NOARP) {
 		ret = rdma_copy_addr(addr, dst->dev, NULL);
 	} else {
@@ -289,7 +305,7 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
 		if (neigh && (neigh->nud_state & NUD_VALID))
 			ret = rdma_copy_addr(addr, neigh->dev, neigh->ha);
 	}
-
+release:
 	dst_release(dst);
 	return ret;
 }
@@ -352,82 +368,6 @@ static void process_req(struct work_struct *work)
 	}
 }
 
-static int addr_resolve_local(struct sockaddr *src_in,
-			      struct sockaddr *dst_in,
-			      struct rdma_dev_addr *addr)
-{
-	struct net_device *dev;
-	int ret;
-
-	switch (dst_in->sa_family) {
-	case AF_INET:
-	{
-		__be32 src_ip = ((struct sockaddr_in *) src_in)->sin_addr.s_addr;
-		__be32 dst_ip = ((struct sockaddr_in *) dst_in)->sin_addr.s_addr;
-
-		dev = ip_dev_find(&init_net, dst_ip);
-		if (!dev)
-			return -EADDRNOTAVAIL;
-
-		if (ipv4_is_zeronet(src_ip)) {
-			src_in->sa_family = dst_in->sa_family;
-			((struct sockaddr_in *) src_in)->sin_addr.s_addr = dst_ip;
-			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
-		} else if (ipv4_is_loopback(src_ip)) {
-			ret = rdma_translate_ip(dst_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		} else {
-			ret = rdma_translate_ip(src_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		}
-		dev_put(dev);
-		break;
-	}
-
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
-	case AF_INET6:
-	{
-		struct in6_addr *a;
-
-		for_each_netdev(&init_net, dev)
-			if (ipv6_chk_addr(&init_net,
-					  &((struct sockaddr_in6 *) dst_in)->sin6_addr,
-					  dev, 1))
-				break;
-
-		if (!dev)
-			return -EADDRNOTAVAIL;
-
-		a = &((struct sockaddr_in6 *) src_in)->sin6_addr;
-
-		if (ipv6_addr_any(a)) {
-			src_in->sa_family = dst_in->sa_family;
-			((struct sockaddr_in6 *) src_in)->sin6_addr =
-				((struct sockaddr_in6 *) dst_in)->sin6_addr;
-			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
-		} else if (ipv6_addr_loopback(a)) {
-			ret = rdma_translate_ip(dst_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		} else  {
-			ret = rdma_translate_ip(src_in, addr);
-			if (!ret)
-				memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
-		}
-		break;
-	}
-#endif
-
-	default:
-		ret = -EADDRNOTAVAIL;
-		break;
-	}
-
-	return ret;
-}
-
 int rdma_resolve_ip(struct rdma_addr_client *client,
 		    struct sockaddr *src_addr, struct sockaddr *dst_addr,
 		    struct rdma_dev_addr *addr, int timeout_ms,
@@ -455,10 +395,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client,
 	src_in = (struct sockaddr *) &req->src_addr;
 	dst_in = (struct sockaddr *) &req->dst_addr;
 
-	req->status = addr_resolve_local(src_in, dst_in, addr);
-	if (req->status == -EADDRNOTAVAIL)
-		req->status = addr_resolve_remote(src_in, dst_in, addr);
-
+	req->status = addr_resolve_remote(src_in, dst_in, addr);
 	switch (req->status) {
 	case 0:
 		req->timeout = jiffies;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                 ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-21 23:08                   ` Jason Gunthorpe
       [not found]                     ` <20091021230845.GR14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-10-21 23:17                   ` [PATCH] link-local address fix for rdma_resolve_addr Sean Hefty
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-21 23:08 UTC (permalink / raw)
  To: David J. Wilder
  Cc: linux-rdma, Sean Hefty, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 21, 2009 at 03:30:29PM -0700, David J. Wilder wrote:

> addr6_resolve_remote(). The second patch fixes the bugs in
> addr_resolve_local().  The 3d patch I am posting now for discussion.
> 
> This patch, as Jason's suggested, moves the function of
> addr_resolve_local() into addr4_resolve_remote() and
> addr6_resolve_remote().  It eliminates the need for
> addr_resolve_local().

This looks exactly like what I was thinking of - have you tested this?

If it is OK, I'd make it the first in the series.

There were two things I was not sure about in my example.
 1) Is 'init_net.loopback_dev' the correct reference for the loop device? Or
    is it something like dev_net(rt->idev->dev)->loopback_dev ?

    I'm sensing it may be the latter, but can't investigate right now
    Donno much about this new namespace stuff really

 2) Was rt->idev->dev the right choice for the ipv4 case? Or is it
    rt->u.dst.dev ?

    The TCP case kinda looks like
int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
        tmp = ip_route_connect(&rt, nexthop, inet->saddr,
                               RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
                               IPPROTO_TCP,
                               inet->sport, usin->sin_port, sk, 1);
        sk_setup_caps(sk, &rt->u.dst);

void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
        __sk_dst_set(sk, dst);

    And all later things key off the sk_get_dst. So I'm thinking
    that u.dst.dev might be correct.

    I have no idea what the difference is though (can't look too hard
    right now)

The main other fixup I see is to remove
                ret = cma_bind_addr(id, src_addr, dst_addr);

>From rdma_resolve_addr and rely on the routing lookup in
addr_resolve_remote called by addr_resolve_ip to setup the bind device
from the routing lookup. (This is what I mentioned in my last email)

Which then lets you fixup the checking and handling of the
sin6_scopeid on the source address - and fixes the main other routing
difference against the TCP stack.

Thanks for working on this!

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                 ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  2009-10-21 23:08                   ` Jason Gunthorpe
@ 2009-10-21 23:17                   ` Sean Hefty
       [not found]                     ` <9D257695083141E79685CB2B260D7D7C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  1 sibling, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-21 23:17 UTC (permalink / raw)
  To: 'David J. Wilder', Jason Gunthorpe
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

>This patch,  as Jason's suggested,  moves the function of addr_resolve_local()
>into addr4_resolve_remote()
>and addr6_resolve_remote().  It eliminates the need for addr_resolve_local().

One quick comment, remove '_remote' from function names:
addr4_resolve_remote, addr6_resolve_remote, and addr_resolve_remote

> drivers/infiniband/core/addr.c |   99 +++++++--------------------------------
> 1 files changed, 18 insertions(+), 81 deletions(-)
>
>diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
>index bd07803..f7a5861 100644
>--- a/drivers/infiniband/core/addr.c
>+++ b/drivers/infiniband/core/addr.c
>@@ -233,6 +233,14 @@ static int addr4_resolve_remote(struct sockaddr_in
>*src_in,
> 	if (ret)
> 		goto out;
>
>+	if (rt->idev->dev == init_net.loopback_dev){
>+		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
>+		if (!ret)
>+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
>+				MAX_ADDR_LEN);
>+		goto put;
>+	}

This doesn't end up doing the same thing as what resolve_local did.  It only
matches up with the 'else if' portion below: 
 
>-static int addr_resolve_local(struct sockaddr *src_in,
>-			      struct sockaddr *dst_in,
>-			      struct rdma_dev_addr *addr)
>-{
>-	struct net_device *dev;
>-	int ret;
>-
>-	switch (dst_in->sa_family) {
>-	case AF_INET:
>-	{
>-		__be32 src_ip = ((struct sockaddr_in *)
src_in)->sin_addr.s_addr;
>-		__be32 dst_ip = ((struct sockaddr_in *)
dst_in)->sin_addr.s_addr;
>-
>-		dev = ip_dev_find(&init_net, dst_ip);
>-		if (!dev)
>-			return -EADDRNOTAVAIL;
>-
>-		if (ipv4_is_zeronet(src_ip)) {
>-			src_in->sa_family = dst_in->sa_family;
>-			((struct sockaddr_in *) src_in)->sin_addr.s_addr =
dst_ip;
>-			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
>-		} else if (ipv4_is_loopback(src_ip)) {
>-			ret = rdma_translate_ip(dst_in, addr);
>-			if (!ret)
>-				memcpy(addr->dst_dev_addr, dev->dev_addr,
>MAX_ADDR_LEN);
>-		} else {
>-			ret = rdma_translate_ip(src_in, addr);
>-			if (!ret)
>-				memcpy(addr->dst_dev_addr, dev->dev_addr,
>MAX_ADDR_LEN);
>-		}

We need to handle the case where the source address is not given and provide
one.  We also need to handle the case where the source address is given, but may
not match the destination address, or even use the same RDMA device.

I didn't look at the ipv6 changes yet.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                     ` <9D257695083141E79685CB2B260D7D7C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-21 23:36                       ` Jason Gunthorpe
       [not found]                         ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-21 23:36 UTC (permalink / raw)
  To: Sean Hefty
  Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 21, 2009 at 04:17:01PM -0700, Sean Hefty wrote:

> This doesn't end up doing the same thing as what resolve_local did.  It only
> matches up with the 'else if' portion below: 

It does all the behaviors, it passes the input into a routing lookup
- though it is true the source needs to be passed into the routing
lookup, which I think David had in another patch. Dave; I guess you
need to include that here now.

So, first case:

> >-		if (ipv4_is_zeronet(src_ip)) {

Looks like:
$ ip route get 10.0.0.11
local 10.0.0.11 dev lo  src 10.0.0.11 
    cache <local>  mtu 16436 advmss 16396 hoplimit 64

trips through the if == loopback and does
'rdma_translate_ip(10.0.0.11)'

Same as the old code (little different path, but seems to give the
same result)

Second case:

> >-		} else if (ipv4_is_loopback(src_ip)) {

Looks like:
$ ip route get 10.0.0.11 from 127.0.0.1
local 10.0.0.11 from 127.0.0.1 dev lo 
    cache <local>  mtu 16436 advmss 16396 hoplimit 64

so trips through the if == loopback and does rdma_translate_ip(10.0.0.11)

Same as the old code

Third case:
> >-		} else {

$ ip route get 10.0.0.11 from 10.0.0.11
local 10.0.0.11 from 10.0.0.11 dev lo 
    cache <local>  mtu 16436 advmss 16396 hoplimit 64

Again, does rdma_translate_ip(10.0.0.11)

And the weird case is different:

$ ip route get 10.0.0.11 from 192.168.122.1
local 10.0.0.11 from 192.168.122.1 dev lo 
    cache <local>  mtu 16436 advmss 16396 hoplimit 64

(192.168.122.1 is bound to a different device on my system than
10.0.0.11)

The new case trips the if == loopback and does
rdma_translate_ip(10.0.0.11)

The old case does rdma_translate_ip(192.168.122.1)

I don't think this is a significant difference, both behaviors are
reasonable choices and the code/complexity savings are worth it, IMHO.
[Alternatively, I suppose one could call rdma_translate_ip(rt->rt_src)
and if that fails then do rdma_translate_ip(dst_in) but why bother?]

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                         ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-21 23:46                           ` Jason Gunthorpe
  2009-10-21 23:58                           ` Sean Hefty
  1 sibling, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-21 23:46 UTC (permalink / raw)
  To: Sean Hefty
  Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 21, 2009 at 05:36:39PM -0600, Jason Gunthorpe wrote:

> > >-		if (ipv4_is_zeronet(src_ip)) {
> 
> Looks like:
> $ ip route get 10.0.0.11
> local 10.0.0.11 dev lo  src 10.0.0.11 
>     cache <local>  mtu 16436 advmss 16396 hoplimit 64
> 
> trips through the if == loopback and does
> 'rdma_translate_ip(10.0.0.11)'
> 
> Same as the old code (little different path, but seems to give the
> same result)

Oops, there is a little woopsie here:

Dave:

+	if (rt->idev->dev == init_net.loopback_dev){
+		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+				MAX_ADDR_LEN);
+		goto put;
+	}
+

The 'goto put' will skip over the source address assignment step.

Maybe move this:

        if (!src_ip) {
                src_in->sin_family = dst_in->sin_family;
                src_in->sin_addr.s_addr = rt->rt_src;
        }

Up above your if (rt->idev)

And similarly for v6.

Also add a src_in = rt->rt_src assignment for v6.

I'd also remove the test for 0 address, just do it unconditionally
(IIRC routing table always returns src if src is not 0)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                         ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-10-21 23:46                           ` Jason Gunthorpe
@ 2009-10-21 23:58                           ` Sean Hefty
       [not found]                             ` <BFF5CD71FFA74BF09A2EED5F3319F370-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  1 sibling, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-21 23:58 UTC (permalink / raw)
  To: 'Jason Gunthorpe'
  Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

>$ ip route get 10.0.0.11 from 192.168.122.1
>local 10.0.0.11 from 192.168.122.1 dev lo
>    cache <local>  mtu 16436 advmss 16396 hoplimit 64
>
>(192.168.122.1 is bound to a different device on my system than
>10.0.0.11)
>
>The new case trips the if == loopback and does
>rdma_translate_ip(10.0.0.11)
>
>The old case does rdma_translate_ip(192.168.122.1)
>
>I don't think this is a significant difference, both behaviors are
>reasonable choices and the code/complexity savings are worth it, IMHO.
>[Alternatively, I suppose one could call rdma_translate_ip(rt->rt_src)
>and if that fails then do rdma_translate_ip(dst_in) but why bother?]

I agree this usage case is weird, but I'm still not sure about the patch.  If
the destination service is listening on 10.0.0.11, then the listen is bound to a
different gid than the source gid that we're trying to connect from.  The
src_dev_addr and dst_dev_addr need to reflect this, so that a PR gives us a path
that routes the CM REQ correctly.

- Sean 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                             ` <BFF5CD71FFA74BF09A2EED5F3319F370-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-22  0:28                               ` Jason Gunthorpe
       [not found]                                 ` <20091022002846.GU14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-22  0:28 UTC (permalink / raw)
  To: Sean Hefty
  Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 21, 2009 at 04:58:02PM -0700, Sean Hefty wrote:
> >$ ip route get 10.0.0.11 from 192.168.122.1
> >local 10.0.0.11 from 192.168.122.1 dev lo
> >    cache <local>  mtu 16436 advmss 16396 hoplimit 64
> >
> >(192.168.122.1 is bound to a different device on my system than
> >10.0.0.11)
> >
> >The new case trips the if == loopback and does
> >rdma_translate_ip(10.0.0.11)
> >
> >The old case does rdma_translate_ip(192.168.122.1)
> >
> >I don't think this is a significant difference, both behaviors are
> >reasonable choices and the code/complexity savings are worth it, IMHO.
> >[Alternatively, I suppose one could call rdma_translate_ip(rt->rt_src)
> >and if that fails then do rdma_translate_ip(dst_in) but why bother?]
> 
> I agree this usage case is weird, but I'm still not sure about the
> patch.  If the destination service is listening on 10.0.0.11, then
> the listen is bound to a different gid than the source gid that
> we're trying to connect from.  The src_dev_addr and dst_dev_addr
> need to reflect this, so that a PR gives us a path that routes the
> CM REQ correctly.

You've lost me.. 

The local loopback case uses PRs?

Even so, it still seems OK to me:

Path:
 addr4_resolve_remote
  $ ip route get 10.0.0.11 from 192.168.122.1
    local 10.0.0.11 from 192.168.122.1 dev lo
  srcIP = 192.168.122.1
  rdma_translate_ip(dst_ip = 10.0.0.11)
   rdma_copy_addr("eth0");
    src_dev_addr = eth0.dev_addr  (ie GID of 10.0.0.11)
  memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11)

So everthing is bound to the GID of 10.0.0.11 which matches the listen
of 10.0.0.11, which seems OK.

The CM REP is sent to GID of 10.0.0.11 from GID of 10.0.0.11, with an
private data that indicates a source IP of 192.168.122.1 and a dest IP
of 10.0.0.11. This seems optimal to me?

By what I understand you said I would think the prior version would be
the one that is in trouble, if I am listening on 10.0.0.11 and you
make a connection sourced from 192.168.122.1, which is looped back, it
seems reasonable to me that this will succeed and the outgoing
connection would necessarily have be bound to the destination address
since the listening process is attached there.

Am I missing something?

Remember, there is still one more missing part there, the cma_bind I
was talking about has to go away. We want to bind the source device
based on the routing lookup, with loopback binding to the destination
device.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                 ` <20091022002846.GU14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-22  0:40                                   ` Sean Hefty
       [not found]                                     ` <4803112A5B7A4953B62ABAFD1BBD9881-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-22  0:40 UTC (permalink / raw)
  To: 'Jason Gunthorpe'
  Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

>The local loopback case uses PRs?

Yes - the rdma cm makes no distinction when resolve route is called.  It does a
PR query.

>Even so, it still seems OK to me:
>
>Path:
> addr4_resolve_remote
>  $ ip route get 10.0.0.11 from 192.168.122.1
>    local 10.0.0.11 from 192.168.122.1 dev lo
>  srcIP = 192.168.122.1
>  rdma_translate_ip(dst_ip = 10.0.0.11)
>   rdma_copy_addr("eth0");
>    src_dev_addr = eth0.dev_addr  (ie GID of 10.0.0.11)
>  memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11)
>
>So everthing is bound to the GID of 10.0.0.11 which matches the listen
>of 10.0.0.11, which seems OK.

The source could have called rdma_bind_addr(192.168.122.1) prior to calling
rdma_resolve_addr().  (DAPL does this.)  This would have returned a different
RDMA device than binding to 10.0.0.11.  The client app could have allocated
resources on that device, but the CM REQ will carry the gid/lid of the other
device.  The endpoints won't be able to communicate.

Yes, it's weird, and may not be optimal, but if a source address is explicitly
given, then its mapping to a specific RDMA device should be honored.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                     ` <4803112A5B7A4953B62ABAFD1BBD9881-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-22  1:19                                       ` Jason Gunthorpe
       [not found]                                         ` <20091022011939.GW14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-22  1:19 UTC (permalink / raw)
  To: Sean Hefty
  Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

On Wed, Oct 21, 2009 at 05:40:30PM -0700, Sean Hefty wrote:

> >Even so, it still seems OK to me:
> >
> >Path:
> > addr4_resolve_remote
> >  $ ip route get 10.0.0.11 from 192.168.122.1
> >    local 10.0.0.11 from 192.168.122.1 dev lo
> >  srcIP = 192.168.122.1
> >  rdma_translate_ip(dst_ip = 10.0.0.11)
> >   rdma_copy_addr("eth0");
> >    src_dev_addr = eth0.dev_addr  (ie GID of 10.0.0.11)
> >  memcpy(dst_dev_addr = src_dev_addr) (ie GID of 10.0.0.11)
> >
> >So everthing is bound to the GID of 10.0.0.11 which matches the listen
> >of 10.0.0.11, which seems OK.
> 
> The source could have called rdma_bind_addr(192.168.122.1) prior to calling
> rdma_resolve_addr().  (DAPL does this.)  This would have returned a different
> RDMA device than binding to 10.0.0.11.  The client app could have allocated
> resources on that device, but the CM REQ will carry the gid/lid of the other
> device.  The endpoints won't be able to communicate.

That is very difficult to fit into the semantics the IP routing
model uses :( And it looks like an API problem in DAPL :(

So, I see now, you are proposing that in this case the connection
attempt to be routed through the network and not looped back..  I
actually have a big problem with that, ignoring a 'lo' entry in a
routing table is very much not IP like and not a good idea. That
should be respected..

I guess I'd much rather see that one situation return EHOSTUNREACH or
something.

But, I suppose you are going to tell me that Intel MPI uses DAPL to
loopback connect to other processes on the same node, and relies on
this? :( :( :(

Sigh. Anyhow, lets not get side tracked. It seems to me, the easy way
out for David's approach is to simply check if the device is already
bound via rdma_bind() and if so force it to that device no matter what
the routing table lookup returns. Can you suggest a reliable way to
make that check?

[What happens now if I do this:
 rdma_bind(10.0.0.11)
 rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11)
Does the cma_bind path check that it is already bound and give out an
error? too late for me to check]

Once the cma_bind for rdma_resolve_addr is moved into the
addr_resolve_remote function then people using the API without calling
bind on the client path will get sane IP-like behavior.

> Yes, it's weird, and may not be optimal, but if a source address is
> explicitly given, then its mapping to a specific RDMA device should
> be honored.

Remember, on Linux the IP is *not* attached to a device, it is part of
the host itself. So the idea that a source address somehow specifies a
RDMA device does not fit into the Linux IP networking model.

Unfortunately the definition of rdma_bind kinda bakes this mismatched
model into the API :(

Truth be told, to fit the Linux IP model, the RDMA CM should have
provided exactly only two ways to bind a cm_id to a specific device -
rdma_accept and rdma_resolve_addr.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                         ` <20091022011939.GW14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-22  5:24                                           ` Sean Hefty
       [not found]                                             ` <60E9443821EE4FFA90448DF5CC1368A8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-22  5:24 UTC (permalink / raw)
  To: 'Jason Gunthorpe'
  Cc: 'David J. Wilder', rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma, pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

>That is very difficult to fit into the semantics the IP routing
>model uses :( And it looks like an API problem in DAPL :(

This depends on if your view is that the rdma cm is trying to match IP routing,
trying to use IP addresses as convenient names for RDMA ports, or something in
between that may lean more one way or the other depending on the device type.
IMO, the primary reason for using IP addressing over IB is more for convenience,
than compliance.

>So, I see now, you are proposing that in this case the connection
>attempt to be routed through the network and not looped back..  I
>actually have a big problem with that, ignoring a 'lo' entry in a
>routing table is very much not IP like and not a good idea. That
>should be respected..

I hesitate comparing RDMA versus IP too closely.

>Sigh. Anyhow, lets not get side tracked. It seems to me, the easy way
>out for David's approach is to simply check if the device is already
>bound via rdma_bind() and if so force it to that device no matter what
>the routing table lookup returns. Can you suggest a reliable way to
>make that check?

I'm not entirely sure of the best way to test this.  Checking the src_dev_addr
is my first thought.

>[What happens now if I do this:
> rdma_bind(10.0.0.11)
> rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11)
>Does the cma_bind path check that it is already bound and give out an
>error? too late for me to check]

rdma_resolve_addr only calls bind if the rdma id is not already bound.  The
src_addr simply gets ignored in this case, and the bound address is used
instead.

>Once the cma_bind for rdma_resolve_addr is moved into the
>addr_resolve_remote function then people using the API without calling
>bind on the client path will get sane IP-like behavior.

I like the approach, I'm just concerned about the case where the app has
requested a specific source address, which today means that a specific RDMA
device should be used.

>Remember, on Linux the IP is *not* attached to a device, it is part of
>the host itself. So the idea that a source address somehow specifies a
>RDMA device does not fit into the Linux IP networking model.

I would say we're extending the linux networking model to incorporate this
concept, while staying in our imposed little RDMA sand box world.  Even iWarp
has these issues.

>Truth be told, to fit the Linux IP model, the RDMA CM should have
>provided exactly only two ways to bind a cm_id to a specific device -
>rdma_accept and rdma_resolve_addr.

I think this is more restrictive than things need to be.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                             ` <60E9443821EE4FFA90448DF5CC1368A8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-22  6:25                                               ` Jason Gunthorpe
       [not found]                                                 ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-22  6:25 UTC (permalink / raw)
  To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma

On Wed, Oct 21, 2009 at 10:24:29PM -0700, Sean Hefty wrote:
> >That is very difficult to fit into the semantics the IP routing
> >model uses :( And it looks like an API problem in DAPL :(
> 
> This depends on if your view is that the rdma cm is trying to match IP routing,
> trying to use IP addresses as convenient names for RDMA ports, or something in
> between that may lean more one way or the other depending on the device type.
> IMO, the primary reason for using IP addressing over IB is more for convenience,
> than compliance.

I more or less agree for IB, but iWarp really should match exactly,
and TBH I'm disappointed that no iWarp folks ever reviewed this stuff
for consistency with the IP stack, it goes to show that netdev wasn't
far off their criticisms I guess :(

> >[What happens now if I do this:
> > rdma_bind(10.0.0.11)
> > rdma_resolve_addr(src = 192.168.122.1 dst = 10.0.0.11)
> >Does the cma_bind path check that it is already bound and give out an
> >error? too late for me to check]
> 
> rdma_resolve_addr only calls bind if the rdma id is not already bound.  The
> src_addr simply gets ignored in this case, and the bound address is used
> instead.

This should get updated to return -EINVAL for this case I think, since
it is nonsense. Either src = null or src == bind addr
 
> >Truth be told, to fit the Linux IP model, the RDMA CM should have
> >provided exactly only two ways to bind a cm_id to a specific device -
> >rdma_accept and rdma_resolve_addr.
> 
> I think this is more restrictive than things need to be.

You get somethings and you loose somethings.. The Linux IP model is
not choosen arbitarily, there are solid reasons why it works like that.

Mainly for RDMA what you get is more kernel flexability, IP like
capabilities, better bonding, and better support of IB APM semantics:
 - bind() + listen() actually works properly if more than one
   interface is bound to the same IP - the cm_id returned by accept is
   bound to the hca and port that accepted the connection
   [ This is a L3 form of bonding Linux supports ]

   This is actually something of a mandatory notion to implement the
   full generality of the IB CM protocol which allows the CM REP to
   contain a port GUID of another port on the same node (multi-port
   APM is an IB feature). So you never know what port the accept()
   result will get bound to.

   BTW: I suppose ideally AF_IB would have a way to say 'CM accept REPs on
   any port on this node' Hmm, reserved GID prefix perhaps? Hmm.

   When used with bonding this would also afford the kernel with the
   ability to accept incoming connections across all the redundant
   RDMA devices - and still have correct bound-to-IP semantics.

 - rdma_resolve_addr more or less as the inverse of all the above
   * multiple interfaces with same IP case works, kernel and routing
     table can distribute outgoing connections
   * multi-port APM works, kernel and user space can choose primary
     and backup port for the IP addy
   * bonding works, kernel can balance outgoing connections across the
     bond slaves.

These are all useful features.

Could we have had the above and still had pretty much enough
flexability? Yes, I think so. The main stumbling block, IMHO, seems to
have been the tying of PDs and a few other non QP verbs objects to a
ibv_context, it should have been the other way around - a PD, comp
channel, etc are fist class kernel objects that exist across all
ports and devices. When a QP is connected to them then the kernel
would setup the HW to match the PD, MRs, etc.

CQs and SRQs would still have to be created after an accpt/route_addr,
but that seems pretty mild <shrug>

But that is of course not how things worked out, it would be possible
to get there from here, but I doubt there is any interest..
It is late, I'm just musing what could have been :)

Still, it must be a total nightmare to write a daemon that listens on
INADDR_ANY, works correctly with multiple devices plus hotplug, and
dosn't leak PDs or something silly - and that is a shame.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                 ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-22  6:49                                                   ` Sean Hefty
       [not found]                                                     ` <B1D32A650C2F400E92AFC4909D8880A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2009-10-25 11:52                                                   ` Or Gerlitz
  1 sibling, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-22  6:49 UTC (permalink / raw)
  To: 'Jason Gunthorpe'; +Cc: 'David J. Wilder', linux-rdma

>   This is actually something of a mandatory notion to implement the
>   full generality of the IB CM protocol which allows the CM REP to
>   contain a port GUID of another port on the same node (multi-port
>   APM is an IB feature). So you never know what port the accept()
>   result will get bound to.

You can redirect (REJ) a REQ to a different port, but you can't accept (REP) a
connection on a different port.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                     ` <B1D32A650C2F400E92AFC4909D8880A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-22 16:47                                                       ` Jason Gunthorpe
       [not found]                                                         ` <20091022164717.GG26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-22 16:47 UTC (permalink / raw)
  To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma

On Wed, Oct 21, 2009 at 11:49:52PM -0700, Sean Hefty wrote:
> >   This is actually something of a mandatory notion to implement the
> >   full generality of the IB CM protocol which allows the CM REP to
> >   contain a port GUID of another port on the same node (multi-port
> >   APM is an IB feature). So you never know what port the accept()
> >   result will get bound to.
> 
> You can redirect (REJ) a REQ to a different port, but you can't accept (REP) a
> connection on a different port.

Do you have a compliance statment for that? :)

Spec says:
 
 The initiator is responsible for proposing the Port Addresses
 (Primary and optional Alternate) that the target (REQ recipient) is to
 use for the channel.  Based on the path defined by those port
 addresses, the initiator provides timeout information and the Service
 Level to be used by the target for any messages that it initiates. The
 SL from initiator to target need not be the same as from target to
 initiator, but the SL that is contained within the REQ message is the
 one that the initiator would prefer the target use. Path in- formation
 is available from Subnet Administration (see section 15.2.5.16
 PathRecord).

'Proposing the Port Addresses' == Any Port On the Node, IMHO.

Someone went to alot of trouble to include enough unnecessary
information to make that possible, and it is necessary for multi port
APM.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                         ` <20091022164717.GG26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-22 17:59                                                           ` Sean Hefty
       [not found]                                                             ` <A03920DE5BAE4309BAFC1B6471AE928C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-22 17:59 UTC (permalink / raw)
  To: 'Jason Gunthorpe'; +Cc: 'David J. Wilder', linux-rdma

>> You can redirect (REJ) a REQ to a different port, but you can't accept (REP)
>a
>> connection on a different port.
>
>Do you have a compliance statment for that? :)

The REP doesn't carry port (i.e. GID) information.

>Spec says:
>
> The initiator is responsible for proposing the Port Addresses
> (Primary and optional Alternate) that the target (REQ recipient) is to
> use for the channel.  Based on the path defined by those port
> addresses, the initiator provides timeout information and the Service
> Level to be used by the target for any messages that it initiates. The
> SL from initiator to target need not be the same as from target to
> initiator, but the SL that is contained within the REQ message is the
> one that the initiator would prefer the target use. Path in- formation
> is available from Subnet Administration (see section 15.2.5.16
> PathRecord).
>
>'Proposing the Port Addresses' == Any Port On the Node, IMHO.

Under reject reasons:

12 Primary Remote Port GID rejected
The recipient of the REQ message could not (or would not) accept the Primary
Remote Port GID
GID of acceptable port.

13 Primary Remote Port LID rejected
The recipient of the REQ message could not (or would not) accept the Primary
Remote Port LID
LID of acceptable port.

I interpret this as the REQ carries the proposal.  If the passive side aggress
with the proposal, it accepts the connection, otherwise it rejects.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                             ` <A03920DE5BAE4309BAFC1B6471AE928C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-22 18:22                                                               ` Jason Gunthorpe
       [not found]                                                                 ` <20091022182221.GZ14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-22 18:22 UTC (permalink / raw)
  To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma

On Thu, Oct 22, 2009 at 10:59:37AM -0700, Sean Hefty wrote:
> >> You can redirect (REJ) a REQ to a different port, but you can't accept (REP)
> >a
> >> connection on a different port.
> >
> >Do you have a compliance statment for that? :)
> 
> The REP doesn't carry port (i.e. GID) information.

Why would it need to? The REQ contains all the port information
necessary, the active side doesn't get to pick the port, it just gets
to approve it.

> Under reject reasons:
> 
> 12 Primary Remote Port GID rejected
> The recipient of the REQ message could not (or would not) accept the Primary
> Remote Port GID
> GID of acceptable port.
> 
> 13 Primary Remote Port LID rejected
> The recipient of the REQ message could not (or would not) accept the Primary
> Remote Port LID
> LID of acceptable port.
> 
> I interpret this as the REQ carries the proposal.  If the passive side aggress
> with the proposal, it accepts the connection, otherwise it rejects.

I agree.

But the proposal contains a complete description of the data carrying
port (GUID, GID, LID) and there is *nothing* in the spec that says
that description must match the port the GMP is received on.

Don't get confused by GMP redirection, that is yet again something
else :)

So, IBA allows the active side to propose a data connection with
primary path on Port 1, alternate path on Port 2 and send the GMP on
Port 2.

The passive CM then looks at this and decides if it is within local
policy constraints, if not it returns 12 or 13.

Thus, you cannot know what port the data QP will be bound to until you
process the REQ - just like IP.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                                 ` <20091022182221.GZ14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-22 18:31                                                                   ` Sean Hefty
       [not found]                                                                     ` <60293EE44E7243B59C41498F20D558A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-22 18:31 UTC (permalink / raw)
  To: 'Jason Gunthorpe'; +Cc: 'David J. Wilder', linux-rdma

>Why would it need to? The REQ contains all the port information
>necessary, the active side doesn't get to pick the port, it just gets
>to approve it.

I'm not following you at all now.  You said:

This is actually something of a mandatory notion to implement the
full generality of the IB CM protocol which allows the CM REP to
contain a port GUID of another port on the same node (multi-port
APM is an IB feature). So you never know what port the accept()
result will get bound to.

The REP does not contain a port GUID, so what exactly were you trying to say
here?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                                     ` <60293EE44E7243B59C41498F20D558A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-22 18:40                                                                       ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-22 18:40 UTC (permalink / raw)
  To: Sean Hefty; +Cc: 'David J. Wilder', linux-rdma

On Thu, Oct 22, 2009 at 11:31:07AM -0700, Sean Hefty wrote:
> The REP does not contain a port GUID, so what exactly were you trying to say
> here?

Sorry, typo, it was too late. I ment REQ.

The point was, that you don't know what port the QP will be bound to
until you get the REQ (ie the RDMA CM user does rdma_accept()). So
early binding limits capabilities.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                     ` <20091021230845.GR14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-22 21:12                       ` David J. Wilder
       [not found]                         ` <1256245942.12075.46.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-22 21:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sean Hefty, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5


On Wed, 2009-10-21 at 17:08 -0600, Jason Gunthorpe wrote:


> This looks exactly like what I was thinking of - have you tested this?

Yes I did do some testing, but that brings up a good question.  I am not
sure I know what all should be tested?  I am running rping with
different destination address (and scoping).  On the ipv4 side:
rping -c -a <ip-of-my-ib0-interface>
rping -c -a <ip-of-remote-nodes-ib0-interface>

For ipv6 I ran what I described previously.  What I do need to do is add
the option to rping to specify a source address and run it with various
address.  Any help you can give defining what exactly needs to be tested
would be appreciated. 
 
> 
> If it is OK, I'd make it the first in the series.
> 
> There were two things I was not sure about in my example.
>  1) Is 'init_net.loopback_dev' the correct reference for the loop device? Or
>     is it something like dev_net(rt->idev->dev)->loopback_dev ?
> 
>     I'm sensing it may be the latter, but can't investigate right now
>     Donno much about this new namespace stuff really

I think you may be correct I will look at that closer.  I did explicitly
verify the test worked in both cases.
 
> 
>  2) Was rt->idev->dev the right choice for the ipv4 case? Or is it
>     rt->u.dst.dev ?
> 
>     The TCP case kinda looks like
> int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>         tmp = ip_route_connect(&rt, nexthop, inet->saddr,
>                                RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
>                                IPPROTO_TCP,
>                                inet->sport, usin->sin_port, sk, 1);
>         sk_setup_caps(sk, &rt->u.dst);
> 
> void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
>         __sk_dst_set(sk, dst);
> 
>     And all later things key off the sk_get_dst. So I'm thinking
>     that u.dst.dev might be correct.
> 
>     I have no idea what the difference is though (can't look too hard
>     right now)
> 
> The main other fixup I see is to remove
>                 ret = cma_bind_addr(id, src_addr, dst_addr);
> 
> From rdma_resolve_addr and rely on the routing lookup in
> addr_resolve_remote called by addr_resolve_ip to setup the bind device
> from the routing lookup. (This is what I mentioned in my last email)
> 
> Which then lets you fixup the checking and handling of the
> sin6_scopeid on the source address - and fixes the main other routing
> difference against the TCP stack.
> 
> Thanks for working on this!
> 
> Jason

Lots of discussion :) I will go through the mails, address the comments
and post the entire series of patches.

Thanks for all your input.

Dave. 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                         ` <1256245942.12075.46.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-22 22:02                           ` Sean Hefty
       [not found]                             ` <660D538F30E647F3AE1E5E6C1ACBE882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-22 22:02 UTC (permalink / raw)
  To: 'David J. Wilder', Jason Gunthorpe
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

>For ipv6 I ran what I described previously.  What I do need to do is add
>the option to rping to specify a source address and run it with various
>address.  Any help you can give defining what exactly needs to be tested
>would be appreciated.

You can also test with ucmatose to verify ipv4 still works.  Use the -b option
to bind to a specific address.

- Sean 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                 ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-10-22  6:49                                                   ` Sean Hefty
@ 2009-10-25 11:52                                                   ` Or Gerlitz
       [not found]                                                     ` <4AE43BED.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 44+ messages in thread
From: Or Gerlitz @ 2009-10-25 11:52 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Sean Hefty, 'David J. Wilder', linux-rdma

Jason Gunthorpe wrote:
> Mainly for RDMA what you get is more kernel flexability, IP like
> capabilities, better bonding, and better support of IB APM semantics:
>  - bind() + listen() actually works properly if more than one
>    interface is bound to the same IP - the cm_id returned by accept is
>    bound to the hca and port that accepted the connection
>    [ This is a L3 form of bonding Linux supports ]
>
>    This is actually something of a mandatory notion to implement the
>    full generality of the IB CM protocol which allows the CM REP to
>    contain a port GUID of another port on the same node (multi-port
>    APM is an IB feature). So you never know what port the accept()
>    result will get bound to.
>
>    BTW: I suppose ideally AF_IB would have a way to say 'CM accept REPs on
>    any port on this node' Hmm, reserved GID prefix perhaps? Hmm.
>
>    When used with bonding this would also afford the kernel with the
>    ability to accept incoming connections across all the redundant
>    RDMA devices - and still have correct bound-to-IP semantics.
>
>  - rdma_resolve_addr more or less as the inverse of all the above
>    * multiple interfaces with same IP case works, kernel and routing
>      table can distribute outgoing connections
>    * multi-port APM works, kernel and user space can choose primary
>      and backup port for the IP addy
>    * bonding works, kernel can balance outgoing connections across the
>      bond slaves.
>
> These are all useful features.

Jason,  Have you even looked into or tested any of the bonding 
load-balancing modes with ipoib? some/most of them are not applicable to 
IPoIB and I don't think that the ones which may be such were ever 
tested. Next, multiple interfaces with the same ip address isn't 
something I see very useful for production environment (but I'd be happy 
to get educated what L3 bonding is and where it can play), next, 
multi-port APM isn't something I ever heard to be required by customers 
and more important, from comments made by Sean in the past, I don't 
think it fits the rdma-cm spirit.

All in all, someone comes here and suggests some fixes to the rdma-cm 
address resolution code to have IPv6 work. I don't think Dave should 
carry on his back/patch all your proposed future enhancements.  Let him 
fix things and following that you can work on the patches to support all 
these nice/nitch features starting with IPv4 and then IPv6.

Or

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                     ` <4AE43BED.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-10-25 20:03                                                       ` Jason Gunthorpe
       [not found]                                                         ` <20091025200332.GN26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-25 20:03 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Sean Hefty, 'David J. Wilder', linux-rdma

On Sun, Oct 25, 2009 at 01:52:13PM +0200, Or Gerlitz wrote:

> Jason,  Have you even looked into or tested any of the bonding 
> load-balancing modes with ipoib? some/most of them are not applicable to 
> IPoIB and I don't think that the ones which may be such were ever 
> tested.

I was saying that point in the rdmacm where the rdma_cm_id is bound to
a local RDMA device should have only been rdma_resolve_addr and
rdma_accept.

Overloading rdma_bind_addr to both bind to an IP and bind to an RDMA
device was a bad API choice.

Sean is right, there may be special cases that require an early
binding, but a seperate API - like IP's SO_BINDTODEVICE - would has
been better - and users are forewarned that calling it restricts the
environments their app will support :(

As it stands we have several impossible situations. Sean, Dave, and I
were disucssing the trades offs of what this means relative to IP
route resolution - but it affects bonding too.

If you rdma_bind_addr to the IP of a bonding device, the stack must
pick one of the local RDMA ports immediately. If you then call
rdma_listen there is a problem: incoming connections may target either
RDMA device, but you are only bound to one of them.

An app cannot say 'I want to listen on this IP, any RDMA device'
with the current API, as you can in IP, and that is a shame.

> Next, multiple interfaces with the same ip address isn't
> something I see very useful for production environment (but I'd be happy 
> to get educated what L3 bonding is and where it can play), next,

Traditionally with ethernet the L2 bonding is really only used for
link aggregation, L1 failure, and a simple multi-switch HA scheme. It
is not deployed if you have multiple ethernet domains. Some people
prefer to have dual, independent ethernet fabrics, and in that case
you rely on routing features to get the multipath, and HA features of
bonding.

Go back on the list and look up the posts from Leo who first
discovered this, what he was trying to do is kinda the L3 bonding
approach.

> and more important, from comments made by Sean in the past, I don't 
> think it fits the rdma-cm spirit.

I think multi-port APM can be fit into the API we have, but yes, there
are some unfortunate design choices in verbs that make it a little
awkward.

> All in all, someone comes here and suggests some fixes to the rdma-cm 
> address resolution code to have IPv6 work. I don't think Dave should 
> carry on his back/patch all your proposed future enhancements.  Let him 
> fix things and following that you can work on the patches to support all 
> these nice/nitch features starting with IPv4 and then IPv6.

David has been doing a good job and I am glad he is working on the
IPv6 support. My comments are only intended to clarify how this is all
supposed to work and why the IP flow is actually still relevant
to RDMA connections.

This (now quite long) discussion about RDMA device binding I really
think has clarified what everyone expects from the API, and I've
certainly found Sean's comments about early device binding quite
interesting.

Unfortunately to make sin6_scope_id, and the loopback cases work
properly does kinda require confronting these issues. The IPv6 notion
of scope is new and it conflicts with the RDMA CM notion of 'scope'
created by the local RDMA device binding. This is very different from
how IP works. Reconciling the two ideas in a sane way is a big source
of trouble.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                         ` <20091025200332.GN26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-27 13:01                                                           ` Or Gerlitz
       [not found]                                                             ` <4AE6EF44.5040004-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Or Gerlitz @ 2009-10-27 13:01 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Sean Hefty, 'David J. Wilder', linux-rdma

Jason Gunthorpe wrote:
> I was saying that point in the rdmacm where the rdma_cm_id is bound to a local RDMA device should have only been rdma_resolve_addr and rdma_accept. Overloading rdma_bind_addr to both bind to an IP and bind to an RDMA device was a bad API choice.
As you wrote, for the most case, binding comes into play only for users 
calling rdma_resolve_address or rdma_accept, for users the need explicit 
binding the rdma-cm provides rdma_bind and it binds to both IP and 
Device, if you can do better, send a patch, binding can't be removed 
from the API since it has users and it makes sense  from users to 
require it.
> Sean is right, there may be special cases that require an early binding, but a seperate API - like IP's SO_BINDTODEVICE - would has been better - and users are forewarned that calling it restricts the environments their app will support
its just naming, will $ sed s/rdma_bind/rdma_set_opt(RDMA_BINDTODEVICE)/g
make you happier? why?

> As it stands we have several impossible situations. Sean, Dave, and I were disucssing the trades offs of what this means relative to IP route resolution 
Don't tell me that Dave's patches are blocked b/c you discovered the 
rdma_bind design and now you don't like it, as I wrote you, Dave sent 
patch to fix the IPv6 support, during the discussion on his patches you 
come and bring up more and more issues you consider as problems (but I 
don't) and block the patch set, I don't think this is appropriate. Let 
the patches go and send your patches to fix the problems you see. Why 
anyone touching some code piece has to fix problems you see in that piece?!


> - but it affects bonding too. If you rdma_bind_addr to the IP of a bonding device, the stack must pick one of the local RDMA ports immediately. If you then call rdma_listen there is a problem: incoming connections may target either RDMA device, but you are only bound to one of them. An app cannot say 'I want to listen on this IP, any RDMA device' with the current API, as you can in IP, and that is a shame
An app can say, I want to listen on that IP and the RDMA device which is 
associated with this IP now. When bonding does fail-over it generated a 
netevent, the rdma-cm catches this event and generates address change 
event, apps can redo their bind/listen at this point. For the time 
being, we never got a user report on a problem, people are doing listen 
on all IPs probably which works perfect with bonding. Currently the HA 
mode of bonding will respond on ARP only on one of the devices and as 
such connection requests will not target any rdma device but rather only 
the active one. If this is such a shame, send fix, spraying mud on the 
maintainer and/or someone sending another patch is a shame, isn't it?


> Traditionally with ethernet the L2 bonding is really only used for link aggregation, L1 failure, and a simple multi-switch HA scheme. It is not deployed if you have multiple ethernet domains. Some people prefer to have dual, independent ethernet fabrics, and in that case you rely on routing features to get the multipath, and HA features of bonding.
okay, thanks for the crash course.

> Go back on the list and look up the posts from Leo who first discovered this, what he was trying to do is kinda the L3 bonding approach.
if Loe has a problem and you want to help him, bring it on the list, 
debate, send patches, jumping into someone's else patch isn't the 
constructive way to go.

> David has been doing a good job and I am glad he is working on the IPv6 support. My comments are only intended to clarify how this is all supposed to work and why the IP flow is actually still relevant to RDMA connections.
As I see it, your comments block the the patches sent by Dave, Sean?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                             ` <4AE6EF44.5040004-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-10-27 17:14                                                               ` Jason Gunthorpe
       [not found]                                                                 ` <20091027171403.GP26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2009-10-27 17:14 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Sean Hefty, linux-rdma

On Tue, Oct 27, 2009 at 03:01:56PM +0200, Or Gerlitz wrote:

> Don't tell me that Dave's patches are blocked b/c you discovered the 
> rdma_bind design and now you don't like it, as I wrote you, Dave sent 
> patch to fix the IPv6 support, during the discussion on his patches you 
> come and bring up more and more issues you consider as problems (but I 
> don't) and block the patch set, I don't think this is appropriate. Let 
> the patches go and send your patches to fix the problems you see. Why 
> anyone touching some code piece has to fix problems you see in that piece?!

Wow, seriously? You do understand the purpose of review, right?

Documentation/development-process/7.AdvancedTopics:
 Different developers will review code from different points of view.
 Some are mostly concerned with coding style and whether code lines
 have trailing white space.  Others will focus primarily on whether the
 change implemented by the patch as a whole is a good thing for the
 kernel or not.  Yet others will check for problematic locking,
 excessive stack usage, possible security issues, duplication of code
 found elsewhere, adequate documentation, adverse effects on
 performance, user-space ABI changes, etc.  All types of review, if
 they lead to better code going into the kernel, are welcome and
 worthwhile.

And yes, actually, accounting for how rdma_bind() is different from
bind() when doing route resolution is pretty much the main remaining
problem.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                             ` <660D538F30E647F3AE1E5E6C1ACBE882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-27 18:22                               ` David J. Wilder
       [not found]                                 ` <1256667738.16192.1.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: David J. Wilder @ 2009-10-27 18:22 UTC (permalink / raw)
  To: Sean Hefty
  Cc: Jason Gunthorpe, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5


On Thu, 2009-10-22 at 15:02 -0700, Sean Hefty wrote:
> >For ipv6 I ran what I described previously.  What I do need to do is add
> >the option to rping to specify a source address and run it with various
> >address.  Any help you can give defining what exactly needs to be tested
> >would be appreciated.
> 
> You can also test with ucmatose to verify ipv4 still works.  Use the -b option
> to bind to a specific address.
> 
> - Sean 
> 

Sean
This patch adds ipv6 support to ucmatose.

Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

------------------------------------------------------
 examples/cmatose.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/examples/cmatose.c b/examples/cmatose.c
index 8c12347..481a6d0 100644
--- a/examples/cmatose.c
+++ b/examples/cmatose.c
@@ -516,15 +516,15 @@ static int get_addr(char *dst, struct sockaddr_in *addr)
 		return ret;
 	}
 
-	if (res->ai_family != PF_INET) {
+	if (res->ai_family == PF_INET)
+		memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in));
+	else if (res->ai_family == PF_INET6)
+		memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in6));
+	else
 		ret = -1;
-		goto out;
-	}
 
-	*addr = *(struct sockaddr_in *) res->ai_addr;
-out:
-	freeaddrinfo(res);
-	return ret;
+        freeaddrinfo(res);
+        return ret;
 }
 
 static int run_server(void)
@@ -543,11 +543,18 @@ static int run_server(void)
 		ret = get_addr(src_addr, &test.src_in);
 		if (ret)
 			goto out;
-	} else
+		if(test.src_in.sin_family == AF_INET)
+			((struct sockaddr_in *) &test.src_in)->sin_port = port;
+		else
+			((struct sockaddr_in6 *) &test.src_in)->sin6_port=port;
+		
+	} else {
 		test.src_in.sin_family = PF_INET;
+		test.src_in.sin_port = port;
+	}
+
+	ret = rdma_bind_addr(listen_id, (struct sockaddr *)&test.src_in);
 
-	test.src_in.sin_port = port;
-	ret = rdma_bind_addr(listen_id, test.src_addr);
 	if (ret) {
 		perror("cmatose: bind address failed");
 		goto out;
@@ -628,8 +635,8 @@ static int run_client(void)
 	printf("cmatose: connecting\n");
 	for (i = 0; i < connections; i++) {
 		ret = rdma_resolve_addr(test.nodes[i].cma_id,
-					src_addr ? test.src_addr : NULL,
-					test.dst_addr, 2000);
+			src_addr ? (struct sockaddr *)&test.src_in : NULL,
+			(struct sockaddr *)&test.dst_in, 2000);
 		if (ret) {
 			perror("cmatose: failure getting addr");
 			connect_error();


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* re: [PATCH] librdmacm/cmatose: add support for ipv6
       [not found]                                 ` <1256667738.16192.1.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
@ 2009-10-27 18:58                                   ` Sean Hefty
       [not found]                                     ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Sean Hefty @ 2009-10-27 18:58 UTC (permalink / raw)
  To: 'David J. Wilder'
  Cc: Jason Gunthorpe, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

>This patch adds ipv6 support to ucmatose.
>
>Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Thanks.

I pulled this patch into my local git tree with just a couple of minor cleanups.
What other patches, if any, did you use to test with it?

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] librdmacm/cmatose: add support for ipv6
       [not found]                                     ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-10-27 19:42                                       ` Sean Hefty
  2009-10-27 21:50                                       ` David J. Wilder
  1 sibling, 0 replies; 44+ messages in thread
From: Sean Hefty @ 2009-10-27 19:42 UTC (permalink / raw)
  To: Hefty, Sean, 'David J. Wilder'
  Cc: Jason Gunthorpe, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma,
	pradeep-r/Jw6+rmf7HQT0dZR+AlfA,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5

From: David J. Wilder <[dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org]>

Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
We need to update struct cmatest to allow storing an IPv6 address,
or we can overrun the buffer.

Running with this patch, the client causes a kernel bug on 2.6.31 in
addr_resolve_local, so we'll still fixes for that.  It also meant that
I couldn't test this patch.

 examples/cmatose.c |   53 ++++++++++++++++++++++++++++------------------------
 1 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/examples/cmatose.c b/examples/cmatose.c
index 8c12347..84831ec 100644
--- a/examples/cmatose.c
+++ b/examples/cmatose.c
@@ -75,10 +75,7 @@ struct cmatest {
 	int			connects_left;
 	int			disconnects_left;
 
-	struct sockaddr_in	dst_in;
-	struct sockaddr		*dst_addr;
-	struct sockaddr_in	src_in;
-	struct sockaddr		*src_addr;
+	struct rdma_addr	addr;
 };
 
 static struct cmatest test;
@@ -505,7 +502,7 @@ static int migrate_channel(struct rdma_cm_id *listen_id)
 	return ret;
 }
 
-static int get_addr(char *dst, struct sockaddr_in *addr)
+static int get_addr(char *dst, struct sockaddr *addr)
 {
 	struct addrinfo *res;
 	int ret;
@@ -516,15 +513,15 @@ static int get_addr(char *dst, struct sockaddr_in *addr)
 		return ret;
 	}
 
-	if (res->ai_family != PF_INET) {
+	if (res->ai_family == PF_INET)
+		memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in));
+	else if (res->ai_family == PF_INET6)
+		memcpy(addr, res->ai_addr, sizeof(struct sockaddr_in6));
+	else
 		ret = -1;
-		goto out;
-	}
 
-	*addr = *(struct sockaddr_in *) res->ai_addr;
-out:
-	freeaddrinfo(res);
-	return ret;
+        freeaddrinfo(res);
+        return ret;
 }
 
 static int run_server(void)
@@ -540,14 +537,21 @@ static int run_server(void)
 	}
 
 	if (src_addr) {
-		ret = get_addr(src_addr, &test.src_in);
+		ret = get_addr(src_addr, &test.addr.src_addr);
 		if (ret)
 			goto out;
-	} else
-		test.src_in.sin_family = PF_INET;
+		if (test.addr.src_addr.sa_family == AF_INET)
+			((struct sockaddr_in *) &test.addr.src_addr)->sin_port = port;
+		else
+			((struct sockaddr_in6 *) &test.addr.src_addr)->sin6_port = port;
+		
+	} else {
+		test.addr.src_addr.sa_family = PF_INET;
+		((struct sockaddr_in *) &test.addr.src_addr)->sin_port = port;
+	}
+
+	ret = rdma_bind_addr(listen_id, &test.addr.src_addr);
 
-	test.src_in.sin_port = port;
-	ret = rdma_bind_addr(listen_id, test.src_addr);
 	if (ret) {
 		perror("cmatose: bind address failed");
 		goto out;
@@ -614,22 +618,25 @@ static int run_client(void)
 
 	printf("cmatose: starting client\n");
 	if (src_addr) {
-		ret = get_addr(src_addr, &test.src_in);
+		ret = get_addr(src_addr, &test.addr.src_addr);
 		if (ret)
 			return ret;
 	}
 
-	ret = get_addr(dst_addr, &test.dst_in);
+	ret = get_addr(dst_addr, &test.addr.dst_addr);
 	if (ret)
 		return ret;
 
-	test.dst_in.sin_port = port;
+	if (test.addr.dst_addr.sa_family == AF_INET)
+		((struct sockaddr_in *) &test.addr.dst_addr)->sin_port = port;
+	else
+		((struct sockaddr_in6 *) &test.addr.dst_addr)->sin6_port = port;
 
 	printf("cmatose: connecting\n");
 	for (i = 0; i < connections; i++) {
 		ret = rdma_resolve_addr(test.nodes[i].cma_id,
-					src_addr ? test.src_addr : NULL,
-					test.dst_addr, 2000);
+					src_addr ? &test.addr.src_addr : NULL,
+					&test.addr.dst_addr, 2000);
 		if (ret) {
 			perror("cmatose: failure getting addr");
 			connect_error();
@@ -717,8 +724,6 @@ int main(int argc, char **argv)
 		}
 	}
 
-	test.dst_addr = (struct sockaddr *) &test.dst_in;
-	test.src_addr = (struct sockaddr *) &test.src_in;
 	test.connects_left = connections;
 	test.disconnects_left = connections;
 



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* re: [PATCH] librdmacm/cmatose: add support for ipv6
       [not found]                                     ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2009-10-27 19:42                                       ` Sean Hefty
@ 2009-10-27 21:50                                       ` David J. Wilder
  1 sibling, 0 replies; 44+ messages in thread
From: David J. Wilder @ 2009-10-27 21:50 UTC (permalink / raw)
  To: Sean Hefty
  Cc: Jason Gunthorpe, linux-rdma, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5


On Tue, 2009-10-27 at 11:58 -0700, Sean Hefty wrote:
> >This patch adds ipv6 support to ucmatose.
> >
> >Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> 
> Thanks.
> 
> I pulled this patch into my local git tree with just a couple of minor cleanups.
> What other patches, if any, did you use to test with it?
> 
> - Sean
> 
I have been testing ucmatose against today's ofed 1.5 build, with the
patch I posted earlier with fixes to resolve_local() and
ipv6_resolve_remote().

This patch is still missing the binding changes Jason suggested it just
fixes the existing code.  I did not see the panic you saw on 2.6.32, but
I have run on that kernel yet.

Dave..

Patch (posted earlier)

rdma_resolve_addr() returns an error when attempting to resolve ipv6
link-local address.  This patch fixes the handling of link-local address.

Signed-off-by: David Wilder <dwilder-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

------------------------------------------------------
 drivers/infiniband/core/addr.c |   25 ++++++++++++++++++++++---
 1 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..3442256 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -278,6 +278,21 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in,
 	fl.nl_u.ip6_u.daddr = dst_in->sin6_addr;
 	fl.nl_u.ip6_u.saddr = src_in->sin6_addr;
 
+	if (ipv6_addr_type(&src_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+		if (!src_in->sin6_scope_id)
+			return -EINVAL;
+		fl.oif = src_in->sin6_scope_id;
+	}
+	if (ipv6_addr_type(&dst_in->sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+		if (dst_in->sin6_scope_id) {
+			if (fl.oif && fl.oif != dst_in->sin6_scope_id)
+				return -EINVAL;
+			fl.oif = dst_in->sin6_scope_id;
+		}
+		if (!fl.oif)
+			return -EINVAL;
+	}
+
 	dst = ip6_route_output(&init_net, NULL, &fl);
 	if (!dst)
 		return ret;
@@ -390,14 +405,16 @@ static int addr_resolve_local(struct sockaddr *src_in,
 	case AF_INET6:
 	{
 		struct in6_addr *a;
+		int found = 0;
 
 		for_each_netdev(&init_net, dev)
 			if (ipv6_chk_addr(&init_net,
 					  &((struct sockaddr_in6 *) dst_in)->sin6_addr,
-					  dev, 1))
+					  dev, 1)) {
+				found = 1;
 				break;
-
-		if (!dev)
+			}
+		if (!found)
 			return -EADDRNOTAVAIL;
 
 		a = &((struct sockaddr_in6 *) src_in)->sin6_addr;
@@ -406,6 +423,8 @@ static int addr_resolve_local(struct sockaddr *src_in,
 			src_in->sa_family = dst_in->sa_family;
 			((struct sockaddr_in6 *) src_in)->sin6_addr =
 				((struct sockaddr_in6 *) dst_in)->sin6_addr;
+			((struct sockaddr_in6 *) src_in)->sin6_scope_id =
+				((struct sockaddr_in6 *) dst_in)->sin6_scope_id;
 			ret = rdma_copy_addr(addr, dev, dev->dev_addr);
 		} else if (ipv6_addr_loopback(a)) {
 			ret = rdma_translate_ip(dst_in, addr);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] link-local address fix for rdma_resolve_addr
       [not found]                                                                 ` <20091027171403.GP26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-28 14:33                                                                   ` Or Gerlitz
  0 siblings, 0 replies; 44+ messages in thread
From: Or Gerlitz @ 2009-10-28 14:33 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Sean Hefty, linux-rdma

Jason Gunthorpe wrote:
> Wow, seriously? You do understand the purpose of review, right?
I think I do, maybe not to the depth you and your arguments are, but 
again, repeating myself: my kind of simple argument is that your review 
is way beyond the --change-- suggested by a patch but rather of a whole 
logic, and you block a patch b/c you don't like the logic this patch 
integrates with. To some extent such practice is excepted, but you took 
it to way beyond acceptable limit. I don't accept your assertion that 
the whole logic is broken and it makes sense to me to have a patch from 
Dave to fix the IPv6 part of it. Next or in parallel you are welcome to 
sent a patch fixing/re-writing the whole bind logic or even the whole 
rdma stack or the whole kernel.

> And yes, actually, accounting for how rdma_bind() is different from bind() when doing route resolution is pretty much the main remaining problem
go and fix that

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2009-10-28 14:33 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-19 22:47 [PATCH] link-local address fix for rdma_resolve_addr David J. Wilder
     [not found] ` <1255992430.12075.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-19 23:43   ` Jason Gunthorpe
     [not found]     ` <20091019234329.GC9643-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-19 23:47       ` Sean Hefty
     [not found]         ` <676AB781CD644CC28E1AD4951EA4EEF8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-20  0:33           ` Jason Gunthorpe
     [not found]             ` <20091020003344.GA14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-21 22:30               ` David J. Wilder
     [not found]                 ` <1256164230.12075.31.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-21 23:08                   ` Jason Gunthorpe
     [not found]                     ` <20091021230845.GR14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 21:12                       ` David J. Wilder
     [not found]                         ` <1256245942.12075.46.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-22 22:02                           ` Sean Hefty
     [not found]                             ` <660D538F30E647F3AE1E5E6C1ACBE882-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-27 18:22                               ` David J. Wilder
     [not found]                                 ` <1256667738.16192.1.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-27 18:58                                   ` [PATCH] librdmacm/cmatose: add support for ipv6 Sean Hefty
     [not found]                                     ` <EE6710D62B854E4FBA82524DEB84E8DC-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-27 19:42                                       ` Sean Hefty
2009-10-27 21:50                                       ` David J. Wilder
2009-10-21 23:17                   ` [PATCH] link-local address fix for rdma_resolve_addr Sean Hefty
     [not found]                     ` <9D257695083141E79685CB2B260D7D7C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-21 23:36                       ` Jason Gunthorpe
     [not found]                         ` <20091021233639.GS14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-21 23:46                           ` Jason Gunthorpe
2009-10-21 23:58                           ` Sean Hefty
     [not found]                             ` <BFF5CD71FFA74BF09A2EED5F3319F370-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22  0:28                               ` Jason Gunthorpe
     [not found]                                 ` <20091022002846.GU14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22  0:40                                   ` Sean Hefty
     [not found]                                     ` <4803112A5B7A4953B62ABAFD1BBD9881-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22  1:19                                       ` Jason Gunthorpe
     [not found]                                         ` <20091022011939.GW14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22  5:24                                           ` Sean Hefty
     [not found]                                             ` <60E9443821EE4FFA90448DF5CC1368A8-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22  6:25                                               ` Jason Gunthorpe
     [not found]                                                 ` <20091022062531.GB26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22  6:49                                                   ` Sean Hefty
     [not found]                                                     ` <B1D32A650C2F400E92AFC4909D8880A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 16:47                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20091022164717.GG26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 17:59                                                           ` Sean Hefty
     [not found]                                                             ` <A03920DE5BAE4309BAFC1B6471AE928C-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 18:22                                                               ` Jason Gunthorpe
     [not found]                                                                 ` <20091022182221.GZ14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22 18:31                                                                   ` Sean Hefty
     [not found]                                                                     ` <60293EE44E7243B59C41498F20D558A0-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-10-22 18:40                                                                       ` Jason Gunthorpe
2009-10-25 11:52                                                   ` Or Gerlitz
     [not found]                                                     ` <4AE43BED.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-10-25 20:03                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20091025200332.GN26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-27 13:01                                                           ` Or Gerlitz
     [not found]                                                             ` <4AE6EF44.5040004-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-10-27 17:14                                                               ` Jason Gunthorpe
     [not found]                                                                 ` <20091027171403.GP26003-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-28 14:33                                                                   ` Or Gerlitz
  -- strict thread matches above, loose matches on Subject: below --
2009-10-13 22:09 David J. Wilder
     [not found] ` <1255471781.14513.7.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-13 23:12   ` Jason Gunthorpe
     [not found]     ` <20091013231234.GK5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-14 16:23       ` David J. Wilder
     [not found]         ` <1255537437.14513.28.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-14 17:01           ` Jason Gunthorpe
     [not found]             ` <20091014170155.GL5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-14 17:30               ` David J. Wilder
     [not found]                 ` <1255541405.5111.14.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-14 17:40                   ` Jason Gunthorpe
     [not found]                     ` <20091014174017.GM5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-15 19:27                       ` David J. Wilder
     [not found]                         ` <1255634841.5111.40.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-15 21:32                           ` Jason Gunthorpe
     [not found]                             ` <20091015213205.GV5191-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-16 18:54                               ` David J. Wilder
     [not found]                                 ` <1255719280.14675.30.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-16 19:28                                   ` Jason Gunthorpe
2009-10-14 19:33               ` David J. Wilder
     [not found]                 ` <1255548798.5111.24.camel-XfwDJb4SXxnMbYB6QlFGEg@public.gmane.org>
2009-10-14 20:09                   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox