public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Steve Wise <swise@opengridcomputing.com>
To: Roland Dreier <rdreier@cisco.com>,
	"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Sean Hefty <sean.hefty@intel.com>,
	OpenFabrics General <general@lists.openfabrics.org>
Subject: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Date: Tue, 07 Aug 2007 09:37:41 -0500	[thread overview]
Message-ID: <46B883B5.8040702@opengridcomputing.com> (raw)

Networking experts,

I'd like input on the patch below, and help in solving this bug 
properly.  iWARP devices that support both native stack TCP and iWARP 
(aka RDMA over TCP/IP/Ethernet) connections on the same interface need 
the fix below or some similar fix to the RDMA connection manager.

This is a BUG in the Linux RDMA-CMA code as it stands today.

Here is the issue:

Consider an mpi cluster running mvapich2.  And the cluster runs 
MPI/Sockets jobs concurrently with MPI/RDMA jobs.  It is possible, 
without the patch below, for MPI/Sockets processes to mistakenly get 
incoming RDMA connections and vice versa.  The way mvapich2 works is 
that the ranks all bind and listen to a random port (retrying new random 
ports if the bind fails with "in use").  Once they get a free port and
bind/listen, they advertise that port number to the peers to do 
connection setup.  Currently, without the patch below, the mpi/rdma 
processes can end up binding/listening to the _same_ port number as the 
mpi/sockets processes running over the native tcp stack.  This is due to 
duplicate port spaces for native stack TCP and the rdma cm's RDMA_PS_TCP 
port space.  If this happens, then the connections can get screwed up.

The correct solution in my mind is to use the host stack's TCP port 
space for _all_ RDMA_PS_TCP port allocations.   The patch below is a 
minimal delta to unify the port spaces by using the kernel stack to bind 
ports.  This is done by allocating a kernel socket and binding to the 
appropriate local addr/port.  It also allows the kernel stack to pick 
ephemeral ports by virtue of just passing in port 0 on the kernel bind 
operation.

There has been a discussion already on the RDMA list if anyone is 
interested:

http://www.mail-archive.com/general@lists.openfabrics.org/msg05162.html


Thanks,

Steve.


---

RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

This is needed for iwarp providers that support native and rdma
connections over the same interface.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/core/cma.c |   27 ++++++++++++++++++++++++++-
 1 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9e0ab04..e4d2d7f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -111,6 +111,7 @@ struct rdma_id_private {
 	struct rdma_cm_id	id;

 	struct rdma_bind_list	*bind_list;
+	struct socket		*sock;
 	struct hlist_node	node;
 	struct list_head	list;
 	struct list_head	listen_list;
@@ -695,6 +696,8 @@ static void cma_release_port(struct rdma
 		kfree(bind_list);
 	}
 	mutex_unlock(&lock);
+	if (id_priv->sock)
+		sock_release(id_priv->sock);
 }

 void rdma_destroy_id(struct rdma_cm_id *id)
@@ -1790,6 +1793,25 @@ static int cma_use_port(struct idr *ps,
 	return 0;
 }

+static int cma_get_tcp_port(struct rdma_id_private *id_priv)
+{
+	int ret;
+	struct socket *sock;
+
+	ret = sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
+	if (ret)
+		return ret;
+	ret = sock->ops->bind(sock,
+			  (struct socketaddr *)&id_priv->id.route.addr.src_addr,
+			  ip_addr_size(&id_priv->id.route.addr.src_addr));
+	if (ret) {
+		sock_release(sock);
+		return ret;
+	}
+	id_priv->sock = sock;
+	return 0;	
+}
+
 static int cma_get_port(struct rdma_id_private *id_priv)
 {
 	struct idr *ps;
@@ -1801,6 +1823,9 @@ static int cma_get_port(struct rdma_id_p
 		break;
 	case RDMA_PS_TCP:
 		ps = &tcp_ps;
+		ret = cma_get_tcp_port(id_priv); /* Synch with native stack */
+		if (ret)
+			goto out;
 		break;
 	case RDMA_PS_UDP:
 		ps = &udp_ps;
@@ -1815,7 +1840,7 @@ static int cma_get_port(struct rdma_id_p
 	else
 		ret = cma_use_port(ps, id_priv);
 	mutex_unlock(&lock);
-
+out:
 	return ret;
 }



             reply	other threads:[~2007-08-07 14:38 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-07 14:37 Steve Wise [this message]
2007-08-07 14:54 ` [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space Evgeniy Polyakov
2007-08-07 15:06   ` Steve Wise
2007-08-07 15:39     ` Evgeniy Polyakov
2007-08-09 18:49 ` Steve Wise
2007-08-09 21:40   ` [ofa-general] " Sean Hefty
2007-08-09 21:55     ` David Miller
2007-08-09 23:22       ` Sean Hefty
2007-08-15 14:42       ` Steve Wise
2007-08-16  2:26         ` Jeff Garzik
2007-08-16  3:11           ` Roland Dreier
2007-08-16  3:27           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom " Sean Hefty
2007-08-16 13:43           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from " Tom Tucker
2007-08-16 21:17             ` David Miller
2007-08-17 19:52               ` Roland Dreier
2007-08-17 21:27                 ` David Miller
2007-08-17 23:31                   ` Roland Dreier
2007-08-18  0:00                     ` David Miller
2007-08-18  5:23                       ` Roland Dreier
2007-08-18  6:44                         ` David Miller
2007-08-19  7:01                           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom " Sean Hefty
2007-08-19  7:23                             ` David Miller
2007-08-19 17:33                               ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom " Felix Marti
2007-08-19 19:32                                 ` David Miller
2007-08-19 19:49                                   ` Felix Marti
2007-08-19 23:04                                     ` David Miller
2007-08-20  0:32                                       ` Felix Marti
2007-08-20  0:40                                         ` David Miller
2007-08-20  0:47                                           ` Felix Marti
2007-08-20  1:05                                             ` David Miller
2007-08-20  1:41                                               ` Felix Marti
2007-08-20 11:07                                                 ` Andi Kleen
2007-08-20 16:26                                                   ` Felix Marti
2007-08-20 19:16                                                   ` Rick Jones
2007-08-20  9:43                                             ` Evgeniy Polyakov
2007-08-20 16:53                                               ` Felix Marti
2007-08-20 18:10                                                 ` Andi Kleen
2007-08-20 19:02                                                   ` Felix Marti
2007-08-20 20:18                                                     ` Thomas Graf
2007-08-20 20:33                                                       ` Andi Kleen
2007-08-20 20:33                                                 ` Patrick Geoffray
2007-08-21  4:21                                                   ` Felix Marti
2007-08-19 23:27                                     ` Andi Kleen
2007-08-19 23:12                                       ` David Miller
2007-08-20  1:45                                       ` Felix Marti
2007-08-20  0:18                                 ` Herbert Xu
2007-08-21  1:16                           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from " Roland Dreier
2007-08-21  6:58                             ` David Miller
2007-08-28 19:38                               ` Roland Dreier
2007-08-28 20:43                                 ` David Miller
2007-10-08 21:54       ` Steve Wise
2007-10-09 13:44         ` James Lentini
2007-10-10 21:01         ` Sean Hefty
2007-10-10 23:04           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46B883B5.8040702@opengridcomputing.com \
    --to=swise@opengridcomputing.com \
    --cc=davem@davemloft.net \
    --cc=general@lists.openfabrics.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=sean.hefty@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox