From: Steve Wise <swise@opengridcomputing.com>
To: Roland Dreier <rdreier@cisco.com>,
"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org,
linux-kernel <linux-kernel@vger.kernel.org>,
Sean Hefty <sean.hefty@intel.com>,
OpenFabrics General <general@lists.openfabrics.org>
Subject: Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Date: Thu, 09 Aug 2007 13:49:52 -0500 [thread overview]
Message-ID: <46BB61D0.4090101@opengridcomputing.com> (raw)
In-Reply-To: <46B883B5.8040702@opengridcomputing.com>
Any more comments?
Steve Wise wrote:
> Networking experts,
>
> I'd like input on the patch below, and help in solving this bug
> properly. iWARP devices that support both native stack TCP and iWARP
> (aka RDMA over TCP/IP/Ethernet) connections on the same interface need
> the fix below or some similar fix to the RDMA connection manager.
>
> This is a BUG in the Linux RDMA-CMA code as it stands today.
>
> Here is the issue:
>
> Consider an mpi cluster running mvapich2. And the cluster runs
> MPI/Sockets jobs concurrently with MPI/RDMA jobs. It is possible,
> without the patch below, for MPI/Sockets processes to mistakenly get
> incoming RDMA connections and vice versa. The way mvapich2 works is
> that the ranks all bind and listen to a random port (retrying new random
> ports if the bind fails with "in use"). Once they get a free port and
> bind/listen, they advertise that port number to the peers to do
> connection setup. Currently, without the patch below, the mpi/rdma
> processes can end up binding/listening to the _same_ port number as the
> mpi/sockets processes running over the native tcp stack. This is due to
> duplicate port spaces for native stack TCP and the rdma cm's RDMA_PS_TCP
> port space. If this happens, then the connections can get screwed up.
>
> The correct solution in my mind is to use the host stack's TCP port
> space for _all_ RDMA_PS_TCP port allocations. The patch below is a
> minimal delta to unify the port spaces by using the kernel stack to bind
> ports. This is done by allocating a kernel socket and binding to the
> appropriate local addr/port. It also allows the kernel stack to pick
> ephemeral ports by virtue of just passing in port 0 on the kernel bind
> operation.
>
> There has been a discussion already on the RDMA list if anyone is
> interested:
>
> http://www.mail-archive.com/general@lists.openfabrics.org/msg05162.html
>
>
> Thanks,
>
> Steve.
>
>
> ---
>
> RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
>
> This is needed for iwarp providers that support native and rdma
> connections over the same interface.
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
>
> drivers/infiniband/core/cma.c | 27 ++++++++++++++++++++++++++-
> 1 files changed, 26 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 9e0ab04..e4d2d7f 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -111,6 +111,7 @@ struct rdma_id_private {
> struct rdma_cm_id id;
>
> struct rdma_bind_list *bind_list;
> + struct socket *sock;
> struct hlist_node node;
> struct list_head list;
> struct list_head listen_list;
> @@ -695,6 +696,8 @@ static void cma_release_port(struct rdma
> kfree(bind_list);
> }
> mutex_unlock(&lock);
> + if (id_priv->sock)
> + sock_release(id_priv->sock);
> }
>
> void rdma_destroy_id(struct rdma_cm_id *id)
> @@ -1790,6 +1793,25 @@ static int cma_use_port(struct idr *ps,
> return 0;
> }
>
> +static int cma_get_tcp_port(struct rdma_id_private *id_priv)
> +{
> + int ret;
> + struct socket *sock;
> +
> + ret = sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
> + if (ret)
> + return ret;
> + ret = sock->ops->bind(sock,
> + (struct sockaddr *)&id_priv->id.route.addr.src_addr,
> + ip_addr_size(&id_priv->id.route.addr.src_addr));
> + if (ret) {
> + sock_release(sock);
> + return ret;
> + }
> + id_priv->sock = sock;
> + return 0;
> +}
> +
> static int cma_get_port(struct rdma_id_private *id_priv)
> {
> struct idr *ps;
> @@ -1801,6 +1823,9 @@ static int cma_get_port(struct rdma_id_p
> break;
> case RDMA_PS_TCP:
> ps = &tcp_ps;
> + ret = cma_get_tcp_port(id_priv); /* Synch with native stack */
> + if (ret)
> + goto out;
> break;
> case RDMA_PS_UDP:
> ps = &udp_ps;
> @@ -1815,7 +1840,7 @@ static int cma_get_port(struct rdma_id_p
> else
> ret = cma_use_port(ps, id_priv);
> mutex_unlock(&lock);
> -
> +out:
> return ret;
> }
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2007-08-09 18:50 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-07 14:37 [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space Steve Wise
2007-08-07 14:54 ` Evgeniy Polyakov
2007-08-07 15:06 ` Steve Wise
2007-08-07 15:39 ` Evgeniy Polyakov
2007-08-09 18:49 ` Steve Wise [this message]
2007-08-09 21:40 ` [ofa-general] " Sean Hefty
2007-08-09 21:55 ` David Miller
2007-08-09 23:22 ` Sean Hefty
2007-08-15 14:42 ` Steve Wise
2007-08-16 2:26 ` Jeff Garzik
2007-08-16 3:11 ` Roland Dreier
2007-08-16 3:27 ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom " Sean Hefty
2007-08-16 13:43 ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from " Tom Tucker
2007-08-16 21:17 ` David Miller
2007-08-17 19:52 ` Roland Dreier
2007-08-17 21:27 ` David Miller
2007-08-17 23:31 ` Roland Dreier
2007-08-18 0:00 ` David Miller
2007-08-18 5:23 ` Roland Dreier
2007-08-18 6:44 ` David Miller
2007-08-19 7:01 ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom " Sean Hefty
2007-08-19 7:23 ` David Miller
2007-08-19 17:33 ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom " Felix Marti
2007-08-19 19:32 ` David Miller
2007-08-19 19:49 ` Felix Marti
2007-08-19 23:04 ` David Miller
2007-08-20 0:32 ` Felix Marti
2007-08-20 0:40 ` David Miller
2007-08-20 0:47 ` Felix Marti
2007-08-20 1:05 ` David Miller
2007-08-20 1:41 ` Felix Marti
2007-08-20 11:07 ` Andi Kleen
2007-08-20 16:26 ` Felix Marti
2007-08-20 19:16 ` Rick Jones
2007-08-20 9:43 ` Evgeniy Polyakov
2007-08-20 16:53 ` Felix Marti
2007-08-20 18:10 ` Andi Kleen
2007-08-20 19:02 ` Felix Marti
2007-08-20 20:18 ` Thomas Graf
2007-08-20 20:33 ` Andi Kleen
2007-08-20 20:33 ` Patrick Geoffray
2007-08-21 4:21 ` Felix Marti
2007-08-19 23:27 ` Andi Kleen
2007-08-19 23:12 ` David Miller
2007-08-20 1:45 ` Felix Marti
2007-08-20 0:18 ` Herbert Xu
2007-08-21 1:16 ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from " Roland Dreier
2007-08-21 6:58 ` David Miller
2007-08-28 19:38 ` Roland Dreier
2007-08-28 20:43 ` David Miller
2007-10-08 21:54 ` Steve Wise
2007-10-09 13:44 ` James Lentini
2007-10-10 21:01 ` Sean Hefty
2007-10-10 23:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46BB61D0.4090101@opengridcomputing.com \
--to=swise@opengridcomputing.com \
--cc=davem@davemloft.net \
--cc=general@lists.openfabrics.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=sean.hefty@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox