All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Wise <swise@opengridcomputing.com>
To: Roland Dreier <rdreier@cisco.com>,
	"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Sean Hefty <sean.hefty@intel.com>,
	OpenFabrics General <general@lists.openfabrics.org>
Subject: Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Date: Thu, 09 Aug 2007 13:49:52 -0500	[thread overview]
Message-ID: <46BB61D0.4090101@opengridcomputing.com> (raw)
In-Reply-To: <46B883B5.8040702@opengridcomputing.com>

Any more comments?


Steve Wise wrote:
> Networking experts,
> 
> I'd like input on the patch below, and help in solving this bug 
> properly.  iWARP devices that support both native stack TCP and iWARP 
> (aka RDMA over TCP/IP/Ethernet) connections on the same interface need 
> the fix below or some similar fix to the RDMA connection manager.
> 
> This is a BUG in the Linux RDMA-CMA code as it stands today.
> 
> Here is the issue:
> 
> Consider an mpi cluster running mvapich2.  And the cluster runs 
> MPI/Sockets jobs concurrently with MPI/RDMA jobs.  It is possible, 
> without the patch below, for MPI/Sockets processes to mistakenly get 
> incoming RDMA connections and vice versa.  The way mvapich2 works is 
> that the ranks all bind and listen to a random port (retrying new random 
> ports if the bind fails with "in use").  Once they get a free port and
> bind/listen, they advertise that port number to the peers to do 
> connection setup.  Currently, without the patch below, the mpi/rdma 
> processes can end up binding/listening to the _same_ port number as the 
> mpi/sockets processes running over the native tcp stack.  This is due to 
> duplicate port spaces for native stack TCP and the rdma cm's RDMA_PS_TCP 
> port space.  If this happens, then the connections can get screwed up.
> 
> The correct solution in my mind is to use the host stack's TCP port 
> space for _all_ RDMA_PS_TCP port allocations.   The patch below is a 
> minimal delta to unify the port spaces by using the kernel stack to bind 
> ports.  This is done by allocating a kernel socket and binding to the 
> appropriate local addr/port.  It also allows the kernel stack to pick 
> ephemeral ports by virtue of just passing in port 0 on the kernel bind 
> operation.
> 
> There has been a discussion already on the RDMA list if anyone is 
> interested:
> 
> http://www.mail-archive.com/general@lists.openfabrics.org/msg05162.html
> 
> 
> Thanks,
> 
> Steve.
> 
> 
> ---
> 
> RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
> 
> This is needed for iwarp providers that support native and rdma
> connections over the same interface.
> 
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> 
> drivers/infiniband/core/cma.c |   27 ++++++++++++++++++++++++++-
> 1 files changed, 26 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 9e0ab04..e4d2d7f 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -111,6 +111,7 @@ struct rdma_id_private {
>     struct rdma_cm_id    id;
> 
>     struct rdma_bind_list    *bind_list;
> +    struct socket        *sock;
>     struct hlist_node    node;
>     struct list_head    list;
>     struct list_head    listen_list;
> @@ -695,6 +696,8 @@ static void cma_release_port(struct rdma
>         kfree(bind_list);
>     }
>     mutex_unlock(&lock);
> +    if (id_priv->sock)
> +        sock_release(id_priv->sock);
> }
> 
> void rdma_destroy_id(struct rdma_cm_id *id)
> @@ -1790,6 +1793,25 @@ static int cma_use_port(struct idr *ps,
>     return 0;
> }
> 
> +static int cma_get_tcp_port(struct rdma_id_private *id_priv)
> +{
> +    int ret;
> +    struct socket *sock;
> +
> +    ret = sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
> +    if (ret)
> +        return ret;
> +    ret = sock->ops->bind(sock,
> +              (struct sockaddr *)&id_priv->id.route.addr.src_addr,
> +              ip_addr_size(&id_priv->id.route.addr.src_addr));
> +    if (ret) {
> +        sock_release(sock);
> +        return ret;
> +    }
> +    id_priv->sock = sock;
> +    return 0;   
> +}
> +
> static int cma_get_port(struct rdma_id_private *id_priv)
> {
>     struct idr *ps;
> @@ -1801,6 +1823,9 @@ static int cma_get_port(struct rdma_id_p
>         break;
>     case RDMA_PS_TCP:
>         ps = &tcp_ps;
> +        ret = cma_get_tcp_port(id_priv); /* Synch with native stack */
> +        if (ret)
> +            goto out;
>         break;
>     case RDMA_PS_UDP:
>         ps = &udp_ps;
> @@ -1815,7 +1840,7 @@ static int cma_get_port(struct rdma_id_p
>     else
>         ret = cma_use_port(ps, id_priv);
>     mutex_unlock(&lock);
> -
> +out:
>     return ret;
> }
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Steve Wise <swise@opengridcomputing.com>
To: Roland Dreier <rdreier@cisco.com>,
	"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	OpenFabrics General <general@lists.openfabrics.org>
Subject: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
Date: Thu, 09 Aug 2007 13:49:52 -0500	[thread overview]
Message-ID: <46BB61D0.4090101@opengridcomputing.com> (raw)
In-Reply-To: <46B883B5.8040702@opengridcomputing.com>

Any more comments?


Steve Wise wrote:
> Networking experts,
> 
> I'd like input on the patch below, and help in solving this bug 
> properly.  iWARP devices that support both native stack TCP and iWARP 
> (aka RDMA over TCP/IP/Ethernet) connections on the same interface need 
> the fix below or some similar fix to the RDMA connection manager.
> 
> This is a BUG in the Linux RDMA-CMA code as it stands today.
> 
> Here is the issue:
> 
> Consider an mpi cluster running mvapich2.  And the cluster runs 
> MPI/Sockets jobs concurrently with MPI/RDMA jobs.  It is possible, 
> without the patch below, for MPI/Sockets processes to mistakenly get 
> incoming RDMA connections and vice versa.  The way mvapich2 works is 
> that the ranks all bind and listen to a random port (retrying new random 
> ports if the bind fails with "in use").  Once they get a free port and
> bind/listen, they advertise that port number to the peers to do 
> connection setup.  Currently, without the patch below, the mpi/rdma 
> processes can end up binding/listening to the _same_ port number as the 
> mpi/sockets processes running over the native tcp stack.  This is due to 
> duplicate port spaces for native stack TCP and the rdma cm's RDMA_PS_TCP 
> port space.  If this happens, then the connections can get screwed up.
> 
> The correct solution in my mind is to use the host stack's TCP port 
> space for _all_ RDMA_PS_TCP port allocations.   The patch below is a 
> minimal delta to unify the port spaces by using the kernel stack to bind 
> ports.  This is done by allocating a kernel socket and binding to the 
> appropriate local addr/port.  It also allows the kernel stack to pick 
> ephemeral ports by virtue of just passing in port 0 on the kernel bind 
> operation.
> 
> There has been a discussion already on the RDMA list if anyone is 
> interested:
> 
> http://www.mail-archive.com/general@lists.openfabrics.org/msg05162.html
> 
> 
> Thanks,
> 
> Steve.
> 
> 
> ---
> 
> RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
> 
> This is needed for iwarp providers that support native and rdma
> connections over the same interface.
> 
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> 
> drivers/infiniband/core/cma.c |   27 ++++++++++++++++++++++++++-
> 1 files changed, 26 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 9e0ab04..e4d2d7f 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -111,6 +111,7 @@ struct rdma_id_private {
>     struct rdma_cm_id    id;
> 
>     struct rdma_bind_list    *bind_list;
> +    struct socket        *sock;
>     struct hlist_node    node;
>     struct list_head    list;
>     struct list_head    listen_list;
> @@ -695,6 +696,8 @@ static void cma_release_port(struct rdma
>         kfree(bind_list);
>     }
>     mutex_unlock(&lock);
> +    if (id_priv->sock)
> +        sock_release(id_priv->sock);
> }
> 
> void rdma_destroy_id(struct rdma_cm_id *id)
> @@ -1790,6 +1793,25 @@ static int cma_use_port(struct idr *ps,
>     return 0;
> }
> 
> +static int cma_get_tcp_port(struct rdma_id_private *id_priv)
> +{
> +    int ret;
> +    struct socket *sock;
> +
> +    ret = sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
> +    if (ret)
> +        return ret;
> +    ret = sock->ops->bind(sock,
> +              (struct sockaddr *)&id_priv->id.route.addr.src_addr,
> +              ip_addr_size(&id_priv->id.route.addr.src_addr));
> +    if (ret) {
> +        sock_release(sock);
> +        return ret;
> +    }
> +    id_priv->sock = sock;
> +    return 0;   
> +}
> +
> static int cma_get_port(struct rdma_id_private *id_priv)
> {
>     struct idr *ps;
> @@ -1801,6 +1823,9 @@ static int cma_get_port(struct rdma_id_p
>         break;
>     case RDMA_PS_TCP:
>         ps = &tcp_ps;
> +        ret = cma_get_tcp_port(id_priv); /* Synch with native stack */
> +        if (ret)
> +            goto out;
>         break;
>     case RDMA_PS_UDP:
>         ps = &udp_ps;
> @@ -1815,7 +1840,7 @@ static int cma_get_port(struct rdma_id_p
>     else
>         ret = cma_use_port(ps, id_priv);
>     mutex_unlock(&lock);
> -
> +out:
>     return ret;
> }
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2007-08-09 18:50 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-07 14:37 [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space Steve Wise
2007-08-07 14:37 ` [ofa-general] " Steve Wise
2007-08-07 14:54 ` Evgeniy Polyakov
2007-08-07 14:54   ` [ofa-general] " Evgeniy Polyakov
2007-08-07 15:06   ` Steve Wise
2007-08-07 15:39     ` Evgeniy Polyakov
2007-08-07 15:39       ` [ofa-general] " Evgeniy Polyakov
2007-08-09 18:49 ` Steve Wise [this message]
2007-08-09 18:49   ` Steve Wise
2007-08-09 21:40   ` Sean Hefty
2007-08-09 21:40     ` Sean Hefty
2007-08-09 21:55     ` David Miller
2007-08-09 21:55       ` David Miller
2007-08-09 23:22       ` Sean Hefty
2007-08-15 14:42       ` Steve Wise
2007-08-15 14:42         ` Steve Wise
2007-08-16  2:26         ` Jeff Garzik
2007-08-16  3:11           ` Roland Dreier
2007-08-16  3:11             ` Roland Dreier
2007-08-16  3:27           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom " Sean Hefty
2007-08-16  3:27             ` Sean Hefty
2007-08-16 13:43           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from " Tom Tucker
2007-08-16 13:43             ` Tom Tucker
2007-08-16 21:17             ` David Miller
2007-08-16 21:17               ` David Miller
2007-08-17 19:52               ` Roland Dreier
2007-08-17 19:52                 ` Roland Dreier
2007-08-17 21:27                 ` David Miller
2007-08-17 23:31                   ` Roland Dreier
2007-08-17 23:31                     ` Roland Dreier
2007-08-18  0:00                     ` David Miller
2007-08-18  0:00                       ` David Miller
2007-08-18  5:23                       ` Roland Dreier
2007-08-18  5:23                         ` Roland Dreier
2007-08-18  6:44                         ` David Miller
2007-08-18  6:44                           ` David Miller
2007-08-19  7:01                           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom " Sean Hefty
2007-08-19  7:01                             ` Sean Hefty
2007-08-19  7:23                             ` David Miller
2007-08-19 17:33                               ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom " Felix Marti
2007-08-19 17:33                                 ` Felix Marti
2007-08-19 19:32                                 ` David Miller
2007-08-19 19:32                                   ` David Miller
2007-08-19 19:49                                   ` Felix Marti
2007-08-19 19:49                                     ` Felix Marti
2007-08-19 23:04                                     ` David Miller
2007-08-19 23:04                                       ` David Miller
2007-08-20  0:32                                       ` Felix Marti
2007-08-20  0:32                                         ` Felix Marti
2007-08-20  0:40                                         ` David Miller
2007-08-20  0:40                                           ` David Miller
2007-08-20  0:47                                           ` Felix Marti
2007-08-20  0:47                                             ` Felix Marti
2007-08-20  1:05                                             ` David Miller
2007-08-20  1:05                                               ` David Miller
2007-08-20  1:41                                               ` Felix Marti
2007-08-20 11:07                                                 ` Andi Kleen
2007-08-20 16:26                                                   ` Felix Marti
2007-08-20 19:16                                                   ` Rick Jones
2007-08-20 19:16                                                     ` Rick Jones
2007-08-20  9:43                                             ` Evgeniy Polyakov
2007-08-20 16:53                                               ` Felix Marti
2007-08-20 16:53                                                 ` Felix Marti
2007-08-20 18:10                                                 ` Andi Kleen
2007-08-20 19:02                                                   ` Felix Marti
2007-08-20 19:02                                                     ` Felix Marti
2007-08-20 20:18                                                     ` Thomas Graf
2007-08-20 20:33                                                       ` Andi Kleen
2007-08-20 20:33                                                         ` Andi Kleen
2007-08-20 20:33                                                 ` Patrick Geoffray
2007-08-20 20:33                                                   ` Patrick Geoffray
2007-08-21  4:21                                                   ` Felix Marti
2007-08-21  4:21                                                     ` Felix Marti
2007-08-19 23:27                                     ` Andi Kleen
2007-08-19 23:27                                       ` Andi Kleen
2007-08-19 23:12                                       ` David Miller
2007-08-19 23:12                                         ` David Miller
2007-08-20  1:45                                       ` Felix Marti
2007-08-20  1:45                                         ` Felix Marti
2007-08-20  0:18                                 ` Herbert Xu
2007-08-20  4:31                               ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP portsfrom " ssufficool
2007-08-21  1:16                           ` [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from " Roland Dreier
2007-08-21  6:58                             ` David Miller
2007-08-21  6:58                               ` David Miller
2007-08-28 19:38                               ` Roland Dreier
2007-08-28 20:43                                 ` David Miller
2007-10-08 21:54       ` Steve Wise
2007-10-09 13:44         ` James Lentini
2007-10-10 21:01         ` Sean Hefty
2007-10-10 21:01           ` Sean Hefty
2007-10-10 23:04           ` David Miller
2007-10-10 23:04             ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46BB61D0.4090101@opengridcomputing.com \
    --to=swise@opengridcomputing.com \
    --cc=davem@davemloft.net \
    --cc=general@lists.openfabrics.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=sean.hefty@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.