* [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support
@ 2016-06-30 23:11 Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 1/9] RDS: Rework path specific indirections Sowmini Varadhan
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
The second installment of changes to enable multipath support in
RDS-TCP. This series implements the changes in rds-tcp so that the
rds_conn_path has a pointer to the rds_tcp_connection in cp_transport_data.
Struct rds_tcp_connection keeps track of the inet_sk per path in
t_sock. The ->sk_user_data in turn is a pointer to the rds_conn_path.
With this set of changes, rds_tcp has the needed plumbing to handle
multiple paths(socket) per rds_connection.
Sowmini Varadhan (9):
RDS: Rework path specific indirections
RDS: TCP: Remove dead logic around c_passive in rds-tcp
RDS: TCP: Make rds_tcp_connection track the rds_conn_path
RDS: TCP: Refactor connection destruction to handle multiple paths
RDS: TCP: make ->sk_user_data point to a rds_conn_path
RDS: TCP: make receive path use the rds_conn_path
RDS: TCP: Hooks to set up a single connection path
RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts
RDS: Do not send a pong to an incoming ping with 0 src port
net/rds/connection.c | 39 ++++++--------
net/rds/ib.c | 8 ++--
net/rds/ib.h | 8 ++--
net/rds/ib_cm.c | 6 ++-
net/rds/ib_recv.c | 3 +-
net/rds/ib_send.c | 3 +-
net/rds/loop.c | 14 +++---
net/rds/rds.h | 7 +--
net/rds/recv.c | 4 ++
net/rds/send.c | 16 ++-----
net/rds/tcp.c | 130 +++++++++++++++++++++++++++++++------------------
net/rds/tcp.h | 22 ++++----
net/rds/tcp_connect.c | 38 ++++++++-------
net/rds/tcp_listen.c | 16 +++---
net/rds/tcp_recv.c | 39 ++++++++-------
net/rds/tcp_send.c | 20 ++++----
net/rds/threads.c | 12 +++-
17 files changed, 211 insertions(+), 174 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next 1/9] RDS: Rework path specific indirections
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 2/9] RDS: TCP: Remove dead logic around c_passive in rds-tcp Sowmini Varadhan
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
Refactor code to avoid separate indirections for single-path
and multipath transports. All transports (both single and mp-capable)
will get a pointer to the rds_conn_path, and can trivially derive
the rds_connection from the ->cp_conn.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/connection.c | 5 +----
net/rds/ib.c | 4 ++--
net/rds/ib.h | 4 ++--
net/rds/ib_cm.c | 3 ++-
net/rds/ib_send.c | 3 ++-
net/rds/loop.c | 4 ++--
net/rds/rds.h | 3 ---
net/rds/send.c | 16 ++++------------
net/rds/tcp.c | 6 +++---
net/rds/tcp.h | 6 +++---
net/rds/tcp_connect.c | 7 ++++---
net/rds/tcp_send.c | 8 ++++----
12 files changed, 29 insertions(+), 40 deletions(-)
diff --git a/net/rds/connection.c b/net/rds/connection.c
index a4b07c8..17c2f25 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -326,10 +326,7 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
wait_event(cp->cp_waitq,
!test_bit(RDS_RECV_REFILL, &cp->cp_flags));
- if (!conn->c_trans->t_mp_capable)
- conn->c_trans->conn_shutdown(conn);
- else
- conn->c_trans->conn_path_shutdown(cp);
+ conn->c_trans->conn_path_shutdown(cp);
rds_conn_path_reset(cp);
if (!rds_conn_path_transition(cp, RDS_CONN_DISCONNECTING,
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 44946a6..1b29ec9 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -381,7 +381,7 @@ void rds_ib_exit(void)
struct rds_transport rds_ib_transport = {
.laddr_check = rds_ib_laddr_check,
- .xmit_complete = rds_ib_xmit_complete,
+ .xmit_path_complete = rds_ib_xmit_path_complete,
.xmit = rds_ib_xmit,
.xmit_rdma = rds_ib_xmit_rdma,
.xmit_atomic = rds_ib_xmit_atomic,
@@ -389,7 +389,7 @@ struct rds_transport rds_ib_transport = {
.conn_alloc = rds_ib_conn_alloc,
.conn_free = rds_ib_conn_free,
.conn_connect = rds_ib_conn_connect,
- .conn_shutdown = rds_ib_conn_shutdown,
+ .conn_path_shutdown = rds_ib_conn_path_shutdown,
.inc_copy_to_user = rds_ib_inc_copy_to_user,
.inc_free = rds_ib_inc_free,
.cm_initiate_connect = rds_ib_cm_initiate_connect,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 627fb79..2051f4b 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -329,7 +329,7 @@ extern struct list_head ib_nodev_conns;
int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp);
void rds_ib_conn_free(void *arg);
int rds_ib_conn_connect(struct rds_connection *conn);
-void rds_ib_conn_shutdown(struct rds_connection *conn);
+void rds_ib_conn_path_shutdown(struct rds_conn_path *cp);
void rds_ib_state_change(struct sock *sk);
int rds_ib_listen_init(void);
void rds_ib_listen_stop(void);
@@ -384,7 +384,7 @@ u32 rds_ib_ring_completed(struct rds_ib_work_ring *ring, u32 wr_id, u32 oldest);
extern wait_queue_head_t rds_ib_ring_empty_wait;
/* ib_send.c */
-void rds_ib_xmit_complete(struct rds_connection *conn);
+void rds_ib_xmit_path_complete(struct rds_conn_path *cp);
int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
unsigned int hdr_off, unsigned int sg, unsigned int off);
void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index e48bb1b..e34ea0b 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -731,8 +731,9 @@ int rds_ib_conn_connect(struct rds_connection *conn)
* so that it can be called at any point during startup. In fact it
* can be called multiple times for a given connection.
*/
-void rds_ib_conn_shutdown(struct rds_connection *conn)
+void rds_ib_conn_path_shutdown(struct rds_conn_path *cp)
{
+ struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
int err = 0;
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 6e4110a..84d90c9 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -980,8 +980,9 @@ int rds_ib_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
return ret;
}
-void rds_ib_xmit_complete(struct rds_connection *conn)
+void rds_ib_xmit_path_complete(struct rds_conn_path *cp)
{
+ struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
/* We may have a pending ACK or window update we were unable
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 15f83db..318c21d 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -156,7 +156,7 @@ static int rds_loop_conn_connect(struct rds_connection *conn)
return 0;
}
-static void rds_loop_conn_shutdown(struct rds_connection *conn)
+static void rds_loop_conn_path_shutdown(struct rds_conn_path *cp)
{
}
@@ -189,7 +189,7 @@ struct rds_transport rds_loop_transport = {
.conn_alloc = rds_loop_conn_alloc,
.conn_free = rds_loop_conn_free,
.conn_connect = rds_loop_conn_connect,
- .conn_shutdown = rds_loop_conn_shutdown,
+ .conn_path_shutdown = rds_loop_conn_path_shutdown,
.inc_copy_to_user = rds_message_inc_copy_to_user,
.inc_free = rds_loop_inc_free,
.t_name = "loopback",
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 2e35b73..5bbad08 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -455,11 +455,8 @@ struct rds_transport {
int (*conn_alloc)(struct rds_connection *conn, gfp_t gfp);
void (*conn_free)(void *data);
int (*conn_connect)(struct rds_connection *conn);
- void (*conn_shutdown)(struct rds_connection *conn);
void (*conn_path_shutdown)(struct rds_conn_path *conn);
- void (*xmit_prepare)(struct rds_connection *conn);
void (*xmit_path_prepare)(struct rds_conn_path *cp);
- void (*xmit_complete)(struct rds_connection *conn);
void (*xmit_path_complete)(struct rds_conn_path *cp);
int (*xmit)(struct rds_connection *conn, struct rds_message *rm,
unsigned int hdr_off, unsigned int sg, unsigned int off);
diff --git a/net/rds/send.c b/net/rds/send.c
index ee43d6b..5a9caf1 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -183,12 +183,8 @@ int rds_send_xmit(struct rds_conn_path *cp)
goto out;
}
- if (conn->c_trans->t_mp_capable) {
- if (conn->c_trans->xmit_path_prepare)
- conn->c_trans->xmit_path_prepare(cp);
- } else if (conn->c_trans->xmit_prepare) {
- conn->c_trans->xmit_prepare(conn);
- }
+ if (conn->c_trans->xmit_path_prepare)
+ conn->c_trans->xmit_path_prepare(cp);
/*
* spin trying to push headers and data down the connection until
@@ -403,12 +399,8 @@ int rds_send_xmit(struct rds_conn_path *cp)
}
over_batch:
- if (conn->c_trans->t_mp_capable) {
- if (conn->c_trans->xmit_path_complete)
- conn->c_trans->xmit_path_complete(cp);
- } else if (conn->c_trans->xmit_complete) {
- conn->c_trans->xmit_complete(conn);
- }
+ if (conn->c_trans->xmit_path_complete)
+ conn->c_trans->xmit_path_complete(cp);
release_in_xmit(cp);
/* Nuke any messages we decided not to retransmit. */
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 5217d49..b139630 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -340,14 +340,14 @@ static void rds_tcp_exit(void);
struct rds_transport rds_tcp_transport = {
.laddr_check = rds_tcp_laddr_check,
- .xmit_prepare = rds_tcp_xmit_prepare,
- .xmit_complete = rds_tcp_xmit_complete,
+ .xmit_path_prepare = rds_tcp_xmit_path_prepare,
+ .xmit_path_complete = rds_tcp_xmit_path_complete,
.xmit = rds_tcp_xmit,
.recv = rds_tcp_recv,
.conn_alloc = rds_tcp_conn_alloc,
.conn_free = rds_tcp_conn_free,
.conn_connect = rds_tcp_conn_connect,
- .conn_shutdown = rds_tcp_conn_shutdown,
+ .conn_path_shutdown = rds_tcp_conn_path_shutdown,
.inc_copy_to_user = rds_tcp_inc_copy_to_user,
.inc_free = rds_tcp_inc_free,
.stats_info_copy = rds_tcp_stats_info_copy,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 7940bab..728abe2 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -61,7 +61,7 @@ void rds_tcp_accept_work(struct sock *sk);
/* tcp_connect.c */
int rds_tcp_conn_connect(struct rds_connection *conn);
-void rds_tcp_conn_shutdown(struct rds_connection *conn);
+void rds_tcp_conn_path_shutdown(struct rds_conn_path *conn);
void rds_tcp_state_change(struct sock *sk);
/* tcp_listen.c */
@@ -80,8 +80,8 @@ void rds_tcp_inc_free(struct rds_incoming *inc);
int rds_tcp_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
/* tcp_send.c */
-void rds_tcp_xmit_prepare(struct rds_connection *conn);
-void rds_tcp_xmit_complete(struct rds_connection *conn);
+void rds_tcp_xmit_path_prepare(struct rds_conn_path *cp);
+void rds_tcp_xmit_path_complete(struct rds_conn_path *cp);
int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm,
unsigned int hdr_off, unsigned int sg, unsigned int off);
void rds_tcp_write_space(struct sock *sk);
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 96c2c4d..aa65c16 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -144,12 +144,13 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
* callbacks to those set by TCP. Our callbacks won't execute again once we
* hold the sock lock.
*/
-void rds_tcp_conn_shutdown(struct rds_connection *conn)
+void rds_tcp_conn_path_shutdown(struct rds_conn_path *cp)
{
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
struct socket *sock = tc->t_sock;
- rdsdebug("shutting down conn %p tc %p sock %p\n", conn, tc, sock);
+ rdsdebug("shutting down conn %p tc %p sock %p\n",
+ cp->cp_conn, tc, sock);
if (sock) {
sock->ops->shutdown(sock, RCV_SHUTDOWN | SEND_SHUTDOWN);
diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index 710f1aa..52cda94 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -49,16 +49,16 @@ static void rds_tcp_cork(struct socket *sock, int val)
set_fs(oldfs);
}
-void rds_tcp_xmit_prepare(struct rds_connection *conn)
+void rds_tcp_xmit_path_prepare(struct rds_conn_path *cp)
{
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
rds_tcp_cork(tc->t_sock, 1);
}
-void rds_tcp_xmit_complete(struct rds_connection *conn)
+void rds_tcp_xmit_path_complete(struct rds_conn_path *cp)
{
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
rds_tcp_cork(tc->t_sock, 0);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 2/9] RDS: TCP: Remove dead logic around c_passive in rds-tcp
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 1/9] RDS: Rework path specific indirections Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 3/9] RDS: TCP: Make rds_tcp_connection track the rds_conn_path Sowmini Varadhan
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
The c_passive bit is only intended for the IB transport and will
never be encountered in rds-tcp, so remove the dead logic that
predicates on this bit.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/tcp.c | 7 +------
1 files changed, 1 insertions(+), 6 deletions(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index b139630..c56fff2 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -329,11 +329,8 @@ static void rds_tcp_destroy_conns(void)
INIT_LIST_HEAD(&rds_tcp_conn_list);
spin_unlock_irq(&rds_tcp_conn_lock);
- list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) {
- if (tc->conn->c_passive)
- rds_conn_destroy(tc->conn->c_passive);
+ list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
rds_conn_destroy(tc->conn);
- }
}
static void rds_tcp_exit(void);
@@ -512,8 +509,6 @@ static void rds_tcp_kill_sock(struct net *net)
sk = tc->t_sock->sk;
sk->sk_prot->disconnect(sk, 0);
tcp_done(sk);
- if (tc->conn->c_passive)
- rds_conn_destroy(tc->conn->c_passive);
rds_conn_destroy(tc->conn);
}
}
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 3/9] RDS: TCP: Make rds_tcp_connection track the rds_conn_path
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 1/9] RDS: Rework path specific indirections Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 2/9] RDS: TCP: Remove dead logic around c_passive in rds-tcp Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 4/9] RDS: TCP: Refactor connection destruction to handle multiple paths Sowmini Varadhan
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
The struct rds_tcp_connection is the transport-specific private
data structure that tracks TCP information per rds_conn_path.
Modify this structure to have a back-pointer to the rds_conn_path
for which it is the ->cp_transport_data.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/connection.c | 30 +++++++++++++++---------------
net/rds/tcp.c | 44 +++++++++++++++++++++++++-------------------
net/rds/tcp.h | 6 +++---
net/rds/tcp_connect.c | 6 +++---
net/rds/tcp_listen.c | 4 ++--
5 files changed, 48 insertions(+), 42 deletions(-)
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 17c2f25..1b0c2a7 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -253,9 +253,12 @@ static struct rds_connection *__rds_conn_create(struct net *net,
for (i = 0; i < RDS_MPATH_WORKERS; i++) {
cp = &conn->c_path[i];
- trans->conn_free(cp->cp_transport_data);
- if (!trans->t_mp_capable)
- break;
+ /* The ->conn_alloc invocation may have
+ * allocated resource for all paths, so all
+ * of them may have to be freed here.
+ */
+ if (cp->cp_transport_data)
+ trans->conn_free(cp->cp_transport_data);
}
kmem_cache_free(rds_conn_slab, conn);
conn = found;
@@ -367,6 +370,9 @@ static void rds_conn_path_destroy(struct rds_conn_path *cp)
{
struct rds_message *rm, *rtmp;
+ if (!cp->cp_transport_data)
+ return;
+
rds_conn_path_drop(cp);
flush_work(&cp->cp_down_w);
@@ -398,6 +404,8 @@ static void rds_conn_path_destroy(struct rds_conn_path *cp)
void rds_conn_destroy(struct rds_connection *conn)
{
unsigned long flags;
+ int i;
+ struct rds_conn_path *cp;
rdsdebug("freeing conn %p for %pI4 -> "
"%pI4\n", conn, &conn->c_laddr,
@@ -410,18 +418,10 @@ void rds_conn_destroy(struct rds_connection *conn)
synchronize_rcu();
/* shut the connection down */
- if (!conn->c_trans->t_mp_capable) {
- rds_conn_path_destroy(&conn->c_path[0]);
- BUG_ON(!list_empty(&conn->c_path[0].cp_retrans));
- } else {
- int i;
- struct rds_conn_path *cp;
-
- for (i = 0; i < RDS_MPATH_WORKERS; i++) {
- cp = &conn->c_path[i];
- rds_conn_path_destroy(cp);
- BUG_ON(!list_empty(&cp->cp_retrans));
- }
+ for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+ cp = &conn->c_path[i];
+ rds_conn_path_destroy(cp);
+ BUG_ON(!list_empty(&cp->cp_retrans));
}
/*
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c56fff2..c6b47f6 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -221,7 +221,7 @@ void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn)
sock->sk->sk_data_ready = sock->sk->sk_user_data;
tc->t_sock = sock;
- tc->conn = conn;
+ tc->t_cpath = &conn->c_path[0];
tc->t_orig_data_ready = sock->sk->sk_data_ready;
tc->t_orig_write_space = sock->sk->sk_write_space;
tc->t_orig_state_change = sock->sk->sk_state_change;
@@ -284,24 +284,29 @@ static int rds_tcp_laddr_check(struct net *net, __be32 addr)
static int rds_tcp_conn_alloc(struct rds_connection *conn, gfp_t gfp)
{
struct rds_tcp_connection *tc;
+ int i;
- tc = kmem_cache_alloc(rds_tcp_conn_slab, gfp);
- if (!tc)
- return -ENOMEM;
+ for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+ tc = kmem_cache_alloc(rds_tcp_conn_slab, gfp);
+ if (!tc)
+ return -ENOMEM;
- mutex_init(&tc->t_conn_lock);
- tc->t_sock = NULL;
- tc->t_tinc = NULL;
- tc->t_tinc_hdr_rem = sizeof(struct rds_header);
- tc->t_tinc_data_rem = 0;
+ mutex_init(&tc->t_conn_path_lock);
+ tc->t_sock = NULL;
+ tc->t_tinc = NULL;
+ tc->t_tinc_hdr_rem = sizeof(struct rds_header);
+ tc->t_tinc_data_rem = 0;
- conn->c_transport_data = tc;
+ conn->c_path[i].cp_transport_data = tc;
+ tc->t_cpath = &conn->c_path[i];
- spin_lock_irq(&rds_tcp_conn_lock);
- list_add_tail(&tc->t_tcp_node, &rds_tcp_conn_list);
- spin_unlock_irq(&rds_tcp_conn_lock);
+ spin_lock_irq(&rds_tcp_conn_lock);
+ list_add_tail(&tc->t_tcp_node, &rds_tcp_conn_list);
+ spin_unlock_irq(&rds_tcp_conn_lock);
+ rdsdebug("rds_conn_path [%d] tc %p\n", i,
+ conn->c_path[i].cp_transport_data);
+ }
- rdsdebug("alloced tc %p\n", conn->c_transport_data);
return 0;
}
@@ -330,7 +335,7 @@ static void rds_tcp_destroy_conns(void)
spin_unlock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
- rds_conn_destroy(tc->conn);
+ rds_conn_destroy(tc->t_cpath->cp_conn);
}
static void rds_tcp_exit(void);
@@ -498,7 +503,7 @@ static void rds_tcp_kill_sock(struct net *net)
flush_work(&rtn->rds_tcp_accept_w);
spin_lock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) {
- struct net *c_net = read_pnet(&tc->conn->c_net);
+ struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
if (net != c_net || !tc->t_sock)
continue;
@@ -509,7 +514,7 @@ static void rds_tcp_kill_sock(struct net *net)
sk = tc->t_sock->sk;
sk->sk_prot->disconnect(sk, 0);
tcp_done(sk);
- rds_conn_destroy(tc->conn);
+ rds_conn_destroy(tc->t_cpath->cp_conn);
}
}
@@ -547,12 +552,13 @@ static void rds_tcp_sysctl_reset(struct net *net)
spin_lock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) {
- struct net *c_net = read_pnet(&tc->conn->c_net);
+ struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
if (net != c_net || !tc->t_sock)
continue;
- rds_conn_drop(tc->conn); /* reconnect with new parameters */
+ /* reconnect with new parameters */
+ rds_conn_path_drop(tc->t_cpath);
}
spin_unlock_irq(&rds_tcp_conn_lock);
}
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 728abe2..e1ff169 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -11,11 +11,11 @@ struct rds_tcp_incoming {
struct rds_tcp_connection {
struct list_head t_tcp_node;
- struct rds_connection *conn;
- /* t_conn_lock synchronizes the connection establishment between
+ struct rds_conn_path *t_cpath;
+ /* t_conn_path_lock synchronizes the connection establishment between
* rds_tcp_accept_one and rds_tcp_conn_connect
*/
- struct mutex t_conn_lock;
+ struct mutex t_conn_path_lock;
struct socket *t_sock;
void *t_orig_write_space;
void *t_orig_data_ready;
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index aa65c16..146692c 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -82,10 +82,10 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
int ret;
struct rds_tcp_connection *tc = conn->c_transport_data;
- mutex_lock(&tc->t_conn_lock);
+ mutex_lock(&tc->t_conn_path_lock);
if (rds_conn_up(conn)) {
- mutex_unlock(&tc->t_conn_lock);
+ mutex_unlock(&tc->t_conn_path_lock);
return 0;
}
ret = sock_create_kern(rds_conn_net(conn), PF_INET,
@@ -129,7 +129,7 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
}
out:
- mutex_unlock(&tc->t_conn_lock);
+ mutex_unlock(&tc->t_conn_path_lock);
if (sock)
sock_release(sock);
return ret;
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index f9cc945..d893346 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -121,7 +121,7 @@ int rds_tcp_accept_one(struct socket *sock)
*/
rs_tcp = (struct rds_tcp_connection *)conn->c_transport_data;
rds_conn_transition(conn, RDS_CONN_DOWN, RDS_CONN_CONNECTING);
- mutex_lock(&rs_tcp->t_conn_lock);
+ mutex_lock(&rs_tcp->t_conn_path_lock);
conn_state = rds_conn_state(conn);
if (conn_state != RDS_CONN_CONNECTING && conn_state != RDS_CONN_UP)
goto rst_nsk;
@@ -156,7 +156,7 @@ int rds_tcp_accept_one(struct socket *sock)
ret = 0;
out:
if (rs_tcp)
- mutex_unlock(&rs_tcp->t_conn_lock);
+ mutex_unlock(&rs_tcp->t_conn_path_lock);
if (new_sock)
sock_release(new_sock);
return ret;
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 4/9] RDS: TCP: Refactor connection destruction to handle multiple paths
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
` (2 preceding siblings ...)
2016-06-30 23:11 ` [PATCH net-next 3/9] RDS: TCP: Make rds_tcp_connection track the rds_conn_path Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 5/9] RDS: TCP: make ->sk_user_data point to a rds_conn_path Sowmini Varadhan
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
A single rds_connection may have multiple rds_conn_paths that have
to be carefully and correctly destroyed, for both rmmod and
netns-delete cases.
For both cases, we extract a single rds_tcp_connection for
each conn into a temporary list, and then invoke rds_conn_destroy()
which iteratively dismantles every path in the rds_connection.
For the netns deletion case, we additionally have to make sure
that we do not leave a socket in TIME_WAIT state, as this will
hold up the netns deletion. Thus we call rds_tcp_conn_paths_destroy()
to reset state quickly.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/tcp.c | 46 +++++++++++++++++++++++++++++++++++++++-------
1 files changed, 39 insertions(+), 7 deletions(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c6b47f6..b327727 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -323,6 +323,17 @@ static void rds_tcp_conn_free(void *arg)
kmem_cache_free(rds_tcp_conn_slab, tc);
}
+static bool list_has_conn(struct list_head *list, struct rds_connection *conn)
+{
+ struct rds_tcp_connection *tc, *_tc;
+
+ list_for_each_entry_safe(tc, _tc, list, t_tcp_node) {
+ if (tc->t_cpath->cp_conn == conn)
+ return true;
+ }
+ return false;
+}
+
static void rds_tcp_destroy_conns(void)
{
struct rds_tcp_connection *tc, *_tc;
@@ -330,8 +341,10 @@ static void rds_tcp_destroy_conns(void)
/* avoid calling conn_destroy with irqs off */
spin_lock_irq(&rds_tcp_conn_lock);
- list_splice(&rds_tcp_conn_list, &tmp_list);
- INIT_LIST_HEAD(&rds_tcp_conn_list);
+ list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) {
+ if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn))
+ list_move_tail(&tc->t_tcp_node, &tmp_list);
+ }
spin_unlock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
@@ -491,10 +504,30 @@ static struct pernet_operations rds_tcp_net_ops = {
.size = sizeof(struct rds_tcp_net),
};
+/* explicitly send a RST on each socket, thereby releasing any socket refcnts
+ * that may otherwise hold up netns deletion.
+ */
+static void rds_tcp_conn_paths_destroy(struct rds_connection *conn)
+{
+ struct rds_conn_path *cp;
+ struct rds_tcp_connection *tc;
+ int i;
+ struct sock *sk;
+
+ for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+ cp = &conn->c_path[i];
+ tc = cp->cp_transport_data;
+ if (!tc->t_sock)
+ continue;
+ sk = tc->t_sock->sk;
+ sk->sk_prot->disconnect(sk, 0);
+ tcp_done(sk);
+ }
+}
+
static void rds_tcp_kill_sock(struct net *net)
{
struct rds_tcp_connection *tc, *_tc;
- struct sock *sk;
LIST_HEAD(tmp_list);
struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
@@ -507,13 +540,12 @@ static void rds_tcp_kill_sock(struct net *net)
if (net != c_net || !tc->t_sock)
continue;
- list_move_tail(&tc->t_tcp_node, &tmp_list);
+ if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn))
+ list_move_tail(&tc->t_tcp_node, &tmp_list);
}
spin_unlock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) {
- sk = tc->t_sock->sk;
- sk->sk_prot->disconnect(sk, 0);
- tcp_done(sk);
+ rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn);
rds_conn_destroy(tc->t_cpath->cp_conn);
}
}
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 5/9] RDS: TCP: make ->sk_user_data point to a rds_conn_path
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
` (3 preceding siblings ...)
2016-06-30 23:11 ` [PATCH net-next 4/9] RDS: TCP: Refactor connection destruction to handle multiple paths Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 6/9] RDS: TCP: make receive path use the rds_conn_path Sowmini Varadhan
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
The socket callbacks should all operate on a struct rds_conn_path,
in preparation for a MP capable RDS-TCP.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/tcp.c | 25 +++++++++++++------------
net/rds/tcp.h | 4 ++--
net/rds/tcp_connect.c | 16 ++++++++--------
net/rds/tcp_listen.c | 12 ++++++------
net/rds/tcp_recv.c | 12 ++++++------
net/rds/tcp_send.c | 12 ++++++------
6 files changed, 41 insertions(+), 40 deletions(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index b327727..5658f3e 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -136,9 +136,9 @@ void rds_tcp_restore_callbacks(struct socket *sock,
* from being called while it isn't set.
*/
void rds_tcp_reset_callbacks(struct socket *sock,
- struct rds_connection *conn)
+ struct rds_conn_path *cp)
{
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
struct socket *osock = tc->t_sock;
if (!osock)
@@ -148,8 +148,8 @@ void rds_tcp_reset_callbacks(struct socket *sock,
* We have an outstanding SYN to this peer, which may
* potentially have transitioned to the RDS_CONN_UP state,
* so we must quiesce any send threads before resetting
- * c_transport_data. We quiesce these threads by setting
- * c_state to something other than RDS_CONN_UP, and then
+ * cp_transport_data. We quiesce these threads by setting
+ * cp_state to something other than RDS_CONN_UP, and then
* waiting for any existing threads in rds_send_xmit to
* complete release_in_xmit(). (Subsequent threads entering
* rds_send_xmit() will bail on !rds_conn_up().
@@ -164,8 +164,8 @@ void rds_tcp_reset_callbacks(struct socket *sock,
* RDS_CONN_RESETTTING, to ensure that rds_tcp_state_change
* cannot mark rds_conn_path_up() in the window before lock_sock()
*/
- atomic_set(&conn->c_state, RDS_CONN_RESETTING);
- wait_event(conn->c_waitq, !test_bit(RDS_IN_XMIT, &conn->c_flags));
+ atomic_set(&cp->cp_state, RDS_CONN_RESETTING);
+ wait_event(cp->cp_waitq, !test_bit(RDS_IN_XMIT, &cp->cp_flags));
lock_sock(osock->sk);
/* reset receive side state for rds_tcp_data_recv() for osock */
if (tc->t_tinc) {
@@ -186,11 +186,12 @@ void rds_tcp_reset_callbacks(struct socket *sock,
release_sock(osock->sk);
sock_release(osock);
newsock:
- rds_send_path_reset(&conn->c_path[0]);
+ rds_send_path_reset(cp);
lock_sock(sock->sk);
write_lock_bh(&sock->sk->sk_callback_lock);
tc->t_sock = sock;
- sock->sk->sk_user_data = conn;
+ tc->t_cpath = cp;
+ sock->sk->sk_user_data = cp;
sock->sk->sk_data_ready = rds_tcp_data_ready;
sock->sk->sk_write_space = rds_tcp_write_space;
sock->sk->sk_state_change = rds_tcp_state_change;
@@ -203,9 +204,9 @@ void rds_tcp_reset_callbacks(struct socket *sock,
* above rds_tcp_reset_callbacks for notes about synchronization
* with data path
*/
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn)
+void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp)
{
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
rdsdebug("setting sock %p callbacks to tc %p\n", sock, tc);
write_lock_bh(&sock->sk->sk_callback_lock);
@@ -221,12 +222,12 @@ void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn)
sock->sk->sk_data_ready = sock->sk->sk_user_data;
tc->t_sock = sock;
- tc->t_cpath = &conn->c_path[0];
+ tc->t_cpath = cp;
tc->t_orig_data_ready = sock->sk->sk_data_ready;
tc->t_orig_write_space = sock->sk->sk_write_space;
tc->t_orig_state_change = sock->sk->sk_state_change;
- sock->sk->sk_user_data = conn;
+ sock->sk->sk_user_data = cp;
sock->sk->sk_data_ready = rds_tcp_data_ready;
sock->sk->sk_write_space = rds_tcp_write_space;
sock->sk->sk_state_change = rds_tcp_state_change;
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index e1ff169..151b09d 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -49,8 +49,8 @@ struct rds_tcp_statistics {
/* tcp.c */
void rds_tcp_tune(struct socket *sock);
void rds_tcp_nonagle(struct socket *sock);
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn);
-void rds_tcp_reset_callbacks(struct socket *sock, struct rds_connection *conn);
+void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
+void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
void rds_tcp_restore_callbacks(struct socket *sock,
struct rds_tcp_connection *tc);
u32 rds_tcp_snd_nxt(struct rds_tcp_connection *tc);
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 146692c..7eddce5 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -41,16 +41,16 @@
void rds_tcp_state_change(struct sock *sk)
{
void (*state_change)(struct sock *sk);
- struct rds_connection *conn;
+ struct rds_conn_path *cp;
struct rds_tcp_connection *tc;
read_lock_bh(&sk->sk_callback_lock);
- conn = sk->sk_user_data;
- if (!conn) {
+ cp = sk->sk_user_data;
+ if (!cp) {
state_change = sk->sk_state_change;
goto out;
}
- tc = conn->c_transport_data;
+ tc = cp->cp_transport_data;
state_change = tc->t_orig_state_change;
rdsdebug("sock %p state_change to %d\n", tc->t_sock, sk->sk_state);
@@ -61,12 +61,11 @@ void rds_tcp_state_change(struct sock *sk)
case TCP_SYN_RECV:
break;
case TCP_ESTABLISHED:
- rds_connect_path_complete(&conn->c_path[0],
- RDS_CONN_CONNECTING);
+ rds_connect_path_complete(cp, RDS_CONN_CONNECTING);
break;
case TCP_CLOSE_WAIT:
case TCP_CLOSE:
- rds_conn_drop(conn);
+ rds_conn_path_drop(cp);
default:
break;
}
@@ -81,6 +80,7 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
struct sockaddr_in src, dest;
int ret;
struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_conn_path *cp = &conn->c_path[0];
mutex_lock(&tc->t_conn_path_lock);
@@ -114,7 +114,7 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
* once we call connect() we can start getting callbacks and they
* own the socket
*/
- rds_tcp_set_callbacks(sock, conn);
+ rds_tcp_set_callbacks(sock, cp);
ret = sock->ops->connect(sock, (struct sockaddr *)&dest, sizeof(dest),
O_NONBLOCK);
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index d893346..ca975a2 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -79,6 +79,7 @@ int rds_tcp_accept_one(struct socket *sock)
struct inet_sock *inet;
struct rds_tcp_connection *rs_tcp = NULL;
int conn_state;
+ struct rds_conn_path *cp;
if (!sock) /* module unload or netns delete in progress */
return -ENETUNREACH;
@@ -120,6 +121,7 @@ int rds_tcp_accept_one(struct socket *sock)
* rds_tcp_state_change() will do that cleanup
*/
rs_tcp = (struct rds_tcp_connection *)conn->c_transport_data;
+ cp = &conn->c_path[0];
rds_conn_transition(conn, RDS_CONN_DOWN, RDS_CONN_CONNECTING);
mutex_lock(&rs_tcp->t_conn_path_lock);
conn_state = rds_conn_state(conn);
@@ -136,16 +138,14 @@ int rds_tcp_accept_one(struct socket *sock)
!conn->c_path[0].cp_outgoing) {
goto rst_nsk;
} else {
- rds_tcp_reset_callbacks(new_sock, conn);
+ rds_tcp_reset_callbacks(new_sock, cp);
conn->c_path[0].cp_outgoing = 0;
/* rds_connect_path_complete() marks RDS_CONN_UP */
- rds_connect_path_complete(&conn->c_path[0],
- RDS_CONN_RESETTING);
+ rds_connect_path_complete(cp, RDS_CONN_RESETTING);
}
} else {
- rds_tcp_set_callbacks(new_sock, conn);
- rds_connect_path_complete(&conn->c_path[0],
- RDS_CONN_CONNECTING);
+ rds_tcp_set_callbacks(new_sock, cp);
+ rds_connect_path_complete(cp, RDS_CONN_CONNECTING);
}
new_sock = NULL;
ret = 0;
diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index 4a87d9e..aa7a79a 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -297,24 +297,24 @@ int rds_tcp_recv(struct rds_connection *conn)
void rds_tcp_data_ready(struct sock *sk)
{
void (*ready)(struct sock *sk);
- struct rds_connection *conn;
+ struct rds_conn_path *cp;
struct rds_tcp_connection *tc;
rdsdebug("data ready sk %p\n", sk);
read_lock_bh(&sk->sk_callback_lock);
- conn = sk->sk_user_data;
- if (!conn) { /* check for teardown race */
+ cp = sk->sk_user_data;
+ if (!cp) { /* check for teardown race */
ready = sk->sk_data_ready;
goto out;
}
- tc = conn->c_transport_data;
+ tc = cp->cp_transport_data;
ready = tc->t_orig_data_ready;
rds_tcp_stats_inc(s_tcp_data_ready_calls);
- if (rds_tcp_read_sock(conn, GFP_ATOMIC) == -ENOMEM)
- queue_delayed_work(rds_wq, &conn->c_recv_w, 0);
+ if (rds_tcp_read_sock(cp->cp_conn, GFP_ATOMIC) == -ENOMEM)
+ queue_delayed_work(rds_wq, &cp->cp_recv_w, 0);
out:
read_unlock_bh(&sk->sk_callback_lock);
ready(sk);
diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index 52cda94..57e0f58 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -178,27 +178,27 @@ static int rds_tcp_is_acked(struct rds_message *rm, uint64_t ack)
void rds_tcp_write_space(struct sock *sk)
{
void (*write_space)(struct sock *sk);
- struct rds_connection *conn;
+ struct rds_conn_path *cp;
struct rds_tcp_connection *tc;
read_lock_bh(&sk->sk_callback_lock);
- conn = sk->sk_user_data;
- if (!conn) {
+ cp = sk->sk_user_data;
+ if (!cp) {
write_space = sk->sk_write_space;
goto out;
}
- tc = conn->c_transport_data;
+ tc = cp->cp_transport_data;
rdsdebug("write_space for tc %p\n", tc);
write_space = tc->t_orig_write_space;
rds_tcp_stats_inc(s_tcp_write_space_calls);
rdsdebug("tcp una %u\n", rds_tcp_snd_una(tc));
tc->t_last_seen_una = rds_tcp_snd_una(tc);
- rds_send_drop_acked(conn, rds_tcp_snd_una(tc), rds_tcp_is_acked);
+ rds_send_path_drop_acked(cp, rds_tcp_snd_una(tc), rds_tcp_is_acked);
if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf)
- queue_delayed_work(rds_wq, &conn->c_send_w, 0);
+ queue_delayed_work(rds_wq, &cp->cp_send_w, 0);
out:
read_unlock_bh(&sk->sk_callback_lock);
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 6/9] RDS: TCP: make receive path use the rds_conn_path
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
` (4 preceding siblings ...)
2016-06-30 23:11 ` [PATCH net-next 5/9] RDS: TCP: make ->sk_user_data point to a rds_conn_path Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 7/9] RDS: TCP: Hooks to set up a single connection path Sowmini Varadhan
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
The ->sk_user_data contains a pointer to the rds_conn_path
for the socket. Use this consistently in the rds_tcp_data_ready
callbacks to get the rds_conn_path for rds_recv_incoming.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/ib.c | 2 +-
net/rds/ib.h | 2 +-
net/rds/ib_recv.c | 3 ++-
net/rds/loop.c | 4 ++--
net/rds/rds.h | 2 +-
net/rds/tcp.c | 2 +-
net/rds/tcp.h | 2 +-
net/rds/tcp_recv.c | 29 ++++++++++++++++-------------
net/rds/threads.c | 2 +-
9 files changed, 26 insertions(+), 22 deletions(-)
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 1b29ec9..e6ba856 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -385,7 +385,7 @@ struct rds_transport rds_ib_transport = {
.xmit = rds_ib_xmit,
.xmit_rdma = rds_ib_xmit_rdma,
.xmit_atomic = rds_ib_xmit_atomic,
- .recv = rds_ib_recv,
+ .recv_path = rds_ib_recv_path,
.conn_alloc = rds_ib_conn_alloc,
.conn_free = rds_ib_conn_free,
.conn_connect = rds_ib_conn_connect,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 2051f4b..579de7e 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -354,7 +354,7 @@ void rds_ib_mr_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc);
/* ib_recv.c */
int rds_ib_recv_init(void);
void rds_ib_recv_exit(void);
-int rds_ib_recv(struct rds_connection *conn);
+int rds_ib_recv_path(struct rds_conn_path *conn);
int rds_ib_recv_alloc_caches(struct rds_ib_connection *ic);
void rds_ib_recv_free_caches(struct rds_ib_connection *ic);
void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp);
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 4ea8cb1..606a11f 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -1009,8 +1009,9 @@ void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic,
rds_ib_recv_refill(conn, 0, GFP_NOWAIT);
}
-int rds_ib_recv(struct rds_connection *conn)
+int rds_ib_recv_path(struct rds_conn_path *cp)
{
+ struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
int ret = 0;
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 318c21d..20284a4 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -102,7 +102,7 @@ static void rds_loop_inc_free(struct rds_incoming *inc)
}
/* we need to at least give the thread something to succeed */
-static int rds_loop_recv(struct rds_connection *conn)
+static int rds_loop_recv_path(struct rds_conn_path *cp)
{
return 0;
}
@@ -185,7 +185,7 @@ void rds_loop_exit(void)
*/
struct rds_transport rds_loop_transport = {
.xmit = rds_loop_xmit,
- .recv = rds_loop_recv,
+ .recv_path = rds_loop_recv_path,
.conn_alloc = rds_loop_conn_alloc,
.conn_free = rds_loop_conn_free,
.conn_connect = rds_loop_conn_connect,
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 5bbad08..0faca30 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -462,7 +462,7 @@ struct rds_transport {
unsigned int hdr_off, unsigned int sg, unsigned int off);
int (*xmit_rdma)(struct rds_connection *conn, struct rm_rdma_op *op);
int (*xmit_atomic)(struct rds_connection *conn, struct rm_atomic_op *op);
- int (*recv)(struct rds_connection *conn);
+ int (*recv_path)(struct rds_conn_path *cp);
int (*inc_copy_to_user)(struct rds_incoming *inc, struct iov_iter *to);
void (*inc_free)(struct rds_incoming *inc);
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 5658f3e..7bc136c 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -359,7 +359,7 @@ struct rds_transport rds_tcp_transport = {
.xmit_path_prepare = rds_tcp_xmit_path_prepare,
.xmit_path_complete = rds_tcp_xmit_path_complete,
.xmit = rds_tcp_xmit,
- .recv = rds_tcp_recv,
+ .recv_path = rds_tcp_recv_path,
.conn_alloc = rds_tcp_conn_alloc,
.conn_free = rds_tcp_conn_free,
.conn_connect = rds_tcp_conn_connect,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 151b09d..5a5f91a 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -75,7 +75,7 @@ int rds_tcp_keepalive(struct socket *sock);
int rds_tcp_recv_init(void);
void rds_tcp_recv_exit(void);
void rds_tcp_data_ready(struct sock *sk);
-int rds_tcp_recv(struct rds_connection *conn);
+int rds_tcp_recv_path(struct rds_conn_path *cp);
void rds_tcp_inc_free(struct rds_incoming *inc);
int rds_tcp_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index aa7a79a..ad4892e 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -34,7 +34,6 @@
#include <linux/slab.h>
#include <net/tcp.h>
-#include "rds_single_path.h"
#include "rds.h"
#include "tcp.h"
@@ -148,7 +147,7 @@ static void rds_tcp_cong_recv(struct rds_connection *conn,
}
struct rds_tcp_desc_arg {
- struct rds_connection *conn;
+ struct rds_conn_path *conn_path;
gfp_t gfp;
};
@@ -156,8 +155,8 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
unsigned int offset, size_t len)
{
struct rds_tcp_desc_arg *arg = desc->arg.data;
- struct rds_connection *conn = arg->conn;
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_conn_path *cp = arg->conn_path;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
struct rds_tcp_incoming *tinc = tc->t_tinc;
struct sk_buff *clone;
size_t left = len, to_copy;
@@ -179,7 +178,8 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
}
tc->t_tinc = tinc;
rdsdebug("alloced tinc %p\n", tinc);
- rds_inc_init(&tinc->ti_inc, conn, conn->c_faddr);
+ rds_inc_path_init(&tinc->ti_inc, cp,
+ cp->cp_conn->c_faddr);
/*
* XXX * we might be able to use the __ variants when
* we've already serialized at a higher level.
@@ -229,6 +229,8 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
}
if (tc->t_tinc_hdr_rem == 0 && tc->t_tinc_data_rem == 0) {
+ struct rds_connection *conn = cp->cp_conn;
+
if (tinc->ti_inc.i_hdr.h_flags == RDS_FLAG_CONG_BITMAP)
rds_tcp_cong_recv(conn, tinc);
else
@@ -251,15 +253,15 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
}
/* the caller has to hold the sock lock */
-static int rds_tcp_read_sock(struct rds_connection *conn, gfp_t gfp)
+static int rds_tcp_read_sock(struct rds_conn_path *cp, gfp_t gfp)
{
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
struct socket *sock = tc->t_sock;
read_descriptor_t desc;
struct rds_tcp_desc_arg arg;
/* It's like glib in the kernel! */
- arg.conn = conn;
+ arg.conn_path = cp;
arg.gfp = gfp;
desc.arg.data = &arg;
desc.error = 0;
@@ -279,16 +281,17 @@ static int rds_tcp_read_sock(struct rds_connection *conn, gfp_t gfp)
* if we fail to allocate we're in trouble.. blindly wait some time before
* trying again to see if the VM can free up something for us.
*/
-int rds_tcp_recv(struct rds_connection *conn)
+int rds_tcp_recv_path(struct rds_conn_path *cp)
{
- struct rds_tcp_connection *tc = conn->c_transport_data;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
struct socket *sock = tc->t_sock;
int ret = 0;
- rdsdebug("recv worker conn %p tc %p sock %p\n", conn, tc, sock);
+ rdsdebug("recv worker path [%d] tc %p sock %p\n",
+ cp->cp_index, tc, sock);
lock_sock(sock->sk);
- ret = rds_tcp_read_sock(conn, GFP_KERNEL);
+ ret = rds_tcp_read_sock(cp, GFP_KERNEL);
release_sock(sock->sk);
return ret;
@@ -313,7 +316,7 @@ void rds_tcp_data_ready(struct sock *sk)
ready = tc->t_orig_data_ready;
rds_tcp_stats_inc(s_tcp_data_ready_calls);
- if (rds_tcp_read_sock(cp->cp_conn, GFP_ATOMIC) == -ENOMEM)
+ if (rds_tcp_read_sock(cp, GFP_ATOMIC) == -ENOMEM)
queue_delayed_work(rds_wq, &cp->cp_recv_w, 0);
out:
read_unlock_bh(&sk->sk_callback_lock);
diff --git a/net/rds/threads.c b/net/rds/threads.c
index 9fbe95b..f717b69 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -203,7 +203,7 @@ void rds_recv_worker(struct work_struct *work)
int ret;
if (rds_conn_path_state(cp) == RDS_CONN_UP) {
- ret = cp->cp_conn->c_trans->recv(cp->cp_conn);
+ ret = cp->cp_conn->c_trans->recv_path(cp);
rdsdebug("conn %p ret %d\n", cp->cp_conn, ret);
switch (ret) {
case -EAGAIN:
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 7/9] RDS: TCP: Hooks to set up a single connection path
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
` (5 preceding siblings ...)
2016-06-30 23:11 ` [PATCH net-next 6/9] RDS: TCP: make receive path use the rds_conn_path Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 8/9] RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts Sowmini Varadhan
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
This patch adds ->conn_path_connect callbacks in the rds_transport
that are used to set up a single connection path.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/ib.c | 2 +-
net/rds/ib.h | 2 +-
net/rds/ib_cm.c | 3 ++-
net/rds/loop.c | 6 +++---
net/rds/rds.h | 2 +-
net/rds/tcp.c | 2 +-
net/rds/tcp.h | 4 ++--
net/rds/tcp_connect.c | 11 ++++++-----
net/rds/threads.c | 5 +++--
9 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/net/rds/ib.c b/net/rds/ib.c
index e6ba856..7eaf887 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -388,7 +388,7 @@ struct rds_transport rds_ib_transport = {
.recv_path = rds_ib_recv_path,
.conn_alloc = rds_ib_conn_alloc,
.conn_free = rds_ib_conn_free,
- .conn_connect = rds_ib_conn_connect,
+ .conn_path_connect = rds_ib_conn_path_connect,
.conn_path_shutdown = rds_ib_conn_path_shutdown,
.inc_copy_to_user = rds_ib_inc_copy_to_user,
.inc_free = rds_ib_inc_free,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 579de7e..046f750 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -328,7 +328,7 @@ extern struct list_head ib_nodev_conns;
/* ib_cm.c */
int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp);
void rds_ib_conn_free(void *arg);
-int rds_ib_conn_connect(struct rds_connection *conn);
+int rds_ib_conn_path_connect(struct rds_conn_path *cp);
void rds_ib_conn_path_shutdown(struct rds_conn_path *cp);
void rds_ib_state_change(struct sock *sk);
int rds_ib_listen_init(void);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index e34ea0b..5b2ab95 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -685,8 +685,9 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id)
return ret;
}
-int rds_ib_conn_connect(struct rds_connection *conn)
+int rds_ib_conn_path_connect(struct rds_conn_path *cp)
{
+ struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
struct sockaddr_in src, dest;
int ret;
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 20284a4..f2bf78d 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -150,9 +150,9 @@ static void rds_loop_conn_free(void *arg)
kfree(lc);
}
-static int rds_loop_conn_connect(struct rds_connection *conn)
+static int rds_loop_conn_path_connect(struct rds_conn_path *cp)
{
- rds_connect_complete(conn);
+ rds_connect_complete(cp->cp_conn);
return 0;
}
@@ -188,7 +188,7 @@ struct rds_transport rds_loop_transport = {
.recv_path = rds_loop_recv_path,
.conn_alloc = rds_loop_conn_alloc,
.conn_free = rds_loop_conn_free,
- .conn_connect = rds_loop_conn_connect,
+ .conn_path_connect = rds_loop_conn_path_connect,
.conn_path_shutdown = rds_loop_conn_path_shutdown,
.inc_copy_to_user = rds_message_inc_copy_to_user,
.inc_free = rds_loop_inc_free,
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 0faca30..6ef07bd 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -454,7 +454,7 @@ struct rds_transport {
int (*laddr_check)(struct net *net, __be32 addr);
int (*conn_alloc)(struct rds_connection *conn, gfp_t gfp);
void (*conn_free)(void *data);
- int (*conn_connect)(struct rds_connection *conn);
+ int (*conn_path_connect)(struct rds_conn_path *cp);
void (*conn_path_shutdown)(struct rds_conn_path *conn);
void (*xmit_path_prepare)(struct rds_conn_path *cp);
void (*xmit_path_complete)(struct rds_conn_path *cp);
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 7bc136c..d278432 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -362,7 +362,7 @@ struct rds_transport rds_tcp_transport = {
.recv_path = rds_tcp_recv_path,
.conn_alloc = rds_tcp_conn_alloc,
.conn_free = rds_tcp_conn_free,
- .conn_connect = rds_tcp_conn_connect,
+ .conn_path_connect = rds_tcp_conn_path_connect,
.conn_path_shutdown = rds_tcp_conn_path_shutdown,
.inc_copy_to_user = rds_tcp_inc_copy_to_user,
.inc_free = rds_tcp_inc_free,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 5a5f91a..1c3160f 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -13,7 +13,7 @@ struct rds_tcp_connection {
struct list_head t_tcp_node;
struct rds_conn_path *t_cpath;
/* t_conn_path_lock synchronizes the connection establishment between
- * rds_tcp_accept_one and rds_tcp_conn_connect
+ * rds_tcp_accept_one and rds_tcp_conn_path_connect
*/
struct mutex t_conn_path_lock;
struct socket *t_sock;
@@ -60,7 +60,7 @@ extern struct rds_transport rds_tcp_transport;
void rds_tcp_accept_work(struct sock *sk);
/* tcp_connect.c */
-int rds_tcp_conn_connect(struct rds_connection *conn);
+int rds_tcp_conn_path_connect(struct rds_conn_path *cp);
void rds_tcp_conn_path_shutdown(struct rds_conn_path *conn);
void rds_tcp_state_change(struct sock *sk);
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 7eddce5..c916715 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -74,17 +74,17 @@ void rds_tcp_state_change(struct sock *sk)
state_change(sk);
}
-int rds_tcp_conn_connect(struct rds_connection *conn)
+int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
{
struct socket *sock = NULL;
struct sockaddr_in src, dest;
int ret;
- struct rds_tcp_connection *tc = conn->c_transport_data;
- struct rds_conn_path *cp = &conn->c_path[0];
+ struct rds_connection *conn = cp->cp_conn;
+ struct rds_tcp_connection *tc = cp->cp_transport_data;
mutex_lock(&tc->t_conn_path_lock);
- if (rds_conn_up(conn)) {
+ if (rds_conn_path_up(cp)) {
mutex_unlock(&tc->t_conn_path_lock);
return 0;
}
@@ -118,6 +118,7 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
ret = sock->ops->connect(sock, (struct sockaddr *)&dest, sizeof(dest),
O_NONBLOCK);
+ cp->cp_outgoing = 1;
rdsdebug("connect to address %pI4 returned %d\n", &conn->c_faddr, ret);
if (ret == -EINPROGRESS)
ret = 0;
@@ -125,7 +126,7 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
rds_tcp_keepalive(sock);
sock = NULL;
} else {
- rds_tcp_restore_callbacks(sock, conn->c_transport_data);
+ rds_tcp_restore_callbacks(sock, cp->cp_transport_data);
}
out:
diff --git a/net/rds/threads.c b/net/rds/threads.c
index f717b69..e8f0941 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -152,8 +152,9 @@ void rds_connect_worker(struct work_struct *work)
int ret;
clear_bit(RDS_RECONNECT_PENDING, &cp->cp_flags);
- if (rds_conn_path_transition(cp, RDS_CONN_DOWN, RDS_CONN_CONNECTING)) {
- ret = conn->c_trans->conn_connect(conn);
+ ret = rds_conn_path_transition(cp, RDS_CONN_DOWN, RDS_CONN_CONNECTING);
+ if (ret) {
+ ret = conn->c_trans->conn_path_connect(cp);
rdsdebug("conn %p for %pI4 to %pI4 dispatched, ret %d\n",
conn, &conn->c_laddr, &conn->c_faddr, ret);
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 8/9] RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
` (6 preceding siblings ...)
2016-06-30 23:11 ` [PATCH net-next 7/9] RDS: TCP: Hooks to set up a single connection path Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 9/9] RDS: Do not send a pong to an incoming ping with 0 src port Sowmini Varadhan
2016-07-01 20:46 ` [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support David Miller
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
When reconnecting, the peer with the smaller IP address will initiate
the reconnect, to avoid needless duelling SYN issues.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/connection.c | 4 +---
net/rds/threads.c | 5 +++++
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 1b0c2a7..19a4fee 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -355,9 +355,7 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
rcu_read_lock();
if (!hlist_unhashed(&conn->c_hash_node)) {
rcu_read_unlock();
- if (conn->c_trans->t_type != RDS_TRANS_TCP ||
- cp->cp_outgoing == 1)
- rds_queue_reconnect(cp);
+ rds_queue_reconnect(cp);
} else {
rcu_read_unlock();
}
diff --git a/net/rds/threads.c b/net/rds/threads.c
index e8f0941..bc97d67 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -125,6 +125,11 @@ void rds_queue_reconnect(struct rds_conn_path *cp)
conn, &conn->c_laddr, &conn->c_faddr,
cp->cp_reconnect_jiffies);
+ /* let peer with smaller addr initiate reconnect, to avoid duels */
+ if (conn->c_trans->t_type == RDS_TRANS_TCP &&
+ conn->c_laddr > conn->c_faddr)
+ return;
+
set_bit(RDS_RECONNECT_PENDING, &cp->cp_flags);
if (cp->cp_reconnect_jiffies == 0) {
cp->cp_reconnect_jiffies = rds_sysctl_reconnect_min_jiffies;
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 9/9] RDS: Do not send a pong to an incoming ping with 0 src port
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
` (7 preceding siblings ...)
2016-06-30 23:11 ` [PATCH net-next 8/9] RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts Sowmini Varadhan
@ 2016-06-30 23:11 ` Sowmini Varadhan
2016-07-01 20:46 ` [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support David Miller
9 siblings, 0 replies; 11+ messages in thread
From: Sowmini Varadhan @ 2016-06-30 23:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
RDS ping messages are sent with a non-zero src port to a zero
dst port, so that the rds pong messages can be sent back to the
originators src port. However if a confused/malicious sender
sends a ping with a 0 src port, we'd have an infinite ping-pong
loop. To avoid this, the receiver should ignore ping messages
with a 0 src port.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/recv.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/net/rds/recv.c b/net/rds/recv.c
index b58f505..fed53a6 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -226,6 +226,10 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
cp->cp_next_rx_seq = be64_to_cpu(inc->i_hdr.h_sequence) + 1;
if (rds_sysctl_ping_enable && inc->i_hdr.h_dport == 0) {
+ if (inc->i_hdr.h_sport == 0) {
+ rdsdebug("ignore ping with 0 sport from 0x%x\n", saddr);
+ goto out;
+ }
rds_stats_inc(s_recv_ping);
rds_send_pong(cp, inc->i_hdr.h_sport);
goto out;
--
1.7.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
` (8 preceding siblings ...)
2016-06-30 23:11 ` [PATCH net-next 9/9] RDS: Do not send a pong to an incoming ping with 0 src port Sowmini Varadhan
@ 2016-07-01 20:46 ` David Miller
9 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2016-07-01 20:46 UTC (permalink / raw)
To: sowmini.varadhan; +Cc: netdev, rds-devel, santosh.shilimkar
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 30 Jun 2016 16:11:09 -0700
> The second installment of changes to enable multipath support in
> RDS-TCP. This series implements the changes in rds-tcp so that the
> rds_conn_path has a pointer to the rds_tcp_connection in cp_transport_data.
> Struct rds_tcp_connection keeps track of the inet_sk per path in
> t_sock. The ->sk_user_data in turn is a pointer to the rds_conn_path.
> With this set of changes, rds_tcp has the needed plumbing to handle
> multiple paths(socket) per rds_connection.
Series applied, thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-07-01 20:46 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-30 23:11 [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 1/9] RDS: Rework path specific indirections Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 2/9] RDS: TCP: Remove dead logic around c_passive in rds-tcp Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 3/9] RDS: TCP: Make rds_tcp_connection track the rds_conn_path Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 4/9] RDS: TCP: Refactor connection destruction to handle multiple paths Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 5/9] RDS: TCP: make ->sk_user_data point to a rds_conn_path Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 6/9] RDS: TCP: make receive path use the rds_conn_path Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 7/9] RDS: TCP: Hooks to set up a single connection path Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 8/9] RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts Sowmini Varadhan
2016-06-30 23:11 ` [PATCH net-next 9/9] RDS: Do not send a pong to an incoming ping with 0 src port Sowmini Varadhan
2016-07-01 20:46 ` [PATCH net-next 0/9] RDS:TCP data structure changes for multipath support David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).