* [PATCH net-next 0/3] rds-tcp netns delete related fixes
@ 2017-11-30 19:11 Sowmini Varadhan
2017-11-30 19:11 ` [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() Sowmini Varadhan
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Sowmini Varadhan @ 2017-11-30 19:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
Patchset contains cleanup and bug fixes. Patch 1 is the removal
of some redundant code/functions. Patch 2 and 3 are fixes for
corner cases identified by syzkaller. I've not been able to
reproduce the actual use-after-free race flagged in the syzkaller
reports, thus these fixes are based on code inspection plus
manual testing to make sure the modified code paths are executed
without problems in the commonly encountered timing cases.
Sowmini Varadhan (3):
rds: tcp: remove redundant function rds_tcp_conn_paths_destroy()
rds: tcp: correctly sequence cleanup on netns deletion.
rds: tcp: atomically purge entries from rds_tcp_conn_list during
netns delete
net/rds/connection.c | 3 ++-
net/rds/rds.h | 6 +++---
net/rds/tcp.c | 38 ++++++++++----------------------------
net/rds/tcp.h | 1 +
4 files changed, 16 insertions(+), 32 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy()
2017-11-30 19:11 [PATCH net-next 0/3] rds-tcp netns delete related fixes Sowmini Varadhan
@ 2017-11-30 19:11 ` Sowmini Varadhan
2017-11-30 20:20 ` Santosh Shilimkar
2017-11-30 19:11 ` [PATCH net-next 2/3] rds: tcp: correctly sequence cleanup on netns deletion Sowmini Varadhan
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Sowmini Varadhan @ 2017-11-30 19:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
A side-effect of Commit c14b0366813a ("rds: tcp: set linger to 1
when unloading a rds-tcp") is that we always send a RST on the tcp
connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy()
is not needed any more and is removed in this patch.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/tcp.c | 25 +------------------------
1 files changed, 1 insertions(+), 24 deletions(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 6b7ee71..222cc53 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -495,27 +495,6 @@ static void __net_exit rds_tcp_exit_net(struct net *net)
.size = sizeof(struct rds_tcp_net),
};
-/* explicitly send a RST on each socket, thereby releasing any socket refcnts
- * that may otherwise hold up netns deletion.
- */
-static void rds_tcp_conn_paths_destroy(struct rds_connection *conn)
-{
- struct rds_conn_path *cp;
- struct rds_tcp_connection *tc;
- int i;
- struct sock *sk;
-
- for (i = 0; i < RDS_MPATH_WORKERS; i++) {
- cp = &conn->c_path[i];
- tc = cp->cp_transport_data;
- if (!tc->t_sock)
- continue;
- sk = tc->t_sock->sk;
- sk->sk_prot->disconnect(sk, 0);
- tcp_done(sk);
- }
-}
-
static void rds_tcp_kill_sock(struct net *net)
{
struct rds_tcp_connection *tc, *_tc;
@@ -535,10 +514,8 @@ static void rds_tcp_kill_sock(struct net *net)
list_move_tail(&tc->t_tcp_node, &tmp_list);
}
spin_unlock_irq(&rds_tcp_conn_lock);
- list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) {
- rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn);
+ list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
rds_conn_destroy(tc->t_cpath->cp_conn);
- }
}
void *rds_tcp_listen_sock_def_readable(struct net *net)
--
1.7.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 2/3] rds: tcp: correctly sequence cleanup on netns deletion.
2017-11-30 19:11 [PATCH net-next 0/3] rds-tcp netns delete related fixes Sowmini Varadhan
2017-11-30 19:11 ` [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() Sowmini Varadhan
@ 2017-11-30 19:11 ` Sowmini Varadhan
2017-11-30 20:28 ` Santosh Shilimkar
2017-11-30 19:11 ` [PATCH net-next 3/3] rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete Sowmini Varadhan
2017-12-01 20:25 ` [PATCH net-next 0/3] rds-tcp netns delete related fixes David Miller
3 siblings, 1 reply; 8+ messages in thread
From: Sowmini Varadhan @ 2017-11-30 19:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
(and thus rds_tcp_dev_event notification) is only called from put_net()
when all netns refcounts go to 0, but this cannot happen if the
rds_connection itself is holding a c_net ref that it expects to
release in rds_tcp_kill_sock.
Instead, the rds_tcp_kill_sock callback should make sure to
tear down state carefully, ensuring that the socket teardown
is only done after all data-structures and workqs that depend
on it are quiesced.
The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit
refcounts on struct net") was to resolve a race condition reported by
syzkaller where workqs for tx/rx/connect were triggered after the
namespace was deleted. Those worker threads should have been
cancelled/flushed before socket tear-down and indeed,
rds_conn_path_destroy() does try to sequence this by doing
/* cancel cp_send_w */
/* cancel cp_recv_w */
/* flush cp_down_w */
/* free data structures */
Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
we ought to have satisfied the requirement that "socket-close is
done after all other dependent state is quiesced". However,
rds_conn_shutdown has a bug in that it *always* triggers the reconnect
workq (and if connection is successful, we always restart tx/rx
workqs so with the right timing, we risk the race conditions reported
by syzkaller).
Netns deletion is like module teardown- no need to restart a
reconnect in this case. We can use the c_destroy_in_prog bit
to avoid restarting the reconnect.
Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/connection.c | 3 ++-
net/rds/rds.h | 6 +++---
net/rds/tcp.c | 4 ++--
3 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 7ee2d5d..9efc82c 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
* to the conn hash, so we never trigger a reconnect on this
* conn - the reconnect is always triggered by the active peer. */
cancel_delayed_work_sync(&cp->cp_conn_w);
+ if (conn->c_destroy_in_prog)
+ return;
rcu_read_lock();
if (!hlist_unhashed(&conn->c_hash_node)) {
rcu_read_unlock();
@@ -445,7 +447,6 @@ void rds_conn_destroy(struct rds_connection *conn)
*/
rds_cong_remove_conn(conn);
- put_net(conn->c_net);
kfree(conn->c_path);
kmem_cache_free(rds_conn_slab, conn);
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 2e0315b..d11301b 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -149,7 +149,7 @@ struct rds_connection {
/* Protocol version */
unsigned int c_version;
- struct net *c_net;
+ possible_net_t c_net;
struct list_head c_map_item;
unsigned long c_map_queued;
@@ -164,13 +164,13 @@ struct rds_connection {
static inline
struct net *rds_conn_net(struct rds_connection *conn)
{
- return conn->c_net;
+ return read_pnet(&conn->c_net);
}
static inline
void rds_conn_net_set(struct rds_connection *conn, struct net *net)
{
- conn->c_net = get_net(net);
+ write_pnet(&conn->c_net, net);
}
#define RDS_FLAG_CONG_BITMAP 0x01
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 222cc53..f580f72 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -506,7 +506,7 @@ static void rds_tcp_kill_sock(struct net *net)
rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w);
spin_lock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) {
- struct net *c_net = tc->t_cpath->cp_conn->c_net;
+ struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
if (net != c_net || !tc->t_sock)
continue;
@@ -563,7 +563,7 @@ static void rds_tcp_sysctl_reset(struct net *net)
spin_lock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) {
- struct net *c_net = tc->t_cpath->cp_conn->c_net;
+ struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net);
if (net != c_net || !tc->t_sock)
continue;
--
1.7.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next 3/3] rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete
2017-11-30 19:11 [PATCH net-next 0/3] rds-tcp netns delete related fixes Sowmini Varadhan
2017-11-30 19:11 ` [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() Sowmini Varadhan
2017-11-30 19:11 ` [PATCH net-next 2/3] rds: tcp: correctly sequence cleanup on netns deletion Sowmini Varadhan
@ 2017-11-30 19:11 ` Sowmini Varadhan
2017-11-30 20:38 ` Santosh Shilimkar
2017-12-01 20:25 ` [PATCH net-next 0/3] rds-tcp netns delete related fixes David Miller
3 siblings, 1 reply; 8+ messages in thread
From: Sowmini Varadhan @ 2017-11-30 19:11 UTC (permalink / raw)
To: netdev; +Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
to find the rds_connection entries marked for deletion as part
of the netns deletion under the protection of the rds_tcp_conn_lock.
Since the rds_tcp_conn_list tracks rds_tcp_connections (which
have a 1:1 mapping with rds_conn_path), multiple tc entries in
the rds_tcp_conn_list will map to a single rds_connection, and will
be deleted as part of the rds_conn_destroy() operation that is
done outside the rds_tcp_conn_lock.
The rds_tcp_conn_list traversal done under the protection of
rds_tcp_conn_lock should not leave any doomed tc entries in
the list after the rds_tcp_conn_lock is released, else another
concurrently executiong netns delete (for a differnt netns) thread
may trip on these entries.
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/tcp.c | 9 +++++++--
net/rds/tcp.h | 1 +
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index f580f72..39f502d 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -306,7 +306,8 @@ static void rds_tcp_conn_free(void *arg)
rdsdebug("freeing tc %p\n", tc);
spin_lock_irqsave(&rds_tcp_conn_lock, flags);
- list_del(&tc->t_tcp_node);
+ if (!tc->t_tcp_node_detached)
+ list_del(&tc->t_tcp_node);
spin_unlock_irqrestore(&rds_tcp_conn_lock, flags);
kmem_cache_free(rds_tcp_conn_slab, tc);
@@ -510,8 +511,12 @@ static void rds_tcp_kill_sock(struct net *net)
if (net != c_net || !tc->t_sock)
continue;
- if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn))
+ if (!list_has_conn(&tmp_list, tc->t_cpath->cp_conn)) {
list_move_tail(&tc->t_tcp_node, &tmp_list);
+ } else {
+ list_del(&tc->t_tcp_node);
+ tc->t_tcp_node_detached = true;
+ }
}
spin_unlock_irq(&rds_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index f8800b7..8775349 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -11,6 +11,7 @@ struct rds_tcp_incoming {
struct rds_tcp_connection {
struct list_head t_tcp_node;
+ bool t_tcp_node_detached;
struct rds_conn_path *t_cpath;
/* t_conn_path_lock synchronizes the connection establishment between
* rds_tcp_accept_one and rds_tcp_conn_path_connect
--
1.7.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy()
2017-11-30 19:11 ` [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() Sowmini Varadhan
@ 2017-11-30 20:20 ` Santosh Shilimkar
0 siblings, 0 replies; 8+ messages in thread
From: Santosh Shilimkar @ 2017-11-30 20:20 UTC (permalink / raw)
To: Sowmini Varadhan, netdev; +Cc: davem, rds-devel
On 11/30/2017 11:11 AM, Sowmini Varadhan wrote:
> A side-effect of Commit c14b0366813a ("rds: tcp: set linger to 1
> when unloading a rds-tcp") is that we always send a RST on the tcp
> connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy()
> is not needed any more and is removed in this patch.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
Looks good.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 2/3] rds: tcp: correctly sequence cleanup on netns deletion.
2017-11-30 19:11 ` [PATCH net-next 2/3] rds: tcp: correctly sequence cleanup on netns deletion Sowmini Varadhan
@ 2017-11-30 20:28 ` Santosh Shilimkar
0 siblings, 0 replies; 8+ messages in thread
From: Santosh Shilimkar @ 2017-11-30 20:28 UTC (permalink / raw)
To: Sowmini Varadhan, netdev; +Cc: davem, rds-devel
On 11/30/2017 11:11 AM, Sowmini Varadhan wrote:
> Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
> introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
> (and thus rds_tcp_dev_event notification) is only called from put_net()
> when all netns refcounts go to 0, but this cannot happen if the
> rds_connection itself is holding a c_net ref that it expects to
> release in rds_tcp_kill_sock.
>
> Instead, the rds_tcp_kill_sock callback should make sure to
> tear down state carefully, ensuring that the socket teardown
> is only done after all data-structures and workqs that depend
> on it are quiesced.
>
> The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit
> refcounts on struct net") was to resolve a race condition reported by
> syzkaller where workqs for tx/rx/connect were triggered after the
> namespace was deleted. Those worker threads should have been
> cancelled/flushed before socket tear-down and indeed,
> rds_conn_path_destroy() does try to sequence this by doing
> /* cancel cp_send_w */
> /* cancel cp_recv_w */
> /* flush cp_down_w */
> /* free data structures */
> Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
> invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
> we ought to have satisfied the requirement that "socket-close is
> done after all other dependent state is quiesced". However,
> rds_conn_shutdown has a bug in that it *always* triggers the reconnect
> workq (and if connection is successful, we always restart tx/rx
> workqs so with the right timing, we risk the race conditions reported
> by syzkaller).
>
> Netns deletion is like module teardown- no need to restart a
> reconnect in this case. We can use the c_destroy_in_prog bit
> to avoid restarting the reconnect.
>
> Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net")
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
> net/rds/connection.c | 3 ++-
> net/rds/rds.h | 6 +++---
> net/rds/tcp.c | 4 ++--
> 3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index 7ee2d5d..9efc82c 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
> * to the conn hash, so we never trigger a reconnect on this
> * conn - the reconnect is always triggered by the active peer. */
> cancel_delayed_work_sync(&cp->cp_conn_w);
> + if (conn->c_destroy_in_prog)
> + return;
Not related to this patch but it will be more safe to use
cp_flags or if needed add flag and conn level for bundle
and use bit wise to avoid possible races to set c_destroy_in_prog.
Something similar to RDS_DESTROY_PENDING etc.
The patch itself looks good to me in terms of netns ref counting.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 3/3] rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete
2017-11-30 19:11 ` [PATCH net-next 3/3] rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete Sowmini Varadhan
@ 2017-11-30 20:38 ` Santosh Shilimkar
0 siblings, 0 replies; 8+ messages in thread
From: Santosh Shilimkar @ 2017-11-30 20:38 UTC (permalink / raw)
To: Sowmini Varadhan, netdev; +Cc: davem, rds-devel
On 11/30/2017 11:11 AM, Sowmini Varadhan wrote:
> The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
> to find the rds_connection entries marked for deletion as part
> of the netns deletion under the protection of the rds_tcp_conn_lock.
> Since the rds_tcp_conn_list tracks rds_tcp_connections (which
> have a 1:1 mapping with rds_conn_path), multiple tc entries in
> the rds_tcp_conn_list will map to a single rds_connection, and will
> be deleted as part of the rds_conn_destroy() operation that is
> done outside the rds_tcp_conn_lock.
>
> The rds_tcp_conn_list traversal done under the protection of
> rds_tcp_conn_lock should not leave any doomed tc entries in
> the list after the rds_tcp_conn_lock is released, else another
> concurrently executiong netns delete (for a differnt netns) thread
> may trip on these entries.
>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 0/3] rds-tcp netns delete related fixes
2017-11-30 19:11 [PATCH net-next 0/3] rds-tcp netns delete related fixes Sowmini Varadhan
` (2 preceding siblings ...)
2017-11-30 19:11 ` [PATCH net-next 3/3] rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete Sowmini Varadhan
@ 2017-12-01 20:25 ` David Miller
3 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2017-12-01 20:25 UTC (permalink / raw)
To: sowmini.varadhan; +Cc: netdev, rds-devel, santosh.shilimkar
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 30 Nov 2017 11:11:26 -0800
> Patchset contains cleanup and bug fixes. Patch 1 is the removal
> of some redundant code/functions. Patch 2 and 3 are fixes for
> corner cases identified by syzkaller. I've not been able to
> reproduce the actual use-after-free race flagged in the syzkaller
> reports, thus these fixes are based on code inspection plus
> manual testing to make sure the modified code paths are executed
> without problems in the commonly encountered timing cases.
Series applied, thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-12-01 20:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-30 19:11 [PATCH net-next 0/3] rds-tcp netns delete related fixes Sowmini Varadhan
2017-11-30 19:11 ` [PATCH net-next 1/3] rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() Sowmini Varadhan
2017-11-30 20:20 ` Santosh Shilimkar
2017-11-30 19:11 ` [PATCH net-next 2/3] rds: tcp: correctly sequence cleanup on netns deletion Sowmini Varadhan
2017-11-30 20:28 ` Santosh Shilimkar
2017-11-30 19:11 ` [PATCH net-next 3/3] rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete Sowmini Varadhan
2017-11-30 20:38 ` Santosh Shilimkar
2017-12-01 20:25 ` [PATCH net-next 0/3] rds-tcp netns delete related fixes David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).