netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] net: defer cgroups init to accept()
@ 2017-10-09  4:44 Eric Dumazet
  2017-10-09  4:44 ` [PATCH net-next 1/2] net: memcontrol: defer call to mem_cgroup_sk_alloc() Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eric Dumazet @ 2017-10-09  4:44 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Johannes Weiner, Tejun Heo,
	John Sperbeck

After TCP 3WHS became lockless, we should not attempt cgroup games
from sk_clone_lock() since listener/cgroup might be already gone.

Move this business to inet_csk_accept() where we have
the guarantee both parent and child exist.

Many thanks to John Sperbeck for spotting these issues

Eric Dumazet (2):
  net: memcontrol: defer call to mem_cgroup_sk_alloc()
  net: defer call to cgroup_sk_alloc()

 kernel/cgroup/cgroup.c          | 11 -----------
 mm/memcontrol.c                 | 15 ---------------
 net/core/sock.c                 |  8 +++++---
 net/ipv4/inet_connection_sock.c |  6 ++++++
 4 files changed, 11 insertions(+), 29 deletions(-)

-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next 1/2] net: memcontrol: defer call to mem_cgroup_sk_alloc()
  2017-10-09  4:44 [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
@ 2017-10-09  4:44 ` Eric Dumazet
  2017-10-09  4:44 ` [PATCH net-next 2/2] net: defer call to cgroup_sk_alloc() Eric Dumazet
  2017-10-09  4:47 ` [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2017-10-09  4:44 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Johannes Weiner, Tejun Heo,
	John Sperbeck

Instead of calling mem_cgroup_sk_alloc() from BH context,
it is better to call it from inet_csk_accept() in process context.

Not only this removes code in mem_cgroup_sk_alloc(), but it also
fixes a bug since listener might have been dismantled and css_get()
might cause a use-after-free.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
---
 mm/memcontrol.c                 | 15 ---------------
 net/core/sock.c                 |  5 ++++-
 net/ipv4/inet_connection_sock.c |  1 +
 3 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d5f3a62887cf958f6b657c0f542f0cf2c3e86e8d..661f046ad3181f65eccfd9bf3832e395e27aa226 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5828,21 +5828,6 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 	if (!mem_cgroup_sockets_enabled)
 		return;
 
-	/*
-	 * Socket cloning can throw us here with sk_memcg already
-	 * filled. It won't however, necessarily happen from
-	 * process context. So the test for root memcg given
-	 * the current task's memcg won't help us in this case.
-	 *
-	 * Respecting the original socket's memcg is a better
-	 * decision in this case.
-	 */
-	if (sk->sk_memcg) {
-		BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
-		css_get(&sk->sk_memcg->css);
-		return;
-	}
-
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(current);
 	if (memcg == root_mem_cgroup)
diff --git a/net/core/sock.c b/net/core/sock.c
index 23953b741a41fbcf4a6ffb0dd5bf05bd5266b99d..70c6ccbdf49f2f8a5a0f7c41c7849ea01459be50 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1677,6 +1677,10 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_dst_pending_confirm = 0;
 		newsk->sk_wmem_queued	= 0;
 		newsk->sk_forward_alloc = 0;
+
+		/* sk->sk_memcg will be populated at accept() time */
+		newsk->sk_memcg = NULL;
+
 		atomic_set(&newsk->sk_drops, 0);
 		newsk->sk_send_head	= NULL;
 		newsk->sk_userlocks	= sk->sk_userlocks & ~SOCK_BINDPORT_LOCK;
@@ -1714,7 +1718,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
 
-		mem_cgroup_sk_alloc(newsk);
 		cgroup_sk_alloc(&newsk->sk_cgrp_data);
 
 		/*
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index c039c937ba90c7aec39ba2687bceb8253ead70aa..67aec7a106860b26c929fea1624d652c87972f04 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -475,6 +475,7 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
 		}
 		spin_unlock_bh(&queue->fastopenq.lock);
 	}
+	mem_cgroup_sk_alloc(newsk);
 out:
 	release_sock(sk);
 	if (req)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 2/2] net: defer call to cgroup_sk_alloc()
  2017-10-09  4:44 [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
  2017-10-09  4:44 ` [PATCH net-next 1/2] net: memcontrol: defer call to mem_cgroup_sk_alloc() Eric Dumazet
@ 2017-10-09  4:44 ` Eric Dumazet
  2017-10-09  4:47 ` [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2017-10-09  4:44 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Johannes Weiner, Tejun Heo,
	John Sperbeck

sk_clone_lock() might run while TCP/DCCP listener already vanished.

In order to prevent use after free, it is better to defer cgroup_sk_alloc()
to the point we know both parent and child exist, and from process context.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup/cgroup.c          | 11 -----------
 net/core/sock.c                 |  3 +--
 net/ipv4/inet_connection_sock.c |  5 +++++
 3 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 44857278eb8aa6a2bbf27b7eb12137ef42628170..3380a3e49af501e457991b2823020494cf32af80 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5709,17 +5709,6 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 	if (cgroup_sk_alloc_disabled)
 		return;
 
-	/* Socket clone path */
-	if (skcd->val) {
-		/*
-		 * We might be cloning a socket which is left in an empty
-		 * cgroup and the cgroup might have already been rmdir'd.
-		 * Don't use cgroup_get_live().
-		 */
-		cgroup_get(sock_cgroup_ptr(skcd));
-		return;
-	}
-
 	rcu_read_lock();
 
 	while (true) {
diff --git a/net/core/sock.c b/net/core/sock.c
index 70c6ccbdf49f2f8a5a0f7c41c7849ea01459be50..4499e31538132ed59a16d92e6f6b923e776df84e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1680,6 +1680,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 
 		/* sk->sk_memcg will be populated at accept() time */
 		newsk->sk_memcg = NULL;
+		memset(&newsk->sk_cgrp_data, 0, sizeof(newsk->sk_cgrp_data));
 
 		atomic_set(&newsk->sk_drops, 0);
 		newsk->sk_send_head	= NULL;
@@ -1718,8 +1719,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
 
-		cgroup_sk_alloc(&newsk->sk_cgrp_data);
-
 		/*
 		 * Before updating sk_refcnt, we must commit prior changes to memory
 		 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 67aec7a106860b26c929fea1624d652c87972f04..d32c74507314cc4b91d040de8e877e4bd8204106 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -26,6 +26,8 @@
 #include <net/tcp.h>
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
+#include <net/cls_cgroup.h>
+#include <net/netprio_cgroup.h>
 
 #ifdef INET_CSK_DEBUG
 const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
@@ -476,6 +478,9 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
 		spin_unlock_bh(&queue->fastopenq.lock);
 	}
 	mem_cgroup_sk_alloc(newsk);
+	cgroup_sk_alloc(&newsk->sk_cgrp_data);
+	sock_update_classid(&newsk->sk_cgrp_data);
+	sock_update_netprioidx(&newsk->sk_cgrp_data);
 out:
 	release_sock(sk);
 	if (req)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 0/2] net: defer cgroups init to accept()
  2017-10-09  4:44 [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
  2017-10-09  4:44 ` [PATCH net-next 1/2] net: memcontrol: defer call to mem_cgroup_sk_alloc() Eric Dumazet
  2017-10-09  4:44 ` [PATCH net-next 2/2] net: defer call to cgroup_sk_alloc() Eric Dumazet
@ 2017-10-09  4:47 ` Eric Dumazet
  2017-10-10  3:55   ` David Miller
  2 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2017-10-09  4:47 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Johannes Weiner, Tejun Heo,
	John Sperbeck

On Sun, Oct 8, 2017 at 9:44 PM, Eric Dumazet <edumazet@google.com> wrote:
> After TCP 3WHS became lockless, we should not attempt cgroup games
> from sk_clone_lock() since listener/cgroup might be already gone.
>
> Move this business to inet_csk_accept() where we have
> the guarantee both parent and child exist.
>
> Many thanks to John Sperbeck for spotting these issues
>
> Eric Dumazet (2):
>   net: memcontrol: defer call to mem_cgroup_sk_alloc()
>   net: defer call to cgroup_sk_alloc()

This was based on net tree, but I used the wrong script, and thus this
has the [PATCH net-next] tag.

Sorry for the confusion, but I guess this also can be applied to
net-next since this is not a recent regression.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 0/2] net: defer cgroups init to accept()
  2017-10-09  4:47 ` [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
@ 2017-10-10  3:55   ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2017-10-10  3:55 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, hannes, tj, jsperbeck

From: Eric Dumazet <edumazet@google.com>
Date: Sun, 8 Oct 2017 21:47:49 -0700

> On Sun, Oct 8, 2017 at 9:44 PM, Eric Dumazet <edumazet@google.com> wrote:
>> After TCP 3WHS became lockless, we should not attempt cgroup games
>> from sk_clone_lock() since listener/cgroup might be already gone.
>>
>> Move this business to inet_csk_accept() where we have
>> the guarantee both parent and child exist.
>>
>> Many thanks to John Sperbeck for spotting these issues
>>
>> Eric Dumazet (2):
>>   net: memcontrol: defer call to mem_cgroup_sk_alloc()
>>   net: defer call to cgroup_sk_alloc()
> 
> This was based on net tree, but I used the wrong script, and thus this
> has the [PATCH net-next] tag.
> 
> Sorry for the confusion, but I guess this also can be applied to
> net-next since this is not a recent regression.

Series applied to 'net', thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-10  3:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-09  4:44 [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
2017-10-09  4:44 ` [PATCH net-next 1/2] net: memcontrol: defer call to mem_cgroup_sk_alloc() Eric Dumazet
2017-10-09  4:44 ` [PATCH net-next 2/2] net: defer call to cgroup_sk_alloc() Eric Dumazet
2017-10-09  4:47 ` [PATCH net-next 0/2] net: defer cgroups init to accept() Eric Dumazet
2017-10-10  3:55   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).