public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf 0/5] BPF sockmap and ulp fixes
@ 2018-08-16 19:49 Daniel Borkmann
  2018-08-16 19:49 ` [PATCH bpf 1/5] tcp, ulp: add alias for all ulp modules Daniel Borkmann
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Daniel Borkmann @ 2018-08-16 19:49 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann

Batch of various fixes related to BPF sockmap and ULP, including
adding module alias to restrict module requests, races and memory
leaks in sockmap code. For details please refer to the individual
patches. Thanks!

Daniel Borkmann (5):
  tcp, ulp: add alias for all ulp modules
  tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
  bpf, sockmap: fix leakage of smap_psock_map_entry
  bpf, sockmap: fix map elem deletion race with smap_stop_sock
  bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist

 include/net/tcp.h    |   4 ++
 kernel/bpf/sockmap.c | 120 +++++++++++++++++++++++++++++----------------------
 net/ipv4/tcp_ulp.c   |   4 +-
 net/tls/tls_main.c   |   1 +
 4 files changed, 76 insertions(+), 53 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf 1/5] tcp, ulp: add alias for all ulp modules
  2018-08-16 19:49 [PATCH bpf 0/5] BPF sockmap and ulp fixes Daniel Borkmann
@ 2018-08-16 19:49 ` Daniel Borkmann
  2018-08-16 21:25   ` Song Liu
  2018-08-16 19:49 ` [PATCH bpf 2/5] tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach Daniel Borkmann
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2018-08-16 19:49 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann

Lets not turn the TCP ULP lookup into an arbitrary module loader as
we only intend to load ULP modules through this mechanism, not other
unrelated kernel modules:

  [root@bar]# cat foo.c
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <linux/tcp.h>
  #include <linux/in.h>

  int main(void)
  {
      int sock = socket(PF_INET, SOCK_STREAM, 0);
      setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
      return 0;
  }

  [root@bar]# gcc foo.c -O2 -Wall
  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  sctp                 1077248  4
  libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
  [root@bar]#

Fix it by adding module alias to TCP ULP modules, so probing module
via request_module() will be limited to tcp-ulp-[name]. The existing
modules like kTLS will load fine given tcp-ulp-tls alias, but others
will fail to load:

  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  [root@bar]#

Sockmap is not affected from this since it's either built-in or not.

Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/tcp.h  | 4 ++++
 net/ipv4/tcp_ulp.c | 2 +-
 net/tls/tls_main.c | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d196901..770917d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2065,6 +2065,10 @@ int tcp_set_ulp_id(struct sock *sk, const int ulp);
 void tcp_get_available_ulp(char *buf, size_t len);
 void tcp_cleanup_ulp(struct sock *sk);
 
+#define MODULE_ALIAS_TCP_ULP(name)				\
+	__MODULE_INFO(alias, alias_userspace, name);		\
+	__MODULE_INFO(alias, alias_tcp_ulp, "tcp-ulp-" name)
+
 /* Call BPF_SOCK_OPS program that returns an int. If the return value
  * is < 0, then the BPF op failed (for example if the loaded BPF
  * program does not support the chosen operation or there is no BPF
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
index 622caa4..7dd44b6 100644
--- a/net/ipv4/tcp_ulp.c
+++ b/net/ipv4/tcp_ulp.c
@@ -51,7 +51,7 @@ static const struct tcp_ulp_ops *__tcp_ulp_find_autoload(const char *name)
 #ifdef CONFIG_MODULES
 	if (!ulp && capable(CAP_NET_ADMIN)) {
 		rcu_read_unlock();
-		request_module("%s", name);
+		request_module("tcp-ulp-%s", name);
 		rcu_read_lock();
 		ulp = tcp_ulp_find(name);
 	}
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index b09867c..93c0c22 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -45,6 +45,7 @@
 MODULE_AUTHOR("Mellanox Technologies");
 MODULE_DESCRIPTION("Transport Layer Security Support");
 MODULE_LICENSE("Dual BSD/GPL");
+MODULE_ALIAS_TCP_ULP("tls");
 
 enum {
 	TLSV4,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 2/5] tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
  2018-08-16 19:49 [PATCH bpf 0/5] BPF sockmap and ulp fixes Daniel Borkmann
  2018-08-16 19:49 ` [PATCH bpf 1/5] tcp, ulp: add alias for all ulp modules Daniel Borkmann
@ 2018-08-16 19:49 ` Daniel Borkmann
  2018-08-16 21:26   ` Song Liu
  2018-08-16 19:49 ` [PATCH bpf 3/5] bpf, sockmap: fix leakage of smap_psock_map_entry Daniel Borkmann
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2018-08-16 19:49 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann

I found that in BPF sockmap programs once we either delete a socket
from the map or we updated a map slot and the old socket was purged
from the map that these socket can never get reattached into a map
even though their related psock has been dropped entirely at that
point.

Reason is that tcp_cleanup_ulp() leaves the old icsk->icsk_ulp_ops
intact, so that on the next tcp_set_ulp_id() the kernel returns an
-EEXIST thinking there is still some active ULP attached.

BPF sockmap is the only one that has this issue as the other user,
kTLS, only calls tcp_cleanup_ulp() from tcp_v4_destroy_sock() whereas
sockmap semantics allow dropping the socket from the map with all
related psock state being cleaned up.

Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp_ulp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
index 7dd44b6..a5995bb 100644
--- a/net/ipv4/tcp_ulp.c
+++ b/net/ipv4/tcp_ulp.c
@@ -129,6 +129,8 @@ void tcp_cleanup_ulp(struct sock *sk)
 	if (icsk->icsk_ulp_ops->release)
 		icsk->icsk_ulp_ops->release(sk);
 	module_put(icsk->icsk_ulp_ops->owner);
+
+	icsk->icsk_ulp_ops = NULL;
 }
 
 /* Change upper layer protocol for socket */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 3/5] bpf, sockmap: fix leakage of smap_psock_map_entry
  2018-08-16 19:49 [PATCH bpf 0/5] BPF sockmap and ulp fixes Daniel Borkmann
  2018-08-16 19:49 ` [PATCH bpf 1/5] tcp, ulp: add alias for all ulp modules Daniel Borkmann
  2018-08-16 19:49 ` [PATCH bpf 2/5] tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach Daniel Borkmann
@ 2018-08-16 19:49 ` Daniel Borkmann
  2018-08-16 21:27   ` Song Liu
  2018-08-16 19:49 ` [PATCH bpf 4/5] bpf, sockmap: fix map elem deletion race with smap_stop_sock Daniel Borkmann
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2018-08-16 19:49 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann

While working on sockmap I noticed that we do not always kfree the
struct smap_psock_map_entry list elements which track psocks attached
to maps. In the case of sock_hash_ctx_update_elem(), these map entries
are allocated outside of __sock_map_ctx_update_elem() with their
linkage to the socket hash table filled. In the case of sock array,
the map entries are allocated inside of __sock_map_ctx_update_elem()
and added with their linkage to the psock->maps. Both additions are
under psock->maps_lock each.

Now, we drop these elements from their psock->maps list in a few
occasions: i) in sock array via smap_list_map_remove() when an entry
is either deleted from the map from user space, or updated via
user space or BPF program where we drop the old socket at that map
slot, or the sock array is freed via sock_map_free() and drops all
its elements; ii) for sock hash via smap_list_hash_remove() in exactly
the same occasions as just described for sock array; iii) in the
bpf_tcp_close() where we remove the elements from the list via
psock_map_pop() and iterate over them dropping themselves from either
sock array or sock hash; and last but not least iv) once again in
smap_gc_work() which is a callback for deferring the work once the
psock refcount hit zero and thus the socket is being destroyed.

Problem is that the only case where we kfree() the list entry is
in case iv), which at that point should have an empty list in
normal cases. So in cases from i) to iii) we unlink the elements
without freeing where they go out of reach from us. Hence fix is
to properly kfree() them as well to stop the leakage. Given these
are all handled under psock->maps_lock there is no need for deferred
RCU freeing.

I later also ran with kmemleak detector and it confirmed the finding
as well where in the state before the fix the object goes unreferenced
while after the patch no kmemleak report related to BPF showed up.

  [...]
  unreferenced object 0xffff880378eadae0 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
      00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
      50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  PMu]............
    backtrace:
      [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
      [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60
      [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960
      [<000000002ef89e83>] 0xffffffffffffffff
  unreferenced object 0xffff880378ead240 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
      00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
      00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  .Du]............
    backtrace:
      [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
      [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240
      [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0
      [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360
      [<0000000000763660>] do_syscall_64+0x9a/0x300
      [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [<000000002ef89e83>] 0xffffffffffffffff
  [...]

Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
Fixes: 54fedb42c653 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 0c1a696..94a324b 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -370,6 +370,7 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
 			}
 			raw_spin_unlock_bh(&b->lock);
 		}
+		kfree(e);
 		e = psock_map_pop(sk, psock);
 	}
 	rcu_read_unlock();
@@ -1675,8 +1676,10 @@ static void smap_list_map_remove(struct smap_psock *psock,
 
 	spin_lock_bh(&psock->maps_lock);
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
-		if (e->entry == entry)
+		if (e->entry == entry) {
 			list_del(&e->list);
+			kfree(e);
+		}
 	}
 	spin_unlock_bh(&psock->maps_lock);
 }
@@ -1690,8 +1693,10 @@ static void smap_list_hash_remove(struct smap_psock *psock,
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
 		struct htab_elem *c = rcu_dereference(e->hash_link);
 
-		if (c == hash_link)
+		if (c == hash_link) {
 			list_del(&e->list);
+			kfree(e);
+		}
 	}
 	spin_unlock_bh(&psock->maps_lock);
 }
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 4/5] bpf, sockmap: fix map elem deletion race with smap_stop_sock
  2018-08-16 19:49 [PATCH bpf 0/5] BPF sockmap and ulp fixes Daniel Borkmann
                   ` (2 preceding siblings ...)
  2018-08-16 19:49 ` [PATCH bpf 3/5] bpf, sockmap: fix leakage of smap_psock_map_entry Daniel Borkmann
@ 2018-08-16 19:49 ` Daniel Borkmann
  2018-08-16 21:30   ` Song Liu
  2018-08-16 19:49 ` [PATCH bpf 5/5] bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist Daniel Borkmann
  2018-08-16 22:06 ` [PATCH bpf 0/5] BPF sockmap and ulp fixes Alexei Starovoitov
  5 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2018-08-16 19:49 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann

The smap_start_sock() and smap_stop_sock() are each protected under
the sock->sk_callback_lock from their call-sites except in the case
of sock_map_delete_elem() where we drop the old socket from the map
slot. This is racy because the same sock could be part of multiple
sock maps, so we run smap_stop_sock() in parallel, and given at that
point psock->strp_enabled might be true on both CPUs, we might for
example wrongly restore the sk->sk_data_ready / sk->sk_write_space.
Therefore, hold the sock->sk_callback_lock as well on delete. Looks
like 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add
multi-map support") had this right, but later on e9db4ef6bf4c ("bpf:
sockhash fix omitted bucket lock in sock_close") removed it again
from delete leaving this smap_stop_sock() instance unprotected.

Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 94a324b..921cb6b 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1786,8 +1786,11 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 	if (!psock)
 		goto out;
 
-	if (psock->bpf_parse)
+	if (psock->bpf_parse) {
+		write_lock_bh(&sock->sk_callback_lock);
 		smap_stop_sock(psock, sock);
+		write_unlock_bh(&sock->sk_callback_lock);
+	}
 	smap_list_map_remove(psock, &stab->sock_map[k]);
 	smap_release_sock(psock, sock);
 out:
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf 5/5] bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
  2018-08-16 19:49 [PATCH bpf 0/5] BPF sockmap and ulp fixes Daniel Borkmann
                   ` (3 preceding siblings ...)
  2018-08-16 19:49 ` [PATCH bpf 4/5] bpf, sockmap: fix map elem deletion race with smap_stop_sock Daniel Borkmann
@ 2018-08-16 19:49 ` Daniel Borkmann
  2018-08-16 21:51   ` Song Liu
  2018-08-16 22:06 ` [PATCH bpf 0/5] BPF sockmap and ulp fixes Alexei Starovoitov
  5 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2018-08-16 19:49 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann

The current code in sock_map_ctx_update_elem() allows for BPF_EXIST
and BPF_NOEXIST map update flags. While on array-like maps this approach
is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others
enforce map update flags to be BPF_ANY such that xchg() can be used
directly, the current implementation in sock map does not guarantee
that such operation with BPF_EXIST / BPF_NOEXIST is atomic.

The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the
socket from the slot which is then tested for NULL / non-NULL. However
later after __sock_map_ctx_update_elem(), the actual update is done
through osock = xchg(&stab->sock_map[i], sock). Problem is that in
the meantime a different CPU could have updated / deleted a socket
on that specific slot and thus flag contraints won't hold anymore.

I've been thinking whether best would be to just break UAPI and do
an enforcement of BPF_ANY to check if someone actually complains,
however trouble is that already in BPF kselftest we use BPF_NOEXIST
for the map update, and therefore it might have been copied into
applications already. The fix to keep the current behavior intact
would be to add a map lock similar to the sock hash bucket lock only
for covering the whole map.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c | 106 +++++++++++++++++++++++++++------------------------
 1 file changed, 57 insertions(+), 49 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 921cb6b..98e621a 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -58,6 +58,7 @@ struct bpf_stab {
 	struct bpf_map map;
 	struct sock **sock_map;
 	struct bpf_sock_progs progs;
+	raw_spinlock_t lock;
 };
 
 struct bucket {
@@ -89,9 +90,9 @@ enum smap_psock_state {
 
 struct smap_psock_map_entry {
 	struct list_head list;
+	struct bpf_map *map;
 	struct sock **entry;
 	struct htab_elem __rcu *hash_link;
-	struct bpf_htab __rcu *htab;
 };
 
 struct smap_psock {
@@ -343,13 +344,18 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
 	e = psock_map_pop(sk, psock);
 	while (e) {
 		if (e->entry) {
-			osk = cmpxchg(e->entry, sk, NULL);
+			struct bpf_stab *stab = container_of(e->map, struct bpf_stab, map);
+
+			raw_spin_lock_bh(&stab->lock);
+			osk = *e->entry;
 			if (osk == sk) {
+				*e->entry = NULL;
 				smap_release_sock(psock, sk);
 			}
+			raw_spin_unlock_bh(&stab->lock);
 		} else {
 			struct htab_elem *link = rcu_dereference(e->hash_link);
-			struct bpf_htab *htab = rcu_dereference(e->htab);
+			struct bpf_htab *htab = container_of(e->map, struct bpf_htab, map);
 			struct hlist_head *head;
 			struct htab_elem *l;
 			struct bucket *b;
@@ -1642,6 +1648,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 		return ERR_PTR(-ENOMEM);
 
 	bpf_map_init_from_attr(&stab->map, attr);
+	raw_spin_lock_init(&stab->lock);
 
 	/* make sure page count doesn't overflow */
 	cost = (u64) stab->map.max_entries * sizeof(struct sock *);
@@ -1716,14 +1723,15 @@ static void sock_map_free(struct bpf_map *map)
 	 * and a grace period expire to ensure psock is really safe to remove.
 	 */
 	rcu_read_lock();
+	raw_spin_lock_bh(&stab->lock);
 	for (i = 0; i < stab->map.max_entries; i++) {
 		struct smap_psock *psock;
 		struct sock *sock;
 
-		sock = xchg(&stab->sock_map[i], NULL);
+		sock = stab->sock_map[i];
 		if (!sock)
 			continue;
-
+		stab->sock_map[i] = NULL;
 		psock = smap_psock_sk(sock);
 		/* This check handles a racing sock event that can get the
 		 * sk_callback_lock before this case but after xchg happens
@@ -1735,6 +1743,7 @@ static void sock_map_free(struct bpf_map *map)
 			smap_release_sock(psock, sock);
 		}
 	}
+	raw_spin_unlock_bh(&stab->lock);
 	rcu_read_unlock();
 
 	sock_map_remove_complete(stab);
@@ -1778,14 +1787,16 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 	if (k >= map->max_entries)
 		return -EINVAL;
 
-	sock = xchg(&stab->sock_map[k], NULL);
+	raw_spin_lock_bh(&stab->lock);
+	sock = stab->sock_map[k];
+	stab->sock_map[k] = NULL;
+	raw_spin_unlock_bh(&stab->lock);
 	if (!sock)
 		return -EINVAL;
 
 	psock = smap_psock_sk(sock);
 	if (!psock)
-		goto out;
-
+		return 0;
 	if (psock->bpf_parse) {
 		write_lock_bh(&sock->sk_callback_lock);
 		smap_stop_sock(psock, sock);
@@ -1793,7 +1804,6 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 	}
 	smap_list_map_remove(psock, &stab->sock_map[k]);
 	smap_release_sock(psock, sock);
-out:
 	return 0;
 }
 
@@ -1829,11 +1839,9 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 static int __sock_map_ctx_update_elem(struct bpf_map *map,
 				      struct bpf_sock_progs *progs,
 				      struct sock *sock,
-				      struct sock **map_link,
 				      void *key)
 {
 	struct bpf_prog *verdict, *parse, *tx_msg;
-	struct smap_psock_map_entry *e = NULL;
 	struct smap_psock *psock;
 	bool new = false;
 	int err = 0;
@@ -1906,14 +1914,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 		new = true;
 	}
 
-	if (map_link) {
-		e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
-		if (!e) {
-			err = -ENOMEM;
-			goto out_free;
-		}
-	}
-
 	/* 3. At this point we have a reference to a valid psock that is
 	 * running. Attach any BPF programs needed.
 	 */
@@ -1935,17 +1935,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 		write_unlock_bh(&sock->sk_callback_lock);
 	}
 
-	/* 4. Place psock in sockmap for use and stop any programs on
-	 * the old sock assuming its not the same sock we are replacing
-	 * it with. Because we can only have a single set of programs if
-	 * old_sock has a strp we can stop it.
-	 */
-	if (map_link) {
-		e->entry = map_link;
-		spin_lock_bh(&psock->maps_lock);
-		list_add_tail(&e->list, &psock->maps);
-		spin_unlock_bh(&psock->maps_lock);
-	}
 	return err;
 out_free:
 	smap_release_sock(psock, sock);
@@ -1956,7 +1945,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 	}
 	if (tx_msg)
 		bpf_prog_put(tx_msg);
-	kfree(e);
 	return err;
 }
 
@@ -1966,36 +1954,57 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 {
 	struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
 	struct bpf_sock_progs *progs = &stab->progs;
-	struct sock *osock, *sock;
+	struct sock *osock, *sock = skops->sk;
+	struct smap_psock_map_entry *e;
+	struct smap_psock *psock;
 	u32 i = *(u32 *)key;
 	int err;
 
 	if (unlikely(flags > BPF_EXIST))
 		return -EINVAL;
-
 	if (unlikely(i >= stab->map.max_entries))
 		return -E2BIG;
 
-	sock = READ_ONCE(stab->sock_map[i]);
-	if (flags == BPF_EXIST && !sock)
-		return -ENOENT;
-	else if (flags == BPF_NOEXIST && sock)
-		return -EEXIST;
+	e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
+	if (!e)
+		return -ENOMEM;
 
-	sock = skops->sk;
-	err = __sock_map_ctx_update_elem(map, progs, sock, &stab->sock_map[i],
-					 key);
+	err = __sock_map_ctx_update_elem(map, progs, sock, key);
 	if (err)
 		goto out;
 
-	osock = xchg(&stab->sock_map[i], sock);
-	if (osock) {
-		struct smap_psock *opsock = smap_psock_sk(osock);
+	/* psock guaranteed to be present. */
+	psock = smap_psock_sk(sock);
+	raw_spin_lock_bh(&stab->lock);
+	osock = stab->sock_map[i];
+	if (osock && flags == BPF_NOEXIST) {
+		err = -EEXIST;
+		goto out_unlock;
+	}
+	if (!osock && flags == BPF_EXIST) {
+		err = -ENOENT;
+		goto out_unlock;
+	}
+
+	e->entry = &stab->sock_map[i];
+	e->map = map;
+	spin_lock_bh(&psock->maps_lock);
+	list_add_tail(&e->list, &psock->maps);
+	spin_unlock_bh(&psock->maps_lock);
 
-		smap_list_map_remove(opsock, &stab->sock_map[i]);
-		smap_release_sock(opsock, osock);
+	stab->sock_map[i] = sock;
+	if (osock) {
+		psock = smap_psock_sk(osock);
+		smap_list_map_remove(psock, &stab->sock_map[i]);
+		smap_release_sock(psock, osock);
 	}
+	raw_spin_unlock_bh(&stab->lock);
+	return 0;
+out_unlock:
+	smap_release_sock(psock, sock);
+	raw_spin_unlock_bh(&stab->lock);
 out:
+	kfree(e);
 	return err;
 }
 
@@ -2358,7 +2367,7 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	err = __sock_map_ctx_update_elem(map, progs, sock, NULL, key);
+	err = __sock_map_ctx_update_elem(map, progs, sock, key);
 	if (err)
 		goto err;
 
@@ -2384,8 +2393,7 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 	}
 
 	rcu_assign_pointer(e->hash_link, l_new);
-	rcu_assign_pointer(e->htab,
-			   container_of(map, struct bpf_htab, map));
+	e->map = map;
 	spin_lock_bh(&psock->maps_lock);
 	list_add_tail(&e->list, &psock->maps);
 	spin_unlock_bh(&psock->maps_lock);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 1/5] tcp, ulp: add alias for all ulp modules
  2018-08-16 19:49 ` [PATCH bpf 1/5] tcp, ulp: add alias for all ulp modules Daniel Borkmann
@ 2018-08-16 21:25   ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2018-08-16 21:25 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, John Fastabend, Networking

On Thu, Aug 16, 2018 at 12:49 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Lets not turn the TCP ULP lookup into an arbitrary module loader as
> we only intend to load ULP modules through this mechanism, not other
> unrelated kernel modules:
>
>   [root@bar]# cat foo.c
>   #include <sys/types.h>
>   #include <sys/socket.h>
>   #include <linux/tcp.h>
>   #include <linux/in.h>
>
>   int main(void)
>   {
>       int sock = socket(PF_INET, SOCK_STREAM, 0);
>       setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
>       return 0;
>   }
>
>   [root@bar]# gcc foo.c -O2 -Wall
>   [root@bar]# lsmod | grep sctp
>   [root@bar]# ./a.out
>   [root@bar]# lsmod | grep sctp
>   sctp                 1077248  4
>   libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
>   [root@bar]#
>
> Fix it by adding module alias to TCP ULP modules, so probing module
> via request_module() will be limited to tcp-ulp-[name]. The existing
> modules like kTLS will load fine given tcp-ulp-tls alias, but others
> will fail to load:
>
>   [root@bar]# lsmod | grep sctp
>   [root@bar]# ./a.out
>   [root@bar]# lsmod | grep sctp
>   [root@bar]#
>
> Sockmap is not affected from this since it's either built-in or not.
>
> Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>
> ---
>  include/net/tcp.h  | 4 ++++
>  net/ipv4/tcp_ulp.c | 2 +-
>  net/tls/tls_main.c | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index d196901..770917d 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2065,6 +2065,10 @@ int tcp_set_ulp_id(struct sock *sk, const int ulp);
>  void tcp_get_available_ulp(char *buf, size_t len);
>  void tcp_cleanup_ulp(struct sock *sk);
>
> +#define MODULE_ALIAS_TCP_ULP(name)                             \
> +       __MODULE_INFO(alias, alias_userspace, name);            \
> +       __MODULE_INFO(alias, alias_tcp_ulp, "tcp-ulp-" name)
> +
>  /* Call BPF_SOCK_OPS program that returns an int. If the return value
>   * is < 0, then the BPF op failed (for example if the loaded BPF
>   * program does not support the chosen operation or there is no BPF
> diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
> index 622caa4..7dd44b6 100644
> --- a/net/ipv4/tcp_ulp.c
> +++ b/net/ipv4/tcp_ulp.c
> @@ -51,7 +51,7 @@ static const struct tcp_ulp_ops *__tcp_ulp_find_autoload(const char *name)
>  #ifdef CONFIG_MODULES
>         if (!ulp && capable(CAP_NET_ADMIN)) {
>                 rcu_read_unlock();
> -               request_module("%s", name);
> +               request_module("tcp-ulp-%s", name);
>                 rcu_read_lock();
>                 ulp = tcp_ulp_find(name);
>         }
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index b09867c..93c0c22 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -45,6 +45,7 @@
>  MODULE_AUTHOR("Mellanox Technologies");
>  MODULE_DESCRIPTION("Transport Layer Security Support");
>  MODULE_LICENSE("Dual BSD/GPL");
> +MODULE_ALIAS_TCP_ULP("tls");
>
>  enum {
>         TLSV4,
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 2/5] tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
  2018-08-16 19:49 ` [PATCH bpf 2/5] tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach Daniel Borkmann
@ 2018-08-16 21:26   ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2018-08-16 21:26 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, John Fastabend, Networking

On Thu, Aug 16, 2018 at 12:49 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> I found that in BPF sockmap programs once we either delete a socket
> from the map or we updated a map slot and the old socket was purged
> from the map that these socket can never get reattached into a map
> even though their related psock has been dropped entirely at that
> point.
>
> Reason is that tcp_cleanup_ulp() leaves the old icsk->icsk_ulp_ops
> intact, so that on the next tcp_set_ulp_id() the kernel returns an
> -EEXIST thinking there is still some active ULP attached.
>
> BPF sockmap is the only one that has this issue as the other user,
> kTLS, only calls tcp_cleanup_ulp() from tcp_v4_destroy_sock() whereas
> sockmap semantics allow dropping the socket from the map with all
> related psock state being cleaned up.
>
> Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  net/ipv4/tcp_ulp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
> index 7dd44b6..a5995bb 100644
> --- a/net/ipv4/tcp_ulp.c
> +++ b/net/ipv4/tcp_ulp.c
> @@ -129,6 +129,8 @@ void tcp_cleanup_ulp(struct sock *sk)
>         if (icsk->icsk_ulp_ops->release)
>                 icsk->icsk_ulp_ops->release(sk);
>         module_put(icsk->icsk_ulp_ops->owner);
> +
> +       icsk->icsk_ulp_ops = NULL;
>  }
>
>  /* Change upper layer protocol for socket */
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 3/5] bpf, sockmap: fix leakage of smap_psock_map_entry
  2018-08-16 19:49 ` [PATCH bpf 3/5] bpf, sockmap: fix leakage of smap_psock_map_entry Daniel Borkmann
@ 2018-08-16 21:27   ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2018-08-16 21:27 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, John Fastabend, Networking

On Thu, Aug 16, 2018 at 12:49 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> While working on sockmap I noticed that we do not always kfree the
> struct smap_psock_map_entry list elements which track psocks attached
> to maps. In the case of sock_hash_ctx_update_elem(), these map entries
> are allocated outside of __sock_map_ctx_update_elem() with their
> linkage to the socket hash table filled. In the case of sock array,
> the map entries are allocated inside of __sock_map_ctx_update_elem()
> and added with their linkage to the psock->maps. Both additions are
> under psock->maps_lock each.
>
> Now, we drop these elements from their psock->maps list in a few
> occasions: i) in sock array via smap_list_map_remove() when an entry
> is either deleted from the map from user space, or updated via
> user space or BPF program where we drop the old socket at that map
> slot, or the sock array is freed via sock_map_free() and drops all
> its elements; ii) for sock hash via smap_list_hash_remove() in exactly
> the same occasions as just described for sock array; iii) in the
> bpf_tcp_close() where we remove the elements from the list via
> psock_map_pop() and iterate over them dropping themselves from either
> sock array or sock hash; and last but not least iv) once again in
> smap_gc_work() which is a callback for deferring the work once the
> psock refcount hit zero and thus the socket is being destroyed.
>
> Problem is that the only case where we kfree() the list entry is
> in case iv), which at that point should have an empty list in
> normal cases. So in cases from i) to iii) we unlink the elements
> without freeing where they go out of reach from us. Hence fix is
> to properly kfree() them as well to stop the leakage. Given these
> are all handled under psock->maps_lock there is no need for deferred
> RCU freeing.
>
> I later also ran with kmemleak detector and it confirmed the finding
> as well where in the state before the fix the object goes unreferenced
> while after the patch no kmemleak report related to BPF showed up.
>
>   [...]
>   unreferenced object 0xffff880378eadae0 (size 64):
>     comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
>     hex dump (first 32 bytes):
>       00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
>       50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  PMu]............
>     backtrace:
>       [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
>       [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60
>       [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960
>       [<000000002ef89e83>] 0xffffffffffffffff
>   unreferenced object 0xffff880378ead240 (size 64):
>     comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
>     hex dump (first 32 bytes):
>       00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
>       00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  .Du]............
>     backtrace:
>       [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
>       [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240
>       [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0
>       [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360
>       [<0000000000763660>] do_syscall_64+0x9a/0x300
>       [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>       [<000000002ef89e83>] 0xffffffffffffffff
>   [...]
>
> Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
> Fixes: 54fedb42c653 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
> Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/bpf/sockmap.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index 0c1a696..94a324b 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -370,6 +370,7 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
>                         }
>                         raw_spin_unlock_bh(&b->lock);
>                 }
> +               kfree(e);
>                 e = psock_map_pop(sk, psock);
>         }
>         rcu_read_unlock();
> @@ -1675,8 +1676,10 @@ static void smap_list_map_remove(struct smap_psock *psock,
>
>         spin_lock_bh(&psock->maps_lock);
>         list_for_each_entry_safe(e, tmp, &psock->maps, list) {
> -               if (e->entry == entry)
> +               if (e->entry == entry) {
>                         list_del(&e->list);
> +                       kfree(e);
> +               }
>         }
>         spin_unlock_bh(&psock->maps_lock);
>  }
> @@ -1690,8 +1693,10 @@ static void smap_list_hash_remove(struct smap_psock *psock,
>         list_for_each_entry_safe(e, tmp, &psock->maps, list) {
>                 struct htab_elem *c = rcu_dereference(e->hash_link);
>
> -               if (c == hash_link)
> +               if (c == hash_link) {
>                         list_del(&e->list);
> +                       kfree(e);
> +               }
>         }
>         spin_unlock_bh(&psock->maps_lock);
>  }
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 4/5] bpf, sockmap: fix map elem deletion race with smap_stop_sock
  2018-08-16 19:49 ` [PATCH bpf 4/5] bpf, sockmap: fix map elem deletion race with smap_stop_sock Daniel Borkmann
@ 2018-08-16 21:30   ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2018-08-16 21:30 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, John Fastabend, Networking

On Thu, Aug 16, 2018 at 12:49 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> The smap_start_sock() and smap_stop_sock() are each protected under
> the sock->sk_callback_lock from their call-sites except in the case
> of sock_map_delete_elem() where we drop the old socket from the map
> slot. This is racy because the same sock could be part of multiple
> sock maps, so we run smap_stop_sock() in parallel, and given at that
> point psock->strp_enabled might be true on both CPUs, we might for
> example wrongly restore the sk->sk_data_ready / sk->sk_write_space.
> Therefore, hold the sock->sk_callback_lock as well on delete. Looks
> like 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add
> multi-map support") had this right, but later on e9db4ef6bf4c ("bpf:
> sockhash fix omitted bucket lock in sock_close") removed it again
> from delete leaving this smap_stop_sock() instance unprotected.
>
> Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/bpf/sockmap.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index 94a324b..921cb6b 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -1786,8 +1786,11 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
>         if (!psock)
>                 goto out;
>
> -       if (psock->bpf_parse)
> +       if (psock->bpf_parse) {
> +               write_lock_bh(&sock->sk_callback_lock);
>                 smap_stop_sock(psock, sock);
> +               write_unlock_bh(&sock->sk_callback_lock);
> +       }
>         smap_list_map_remove(psock, &stab->sock_map[k]);
>         smap_release_sock(psock, sock);
>  out:
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 5/5] bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
  2018-08-16 19:49 ` [PATCH bpf 5/5] bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist Daniel Borkmann
@ 2018-08-16 21:51   ` Song Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2018-08-16 21:51 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, John Fastabend, Networking

On Thu, Aug 16, 2018 at 12:49 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> The current code in sock_map_ctx_update_elem() allows for BPF_EXIST
> and BPF_NOEXIST map update flags. While on array-like maps this approach
> is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others
> enforce map update flags to be BPF_ANY such that xchg() can be used
> directly, the current implementation in sock map does not guarantee
> that such operation with BPF_EXIST / BPF_NOEXIST is atomic.
>
> The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the
> socket from the slot which is then tested for NULL / non-NULL. However
> later after __sock_map_ctx_update_elem(), the actual update is done
> through osock = xchg(&stab->sock_map[i], sock). Problem is that in
> the meantime a different CPU could have updated / deleted a socket
> on that specific slot and thus flag contraints won't hold anymore.
>
> I've been thinking whether best would be to just break UAPI and do
> an enforcement of BPF_ANY to check if someone actually complains,
> however trouble is that already in BPF kselftest we use BPF_NOEXIST
> for the map update, and therefore it might have been copied into
> applications already. The fix to keep the current behavior intact
> would be to add a map lock similar to the sock hash bucket lock only
> for covering the whole map.
>
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  kernel/bpf/sockmap.c | 106 +++++++++++++++++++++++++++------------------------
>  1 file changed, 57 insertions(+), 49 deletions(-)
>
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index 921cb6b..98e621a 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -58,6 +58,7 @@ struct bpf_stab {
>         struct bpf_map map;
>         struct sock **sock_map;
>         struct bpf_sock_progs progs;
> +       raw_spinlock_t lock;
>  };
>
>  struct bucket {
> @@ -89,9 +90,9 @@ enum smap_psock_state {
>
>  struct smap_psock_map_entry {
>         struct list_head list;
> +       struct bpf_map *map;
>         struct sock **entry;
>         struct htab_elem __rcu *hash_link;
> -       struct bpf_htab __rcu *htab;
>  };
>
>  struct smap_psock {
> @@ -343,13 +344,18 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
>         e = psock_map_pop(sk, psock);
>         while (e) {
>                 if (e->entry) {
> -                       osk = cmpxchg(e->entry, sk, NULL);
> +                       struct bpf_stab *stab = container_of(e->map, struct bpf_stab, map);
> +
> +                       raw_spin_lock_bh(&stab->lock);
> +                       osk = *e->entry;
>                         if (osk == sk) {
> +                               *e->entry = NULL;
>                                 smap_release_sock(psock, sk);
>                         }
> +                       raw_spin_unlock_bh(&stab->lock);
>                 } else {
>                         struct htab_elem *link = rcu_dereference(e->hash_link);
> -                       struct bpf_htab *htab = rcu_dereference(e->htab);
> +                       struct bpf_htab *htab = container_of(e->map, struct bpf_htab, map);
>                         struct hlist_head *head;
>                         struct htab_elem *l;
>                         struct bucket *b;
> @@ -1642,6 +1648,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
>                 return ERR_PTR(-ENOMEM);
>
>         bpf_map_init_from_attr(&stab->map, attr);
> +       raw_spin_lock_init(&stab->lock);
>
>         /* make sure page count doesn't overflow */
>         cost = (u64) stab->map.max_entries * sizeof(struct sock *);
> @@ -1716,14 +1723,15 @@ static void sock_map_free(struct bpf_map *map)
>          * and a grace period expire to ensure psock is really safe to remove.
>          */
>         rcu_read_lock();
> +       raw_spin_lock_bh(&stab->lock);
>         for (i = 0; i < stab->map.max_entries; i++) {
>                 struct smap_psock *psock;
>                 struct sock *sock;
>
> -               sock = xchg(&stab->sock_map[i], NULL);
> +               sock = stab->sock_map[i];
>                 if (!sock)
>                         continue;
> -
> +               stab->sock_map[i] = NULL;
>                 psock = smap_psock_sk(sock);
>                 /* This check handles a racing sock event that can get the
>                  * sk_callback_lock before this case but after xchg happens
> @@ -1735,6 +1743,7 @@ static void sock_map_free(struct bpf_map *map)
>                         smap_release_sock(psock, sock);
>                 }
>         }
> +       raw_spin_unlock_bh(&stab->lock);
>         rcu_read_unlock();
>
>         sock_map_remove_complete(stab);
> @@ -1778,14 +1787,16 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
>         if (k >= map->max_entries)
>                 return -EINVAL;
>
> -       sock = xchg(&stab->sock_map[k], NULL);
> +       raw_spin_lock_bh(&stab->lock);
> +       sock = stab->sock_map[k];
> +       stab->sock_map[k] = NULL;
> +       raw_spin_unlock_bh(&stab->lock);
>         if (!sock)
>                 return -EINVAL;
>
>         psock = smap_psock_sk(sock);
>         if (!psock)
> -               goto out;
> -
> +               return 0;
>         if (psock->bpf_parse) {
>                 write_lock_bh(&sock->sk_callback_lock);
>                 smap_stop_sock(psock, sock);
> @@ -1793,7 +1804,6 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
>         }
>         smap_list_map_remove(psock, &stab->sock_map[k]);
>         smap_release_sock(psock, sock);
> -out:
>         return 0;
>  }
>
> @@ -1829,11 +1839,9 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
>  static int __sock_map_ctx_update_elem(struct bpf_map *map,
>                                       struct bpf_sock_progs *progs,
>                                       struct sock *sock,
> -                                     struct sock **map_link,
>                                       void *key)
>  {
>         struct bpf_prog *verdict, *parse, *tx_msg;
> -       struct smap_psock_map_entry *e = NULL;
>         struct smap_psock *psock;
>         bool new = false;
>         int err = 0;
> @@ -1906,14 +1914,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
>                 new = true;
>         }
>
> -       if (map_link) {
> -               e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
> -               if (!e) {
> -                       err = -ENOMEM;
> -                       goto out_free;
> -               }
> -       }
> -
>         /* 3. At this point we have a reference to a valid psock that is
>          * running. Attach any BPF programs needed.
>          */
> @@ -1935,17 +1935,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
>                 write_unlock_bh(&sock->sk_callback_lock);
>         }
>
> -       /* 4. Place psock in sockmap for use and stop any programs on
> -        * the old sock assuming its not the same sock we are replacing
> -        * it with. Because we can only have a single set of programs if
> -        * old_sock has a strp we can stop it.
> -        */
> -       if (map_link) {
> -               e->entry = map_link;
> -               spin_lock_bh(&psock->maps_lock);
> -               list_add_tail(&e->list, &psock->maps);
> -               spin_unlock_bh(&psock->maps_lock);
> -       }
>         return err;
>  out_free:
>         smap_release_sock(psock, sock);
> @@ -1956,7 +1945,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
>         }
>         if (tx_msg)
>                 bpf_prog_put(tx_msg);
> -       kfree(e);
>         return err;
>  }
>
> @@ -1966,36 +1954,57 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
>  {
>         struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
>         struct bpf_sock_progs *progs = &stab->progs;
> -       struct sock *osock, *sock;
> +       struct sock *osock, *sock = skops->sk;
> +       struct smap_psock_map_entry *e;
> +       struct smap_psock *psock;
>         u32 i = *(u32 *)key;
>         int err;
>
>         if (unlikely(flags > BPF_EXIST))
>                 return -EINVAL;
> -
>         if (unlikely(i >= stab->map.max_entries))
>                 return -E2BIG;
>
> -       sock = READ_ONCE(stab->sock_map[i]);
> -       if (flags == BPF_EXIST && !sock)
> -               return -ENOENT;
> -       else if (flags == BPF_NOEXIST && sock)
> -               return -EEXIST;
> +       e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
> +       if (!e)
> +               return -ENOMEM;
>
> -       sock = skops->sk;
> -       err = __sock_map_ctx_update_elem(map, progs, sock, &stab->sock_map[i],
> -                                        key);
> +       err = __sock_map_ctx_update_elem(map, progs, sock, key);
>         if (err)
>                 goto out;
>
> -       osock = xchg(&stab->sock_map[i], sock);
> -       if (osock) {
> -               struct smap_psock *opsock = smap_psock_sk(osock);
> +       /* psock guaranteed to be present. */
> +       psock = smap_psock_sk(sock);
> +       raw_spin_lock_bh(&stab->lock);
> +       osock = stab->sock_map[i];
> +       if (osock && flags == BPF_NOEXIST) {
> +               err = -EEXIST;
> +               goto out_unlock;
> +       }
> +       if (!osock && flags == BPF_EXIST) {
> +               err = -ENOENT;
> +               goto out_unlock;
> +       }
> +
> +       e->entry = &stab->sock_map[i];
> +       e->map = map;
> +       spin_lock_bh(&psock->maps_lock);
> +       list_add_tail(&e->list, &psock->maps);
> +       spin_unlock_bh(&psock->maps_lock);
>
> -               smap_list_map_remove(opsock, &stab->sock_map[i]);
> -               smap_release_sock(opsock, osock);
> +       stab->sock_map[i] = sock;
> +       if (osock) {
> +               psock = smap_psock_sk(osock);
> +               smap_list_map_remove(psock, &stab->sock_map[i]);
> +               smap_release_sock(psock, osock);
>         }
> +       raw_spin_unlock_bh(&stab->lock);
> +       return 0;
> +out_unlock:
> +       smap_release_sock(psock, sock);
> +       raw_spin_unlock_bh(&stab->lock);
>  out:
> +       kfree(e);
>         return err;
>  }
>
> @@ -2358,7 +2367,7 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
>         b = __select_bucket(htab, hash);
>         head = &b->head;
>
> -       err = __sock_map_ctx_update_elem(map, progs, sock, NULL, key);
> +       err = __sock_map_ctx_update_elem(map, progs, sock, key);
>         if (err)
>                 goto err;
>
> @@ -2384,8 +2393,7 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
>         }
>
>         rcu_assign_pointer(e->hash_link, l_new);
> -       rcu_assign_pointer(e->htab,
> -                          container_of(map, struct bpf_htab, map));
> +       e->map = map;
>         spin_lock_bh(&psock->maps_lock);
>         list_add_tail(&e->list, &psock->maps);
>         spin_unlock_bh(&psock->maps_lock);
> --
> 2.9.5
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf 0/5] BPF sockmap and ulp fixes
  2018-08-16 19:49 [PATCH bpf 0/5] BPF sockmap and ulp fixes Daniel Borkmann
                   ` (4 preceding siblings ...)
  2018-08-16 19:49 ` [PATCH bpf 5/5] bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist Daniel Borkmann
@ 2018-08-16 22:06 ` Alexei Starovoitov
  5 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2018-08-16 22:06 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: john.fastabend, netdev

On Thu, Aug 16, 2018 at 09:49:05PM +0200, Daniel Borkmann wrote:
> Batch of various fixes related to BPF sockmap and ULP, including
> adding module alias to restrict module requests, races and memory
> leaks in sockmap code. For details please refer to the individual
> patches. Thanks!

Applied to bpf tree, Thanks everyone

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-17  1:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-16 19:49 [PATCH bpf 0/5] BPF sockmap and ulp fixes Daniel Borkmann
2018-08-16 19:49 ` [PATCH bpf 1/5] tcp, ulp: add alias for all ulp modules Daniel Borkmann
2018-08-16 21:25   ` Song Liu
2018-08-16 19:49 ` [PATCH bpf 2/5] tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach Daniel Borkmann
2018-08-16 21:26   ` Song Liu
2018-08-16 19:49 ` [PATCH bpf 3/5] bpf, sockmap: fix leakage of smap_psock_map_entry Daniel Borkmann
2018-08-16 21:27   ` Song Liu
2018-08-16 19:49 ` [PATCH bpf 4/5] bpf, sockmap: fix map elem deletion race with smap_stop_sock Daniel Borkmann
2018-08-16 21:30   ` Song Liu
2018-08-16 19:49 ` [PATCH bpf 5/5] bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist Daniel Borkmann
2018-08-16 21:51   ` Song Liu
2018-08-16 22:06 ` [PATCH bpf 0/5] BPF sockmap and ulp fixes Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox