netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/4] net: fix and tighten rcu dereference checks
@ 2016-03-31 23:29 Hannes Frederic Sowa
  2016-03-31 23:29 ` [PATCH net 1/4] tun: add socket locking around sk_{attach,detach}_filter Hannes Frederic Sowa
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-03-31 23:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

Only the first patch is really applicable for stable. It adds appropriate
socket locks so lockdep doesn't complain if tuntap's ioctls modify the
filters on the socket.

Rest of the patches tighten the rcu dereference socket lock checks.

Last patch fixes missing rcu_read_locks which were discovered by this
change. Certainly there are more to come.

Hannes Frederic Sowa (4):
  tun: add socket locking around sk_{attach,detach}_filter
  net: proper check if we hold the socket lock during dereference
  sock: use lockdep_sock_is_held were appropriate
  tcp: various missing rcu_read_lock around __sk_dst_get

 drivers/net/tun.c        | 12 ++++++++++++
 include/net/sock.h       | 10 ++++++++--
 include/net/tcp.h        |  5 ++++-
 net/core/filter.c        |  6 +++---
 net/core/sock.c          |  7 +++++--
 net/dccp/ipv4.c          |  2 +-
 net/dccp/ipv6.c          |  2 +-
 net/ipv4/af_inet.c       |  2 +-
 net/ipv4/cipso_ipv4.c    |  3 ++-
 net/ipv4/ip_sockglue.c   |  4 ++--
 net/ipv4/tcp_input.c     | 18 ++++++++++++++----
 net/ipv4/tcp_ipv4.c      |  8 +++-----
 net/ipv4/tcp_metrics.c   | 12 +++++-------
 net/ipv4/tcp_output.c    | 22 ++++++++++++++++++++--
 net/ipv6/ipv6_sockglue.c |  6 ++++--
 net/ipv6/tcp_ipv6.c      |  2 +-
 net/socket.c             |  2 +-
 17 files changed, 87 insertions(+), 36 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH net 1/4] tun: add socket locking around sk_{attach,detach}_filter
  2016-03-31 23:29 [PATCH net 0/4] net: fix and tighten rcu dereference checks Hannes Frederic Sowa
@ 2016-03-31 23:29 ` Hannes Frederic Sowa
  2016-03-31 23:29 ` [PATCH net 2/4] net: proper check if we hold the socket lock during dereference Hannes Frederic Sowa
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-03-31 23:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

[Changelog stolen from Daniel Borkmann:]

Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:

  [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
  [   52.765688] other info that might help us debug this:
  [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
  [   52.765701] 1 lock held by a.out/1525:
  [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
  [   52.765721] stack backtrace:
  [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
  [...]
  [   52.765768] Call Trace:
  [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
  [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
  [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
  [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
  [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
  [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
  [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
  [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
  [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
  [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
  [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
  [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25

Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.

Since the fix in commit f91ff5b9ff52 ("net: sk_{detach|attach}_filter()
rcu fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.

Since its introduction in commit 994051625981 ("tun: socket
filter support"), tap filters are managed under RTNL lock from
__tun_chr_ioctl(). Thus the sock_owned_by_user(sk) doesn't apply in this
specific case and therefore triggers the false positive.

Simply holding the locks during the filter updates will fix this problem.

Fixes: 994051625981 ("tun: socket filter support")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 drivers/net/tun.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950617c36e..dccbbacbbc7f02 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -622,7 +622,11 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
 
 	/* Re-attach the filter to persist device */
 	if (!skip_filter && (tun->filter_attached == true)) {
+		bool slow;
+
+		slow = lock_sock_fast(tfile->socket.sk);
 		err = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+		unlock_sock_fast(tfile->socket.sk, slow);
 		if (!err)
 			goto out;
 	}
@@ -1821,8 +1825,12 @@ static void tun_detach_filter(struct tun_struct *tun, int n)
 	struct tun_file *tfile;
 
 	for (i = 0; i < n; i++) {
+		bool slow;
+
 		tfile = rtnl_dereference(tun->tfiles[i]);
+		slow = lock_sock_fast(tfile->socket.sk);
 		sk_detach_filter(tfile->socket.sk);
+		unlock_sock_fast(tfile->socket.sk, slow);
 	}
 
 	tun->filter_attached = false;
@@ -1834,8 +1842,12 @@ static int tun_attach_filter(struct tun_struct *tun)
 	struct tun_file *tfile;
 
 	for (i = 0; i < tun->numqueues; i++) {
+		bool slow;
+
 		tfile = rtnl_dereference(tun->tfiles[i]);
+		slow = lock_sock_fast(tfile->socket.sk);
 		ret = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+		unlock_sock_fast(tfile->socket.sk, slow);
 		if (ret) {
 			tun_detach_filter(tun, i);
 			return ret;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net 2/4] net: proper check if we hold the socket lock during dereference
  2016-03-31 23:29 [PATCH net 0/4] net: fix and tighten rcu dereference checks Hannes Frederic Sowa
  2016-03-31 23:29 ` [PATCH net 1/4] tun: add socket locking around sk_{attach,detach}_filter Hannes Frederic Sowa
@ 2016-03-31 23:29 ` Hannes Frederic Sowa
  2016-03-31 23:29 ` [PATCH net 3/4] sock: use lockdep_sock_is_held were appropriate Hannes Frederic Sowa
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-03-31 23:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

lockdep_sock_is_held makes sure that we currently own the
lock. sock_owned_by_user simply checks if a user holds the socket. This
could lead to non deterministic lock checks.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/sock.h | 5 +++++
 net/core/filter.c  | 6 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e03727b73..30f9b5ad0a82ef 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1382,6 +1382,11 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
 		spin_unlock_bh(&sk->sk_lock.slock);
 }
 
+static bool lockdep_sock_is_held(struct sock *sk)
+{
+	return lockdep_is_held(&sk->sk_lock) ||
+	       lockdep_is_held(&sk->sk_lock.slock);
+}
 
 struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		      struct proto *prot, int kern);
diff --git a/net/core/filter.c b/net/core/filter.c
index 4b81b71171b4ce..e8486ba601eae7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1166,7 +1166,7 @@ static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
 	}
 
 	old_fp = rcu_dereference_protected(sk->sk_filter,
-					   sock_owned_by_user(sk));
+					   lockdep_sock_is_held(sk));
 	rcu_assign_pointer(sk->sk_filter, fp);
 
 	if (old_fp)
@@ -2259,7 +2259,7 @@ int sk_detach_filter(struct sock *sk)
 		return -EPERM;
 
 	filter = rcu_dereference_protected(sk->sk_filter,
-					   sock_owned_by_user(sk));
+					   lockdep_sock_is_held(sk));
 	if (filter) {
 		RCU_INIT_POINTER(sk->sk_filter, NULL);
 		sk_filter_uncharge(sk, filter);
@@ -2279,7 +2279,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 
 	lock_sock(sk);
 	filter = rcu_dereference_protected(sk->sk_filter,
-					   sock_owned_by_user(sk));
+					   lockdep_sock_is_held(sk));
 	if (!filter)
 		goto out;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net 3/4] sock: use lockdep_sock_is_held were appropriate
  2016-03-31 23:29 [PATCH net 0/4] net: fix and tighten rcu dereference checks Hannes Frederic Sowa
  2016-03-31 23:29 ` [PATCH net 1/4] tun: add socket locking around sk_{attach,detach}_filter Hannes Frederic Sowa
  2016-03-31 23:29 ` [PATCH net 2/4] net: proper check if we hold the socket lock during dereference Hannes Frederic Sowa
@ 2016-03-31 23:29 ` Hannes Frederic Sowa
  2016-03-31 23:29 ` [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get Hannes Frederic Sowa
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-03-31 23:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

Also make lockdep_sock_is_held accept const arguments, so we don't need to
modify too many callers.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/sock.h       | 7 ++++---
 net/dccp/ipv4.c          | 2 +-
 net/dccp/ipv6.c          | 2 +-
 net/ipv4/af_inet.c       | 2 +-
 net/ipv4/cipso_ipv4.c    | 3 ++-
 net/ipv4/ip_sockglue.c   | 4 ++--
 net/ipv4/tcp_ipv4.c      | 8 +++-----
 net/ipv6/ipv6_sockglue.c | 6 ++++--
 net/ipv6/tcp_ipv6.c      | 2 +-
 net/socket.c             | 2 +-
 10 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 30f9b5ad0a82ef..bbea02fdaaa0fd 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1382,8 +1382,9 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
 		spin_unlock_bh(&sk->sk_lock.slock);
 }
 
-static bool lockdep_sock_is_held(struct sock *sk)
+static bool lockdep_sock_is_held(const struct sock *csk)
 {
+	struct sock *sk = (struct sock *)csk;
 	return lockdep_is_held(&sk->sk_lock) ||
 	       lockdep_is_held(&sk->sk_lock.slock);
 }
@@ -1589,8 +1590,8 @@ static inline void sk_rethink_txhash(struct sock *sk)
 static inline struct dst_entry *
 __sk_dst_get(struct sock *sk)
 {
-	return rcu_dereference_check(sk->sk_dst_cache, sock_owned_by_user(sk) ||
-						       lockdep_is_held(&sk->sk_lock.slock));
+	return rcu_dereference_check(sk->sk_dst_cache,
+				     lockdep_sock_is_held(sk));
 }
 
 static inline struct dst_entry *
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 9c67a961ba5382..0ea298d849383f 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -62,7 +62,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	nexthop = daddr = usin->sin_addr.s_addr;
 
 	inet_opt = rcu_dereference_protected(inet->inet_opt,
-					     sock_owned_by_user(sk));
+					     lockdep_sock_is_held(sk));
 	if (inet_opt != NULL && inet_opt->opt.srr) {
 		if (daddr == 0)
 			return -EINVAL;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 4663a01d503991..6ea214fa5499c5 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -865,7 +865,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	fl6.fl6_sport = inet->inet_sport;
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-	opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+	opt = rcu_dereference_protected(np->opt, lockdep_sock_is_held(sk));
 	final_p = fl6_update_dst(&fl6, opt, &final);
 
 	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9e481992dbaef2..7e37ebb5af396e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1106,7 +1106,7 @@ static int inet_sk_reselect_saddr(struct sock *sk)
 	struct ip_options_rcu *inet_opt;
 
 	inet_opt = rcu_dereference_protected(inet->inet_opt,
-					     sock_owned_by_user(sk));
+					     lockdep_sock_is_held(sk));
 	if (inet_opt && inet_opt->opt.srr)
 		daddr = inet_opt->opt.faddr;
 
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index bdb2a07ec363b7..40d6b87713a132 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1933,7 +1933,8 @@ int cipso_v4_sock_setattr(struct sock *sk,
 
 	sk_inet = inet_sk(sk);
 
-	old = rcu_dereference_protected(sk_inet->inet_opt, sock_owned_by_user(sk));
+	old = rcu_dereference_protected(sk_inet->inet_opt,
+					lockdep_sock_is_held(sk));
 	if (sk_inet->is_icsk) {
 		sk_conn = inet_csk(sk);
 		if (old)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 035ad645a8d9d8..cf073059192d99 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -635,7 +635,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 		if (err)
 			break;
 		old = rcu_dereference_protected(inet->inet_opt,
-						sock_owned_by_user(sk));
+						lockdep_sock_is_held(sk));
 		if (inet->is_icsk) {
 			struct inet_connection_sock *icsk = inet_csk(sk);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1295,7 +1295,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 		struct ip_options_rcu *inet_opt;
 
 		inet_opt = rcu_dereference_protected(inet->inet_opt,
-						     sock_owned_by_user(sk));
+						     lockdep_sock_is_held(sk));
 		opt->optlen = 0;
 		if (inet_opt)
 			memcpy(optbuf, &inet_opt->opt,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ad450509029bce..17cc1840337756 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -157,7 +157,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 
 	nexthop = daddr = usin->sin_addr.s_addr;
 	inet_opt = rcu_dereference_protected(inet->inet_opt,
-					     sock_owned_by_user(sk));
+					     lockdep_sock_is_held(sk));
 	if (inet_opt && inet_opt->opt.srr) {
 		if (!daddr)
 			return -EINVAL;
@@ -882,8 +882,7 @@ struct tcp_md5sig_key *tcp_md5_do_lookup(const struct sock *sk,
 
 	/* caller either holds rcu_read_lock() or socket lock */
 	md5sig = rcu_dereference_check(tp->md5sig_info,
-				       sock_owned_by_user(sk) ||
-				       lockdep_is_held((spinlock_t *)&sk->sk_lock.slock));
+				       lockdep_sock_is_held(sk));
 	if (!md5sig)
 		return NULL;
 #if IS_ENABLED(CONFIG_IPV6)
@@ -928,8 +927,7 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
 	}
 
 	md5sig = rcu_dereference_protected(tp->md5sig_info,
-					   sock_owned_by_user(sk) ||
-					   lockdep_is_held(&sk->sk_lock.slock));
+					   lockdep_sock_is_held(sk));
 	if (!md5sig) {
 		md5sig = kmalloc(sizeof(*md5sig), gfp);
 		if (!md5sig)
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4449ad1f81147c..516b6a31c30f7b 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -407,7 +407,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
 			break;
 
-		opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+		opt = rcu_dereference_protected(np->opt,
+						lockdep_sock_is_held(sk));
 		opt = ipv6_renew_options(sk, opt, optname,
 					 (struct ipv6_opt_hdr __user *)optval,
 					 optlen);
@@ -1123,7 +1124,8 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		struct ipv6_txoptions *opt;
 
 		lock_sock(sk);
-		opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+		opt = rcu_dereference_protected(np->opt,
+						lockdep_sock_is_held(sk));
 		len = ipv6_getsockopt_sticky(sk, opt, optname, optval, len);
 		release_sock(sk);
 		/* check if ipv6_getsockopt_sticky() returns err code */
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 711d209f912473..bd16dc4b6ba71b 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -234,7 +234,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	fl6.fl6_dport = usin->sin6_port;
 	fl6.fl6_sport = inet->inet_sport;
 
-	opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
+	opt = rcu_dereference_protected(np->opt, lockdep_sock_is_held(sk));
 	final_p = fl6_update_dst(&fl6, opt, &final);
 
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
diff --git a/net/socket.c b/net/socket.c
index 5f77a8e93830bd..e3299cdfe9db39 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1046,7 +1046,7 @@ static int sock_fasync(int fd, struct file *filp, int on)
 		return -EINVAL;
 
 	lock_sock(sk);
-	wq = rcu_dereference_protected(sock->wq, sock_owned_by_user(sk));
+	wq = rcu_dereference_protected(sock->wq, lockdep_sock_is_held(sk));
 	fasync_helper(fd, filp, on, &wq->fasync_list);
 
 	if (!wq->fasync_list)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-03-31 23:29 [PATCH net 0/4] net: fix and tighten rcu dereference checks Hannes Frederic Sowa
                   ` (2 preceding siblings ...)
  2016-03-31 23:29 ` [PATCH net 3/4] sock: use lockdep_sock_is_held were appropriate Hannes Frederic Sowa
@ 2016-03-31 23:29 ` Hannes Frederic Sowa
  2016-03-31 23:39   ` Eric Dumazet
  2016-03-31 23:48 ` [PATCH net 5/4] tcp: fix rcu usage around __sk_dst_get in tcp_update_metrics Hannes Frederic Sowa
  2016-03-31 23:56 ` [PATCH net 6/4] tcp: fix __sk_dst_get usage in tcp_current_mss Hannes Frederic Sowa
  5 siblings, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-03-31 23:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

Various fixes which were discovered by this changeset. More probably
to come...

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/tcp.h      |  5 ++++-
 net/core/sock.c        |  7 +++++--
 net/ipv4/tcp_input.c   | 18 ++++++++++++++----
 net/ipv4/tcp_metrics.c | 12 +++++-------
 net/ipv4/tcp_output.c  | 22 ++++++++++++++++++++--
 5 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index b91370f61be64a..541c99bf633d4b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -647,11 +647,14 @@ static inline void tcp_fast_path_check(struct sock *sk)
 /* Compute the actual rto_min value */
 static inline u32 tcp_rto_min(struct sock *sk)
 {
-	const struct dst_entry *dst = __sk_dst_get(sk);
+	const struct dst_entry *dst;
 	u32 rto_min = TCP_RTO_MIN;
 
+	rcu_read_lock();
+	dst = __sk_dst_get(sk);
 	if (dst && dst_metric_locked(dst, RTAX_RTO_MIN))
 		rto_min = dst_metric_rtt(dst, RTAX_RTO_MIN);
+	rcu_read_unlock();
 	return rto_min;
 }
 
diff --git a/net/core/sock.c b/net/core/sock.c
index b67b9aedb230f9..963d0ba7aa4232 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -486,14 +486,17 @@ EXPORT_SYMBOL(sk_receive_skb);
 
 struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 {
-	struct dst_entry *dst = __sk_dst_get(sk);
+	struct dst_entry *dst;
 
+	rcu_read_lock();
+	dst = __sk_dst_get(sk);
 	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
 		sk_tx_queue_clear(sk);
 		RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
 		dst_release(dst);
-		return NULL;
+		dst = NULL;
 	}
+	rcu_read_unlock();
 
 	return dst;
 }
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e6e65f79ade821..8489e9e45f906c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -203,10 +203,16 @@ static void tcp_enter_quickack_mode(struct sock *sk)
 static bool tcp_in_quickack_mode(struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
-	const struct dst_entry *dst = __sk_dst_get(sk);
+	const struct dst_entry *dst;
+	bool ret;
 
-	return (dst && dst_metric(dst, RTAX_QUICKACK)) ||
-		(icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);
+	rcu_read_lock();
+	dst = __sk_dst_get(sk);
+	ret = (dst && dst_metric(dst, RTAX_QUICKACK)) ||
+	      (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);
+	rcu_read_unlock();
+
+	return ret;
 }
 
 static void tcp_ecn_queue_cwr(struct tcp_sock *tp)
@@ -3674,9 +3680,13 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 		tcp_process_tlp_ack(sk, ack, flag);
 
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
-		struct dst_entry *dst = __sk_dst_get(sk);
+		struct dst_entry *dst;
+
+		rcu_read_lock();
+		dst = __sk_dst_get(sk);
 		if (dst)
 			dst_confirm(dst);
+		rcu_read_unlock();
 	}
 
 	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 7b7eec43990692..33a36648423e8b 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -488,22 +488,20 @@ out_unlock:
 
 void tcp_init_metrics(struct sock *sk)
 {
-	struct dst_entry *dst = __sk_dst_get(sk);
+	struct dst_entry *dst;
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct tcp_metrics_block *tm;
 	u32 val, crtt = 0; /* cached RTT scaled by 8 */
 
+	rcu_read_lock();
+	dst = __sk_dst_get(sk);
 	if (!dst)
 		goto reset;
-
 	dst_confirm(dst);
 
-	rcu_read_lock();
 	tm = tcp_get_metrics(sk, dst, true);
-	if (!tm) {
-		rcu_read_unlock();
+	if (!tm)
 		goto reset;
-	}
 
 	if (tcp_metric_locked(tm, TCP_METRIC_CWND))
 		tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
@@ -527,7 +525,6 @@ void tcp_init_metrics(struct sock *sk)
 	}
 
 	crtt = tcp_metric_get(tm, TCP_METRIC_RTT);
-	rcu_read_unlock();
 reset:
 	/* The initial RTT measurement from the SYN/SYN-ACK is not ideal
 	 * to seed the RTO for later data packets because SYN packets are
@@ -575,6 +572,7 @@ reset:
 	else
 		tp->snd_cwnd = tcp_init_cwnd(tp, dst);
 	tp->snd_cwnd_stamp = tcp_time_stamp;
+	rcu_read_unlock();
 }
 
 bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7d2dc015cd19a6..ba3621834e7bfa 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -548,6 +548,7 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 	struct tcp_fastopen_request *fastopen = tp->fastopen_req;
 
 #ifdef CONFIG_TCP_MD5SIG
+	rcu_read_lock();
 	*md5 = tp->af_specific->md5_lookup(sk, sk);
 	if (*md5) {
 		opts->options |= OPTION_MD5;
@@ -601,6 +602,10 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
+#ifdef CONFIG_TCP_MD5SIG
+	rcu_read_unlock();
+#endif
+
 	return MAX_TCP_OPTION_SPACE - remaining;
 }
 
@@ -928,6 +933,10 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	tcb = TCP_SKB_CB(skb);
 	memset(&opts, 0, sizeof(opts));
 
+#ifdef CONFIG_TCP_MD5SIG
+	rcu_read_lock();
+#endif
+
 	if (unlikely(tcb->tcp_flags & TCPHDR_SYN))
 		tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5);
 	else
@@ -996,6 +1005,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 		tp->af_specific->calc_md5_hash(opts.hash_location,
 					       md5, sk, skb);
 	}
+	rcu_read_unlock();
 #endif
 
 	icsk->icsk_af_ops->send_check(sk, skb);
@@ -1294,10 +1304,13 @@ static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu)
 
 	/* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */
 	if (icsk->icsk_af_ops->net_frag_header_len) {
-		const struct dst_entry *dst = __sk_dst_get(sk);
+		const struct dst_entry *dst;
 
+		rcu_read_lock();
+		dst = __sk_dst_get(sk);
 		if (dst && dst_allfrag(dst))
 			mss_now -= icsk->icsk_af_ops->net_frag_header_len;
+		rcu_read_unlock();
 	}
 
 	/* Clamp it (mss_clamp does not include tcp options) */
@@ -1335,10 +1348,13 @@ int tcp_mss_to_mtu(struct sock *sk, int mss)
 
 	/* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */
 	if (icsk->icsk_af_ops->net_frag_header_len) {
-		const struct dst_entry *dst = __sk_dst_get(sk);
+		const struct dst_entry *dst;
 
+		rcu_read_lock();
+		dst = __sk_dst_get(sk);
 		if (dst && dst_allfrag(dst))
 			mtu += icsk->icsk_af_ops->net_frag_header_len;
+		rcu_read_unlock();
 	}
 	return mtu;
 }
@@ -1424,8 +1440,10 @@ unsigned int tcp_current_mss(struct sock *sk)
 			mss_now = tcp_sync_mss(sk, mtu);
 	}
 
+	rcu_read_lock();
 	header_len = tcp_established_options(sk, NULL, &opts, &md5) +
 		     sizeof(struct tcphdr);
+	rcu_read_unlock();
 	/* The mss_cache is sized based on tp->tcp_header_len, which assumes
 	 * some common options. If this is an odd packet (because we have SACK
 	 * blocks etc) then our calculated header_len will be different, and
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-03-31 23:29 ` [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get Hannes Frederic Sowa
@ 2016-03-31 23:39   ` Eric Dumazet
  2016-04-01  0:01     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-03-31 23:39 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Fri, 2016-04-01 at 01:29 +0200, Hannes Frederic Sowa wrote:
> Various fixes which were discovered by this changeset. More probably
> to come...


Really ?

Again, TCP stack locks the socket most of the time.

The fact that lockdep does not understand this is not a reason to add
this overhead.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH net 5/4] tcp: fix rcu usage around __sk_dst_get in tcp_update_metrics
  2016-03-31 23:29 [PATCH net 0/4] net: fix and tighten rcu dereference checks Hannes Frederic Sowa
                   ` (3 preceding siblings ...)
  2016-03-31 23:29 ` [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get Hannes Frederic Sowa
@ 2016-03-31 23:48 ` Hannes Frederic Sowa
  2016-03-31 23:56 ` [PATCH net 6/4] tcp: fix __sk_dst_get usage in tcp_current_mss Hannes Frederic Sowa
  5 siblings, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-03-31 23:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

Already another one fix I overlooked.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/tcp_metrics.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 33a36648423e8b..196de79902819a 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -367,7 +367,7 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk,
 void tcp_update_metrics(struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
-	struct dst_entry *dst = __sk_dst_get(sk);
+	struct dst_entry *dst;
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct net *net = sock_net(sk);
 	struct tcp_metrics_block *tm;
@@ -375,13 +375,14 @@ void tcp_update_metrics(struct sock *sk)
 	u32 val;
 	int m;
 
+	rcu_read_lock();
+	dst = __sk_dst_get(sk);
 	if (sysctl_tcp_nometrics_save || !dst)
-		return;
+		goto out_unlock;
 
 	if (dst->flags & DST_HOST)
 		dst_confirm(dst);
 
-	rcu_read_lock();
 	if (icsk->icsk_backoff || !tp->srtt_us) {
 		/* This session failed to estimate rtt. Why?
 		 * Probably, no packets returned in time.  Reset our
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net 6/4] tcp: fix __sk_dst_get usage in tcp_current_mss
  2016-03-31 23:29 [PATCH net 0/4] net: fix and tighten rcu dereference checks Hannes Frederic Sowa
                   ` (4 preceding siblings ...)
  2016-03-31 23:48 ` [PATCH net 5/4] tcp: fix rcu usage around __sk_dst_get in tcp_update_metrics Hannes Frederic Sowa
@ 2016-03-31 23:56 ` Hannes Frederic Sowa
  5 siblings, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-03-31 23:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/tcp_output.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ba3621834e7bfa..3f70582578ada0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1426,21 +1426,21 @@ EXPORT_SYMBOL(tcp_sync_mss);
 unsigned int tcp_current_mss(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
-	const struct dst_entry *dst = __sk_dst_get(sk);
+	const struct dst_entry *dst;
 	u32 mss_now;
 	unsigned int header_len;
 	struct tcp_out_options opts;
 	struct tcp_md5sig_key *md5;
 
+	rcu_read_lock();
 	mss_now = tp->mss_cache;
-
+	dst = __sk_dst_get(sk);
 	if (dst) {
 		u32 mtu = dst_mtu(dst);
 		if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
 			mss_now = tcp_sync_mss(sk, mtu);
 	}
 
-	rcu_read_lock();
 	header_len = tcp_established_options(sk, NULL, &opts, &md5) +
 		     sizeof(struct tcphdr);
 	rcu_read_unlock();
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-03-31 23:39   ` Eric Dumazet
@ 2016-04-01  0:01     ` Hannes Frederic Sowa
  2016-04-01  0:12       ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  0:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

Hi Eric,

On 01.04.2016 01:39, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 01:29 +0200, Hannes Frederic Sowa wrote:
>> Various fixes which were discovered by this changeset. More probably
>> to come...
>
>
> Really ?
>
> Again, TCP stack locks the socket most of the time.
>
> The fact that lockdep does not understand this is not a reason to add
> this overhead.

I don't see how lockdep does not understand this? I think I added the 
appropriate helper to exactly verify if we have the socket lock with 
lockdep, please have a look at lockdep_sock_is_held in patch #2.

Some of the patches also just reorder the rcu_read_lock, which is anyway 
mostly free.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  0:01     ` Hannes Frederic Sowa
@ 2016-04-01  0:12       ` Eric Dumazet
  2016-04-01  0:21         ` Hannes Frederic Sowa
  2016-04-01  0:30         ` Hannes Frederic Sowa
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2016-04-01  0:12 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Fri, 2016-04-01 at 02:01 +0200, Hannes Frederic Sowa wrote:
> Hi Eric,
> 
> On 01.04.2016 01:39, Eric Dumazet wrote:
> > On Fri, 2016-04-01 at 01:29 +0200, Hannes Frederic Sowa wrote:
> >> Various fixes which were discovered by this changeset. More probably
> >> to come...
> >
> >
> > Really ?
> >
> > Again, TCP stack locks the socket most of the time.
> >
> > The fact that lockdep does not understand this is not a reason to add
> > this overhead.
> 
> I don't see how lockdep does not understand this? I think I added the 
> appropriate helper to exactly verify if we have the socket lock with 
> lockdep, please have a look at lockdep_sock_is_held in patch #2.
> 
> Some of the patches also just reorder the rcu_read_lock, which is anyway 
> mostly free.

I strongly disagree. Adding rcu_read_lock() everywhere as you did gives
a sense of 'security' by merely shutting down maybe good signals.

Your changelog explains nothing, and your patch makes absolutely no
sense to me.

Lets take following example :

 struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 {
-       struct dst_entry *dst = __sk_dst_get(sk);
+       struct dst_entry *dst;
 
+       rcu_read_lock();
+       dst = __sk_dst_get(sk);
        if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
                sk_tx_queue_clear(sk);
                RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
                dst_release(dst);
-               return NULL;
+               dst = NULL;
        }
+       rcu_read_unlock();
 
        return dst;
 }


Since this function does not take a reference on the dst, how could it
be safe ?

As soon as you exit the function, dst would possibly point to something
obsolete/freed.

This works because the caller owns the socket.

If you believe one path needs to be fixed, tell me which one it is.

Please ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  0:12       ` Eric Dumazet
@ 2016-04-01  0:21         ` Hannes Frederic Sowa
  2016-04-01  1:19           ` Eric Dumazet
  2016-04-01  0:30         ` Hannes Frederic Sowa
  1 sibling, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  0:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On 01.04.2016 02:12, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 02:01 +0200, Hannes Frederic Sowa wrote:
>> Hi Eric,
>>
>> On 01.04.2016 01:39, Eric Dumazet wrote:
>>> On Fri, 2016-04-01 at 01:29 +0200, Hannes Frederic Sowa wrote:
>>>> Various fixes which were discovered by this changeset. More probably
>>>> to come...
>>>
>>>
>>> Really ?
>>>
>>> Again, TCP stack locks the socket most of the time.
>>>
>>> The fact that lockdep does not understand this is not a reason to add
>>> this overhead.
>>
>> I don't see how lockdep does not understand this? I think I added the
>> appropriate helper to exactly verify if we have the socket lock with
>> lockdep, please have a look at lockdep_sock_is_held in patch #2.
>>
>> Some of the patches also just reorder the rcu_read_lock, which is anyway
>> mostly free.
>
> I strongly disagree. Adding rcu_read_lock() everywhere as you did gives
> a sense of 'security' by merely shutting down maybe good signals.

I did those patches by adding the first patches and running tcp tests, 
so I have splats for every single one of those. I just didn't bother 
them into the changelog. I certainly can do so.

> Your changelog explains nothing, and your patch makes absolutely no
> sense to me.
>
> Lets take following example :
>
>   struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
>   {
> -       struct dst_entry *dst = __sk_dst_get(sk);
> +       struct dst_entry *dst;
>
> +       rcu_read_lock();
> +       dst = __sk_dst_get(sk);
>          if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
>                  sk_tx_queue_clear(sk);
>                  RCU_INIT_POINTER(sk->sk_dst_cache, NULL);
>                  dst_release(dst);
> -               return NULL;
> +               dst = NULL;
>          }
> +       rcu_read_unlock();
>
>          return dst;
>   }
>
>
> Since this function does not take a reference on the dst, how could it
> be safe ?

This is stupid by me, I am sorry, thanks for pointing this out. I will
fix this. Just looking too long at all those lockdep reports.

I will find another fix for this.

> As soon as you exit the function, dst would possibly point to something
> obsolete/freed.
>
> This works because the caller owns the socket.
>
> If you believe one path needs to be fixed, tell me which one it is.
>
> Please ?

[   31.064029] ===============================
[   31.064030] [ INFO: suspicious RCU usage. ]
[   31.064032] 4.5.0+ #13 Not tainted
[   31.064033] -------------------------------
[   31.064034] include/net/sock.h:1594 suspicious 
rcu_dereference_check() usage!
[   31.064035]
                other info that might help us debug this:

[   31.064041]
                rcu_scheduler_active = 1, debug_locks = 1
[   31.064042] no locks held by ssh/817.
[   31.064043]
                stack backtrace:
[   31.064045] CPU: 0 PID: 817 Comm: ssh Not tainted 4.5.0+ #13
[   31.064046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.8.2-20150714_191134- 04/01/2014
[   31.064047]  0000000000000286 000000006730b46b ffff8800badf7bd0 
ffffffff81442b33
[   31.064050]  ffff8800b8c78000 0000000000000001 ffff8800badf7c00 
ffffffff8110ae75
[   31.064052]  ffff880035ea2f00 ffff8800b8e28000 0000000000000003 
00000000000004c4
[   31.064054] Call Trace:
[   31.064058]  [<ffffffff81442b33>] dump_stack+0x85/0xc2
[   31.064062]  [<ffffffff8110ae75>] lockdep_rcu_suspicious+0xc5/0x100
[   31.064064]  [<ffffffff8173bf57>] __sk_dst_check+0x77/0xb0
[   31.064066]  [<ffffffff8182e502>] inet6_sk_rebuild_header+0x52/0x300
[   31.064068]  [<ffffffff813bb61e>] ? selinux_skb_peerlbl_sid+0x5e/0xa0
[   31.064070]  [<ffffffff813bb69e>] ? 
selinux_inet_conn_established+0x3e/0x40
[   31.064072]  [<ffffffff817c2bad>] tcp_finish_connect+0x4d/0x270
[   31.064074]  [<ffffffff817c33f7>] tcp_rcv_state_process+0x627/0xe40
[   31.064076]  [<ffffffff81866584>] tcp_v6_do_rcv+0xd4/0x410
[   31.064078]  [<ffffffff8173bc65>] release_sock+0x85/0x1c0
[   31.064079]  [<ffffffff817e9983>] __inet_stream_connect+0x1c3/0x340
[   31.064081]  [<ffffffff8173b089>] ? lock_sock_nested+0x49/0xb0
[   31.064083]  [<ffffffff81100270>] ? abort_exclusive_wait+0xb0/0xb0
[   31.064084]  [<ffffffff817e9b38>] inet_stream_connect+0x38/0x50
[   31.064086]  [<ffffffff8173794f>] SYSC_connect+0xcf/0xf0
[   31.064088]  [<ffffffff8110d069>] ? trace_hardirqs_on_caller+0x129/0x1b0
[   31.064090]  [<ffffffff8100301b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
[   31.064091]  [<ffffffff8173854e>] SyS_connect+0xe/0x10
[   31.064094]  [<ffffffff818a0e7c>] entry_SYSCALL_64_fastpath+0x1f/0xbd

Bye,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  0:12       ` Eric Dumazet
  2016-04-01  0:21         ` Hannes Frederic Sowa
@ 2016-04-01  0:30         ` Hannes Frederic Sowa
  2016-04-01  1:23           ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  0:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On 01.04.2016 02:12, Eric Dumazet wrote:
> Since this function does not take a reference on the dst, how could it
> be safe ?
>
> As soon as you exit the function, dst would possibly point to something
> obsolete/freed.
>
> This works because the caller owns the socket.
>
> If you believe one path needs to be fixed, tell me which one it is.
>
> Please ?

I fixed this one, I wait with review a bit and collapse some of the 
newer fixes into one and check and repost again tomorrow.

Thanks for reviewing!

Bye,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  0:21         ` Hannes Frederic Sowa
@ 2016-04-01  1:19           ` Eric Dumazet
  2016-04-01  1:36             ` Hannes Frederic Sowa
  2016-04-01  1:45             ` Alexei Starovoitov
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2016-04-01  1:19 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Fri, 2016-04-01 at 02:21 +0200, Hannes Frederic Sowa wrote:

> 
> [   31.064029] ===============================
> [   31.064030] [ INFO: suspicious RCU usage. ]
> [   31.064032] 4.5.0+ #13 Not tainted
> [   31.064033] -------------------------------
> [   31.064034] include/net/sock.h:1594 suspicious 
> rcu_dereference_check() usage!
> [   31.064035]
>                 other info that might help us debug this:
> 
> [   31.064041]
>                 rcu_scheduler_active = 1, debug_locks = 1
> [   31.064042] no locks held by ssh/817.
> [   31.064043]
>                 stack backtrace:
> [   31.064045] CPU: 0 PID: 817 Comm: ssh Not tainted 4.5.0+ #13
> [   31.064046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 1.8.2-20150714_191134- 04/01/2014
> [   31.064047]  0000000000000286 000000006730b46b ffff8800badf7bd0 
> ffffffff81442b33
> [   31.064050]  ffff8800b8c78000 0000000000000001 ffff8800badf7c00 
> ffffffff8110ae75
> [   31.064052]  ffff880035ea2f00 ffff8800b8e28000 0000000000000003 
> 00000000000004c4
> [   31.064054] Call Trace:
> [   31.064058]  [<ffffffff81442b33>] dump_stack+0x85/0xc2
> [   31.064062]  [<ffffffff8110ae75>] lockdep_rcu_suspicious+0xc5/0x100
> [   31.064064]  [<ffffffff8173bf57>] __sk_dst_check+0x77/0xb0
> [   31.064066]  [<ffffffff8182e502>] inet6_sk_rebuild_header+0x52/0x300
> [   31.064068]  [<ffffffff813bb61e>] ? selinux_skb_peerlbl_sid+0x5e/0xa0
> [   31.064070]  [<ffffffff813bb69e>] ? 
> selinux_inet_conn_established+0x3e/0x40
> [   31.064072]  [<ffffffff817c2bad>] tcp_finish_connect+0x4d/0x270
> [   31.064074]  [<ffffffff817c33f7>] tcp_rcv_state_process+0x627/0xe40
> [   31.064076]  [<ffffffff81866584>] tcp_v6_do_rcv+0xd4/0x410
> [   31.064078]  [<ffffffff8173bc65>] release_sock+0x85/0x1c0
> [   31.064079]  [<ffffffff817e9983>] __inet_stream_connect+0x1c3/0x340
> [   31.064081]  [<ffffffff8173b089>] ? lock_sock_nested+0x49/0xb0
> [   31.064083]  [<ffffffff81100270>] ? abort_exclusive_wait+0xb0/0xb0
> [   31.064084]  [<ffffffff817e9b38>] inet_stream_connect+0x38/0x50
> [   31.064086]  [<ffffffff8173794f>] SYSC_connect+0xcf/0xf0
> [   31.064088]  [<ffffffff8110d069>] ? trace_hardirqs_on_caller+0x129/0x1b0
> [   31.064090]  [<ffffffff8100301b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
> [   31.064091]  [<ffffffff8173854e>] SyS_connect+0xe/0x10
> [   31.064094]  [<ffffffff818a0e7c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
> 
> Bye,
> Hannes

Thanks.

As you can see, release_sock() messes badly lockdep (once your other
patches are in )

Once we properly fix release_sock() and/or __release_sock(), all these
false positives disappear.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  0:30         ` Hannes Frederic Sowa
@ 2016-04-01  1:23           ` Eric Dumazet
  2016-04-01  1:37             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-04-01  1:23 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Fri, 2016-04-01 at 02:30 +0200, Hannes Frederic Sowa wrote:

> I fixed this one, I wait with review a bit and collapse some of the 
> newer fixes into one and check and repost again tomorrow.

No change should be needed in TCP, once net/core/sock.c is fixed.

When someone fixes a lockdep issue, it should be mandatory to include in
the changelog the lockdep splat, so that we can double check.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:19           ` Eric Dumazet
@ 2016-04-01  1:36             ` Hannes Frederic Sowa
  2016-04-01  1:39               ` Eric Dumazet
  2016-04-01  1:45             ` Alexei Starovoitov
  1 sibling, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  1:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Fri, Apr 1, 2016, at 03:19, Eric Dumazet wrote:
> Thanks.
> 
> As you can see, release_sock() messes badly lockdep (once your other
> patches are in )
> 
> Once we properly fix release_sock() and/or __release_sock(), all these
> false positives disappear.

This was a loopback connection. I need to study release_sock and
__release_sock more as I cannot currently see an issue with the lockdep
handling.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:23           ` Eric Dumazet
@ 2016-04-01  1:37             ` Hannes Frederic Sowa
  0 siblings, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  1:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek



On Fri, Apr 1, 2016, at 03:23, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 02:30 +0200, Hannes Frederic Sowa wrote:
> 
> > I fixed this one, I wait with review a bit and collapse some of the 
> > newer fixes into one and check and repost again tomorrow.
> 
> No change should be needed in TCP, once net/core/sock.c is fixed.
> 
> When someone fixes a lockdep issue, it should be mandatory to include in
> the changelog the lockdep splat, so that we can double check.

Yup, will do that. Every change will be a single patch with lockdep
report.

Thank you,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:36             ` Hannes Frederic Sowa
@ 2016-04-01  1:39               ` Eric Dumazet
  2016-04-01  1:45                 ` Eric Dumazet
  2016-04-01  1:58                 ` Hannes Frederic Sowa
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2016-04-01  1:39 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Fri, 2016-04-01 at 03:36 +0200, Hannes Frederic Sowa wrote:
> On Fri, Apr 1, 2016, at 03:19, Eric Dumazet wrote:
> > Thanks.
> > 
> > As you can see, release_sock() messes badly lockdep (once your other
> > patches are in )
> > 
> > Once we properly fix release_sock() and/or __release_sock(), all these
> > false positives disappear.
> 
> This was a loopback connection. I need to study release_sock and
> __release_sock more as I cannot currently see an issue with the lockdep
> handling.

Okay, please try :

diff --git a/net/core/sock.c b/net/core/sock.c
index b67b9aedb230..570dcd91d64e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2429,10 +2429,6 @@ EXPORT_SYMBOL(lock_sock_nested);
 
 void release_sock(struct sock *sk)
 {
-	/*
-	 * The sk_lock has mutex_unlock() semantics:
-	 */
-	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
 
 	spin_lock_bh(&sk->sk_lock.slock);
 	if (sk->sk_backlog.tail)
@@ -2445,6 +2441,10 @@ void release_sock(struct sock *sk)
 		sk->sk_prot->release_cb(sk);
 
 	sock_release_ownership(sk);
+	/*
+	 * The sk_lock has mutex_unlock() semantics:
+	 */
+	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
 	if (waitqueue_active(&sk->sk_lock.wq))
 		wake_up(&sk->sk_lock.wq);
 	spin_unlock_bh(&sk->sk_lock.slock);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:19           ` Eric Dumazet
  2016-04-01  1:36             ` Hannes Frederic Sowa
@ 2016-04-01  1:45             ` Alexei Starovoitov
  2016-04-01  3:03               ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2016-04-01  1:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, davem, netdev, sasha.levin, daniel,
	mkubecek

On Thu, Mar 31, 2016 at 06:19:52PM -0700, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 02:21 +0200, Hannes Frederic Sowa wrote:
> 
> > 
> > [   31.064029] ===============================
> > [   31.064030] [ INFO: suspicious RCU usage. ]
> > [   31.064032] 4.5.0+ #13 Not tainted
> > [   31.064033] -------------------------------
> > [   31.064034] include/net/sock.h:1594 suspicious 
> > rcu_dereference_check() usage!
> > [   31.064035]
> >                 other info that might help us debug this:
> > 
> > [   31.064041]
> >                 rcu_scheduler_active = 1, debug_locks = 1
> > [   31.064042] no locks held by ssh/817.
> > [   31.064043]
> >                 stack backtrace:
> > [   31.064045] CPU: 0 PID: 817 Comm: ssh Not tainted 4.5.0+ #13
> > [   31.064046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > BIOS 1.8.2-20150714_191134- 04/01/2014
> > [   31.064047]  0000000000000286 000000006730b46b ffff8800badf7bd0 
> > ffffffff81442b33
> > [   31.064050]  ffff8800b8c78000 0000000000000001 ffff8800badf7c00 
> > ffffffff8110ae75
> > [   31.064052]  ffff880035ea2f00 ffff8800b8e28000 0000000000000003 
> > 00000000000004c4
> > [   31.064054] Call Trace:
> > [   31.064058]  [<ffffffff81442b33>] dump_stack+0x85/0xc2
> > [   31.064062]  [<ffffffff8110ae75>] lockdep_rcu_suspicious+0xc5/0x100
> > [   31.064064]  [<ffffffff8173bf57>] __sk_dst_check+0x77/0xb0
> > [   31.064066]  [<ffffffff8182e502>] inet6_sk_rebuild_header+0x52/0x300
> > [   31.064068]  [<ffffffff813bb61e>] ? selinux_skb_peerlbl_sid+0x5e/0xa0
> > [   31.064070]  [<ffffffff813bb69e>] ? 
> > selinux_inet_conn_established+0x3e/0x40
> > [   31.064072]  [<ffffffff817c2bad>] tcp_finish_connect+0x4d/0x270
> > [   31.064074]  [<ffffffff817c33f7>] tcp_rcv_state_process+0x627/0xe40
> > [   31.064076]  [<ffffffff81866584>] tcp_v6_do_rcv+0xd4/0x410
> > [   31.064078]  [<ffffffff8173bc65>] release_sock+0x85/0x1c0
> > [   31.064079]  [<ffffffff817e9983>] __inet_stream_connect+0x1c3/0x340
> > [   31.064081]  [<ffffffff8173b089>] ? lock_sock_nested+0x49/0xb0
> > [   31.064083]  [<ffffffff81100270>] ? abort_exclusive_wait+0xb0/0xb0
> > [   31.064084]  [<ffffffff817e9b38>] inet_stream_connect+0x38/0x50
> > [   31.064086]  [<ffffffff8173794f>] SYSC_connect+0xcf/0xf0
> > [   31.064088]  [<ffffffff8110d069>] ? trace_hardirqs_on_caller+0x129/0x1b0
> > [   31.064090]  [<ffffffff8100301b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
> > [   31.064091]  [<ffffffff8173854e>] SyS_connect+0xe/0x10
> > [   31.064094]  [<ffffffff818a0e7c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
> > 
> > Bye,
> > Hannes
> 
> Thanks.
> 
> As you can see, release_sock() messes badly lockdep (once your other
> patches are in )
> 
> Once we properly fix release_sock() and/or __release_sock(), all these
> false positives disappear.

+1. Nice catch.

Eric, what's your take on Hannes's patch 2 ?
Is it more accurate to ask lockdep to check for actual lock
or lockdep can rely on owned flag?
Potentially there could be races between setting the flag and
actual lock... but that code is contained, so unlikely.
Will we find the real issues with this 'stronger' check or
just spend a ton of time adapting to new model like your other
patch for release_sock and whatever may need to come next...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:39               ` Eric Dumazet
@ 2016-04-01  1:45                 ` Eric Dumazet
  2016-04-01  2:01                   ` Hannes Frederic Sowa
  2016-04-01  1:58                 ` Hannes Frederic Sowa
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-04-01  1:45 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Thu, 2016-03-31 at 18:39 -0700, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 03:36 +0200, Hannes Frederic Sowa wrote:
> > On Fri, Apr 1, 2016, at 03:19, Eric Dumazet wrote:
> > > Thanks.
> > > 
> > > As you can see, release_sock() messes badly lockdep (once your other
> > > patches are in )
> > > 
> > > Once we properly fix release_sock() and/or __release_sock(), all these
> > > false positives disappear.
> > 
> > This was a loopback connection. I need to study release_sock and
> > __release_sock more as I cannot currently see an issue with the lockdep
> > handling.
> 
> Okay, please try :
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b67b9aedb230..570dcd91d64e 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2429,10 +2429,6 @@ EXPORT_SYMBOL(lock_sock_nested);
>  
>  void release_sock(struct sock *sk)
>  {
> -	/*
> -	 * The sk_lock has mutex_unlock() semantics:
> -	 */
> -	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
>  
>  	spin_lock_bh(&sk->sk_lock.slock);
>  	if (sk->sk_backlog.tail)
> @@ -2445,6 +2441,10 @@ void release_sock(struct sock *sk)
>  		sk->sk_prot->release_cb(sk);
>  
>  	sock_release_ownership(sk);
> +	/*
> +	 * The sk_lock has mutex_unlock() semantics:
> +	 */
> +	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
>  	if (waitqueue_active(&sk->sk_lock.wq))
>  		wake_up(&sk->sk_lock.wq);
>  	spin_unlock_bh(&sk->sk_lock.slock);

Also take a look at commit c3f9b01849ef3bc69024990092b9f42e20df7797

We might need to include the mutex_release() in sock_release_ownership()

Thanks !

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:39               ` Eric Dumazet
  2016-04-01  1:45                 ` Eric Dumazet
@ 2016-04-01  1:58                 ` Hannes Frederic Sowa
  1 sibling, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  1:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On 01.04.2016 03:39, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 03:36 +0200, Hannes Frederic Sowa wrote:
>> On Fri, Apr 1, 2016, at 03:19, Eric Dumazet wrote:
>>> Thanks.
>>>
>>> As you can see, release_sock() messes badly lockdep (once your other
>>> patches are in )
>>>
>>> Once we properly fix release_sock() and/or __release_sock(), all these
>>> false positives disappear.
>>
>> This was a loopback connection. I need to study release_sock and
>> __release_sock more as I cannot currently see an issue with the lockdep
>> handling.
>
> Okay, please try :
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b67b9aedb230..570dcd91d64e 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2429,10 +2429,6 @@ EXPORT_SYMBOL(lock_sock_nested);
>
>   void release_sock(struct sock *sk)
>   {
> -	/*
> -	 * The sk_lock has mutex_unlock() semantics:
> -	 */
> -	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
>
>   	spin_lock_bh(&sk->sk_lock.slock);
>   	if (sk->sk_backlog.tail)
> @@ -2445,6 +2441,10 @@ void release_sock(struct sock *sk)
>   		sk->sk_prot->release_cb(sk);
>
>   	sock_release_ownership(sk);
> +	/*
> +	 * The sk_lock has mutex_unlock() semantics:
> +	 */
> +	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
>   	if (waitqueue_active(&sk->sk_lock.wq))
>   		wake_up(&sk->sk_lock.wq);
>   	spin_unlock_bh(&sk->sk_lock.slock);


Looks much better with your patch already. I slowly begin to understand, 
this is really tricky... :)

Bye,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:45                 ` Eric Dumazet
@ 2016-04-01  2:01                   ` Hannes Frederic Sowa
  2016-04-01  3:13                     ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  2:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On 01.04.2016 03:45, Eric Dumazet wrote:
> On Thu, 2016-03-31 at 18:39 -0700, Eric Dumazet wrote:
>> On Fri, 2016-04-01 at 03:36 +0200, Hannes Frederic Sowa wrote:
>>> On Fri, Apr 1, 2016, at 03:19, Eric Dumazet wrote:
>>>> Thanks.
>>>>
>>>> As you can see, release_sock() messes badly lockdep (once your other
>>>> patches are in )
>>>>
>>>> Once we properly fix release_sock() and/or __release_sock(), all these
>>>> false positives disappear.
>>>
>>> This was a loopback connection. I need to study release_sock and
>>> __release_sock more as I cannot currently see an issue with the lockdep
>>> handling.
>>
>> Okay, please try :
>>
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index b67b9aedb230..570dcd91d64e 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2429,10 +2429,6 @@ EXPORT_SYMBOL(lock_sock_nested);
>>
>>   void release_sock(struct sock *sk)
>>   {
>> -	/*
>> -	 * The sk_lock has mutex_unlock() semantics:
>> -	 */
>> -	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
>>
>>   	spin_lock_bh(&sk->sk_lock.slock);
>>   	if (sk->sk_backlog.tail)
>> @@ -2445,6 +2441,10 @@ void release_sock(struct sock *sk)
>>   		sk->sk_prot->release_cb(sk);
>>
>>   	sock_release_ownership(sk);
>> +	/*
>> +	 * The sk_lock has mutex_unlock() semantics:
>> +	 */
>> +	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
>>   	if (waitqueue_active(&sk->sk_lock.wq))
>>   		wake_up(&sk->sk_lock.wq);
>>   	spin_unlock_bh(&sk->sk_lock.slock);
>
> Also take a look at commit c3f9b01849ef3bc69024990092b9f42e20df7797
>
> We might need to include the mutex_release() in sock_release_ownership()

I thought so first, as well. But given the double check for the 
spin_lock and the "mutex" we end up with the same result for the 
lockdep_sock_is_held check.

Do you see other consequences?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  1:45             ` Alexei Starovoitov
@ 2016-04-01  3:03               ` Eric Dumazet
  2016-04-01  3:06                 ` Hannes Frederic Sowa
  2016-04-01  4:04                 ` Alexei Starovoitov
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2016-04-01  3:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, davem, netdev, sasha.levin, daniel,
	mkubecek

On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:

> Eric, what's your take on Hannes's patch 2 ?
> Is it more accurate to ask lockdep to check for actual lock
> or lockdep can rely on owned flag?
> Potentially there could be races between setting the flag and
> actual lock... but that code is contained, so unlikely.
> Will we find the real issues with this 'stronger' check or
> just spend a ton of time adapting to new model like your other
> patch for release_sock and whatever may need to come next...

More precise lockdep checks are certainly good, I only objected to 4/4
trying to work around another bug.

But why do we rush for 'net' tree ?

This looks net-next material to me.

Locking changes are often subtle, lets take the time to do them
properly.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  3:03               ` Eric Dumazet
@ 2016-04-01  3:06                 ` Hannes Frederic Sowa
  2016-04-01  4:04                 ` Alexei Starovoitov
  1 sibling, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  3:06 UTC (permalink / raw)
  To: Eric Dumazet, Alexei Starovoitov
  Cc: davem, netdev, sasha.levin, daniel, mkubecek

On Fri, Apr 1, 2016, at 05:03, Eric Dumazet wrote:
> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
> 
> > Eric, what's your take on Hannes's patch 2 ?
> > Is it more accurate to ask lockdep to check for actual lock
> > or lockdep can rely on owned flag?
> > Potentially there could be races between setting the flag and
> > actual lock... but that code is contained, so unlikely.
> > Will we find the real issues with this 'stronger' check or
> > just spend a ton of time adapting to new model like your other
> > patch for release_sock and whatever may need to come next...
> 
> More precise lockdep checks are certainly good, I only objected to 4/4
> trying to work around another bug.
> 
> But why do we rush for 'net' tree ?
> 
> This looks net-next material to me.
> 
> Locking changes are often subtle, lets take the time to do them
> properly.

I certainly can see my mistake now trying to paper over the splats. Do
you object if I send the first patches to fix up the reported lockdep?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  2:01                   ` Hannes Frederic Sowa
@ 2016-04-01  3:13                     ` Eric Dumazet
  2016-04-01  3:31                       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-04-01  3:13 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek

On Fri, 2016-04-01 at 04:01 +0200, Hannes Frederic Sowa wrote:

> I thought so first, as well. But given the double check for the 
> spin_lock and the "mutex" we end up with the same result for the 
> lockdep_sock_is_held check.
> 
> Do you see other consequences?

Well, we release the spinlock in __release_sock()

So another thread could come and acquire the socket, then call
mutex_acquire() while the first thread did not call yet mutex_release()

So maybe lockdep will complain (but I do not know lockdep enough to
tell)

So maybe the following would be better :

(Absolutely untested, really I need to take a break)

diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e03727b..7d5dfa7e1918 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1327,7 +1327,13 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
 
 static inline void sock_release_ownership(struct sock *sk)
 {
-	sk->sk_lock.owned = 0;
+	if (sk->sk_lock.owned) {
+		/*
+		 * The sk_lock has mutex_unlock() semantics:
+		 */
+		mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
+		sk->sk_lock.owned = 0;
+	}
 }
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index b67b9aedb230..c7ab98e72346 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2429,10 +2429,6 @@ EXPORT_SYMBOL(lock_sock_nested);
 
 void release_sock(struct sock *sk)
 {
-	/*
-	 * The sk_lock has mutex_unlock() semantics:
-	 */
-	mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
 
 	spin_lock_bh(&sk->sk_lock.slock);
 	if (sk->sk_backlog.tail)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  3:13                     ` Eric Dumazet
@ 2016-04-01  3:31                       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  3:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, alexei.starovoitov, mkubecek



On Fri, Apr 1, 2016, at 05:13, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 04:01 +0200, Hannes Frederic Sowa wrote:
> 
> > I thought so first, as well. But given the double check for the 
> > spin_lock and the "mutex" we end up with the same result for the 
> > lockdep_sock_is_held check.
> > 
> > Do you see other consequences?
> 
> Well, we release the spinlock in __release_sock()
> 
> So another thread could come and acquire the socket, then call
> mutex_acquire() while the first thread did not call yet mutex_release()
> 
> So maybe lockdep will complain (but I do not know lockdep enough to
> tell)
> 
> So maybe the following would be better :
> 
> (Absolutely untested, really I need to take a break)

I quickly tested the patch and my scripts didn't show any splats so far.
This patch seems more consistent albeit I don't think it is relevant for
lockdep_sock_is_held as we only flip owned while holding slock. But this
definitely needs more review.

Thanks a lot!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  3:03               ` Eric Dumazet
  2016-04-01  3:06                 ` Hannes Frederic Sowa
@ 2016-04-01  4:04                 ` Alexei Starovoitov
  2016-04-01  4:12                   ` Hannes Frederic Sowa
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2016-04-01  4:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, davem, netdev, sasha.levin, daniel,
	mkubecek

On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
> 
> > Eric, what's your take on Hannes's patch 2 ?
> > Is it more accurate to ask lockdep to check for actual lock
> > or lockdep can rely on owned flag?
> > Potentially there could be races between setting the flag and
> > actual lock... but that code is contained, so unlikely.
> > Will we find the real issues with this 'stronger' check or
> > just spend a ton of time adapting to new model like your other
> > patch for release_sock and whatever may need to come next...
> 
> More precise lockdep checks are certainly good, I only objected to 4/4
> trying to work around another bug.
> 
> But why do we rush for 'net' tree ?
> 
> This looks net-next material to me.
> 
> Locking changes are often subtle, lets take the time to do them
> properly.

completely agree. I think only first patch belongs in net.
Everything else is net-next material.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  4:04                 ` Alexei Starovoitov
@ 2016-04-01  4:12                   ` Hannes Frederic Sowa
  2016-04-01  4:26                     ` Alexei Starovoitov
  0 siblings, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  4:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Eric Dumazet
  Cc: davem, netdev, sasha.levin, daniel, mkubecek

On 01.04.2016 06:04, Alexei Starovoitov wrote:
> On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
>> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
>>
>>> Eric, what's your take on Hannes's patch 2 ?
>>> Is it more accurate to ask lockdep to check for actual lock
>>> or lockdep can rely on owned flag?
>>> Potentially there could be races between setting the flag and
>>> actual lock... but that code is contained, so unlikely.
>>> Will we find the real issues with this 'stronger' check or
>>> just spend a ton of time adapting to new model like your other
>>> patch for release_sock and whatever may need to come next...
>>
>> More precise lockdep checks are certainly good, I only objected to 4/4
>> trying to work around another bug.
>>
>> But why do we rush for 'net' tree ?
>>
>> This looks net-next material to me.
>>
>> Locking changes are often subtle, lets take the time to do them
>> properly.
>
> completely agree. I think only first patch belongs in net.
> Everything else is net-next material.

Problem with first patch is that it uses lock_sock_fast, thus the 
current sock_owned_by_user check doesn't get rid the lockdep warning. :/

Thus we would need to go with the two first patches. Do you think it is 
acceptable? I actually didn't see a problem and testing showed no 
problems so far.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  4:12                   ` Hannes Frederic Sowa
@ 2016-04-01  4:26                     ` Alexei Starovoitov
  2016-04-01  4:33                       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2016-04-01  4:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, davem, netdev, sasha.levin, daniel, mkubecek

On Fri, Apr 01, 2016 at 06:12:49AM +0200, Hannes Frederic Sowa wrote:
> On 01.04.2016 06:04, Alexei Starovoitov wrote:
> >On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
> >>On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
> >>
> >>>Eric, what's your take on Hannes's patch 2 ?
> >>>Is it more accurate to ask lockdep to check for actual lock
> >>>or lockdep can rely on owned flag?
> >>>Potentially there could be races between setting the flag and
> >>>actual lock... but that code is contained, so unlikely.
> >>>Will we find the real issues with this 'stronger' check or
> >>>just spend a ton of time adapting to new model like your other
> >>>patch for release_sock and whatever may need to come next...
> >>
> >>More precise lockdep checks are certainly good, I only objected to 4/4
> >>trying to work around another bug.
> >>
> >>But why do we rush for 'net' tree ?
> >>
> >>This looks net-next material to me.
> >>
> >>Locking changes are often subtle, lets take the time to do them
> >>properly.
> >
> >completely agree. I think only first patch belongs in net.
> >Everything else is net-next material.
> 
> Problem with first patch is that it uses lock_sock_fast, thus the current
> sock_owned_by_user check doesn't get rid the lockdep warning. :/
> 
> Thus we would need to go with the two first patches. Do you think it is
> acceptable? I actually didn't see a problem and testing showed no problems
> so far.

I see. right. the patch 1 only makes sense when coupled with 2.
but now I'm not so sure that lockdep_is_held(&sk->sk_lock.slock)
is a valid check, since current sock_owned_by_user() is equivalent
to lockdep_is_held(&sk->sk_lock) only.
I would go with Daniel's approach. Much simpler to reason about.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  4:26                     ` Alexei Starovoitov
@ 2016-04-01  4:33                       ` Hannes Frederic Sowa
  2016-04-01  8:10                         ` Daniel Borkmann
  0 siblings, 1 reply; 32+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-01  4:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, davem, netdev, sasha.levin, daniel, mkubecek

On 01.04.2016 06:26, Alexei Starovoitov wrote:
> On Fri, Apr 01, 2016 at 06:12:49AM +0200, Hannes Frederic Sowa wrote:
>> On 01.04.2016 06:04, Alexei Starovoitov wrote:
>>> On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
>>>> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
>>>>
>>>>> Eric, what's your take on Hannes's patch 2 ?
>>>>> Is it more accurate to ask lockdep to check for actual lock
>>>>> or lockdep can rely on owned flag?
>>>>> Potentially there could be races between setting the flag and
>>>>> actual lock... but that code is contained, so unlikely.
>>>>> Will we find the real issues with this 'stronger' check or
>>>>> just spend a ton of time adapting to new model like your other
>>>>> patch for release_sock and whatever may need to come next...
>>>>
>>>> More precise lockdep checks are certainly good, I only objected to 4/4
>>>> trying to work around another bug.
>>>>
>>>> But why do we rush for 'net' tree ?
>>>>
>>>> This looks net-next material to me.
>>>>
>>>> Locking changes are often subtle, lets take the time to do them
>>>> properly.
>>>
>>> completely agree. I think only first patch belongs in net.
>>> Everything else is net-next material.
>>
>> Problem with first patch is that it uses lock_sock_fast, thus the current
>> sock_owned_by_user check doesn't get rid the lockdep warning. :/
>>
>> Thus we would need to go with the two first patches. Do you think it is
>> acceptable? I actually didn't see a problem and testing showed no problems
>> so far.
>
> I see. right. the patch 1 only makes sense when coupled with 2.
> but now I'm not so sure that lockdep_is_held(&sk->sk_lock.slock)
> is a valid check, since current sock_owned_by_user() is equivalent
> to lockdep_is_held(&sk->sk_lock) only.
> I would go with Daniel's approach. Much simpler to reason about.

IMHO we should treat sk_lock and sk_lock.slock the same as they are 
encapsulated by socket lock api.

I was rather afraid that we call those changed functions from within 
release_sock and thus would have the same problem again, where we get 
splats because of the time where we actually have user ownership but not 
the mark in the lockdep data structures. But this seems not to be the 
case as the functions are only directly called on behalf of user space.

Daniel, what do you think? I would be fine with your patch for net and 
we clean this up a bit in net-next then.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  4:33                       ` Hannes Frederic Sowa
@ 2016-04-01  8:10                         ` Daniel Borkmann
  2016-04-01 18:33                           ` David Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel Borkmann @ 2016-04-01  8:10 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Alexei Starovoitov
  Cc: Eric Dumazet, davem, netdev, sasha.levin, mkubecek

On 04/01/2016 06:33 AM, Hannes Frederic Sowa wrote:
> On 01.04.2016 06:26, Alexei Starovoitov wrote:
>> On Fri, Apr 01, 2016 at 06:12:49AM +0200, Hannes Frederic Sowa wrote:
>>> On 01.04.2016 06:04, Alexei Starovoitov wrote:
>>>> On Thu, Mar 31, 2016 at 08:03:38PM -0700, Eric Dumazet wrote:
>>>>> On Thu, 2016-03-31 at 18:45 -0700, Alexei Starovoitov wrote:
>>>>>
>>>>>> Eric, what's your take on Hannes's patch 2 ?
>>>>>> Is it more accurate to ask lockdep to check for actual lock
>>>>>> or lockdep can rely on owned flag?
>>>>>> Potentially there could be races between setting the flag and
>>>>>> actual lock... but that code is contained, so unlikely.
>>>>>> Will we find the real issues with this 'stronger' check or
>>>>>> just spend a ton of time adapting to new model like your other
>>>>>> patch for release_sock and whatever may need to come next...
>>>>>
>>>>> More precise lockdep checks are certainly good, I only objected to 4/4
>>>>> trying to work around another bug.
>>>>>
>>>>> But why do we rush for 'net' tree ?
>>>>>
>>>>> This looks net-next material to me.
>>>>>
>>>>> Locking changes are often subtle, lets take the time to do them
>>>>> properly.
>>>>
>>>> completely agree. I think only first patch belongs in net.
>>>> Everything else is net-next material.
>>>
>>> Problem with first patch is that it uses lock_sock_fast, thus the current
>>> sock_owned_by_user check doesn't get rid the lockdep warning. :/
>>>
>>> Thus we would need to go with the two first patches. Do you think it is
>>> acceptable? I actually didn't see a problem and testing showed no problems
>>> so far.
>>
>> I see. right. the patch 1 only makes sense when coupled with 2.
>> but now I'm not so sure that lockdep_is_held(&sk->sk_lock.slock)
>> is a valid check, since current sock_owned_by_user() is equivalent
>> to lockdep_is_held(&sk->sk_lock) only.
>> I would go with Daniel's approach. Much simpler to reason about.
>
> IMHO we should treat sk_lock and sk_lock.slock the same as they are encapsulated by socket lock api.
>
> I was rather afraid that we call those changed functions from within release_sock and thus would have the same problem again, where we get splats because of the time where we actually have user ownership but not the mark in the lockdep data structures. But this seems not to be the case as the functions are only directly called on behalf of user space.
>
> Daniel, what do you think? I would be fine with your patch for net and we clean this up a bit in net-next then.

Okay, that's fine by me.

Dave, do you need me to resubmit this one w/o changes: http://patchwork.ozlabs.org/patch/603903/ ?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01  8:10                         ` Daniel Borkmann
@ 2016-04-01 18:33                           ` David Miller
  2016-04-01 18:36                             ` Daniel Borkmann
  0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2016-04-01 18:33 UTC (permalink / raw)
  To: daniel
  Cc: hannes, alexei.starovoitov, eric.dumazet, netdev, sasha.levin,
	mkubecek

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Fri, 01 Apr 2016 10:10:11 +0200

> Dave, do you need me to resubmit this one w/o changes:
> http://patchwork.ozlabs.org/patch/603903/ ?

I'll apply this and queue it up for -stable, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get
  2016-04-01 18:33                           ` David Miller
@ 2016-04-01 18:36                             ` Daniel Borkmann
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Borkmann @ 2016-04-01 18:36 UTC (permalink / raw)
  To: David Miller
  Cc: hannes, alexei.starovoitov, eric.dumazet, netdev, sasha.levin,
	mkubecek

On 04/01/2016 08:33 PM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Fri, 01 Apr 2016 10:10:11 +0200
>
>> Dave, do you need me to resubmit this one w/o changes:
>> http://patchwork.ozlabs.org/patch/603903/ ?
>
> I'll apply this and queue it up for -stable, thanks.

Ok, thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-04-01 18:37 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-31 23:29 [PATCH net 0/4] net: fix and tighten rcu dereference checks Hannes Frederic Sowa
2016-03-31 23:29 ` [PATCH net 1/4] tun: add socket locking around sk_{attach,detach}_filter Hannes Frederic Sowa
2016-03-31 23:29 ` [PATCH net 2/4] net: proper check if we hold the socket lock during dereference Hannes Frederic Sowa
2016-03-31 23:29 ` [PATCH net 3/4] sock: use lockdep_sock_is_held were appropriate Hannes Frederic Sowa
2016-03-31 23:29 ` [PATCH net 4/4] tcp: various missing rcu_read_lock around __sk_dst_get Hannes Frederic Sowa
2016-03-31 23:39   ` Eric Dumazet
2016-04-01  0:01     ` Hannes Frederic Sowa
2016-04-01  0:12       ` Eric Dumazet
2016-04-01  0:21         ` Hannes Frederic Sowa
2016-04-01  1:19           ` Eric Dumazet
2016-04-01  1:36             ` Hannes Frederic Sowa
2016-04-01  1:39               ` Eric Dumazet
2016-04-01  1:45                 ` Eric Dumazet
2016-04-01  2:01                   ` Hannes Frederic Sowa
2016-04-01  3:13                     ` Eric Dumazet
2016-04-01  3:31                       ` Hannes Frederic Sowa
2016-04-01  1:58                 ` Hannes Frederic Sowa
2016-04-01  1:45             ` Alexei Starovoitov
2016-04-01  3:03               ` Eric Dumazet
2016-04-01  3:06                 ` Hannes Frederic Sowa
2016-04-01  4:04                 ` Alexei Starovoitov
2016-04-01  4:12                   ` Hannes Frederic Sowa
2016-04-01  4:26                     ` Alexei Starovoitov
2016-04-01  4:33                       ` Hannes Frederic Sowa
2016-04-01  8:10                         ` Daniel Borkmann
2016-04-01 18:33                           ` David Miller
2016-04-01 18:36                             ` Daniel Borkmann
2016-04-01  0:30         ` Hannes Frederic Sowa
2016-04-01  1:23           ` Eric Dumazet
2016-04-01  1:37             ` Hannes Frederic Sowa
2016-03-31 23:48 ` [PATCH net 5/4] tcp: fix rcu usage around __sk_dst_get in tcp_update_metrics Hannes Frederic Sowa
2016-03-31 23:56 ` [PATCH net 6/4] tcp: fix __sk_dst_get usage in tcp_current_mss Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).