[added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind()
@ 2016-03-02 20:23 Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] net: dp83640: Fix tx timestamp overflow handling Sasha Levin
                   ` (32 more replies)
  0 siblings, 33 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Ursula Braun, David S. Miller, Sasha Levin

From: Ursula Braun <ursula.braun@de.ibm.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 52a82e23b9f2a9e1d429c5207f8575784290d008 ]

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Evgeny Cherkashin <Eugene.Crosser@ru.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/iucv/af_iucv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 6daa52a..123f6f9 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -709,6 +709,9 @@ static int iucv_sock_bind(struct socket *sock, struct sockaddr *addr,
 	if (!addr || addr->sa_family != AF_IUCV)
 		return -EINVAL;
 
+	if (addr_len < sizeof(struct sockaddr_iucv))
+		return -EINVAL;
+
 	lock_sock(sk);
 	if (sk->sk_state != IUCV_OPEN) {
 		err = -EBADFD;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] net: dp83640: Fix tx timestamp overflow handling.
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] tcp: fix NULL deref in tcp_v4_send_ack() Sasha Levin
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Manfred Rudigier, Manfred Rudigier, David S. Miller, Sasha Levin

From: Manfred Rudigier <Manfred.Rudigier@omicron.at>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 81e8f2e930fe76b9814c71b9d87c30760b5eb705 ]

PHY status frames are not reliable, the PHY may not be able to send them
during heavy receive traffic. This overflow condition is signaled by the
PHY in the next status frame, but the driver did not make use of it.
Instead it always reported wrong tx timestamps to user space after an
overflow happened because it assigned newly received tx timestamps to old
packets in the queue.

This commit fixes this issue by clearing the tx timestamp queue every time
an overflow happens, so that no timestamps are delivered for overflow
packets. This way time stamping will continue correctly after an overflow.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/phy/dp83640.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 00cb41e..c56cf0b 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -833,6 +833,11 @@ static void decode_rxts(struct dp83640_private *dp83640,
 	struct skb_shared_hwtstamps *shhwtstamps = NULL;
 	struct sk_buff *skb;
 	unsigned long flags;
+	u8 overflow;
+
+	overflow = (phy_rxts->ns_hi >> 14) & 0x3;
+	if (overflow)
+		pr_debug("rx timestamp queue overflow, count %d\n", overflow);
 
 	spin_lock_irqsave(&dp83640->rx_lock, flags);
 
@@ -875,6 +880,7 @@ static void decode_txts(struct dp83640_private *dp83640,
 	struct skb_shared_hwtstamps shhwtstamps;
 	struct sk_buff *skb;
 	u64 ns;
+	u8 overflow;
 
 	/* We must already have the skb that triggered this. */
 
@@ -884,6 +890,17 @@ static void decode_txts(struct dp83640_private *dp83640,
 		pr_debug("have timestamp but tx_queue empty\n");
 		return;
 	}
+
+	overflow = (phy_txts->ns_hi >> 14) & 0x3;
+	if (overflow) {
+		pr_debug("tx timestamp queue overflow, count %d\n", overflow);
+		while (skb) {
+			skb_complete_tx_timestamp(skb, NULL);
+			skb = skb_dequeue(&dp83640->tx_queue);
+		}
+		return;
+	}
+
 	ns = phy2txts(phy_txts);
 	memset(&shhwtstamps, 0, sizeof(shhwtstamps));
 	shhwtstamps.hwtstamp = ns_to_ktime(ns);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] tcp: fix NULL deref in tcp_v4_send_ack()
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] net: dp83640: Fix tx timestamp overflow handling Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] af_unix: fix struct pid memory leak Sasha Levin
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Eric Dumazet, Jerry Chu, Yuchung Cheng, David S. Miller,
	Sasha Levin

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit e62a123b8ef7c5dc4db2c16383d506860ad21b47 ]

Neal reported crashes with this stack trace :

 RIP: 0010:[<ffffffff8c57231b>] tcp_v4_send_ack+0x41/0x20f
...
 CR2: 0000000000000018 CR3: 000000044005c000 CR4: 00000000001427e0
...
  [<ffffffff8c57258e>] tcp_v4_reqsk_send_ack+0xa5/0xb4
  [<ffffffff8c1a7caa>] tcp_check_req+0x2ea/0x3e0
  [<ffffffff8c19e420>] tcp_rcv_state_process+0x850/0x2500
  [<ffffffff8c1a6d21>] tcp_v4_do_rcv+0x141/0x330
  [<ffffffff8c56cdb2>] sk_backlog_rcv+0x21/0x30
  [<ffffffff8c098bbd>] tcp_recvmsg+0x75d/0xf90
  [<ffffffff8c0a8700>] inet_recvmsg+0x80/0xa0
  [<ffffffff8c17623e>] sock_aio_read+0xee/0x110
  [<ffffffff8c066fcf>] do_sync_read+0x6f/0xa0
  [<ffffffff8c0673a1>] SyS_read+0x1e1/0x290
  [<ffffffff8c5ca262>] system_call_fastpath+0x16/0x1b

The problem here is the skb we provide to tcp_v4_send_ack() had to
be parked in the backlog of a new TCP fastopen child because this child
was owned by the user at the time an out of window packet arrived.

Before queuing a packet, TCP has to set skb->dev to NULL as the device
could disappear before packet is removed from the queue.

Fix this issue by using the net pointer provided by the socket (being a
timewait or a request socket).

IPv6 is immune to the bug : tcp_v6_send_response() already gets the net
pointer from the socket if provided.

Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv4/tcp_ipv4.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cd18c3d..13b92d5 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -705,7 +705,8 @@ release_sk1:
    outside socket context is ugly, certainly. What can I do?
  */
 
-static void tcp_v4_send_ack(struct sk_buff *skb, u32 seq, u32 ack,
+static void tcp_v4_send_ack(struct net *net,
+			    struct sk_buff *skb, u32 seq, u32 ack,
 			    u32 win, u32 tsval, u32 tsecr, int oif,
 			    struct tcp_md5sig_key *key,
 			    int reply_flags, u8 tos)
@@ -720,7 +721,6 @@ static void tcp_v4_send_ack(struct sk_buff *skb, u32 seq, u32 ack,
 			];
 	} rep;
 	struct ip_reply_arg arg;
-	struct net *net = dev_net(skb_dst(skb)->dev);
 
 	memset(&rep.th, 0, sizeof(struct tcphdr));
 	memset(&arg, 0, sizeof(arg));
@@ -782,7 +782,8 @@ static void tcp_v4_timewait_ack(struct sock *sk, struct sk_buff *skb)
 	struct inet_timewait_sock *tw = inet_twsk(sk);
 	struct tcp_timewait_sock *tcptw = tcp_twsk(sk);
 
-	tcp_v4_send_ack(skb, tcptw->tw_snd_nxt, tcptw->tw_rcv_nxt,
+	tcp_v4_send_ack(sock_net(sk), skb,
+			tcptw->tw_snd_nxt, tcptw->tw_rcv_nxt,
 			tcptw->tw_rcv_wnd >> tw->tw_rcv_wscale,
 			tcp_time_stamp + tcptw->tw_ts_offset,
 			tcptw->tw_ts_recent,
@@ -801,8 +802,10 @@ static void tcp_v4_reqsk_send_ack(struct sock *sk, struct sk_buff *skb,
 	/* sk->sk_state == TCP_LISTEN -> for regular TCP_SYN_RECV
 	 * sk->sk_state == TCP_SYN_RECV -> for Fast Open.
 	 */
-	tcp_v4_send_ack(skb, (sk->sk_state == TCP_LISTEN) ?
-			tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
+	u32 seq = (sk->sk_state == TCP_LISTEN) ? tcp_rsk(req)->snt_isn + 1 :
+					     tcp_sk(sk)->snd_nxt;
+
+	tcp_v4_send_ack(sock_net(sk), skb, seq,
 			tcp_rsk(req)->rcv_nxt, req->rcv_wnd,
 			tcp_time_stamp,
 			req->ts_recent,
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] af_unix: fix struct pid memory leak
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] net: dp83640: Fix tx timestamp overflow handling Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] tcp: fix NULL deref in tcp_v4_send_ack() Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] pptp: fix illegal memory access caused by multiple bind()s Sasha Levin
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Eric Dumazet, Rainer Weikusat, David S. Miller, Sasha Levin

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit fa0dc04df259ba2df3ce1920e9690c7842f8fa4b ]

Dmitry reported a struct pid leak detected by a syzkaller program.

Bug happens in unix_stream_recvmsg() when we break the loop when a
signal is pending, without properly releasing scm.

Fixes: b3ca9b02b007 ("net: fix multithreaded signal handling in unix recv routines")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/unix/af_unix.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index cb3a01a..c741d83 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2131,6 +2131,7 @@ again:
 
 			if (signal_pending(current)) {
 				err = sock_intr_errno(timeo);
+				scm_destroy(&scm);
 				goto out;
 			}
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] pptp: fix illegal memory access caused by multiple bind()s
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (2 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] af_unix: fix struct pid memory leak Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] sctp: allow setting SCTP_SACK_IMMEDIATELY by the application Sasha Levin
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Hannes Frederic Sowa, Dmitry Kozlov, Sasha Levin, Dmitry Vyukov,
	Dave Jones, David S. Miller

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 9a368aff9cb370298fa02feeffa861f2db497c18 ]

Several times already this has been reported as kasan reports caused by
syzkaller and trinity and people always looked at RCU races, but it is
much more simple. :)

In case we bind a pptp socket multiple times, we simply add it to
the callid_sock list but don't remove the old binding. Thus the old
socket stays in the bucket with unused call_id indexes and doesn't get
cleaned up. This causes various forms of kasan reports which were hard
to pinpoint.

Simply don't allow multiple binds and correct error handling in
pptp_bind. Also keep sk_state bits in place in pptp_connect.

Fixes: 00959ade36acad ("PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)")
Cc: Dmitry Kozlov <xeb@mail.ru>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ppp/pptp.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 0bacabf..b35199c 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -131,24 +131,27 @@ static int lookup_chan_dst(u16 call_id, __be32 d_addr)
 	return i < MAX_CALLID;
 }
 
-static int add_chan(struct pppox_sock *sock)
+static int add_chan(struct pppox_sock *sock,
+		    struct pptp_addr *sa)
 {
 	static int call_id;
 
 	spin_lock(&chan_lock);
-	if (!sock->proto.pptp.src_addr.call_id)	{
+	if (!sa->call_id)	{
 		call_id = find_next_zero_bit(callid_bitmap, MAX_CALLID, call_id + 1);
 		if (call_id == MAX_CALLID) {
 			call_id = find_next_zero_bit(callid_bitmap, MAX_CALLID, 1);
 			if (call_id == MAX_CALLID)
 				goto out_err;
 		}
-		sock->proto.pptp.src_addr.call_id = call_id;
-	} else if (test_bit(sock->proto.pptp.src_addr.call_id, callid_bitmap))
+		sa->call_id = call_id;
+	} else if (test_bit(sa->call_id, callid_bitmap)) {
 		goto out_err;
+	}
 
-	set_bit(sock->proto.pptp.src_addr.call_id, callid_bitmap);
-	rcu_assign_pointer(callid_sock[sock->proto.pptp.src_addr.call_id], sock);
+	sock->proto.pptp.src_addr = *sa;
+	set_bit(sa->call_id, callid_bitmap);
+	rcu_assign_pointer(callid_sock[sa->call_id], sock);
 	spin_unlock(&chan_lock);
 
 	return 0;
@@ -417,7 +420,6 @@ static int pptp_bind(struct socket *sock, struct sockaddr *uservaddr,
 	struct sock *sk = sock->sk;
 	struct sockaddr_pppox *sp = (struct sockaddr_pppox *) uservaddr;
 	struct pppox_sock *po = pppox_sk(sk);
-	struct pptp_opt *opt = &po->proto.pptp;
 	int error = 0;
 
 	if (sockaddr_len < sizeof(struct sockaddr_pppox))
@@ -425,10 +427,22 @@ static int pptp_bind(struct socket *sock, struct sockaddr *uservaddr,
 
 	lock_sock(sk);
 
-	opt->src_addr = sp->sa_addr.pptp;
-	if (add_chan(po))
+	if (sk->sk_state & PPPOX_DEAD) {
+		error = -EALREADY;
+		goto out;
+	}
+
+	if (sk->sk_state & PPPOX_BOUND) {
 		error = -EBUSY;
+		goto out;
+	}
+
+	if (add_chan(po, &sp->sa_addr.pptp))
+		error = -EBUSY;
+	else
+		sk->sk_state |= PPPOX_BOUND;
 
+out:
 	release_sock(sk);
 	return error;
 }
@@ -499,7 +513,7 @@ static int pptp_connect(struct socket *sock, struct sockaddr *uservaddr,
 	}
 
 	opt->dst_addr = sp->sa_addr.pptp;
-	sk->sk_state = PPPOX_CONNECTED;
+	sk->sk_state |= PPPOX_CONNECTED;
 
  end:
 	release_sock(sk);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] sctp: allow setting SCTP_SACK_IMMEDIATELY by the application
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (3 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] pptp: fix illegal memory access caused by multiple bind()s Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] tipc: fix connection abort during subscription cancel Sasha Levin
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Marcelo Ricardo Leitner, David S. Miller, Sasha Levin

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 27f7ed2b11d42ab6d796e96533c2076ec220affc ]

This patch extends commit b93d6471748d ("sctp: implement the sender side
for SACK-IMMEDIATELY extension") as it didn't white list
SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be
understood as an invalid flag and returning -EINVAL to the application.

Note that the actual handling of the flag is already there in
sctp_datamsg_from_user().

https://tools.ietf.org/html/rfc7053#section-7

Fixes: b93d6471748d ("sctp: implement the sender side for SACK-IMMEDIATELY extension")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/sctp/socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76e6ec6..1b80f20 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6653,6 +6653,7 @@ static int sctp_msghdr_parse(const struct msghdr *msg, sctp_cmsgs_t *cmsgs)
 
 			if (cmsgs->srinfo->sinfo_flags &
 			    ~(SCTP_UNORDERED | SCTP_ADDR_OVER |
+			      SCTP_SACK_IMMEDIATELY |
 			      SCTP_ABORT | SCTP_EOF))
 				return -EINVAL;
 			break;
@@ -6676,6 +6677,7 @@ static int sctp_msghdr_parse(const struct msghdr *msg, sctp_cmsgs_t *cmsgs)
 
 			if (cmsgs->sinfo->snd_flags &
 			    ~(SCTP_UNORDERED | SCTP_ADDR_OVER |
+			      SCTP_SACK_IMMEDIATELY |
 			      SCTP_ABORT | SCTP_EOF))
 				return -EINVAL;
 			break;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] tipc: fix connection abort during subscription cancel
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (4 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] sctp: allow setting SCTP_SACK_IMMEDIATELY by the application Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications Sasha Levin
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Parthasarathy Bhuvaragan, David S. Miller, Sasha Levin

From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 4d5cfcba2f6ec494d8810b9e3c0a7b06255c8067 ]

In 'commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing
to events")', we terminate the connection if the subscription
creation fails.
In the same commit, the subscription creation result was based on
the value of the subscription pointer (set in the function) instead
of the return code.

Unfortunately, the same function tipc_subscrp_create() handles
subscription cancel request. For a subscription cancellation request,
the subscription pointer cannot be set. Thus if a subscriber has
several subscriptions and cancels any of them, the connection is
terminated.

In this commit, we terminate the connection based on the return value
of tipc_subscrp_create().
Fixes: commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing to events")

Reviewed-by:  Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/tipc/subscr.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
index 1c147c8..948f316 100644
--- a/net/tipc/subscr.c
+++ b/net/tipc/subscr.c
@@ -302,11 +302,10 @@ static void subscr_conn_msg_event(struct net *net, int conid,
 	struct tipc_net *tn = net_generic(net, tipc_net_id);
 
 	spin_lock_bh(&subscriber->lock);
-	subscr_subscribe(net, (struct tipc_subscr *)buf, subscriber, &sub);
-	if (sub)
-		tipc_nametbl_subscribe(sub);
-	else
+	if (subscr_subscribe(net, (struct tipc_subscr *)buf, subscriber, &sub))
 		tipc_conn_terminate(tn->topsrv, subscriber->conid);
+	else
+		tipc_nametbl_subscribe(sub);
 	spin_unlock_bh(&subscriber->lock);
 }
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (5 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] tipc: fix connection abort during subscription cancel Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-03  9:35   ` Ido Schimmel
  2016-03-02 20:23 ` [added to the 4.1 stable tree] tcp: beware of alignments in tcp_get_info() Sasha Levin
                   ` (25 subsequent siblings)
  32 siblings, 1 reply; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Ido Schimmel, Jiri Pirko, David S. Miller, Sasha Levin

From: Ido Schimmel <idosch@mellanox.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 4f2c6ae5c64c353fb1b0425e4747e5603feadba1 ]

When switchdev drivers process FDB notifications from the underlying
device they resolve the netdev to which the entry points to and notify
the bridge using the switchdev notifier.

However, since the RTNL mutex is not held there is nothing preventing
the netdev from disappearing in the middle, which will cause
br_switchdev_event() to dereference a non-existing netdev.

Make switchdev drivers hold the lock at the beginning of the
notification processing session and release it once it ends, after
notifying the bridge.

Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
when RTNL mutex is held.

Fixes: 03bf0c281234 ("switchdev: introduce switchdev notifier")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/rocker/rocker.c |  2 ++
 net/bridge/br.c                      |  3 +--
 net/switchdev/switchdev.c            | 15 ++++++++-------
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 73b6fc2..4fedf7f 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3384,12 +3384,14 @@ static void rocker_port_fdb_learn_work(struct work_struct *work)
 	info.addr = lw->addr;
 	info.vid = lw->vid;
 
+	rtnl_lock();
 	if (learned && removing)
 		call_netdev_switch_notifiers(NETDEV_SWITCH_FDB_DEL,
 					     lw->dev, &info.info);
 	else if (learned && !removing)
 		call_netdev_switch_notifiers(NETDEV_SWITCH_FDB_ADD,
 					     lw->dev, &info.info);
+	rtnl_unlock();
 
 	kfree(work);
 }
diff --git a/net/bridge/br.c b/net/bridge/br.c
index 02c24cf..c72e01c 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -121,6 +121,7 @@ static struct notifier_block br_device_notifier = {
 	.notifier_call = br_device_event
 };
 
+/* called with RTNL */
 static int br_netdev_switch_event(struct notifier_block *unused,
 				  unsigned long event, void *ptr)
 {
@@ -130,7 +131,6 @@ static int br_netdev_switch_event(struct notifier_block *unused,
 	struct netdev_switch_notifier_fdb_info *fdb_info;
 	int err = NOTIFY_DONE;
 
-	rtnl_lock();
 	p = br_port_get_rtnl(dev);
 	if (!p)
 		goto out;
@@ -155,7 +155,6 @@ static int br_netdev_switch_event(struct notifier_block *unused,
 	}
 
 out:
-	rtnl_unlock();
 	return err;
 }
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 055453d..a8dbe80 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -15,6 +15,7 @@
 #include <linux/mutex.h>
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
 #include <net/ip_fib.h>
 #include <net/switchdev.h>
 
@@ -64,7 +65,6 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
 }
 EXPORT_SYMBOL_GPL(netdev_switch_port_stp_update);
 
-static DEFINE_MUTEX(netdev_switch_mutex);
 static RAW_NOTIFIER_HEAD(netdev_switch_notif_chain);
 
 /**
@@ -79,9 +79,9 @@ int register_netdev_switch_notifier(struct notifier_block *nb)
 {
 	int err;
 
-	mutex_lock(&netdev_switch_mutex);
+	rtnl_lock();
 	err = raw_notifier_chain_register(&netdev_switch_notif_chain, nb);
-	mutex_unlock(&netdev_switch_mutex);
+	rtnl_unlock();
 	return err;
 }
 EXPORT_SYMBOL_GPL(register_netdev_switch_notifier);
@@ -97,9 +97,9 @@ int unregister_netdev_switch_notifier(struct notifier_block *nb)
 {
 	int err;
 
-	mutex_lock(&netdev_switch_mutex);
+	rtnl_lock();
 	err = raw_notifier_chain_unregister(&netdev_switch_notif_chain, nb);
-	mutex_unlock(&netdev_switch_mutex);
+	rtnl_unlock();
 	return err;
 }
 EXPORT_SYMBOL_GPL(unregister_netdev_switch_notifier);
@@ -113,16 +113,17 @@ EXPORT_SYMBOL_GPL(unregister_netdev_switch_notifier);
  *	Call all network notifier blocks. This should be called by driver
  *	when it needs to propagate hardware event.
  *	Return values are same as for atomic_notifier_call_chain().
+ *	rtnl_lock must be held.
  */
 int call_netdev_switch_notifiers(unsigned long val, struct net_device *dev,
 				 struct netdev_switch_notifier_info *info)
 {
 	int err;
 
+	ASSERT_RTNL();
+
 	info->dev = dev;
-	mutex_lock(&netdev_switch_mutex);
 	err = raw_notifier_call_chain(&netdev_switch_notif_chain, val, info);
-	mutex_unlock(&netdev_switch_mutex);
 	return err;
 }
 EXPORT_SYMBOL_GPL(call_netdev_switch_notifiers);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] tcp: beware of alignments in tcp_get_info()
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (6 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() Sasha Levin
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Eric Dumazet, David S. Miller, Sasha Levin

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit ff5d749772018602c47509bdc0093ff72acd82ec ]

With some combinations of user provided flags in netlink command,
it is possible to call tcp_get_info() with a buffer that is not 8-bytes
aligned.

It does matter on some arches, so we need to use put_unaligned() to
store the u64 fields.

Current iproute2 package does not trigger this particular issue.

Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Fixes: 977cb0ecf82e ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv4/tcp.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bb2ce74..b5f4f5c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -279,6 +279,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <asm/unaligned.h>
 #include <net/busy_poll.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
@@ -2603,6 +2604,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	u32 now = tcp_time_stamp;
 	unsigned int start;
+	u64 rate64;
 	u32 rate;
 
 	memset(info, 0, sizeof(*info));
@@ -2665,15 +2667,17 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_total_retrans = tp->total_retrans;
 
 	rate = READ_ONCE(sk->sk_pacing_rate);
-	info->tcpi_pacing_rate = rate != ~0U ? rate : ~0ULL;
+	rate64 = rate != ~0U ? rate : ~0ULL;
+	put_unaligned(rate64, &info->tcpi_pacing_rate);
 
 	rate = READ_ONCE(sk->sk_max_pacing_rate);
-	info->tcpi_max_pacing_rate = rate != ~0U ? rate : ~0ULL;
+	rate64 = rate != ~0U ? rate : ~0ULL;
+	put_unaligned(rate64, &info->tcpi_max_pacing_rate);
 
 	do {
 		start = u64_stats_fetch_begin_irq(&tp->syncp);
-		info->tcpi_bytes_acked = tp->bytes_acked;
-		info->tcpi_bytes_received = tp->bytes_received;
+		put_unaligned(tp->bytes_acked, &info->tcpi_bytes_acked);
+		put_unaligned(tp->bytes_received, &info->tcpi_bytes_received);
 	} while (u64_stats_fetch_retry_irq(&tp->syncp, start));
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (7 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] tcp: beware of alignments in tcp_get_info() Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6/udp: use sticky pktinfo egress ifindex on connect() Sasha Levin
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Paolo Abeni, David S. Miller, Sasha Levin

From: Paolo Abeni <pabeni@redhat.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 6f21c96a78b835259546d8f3fb4edff0f651d478 ]

The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/net/ip6_route.h | 12 ++++++++++--
 net/ipv6/ip6_output.c   |  6 +++++-
 net/ipv6/route.c        |  8 +++-----
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 5e19206..388dea4 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -64,8 +64,16 @@ static inline bool rt6_need_strict(const struct in6_addr *daddr)
 
 void ip6_route_input(struct sk_buff *skb);
 
-struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
-				   struct flowi6 *fl6);
+struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk,
+					 struct flowi6 *fl6, int flags);
+
+static inline struct dst_entry *ip6_route_output(struct net *net,
+						 const struct sock *sk,
+						 struct flowi6 *fl6)
+{
+	return ip6_route_output_flags(net, sk, fl6, 0);
+}
+
 struct dst_entry *ip6_route_lookup(struct net *net, struct flowi6 *fl6,
 				   int flags);
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f50228b..36b9ac4 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -885,6 +885,7 @@ static int ip6_dst_lookup_tail(struct sock *sk,
 	struct rt6_info *rt;
 #endif
 	int err;
+	int flags = 0;
 
 	/* The correct way to handle this would be to do
 	 * ip6_route_get_saddr, and then ip6_route_output; however,
@@ -916,10 +917,13 @@ static int ip6_dst_lookup_tail(struct sock *sk,
 			dst_release(*dst);
 			*dst = NULL;
 		}
+
+		if (fl6->flowi6_oif)
+			flags |= RT6_LOOKUP_F_IFACE;
 	}
 
 	if (!*dst)
-		*dst = ip6_route_output(net, sk, fl6);
+		*dst = ip6_route_output_flags(net, sk, fl6, flags);
 
 	err = (*dst)->error;
 	if (err)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f371fef..fe70bd6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1030,11 +1030,9 @@ static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table
 	return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
 }
 
-struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
-				    struct flowi6 *fl6)
+struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk,
+					 struct flowi6 *fl6, int flags)
 {
-	int flags = 0;
-
 	fl6->flowi6_iif = LOOPBACK_IFINDEX;
 
 	if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl6->daddr))
@@ -1047,7 +1045,7 @@ struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
 
 	return fib6_rule_lookup(net, fl6, flags, ip6_pol_route_output);
 }
-EXPORT_SYMBOL(ip6_route_output);
+EXPORT_SYMBOL_GPL(ip6_route_output_flags);
 
 struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig)
 {
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] ipv6/udp: use sticky pktinfo egress ifindex on connect()
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (8 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] net/ipv6: add sysctl option accept_ra_min_hop_limit Sasha Levin
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Paolo Abeni, David S. Miller, Sasha Levin

From: Paolo Abeni <pabeni@redhat.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 1cdda91871470f15e79375991bd2eddc6e86ddb1 ]

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv6/datagram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 13ca4cf..8e6cb3f 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -162,6 +162,9 @@ ipv4_connected:
 	fl6.fl6_dport = inet->inet_dport;
 	fl6.fl6_sport = inet->inet_sport;
 
+	if (!fl6.flowi6_oif)
+		fl6.flowi6_oif = np->sticky_pktinfo.ipi6_ifindex;
+
 	if (!fl6.flowi6_oif && (addr_type&IPV6_ADDR_MULTICAST))
 		fl6.flowi6_oif = np->mcast_oif;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] net/ipv6: add sysctl option accept_ra_min_hop_limit
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (9 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6/udp: use sticky pktinfo egress ifindex on connect() Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: addrconf: Fix recursive spin lock call Sasha Levin
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Hangbin Liu, David S. Miller, Sasha Levin

From: Hangbin Liu <liuhangbin@gmail.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4 ]

Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.

RFC 4861, 6.3.4.  Processing Received Router Advertisements
   A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
   and Retrans Timer) may contain a value denoting that it is
   unspecified.  In such cases, the parameter should be ignored and the
   host should continue using whatever value it is already using.

   If the received Cur Hop Limit value is non-zero, the host SHOULD set
   its CurHopLimit variable to the received value.

So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 Documentation/networking/ip-sysctl.txt |  8 ++++++++
 include/linux/ipv6.h                   |  1 +
 include/uapi/linux/ipv6.h              |  2 ++
 net/ipv6/addrconf.c                    | 10 ++++++++++
 net/ipv6/ndisc.c                       | 16 +++++++---------
 5 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 071fb18..07fad3d 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1321,6 +1321,14 @@ accept_ra_from_local - BOOLEAN
 	   disabled if accept_ra_from_local is disabled
                on a specific interface.
 
+accept_ra_min_hop_limit - INTEGER
+	Minimum hop limit Information in Router Advertisement.
+
+	Hop limit Information in Router Advertisement less than this
+	variable shall be ignored.
+
+	Default: 1
+
 accept_ra_pinfo - BOOLEAN
 	Learn Prefix Information in Router Advertisement.
 
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index e4b4649..01c2592 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -29,6 +29,7 @@ struct ipv6_devconf {
 	__s32		max_desync_factor;
 	__s32		max_addresses;
 	__s32		accept_ra_defrtr;
+	__s32		accept_ra_min_hop_limit;
 	__s32		accept_ra_pinfo;
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	__s32		accept_ra_rtr_pref;
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 5efa54a..80f3b74 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -171,6 +171,8 @@ enum {
 	DEVCONF_USE_OPTIMISTIC,
 	DEVCONF_ACCEPT_RA_MTU,
 	DEVCONF_STABLE_SECRET,
+	DEVCONF_USE_OIF_ADDRS_ONLY,
+	DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f4795b0..28c4bc5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -195,6 +195,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.max_addresses		= IPV6_MAX_ADDRESSES,
 	.accept_ra_defrtr	= 1,
 	.accept_ra_from_local	= 0,
+	.accept_ra_min_hop_limit= 1,
 	.accept_ra_pinfo	= 1,
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	.accept_ra_rtr_pref	= 1,
@@ -236,6 +237,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.max_addresses		= IPV6_MAX_ADDRESSES,
 	.accept_ra_defrtr	= 1,
 	.accept_ra_from_local	= 0,
+	.accept_ra_min_hop_limit= 1,
 	.accept_ra_pinfo	= 1,
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	.accept_ra_rtr_pref	= 1,
@@ -4565,6 +4567,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor;
 	array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
 	array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
+	array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;
 	array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
@@ -5458,6 +5461,13 @@ static struct addrconf_sysctl_table
 			.proc_handler	= proc_dointvec,
 		},
 		{
+			.procname	= "accept_ra_min_hop_limit",
+			.data		= &ipv6_devconf.accept_ra_min_hop_limit,
+			.maxlen		= sizeof(int),
+			.mode		= 0644,
+			.proc_handler	= proc_dointvec,
+		},
+		{
 			.procname	= "accept_ra_pinfo",
 			.data		= &ipv6_devconf.accept_ra_pinfo,
 			.maxlen		= sizeof(int),
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 96f153c..abb0bdd 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1225,18 +1225,16 @@ static void ndisc_router_discovery(struct sk_buff *skb)
 
 	if (rt)
 		rt6_set_expires(rt, jiffies + (HZ * lifetime));
-	if (ra_msg->icmph.icmp6_hop_limit) {
-		/* Only set hop_limit on the interface if it is higher than
-		 * the current hop_limit.
-		 */
-		if (in6_dev->cnf.hop_limit < ra_msg->icmph.icmp6_hop_limit) {
+	if (in6_dev->cnf.accept_ra_min_hop_limit < 256 &&
+	    ra_msg->icmph.icmp6_hop_limit) {
+		if (in6_dev->cnf.accept_ra_min_hop_limit <= ra_msg->icmph.icmp6_hop_limit) {
 			in6_dev->cnf.hop_limit = ra_msg->icmph.icmp6_hop_limit;
+			if (rt)
+				dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
+					       ra_msg->icmph.icmp6_hop_limit);
 		} else {
-			ND_PRINTK(2, warn, "RA: Got route advertisement with lower hop_limit than current\n");
+			ND_PRINTK(2, warn, "RA: Got route advertisement with lower hop_limit than minimum\n");
 		}
-		if (rt)
-			dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
-				       ra_msg->icmph.icmp6_hop_limit);
 	}
 
 skip_defrtr:
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] ipv6: addrconf: Fix recursive spin lock call
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (10 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] net/ipv6: add sysctl option accept_ra_min_hop_limit Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: fix a lockdep splat Sasha Levin
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: subashab@codeaurora.org, Eric Dumazet, Erik Kline,
	Hannes Frederic Sowa, David S. Miller, Sasha Levin

From: "subashab@codeaurora.org" <subashab@codeaurora.org>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 16186a82de1fdd868255448274e64ae2616e2640 ]

A rcu stall with the following backtrace was seen on a system with
forwarding, optimistic_dad and use_optimistic set. To reproduce,
set these flags and allow ipv6 autoconf.

This occurs because the device write_lock is acquired while already
holding the read_lock. Back trace below -

INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
 g=3992 c=3991 q=4471)
<6> Task dump for CPU 1:
<2> kworker/1:0     R  running task    12168    15   2 0x00000002
<2> Workqueue: ipv6_addrconf addrconf_dad_work
<6> Call trace:
<2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
<2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
<2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
<2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
<2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
<2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
<2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
<2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
<2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
<2> [<ffffffc0000bb40c>] kthread+0xe0/0xec

v2: do addrconf_dad_kick inside read lock and then acquire write
lock for ipv6_ifa_notify as suggested by Eric

Fixes: 7fd2561e4ebdd ("net: ipv6: Add a sysctl to make optimistic
addresses useful candidates")

Cc: Eric Dumazet <edumazet@google.com>
Cc: Erik Kline <ek@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv6/addrconf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 28c4bc5..fcfbd05 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3423,6 +3423,7 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 {
 	struct inet6_dev *idev = ifp->idev;
 	struct net_device *dev = idev->dev;
+	bool notify = false;
 
 	addrconf_join_solict(dev, &ifp->addr);
 
@@ -3468,7 +3469,7 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 			/* Because optimistic nodes can use this address,
 			 * notify listeners. If DAD fails, RTM_DELADDR is sent.
 			 */
-			ipv6_ifa_notify(RTM_NEWADDR, ifp);
+			notify = true;
 		}
 	}
 
@@ -3476,6 +3477,8 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 out:
 	spin_unlock(&ifp->lock);
 	read_unlock_bh(&idev->lock);
+	if (notify)
+		ipv6_ifa_notify(RTM_NEWADDR, ifp);
 }
 
 static void addrconf_dad_start(struct inet6_ifaddr *ifp)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] ipv6: fix a lockdep splat
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (11 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: addrconf: Fix recursive spin lock call Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:23 ` [added to the 4.1 stable tree] unix: correctly track in-flight fds in sending process user_struct Sasha Levin
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Eric Dumazet, David S. Miller, Sasha Levin

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 44c3d0c1c0a880354e9de5d94175742e2c7c9683 ]

Silence lockdep false positive about rcu_dereference() being
used in the wrong context.

First one should use rcu_dereference_protected() as we own the spinlock.

Second one should be a normal assignation, as no barrier is needed.

Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv6/ip6_flowlabel.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index d491125..db939e4 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -540,12 +540,13 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		}
 		spin_lock_bh(&ip6_sk_fl_lock);
 		for (sflp = &np->ipv6_fl_list;
-		     (sfl = rcu_dereference(*sflp)) != NULL;
+		     (sfl = rcu_dereference_protected(*sflp,
+						      lockdep_is_held(&ip6_sk_fl_lock))) != NULL;
 		     sflp = &sfl->next) {
 			if (sfl->fl->label == freq.flr_label) {
 				if (freq.flr_label == (np->flow_label&IPV6_FLOWLABEL_MASK))
 					np->flow_label &= ~IPV6_FLOWLABEL_MASK;
-				*sflp = rcu_dereference(sfl->next);
+				*sflp = sfl->next;
 				spin_unlock_bh(&ip6_sk_fl_lock);
 				fl_release(sfl->fl);
 				kfree_rcu(sfl, rcu);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] unix: correctly track in-flight fds in sending process user_struct
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (12 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: fix a lockdep splat Sasha Levin
@ 2016-03-02 20:23 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net:Add sysctl_max_skb_frags Sasha Levin
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:23 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Hannes Frederic Sowa, David Herrmann, Willy Tarreau,
	Linus Torvalds, David S. Miller, Sasha Levin

From: Hannes Frederic Sowa <hannes@stressinduktion.org>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 ]

The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.

To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.

Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/net/af_unix.h | 4 ++--
 include/net/scm.h     | 1 +
 net/core/scm.c        | 7 +++++++
 net/unix/af_unix.c    | 4 ++--
 net/unix/garbage.c    | 8 ++++----
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index e830c3d..7bb69c9 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -6,8 +6,8 @@
 #include <linux/mutex.h>
 #include <net/sock.h>
 
-void unix_inflight(struct file *fp);
-void unix_notinflight(struct file *fp);
+void unix_inflight(struct user_struct *user, struct file *fp);
+void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_gc(void);
 void wait_for_unix_gc(void);
 struct sock *unix_get_socket(struct file *filp);
diff --git a/include/net/scm.h b/include/net/scm.h
index 262532d..59fa93c 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -21,6 +21,7 @@ struct scm_creds {
 struct scm_fp_list {
 	short			count;
 	short			max;
+	struct user_struct	*user;
 	struct file		*fp[SCM_MAX_FD];
 };
 
diff --git a/net/core/scm.c b/net/core/scm.c
index 8a1741b..dce0acb 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -87,6 +87,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
 		*fplp = fpl;
 		fpl->count = 0;
 		fpl->max = SCM_MAX_FD;
+		fpl->user = NULL;
 	}
 	fpp = &fpl->fp[fpl->count];
 
@@ -107,6 +108,10 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
 		*fpp++ = file;
 		fpl->count++;
 	}
+
+	if (!fpl->user)
+		fpl->user = get_uid(current_user());
+
 	return num;
 }
 
@@ -119,6 +124,7 @@ void __scm_destroy(struct scm_cookie *scm)
 		scm->fp = NULL;
 		for (i=fpl->count-1; i>=0; i--)
 			fput(fpl->fp[i]);
+		free_uid(fpl->user);
 		kfree(fpl);
 	}
 }
@@ -336,6 +342,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
 		for (i = 0; i < fpl->count; i++)
 			get_file(fpl->fp[i]);
 		new_fpl->max = new_fpl->count;
+		new_fpl->user = get_uid(fpl->user);
 	}
 	return new_fpl;
 }
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c741d83..d644042 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1464,7 +1464,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 	UNIXCB(skb).fp = NULL;
 
 	for (i = scm->fp->count-1; i >= 0; i--)
-		unix_notinflight(scm->fp->fp[i]);
+		unix_notinflight(scm->fp->user, scm->fp->fp[i]);
 }
 
 static void unix_destruct_scm(struct sk_buff *skb)
@@ -1529,7 +1529,7 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 		return -ENOMEM;
 
 	for (i = scm->fp->count - 1; i >= 0; i--)
-		unix_inflight(scm->fp->fp[i]);
+		unix_inflight(scm->fp->user, scm->fp->fp[i]);
 	return max_level;
 }
 
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 8fcdc22..6a0d485 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -116,7 +116,7 @@ struct sock *unix_get_socket(struct file *filp)
  * descriptor if it is for an AF_UNIX socket.
  */
 
-void unix_inflight(struct file *fp)
+void unix_inflight(struct user_struct *user, struct file *fp)
 {
 	struct sock *s = unix_get_socket(fp);
 
@@ -133,11 +133,11 @@ void unix_inflight(struct file *fp)
 		}
 		unix_tot_inflight++;
 	}
-	fp->f_cred->user->unix_inflight++;
+	user->unix_inflight++;
 	spin_unlock(&unix_gc_lock);
 }
 
-void unix_notinflight(struct file *fp)
+void unix_notinflight(struct user_struct *user, struct file *fp)
 {
 	struct sock *s = unix_get_socket(fp);
 
@@ -152,7 +152,7 @@ void unix_notinflight(struct file *fp)
 			list_del_init(&u->link);
 		unix_tot_inflight--;
 	}
-	fp->f_cred->user->unix_inflight--;
+	user->unix_inflight--;
 	spin_unlock(&unix_gc_lock);
 }
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] net:Add sysctl_max_skb_frags
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (13 preceding siblings ...)
  2016-03-02 20:23 ` [added to the 4.1 stable tree] unix: correctly track in-flight fds in sending process user_struct Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs Sasha Levin
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Hans Westgaard Ry, David S. Miller, Sasha Levin

From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 5f74f82ea34c0da80ea0b49192bb5ea06e063593 ]

Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/linux/skbuff.h     |  1 +
 net/core/skbuff.c          |  2 ++
 net/core/sysctl_net_core.c | 10 ++++++++++
 net/ipv4/tcp.c             |  4 ++--
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1f17abe..6633b0c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -203,6 +203,7 @@ struct sk_buff;
 #else
 #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
 #endif
+extern int sysctl_max_skb_frags;
 
 typedef struct skb_frag_struct skb_frag_t;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2e5fcda..c9793c6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -79,6 +79,8 @@
 
 struct kmem_cache *skbuff_head_cache __read_mostly;
 static struct kmem_cache *skbuff_fclone_cache __read_mostly;
+int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
+EXPORT_SYMBOL(sysctl_max_skb_frags);
 
 /**
  *	skb_panic - private function for out-of-line support
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 95b6139..a6beb7b 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -26,6 +26,7 @@ static int zero = 0;
 static int one = 1;
 static int min_sndbuf = SOCK_MIN_SNDBUF;
 static int min_rcvbuf = SOCK_MIN_RCVBUF;
+static int max_skb_frags = MAX_SKB_FRAGS;
 
 static int net_msg_warn;	/* Unused, but still a sysctl */
 
@@ -392,6 +393,15 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "max_skb_frags",
+		.data		= &sysctl_max_skb_frags,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
+		.extra2		= &max_skb_frags,
+	},
 	{ }
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b5f4f5c..19d385a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -922,7 +922,7 @@ new_segment:
 
 		i = skb_shinfo(skb)->nr_frags;
 		can_coalesce = skb_can_coalesce(skb, i, page, offset);
-		if (!can_coalesce && i >= MAX_SKB_FRAGS) {
+		if (!can_coalesce && i >= sysctl_max_skb_frags) {
 			tcp_mark_push(tp, skb);
 			goto new_segment;
 		}
@@ -1188,7 +1188,7 @@ new_segment:
 
 			if (!skb_can_coalesce(skb, i, pfrag->page,
 					      pfrag->offset)) {
-				if (i == MAX_SKB_FRAGS || !sg) {
+				if (i == sysctl_max_skb_frags || !sg) {
 					tcp_mark_push(tp, skb);
 					goto new_segment;
 				}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (14 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net:Add sysctl_max_skb_frags Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] sctp: translate network order to host order when users get a hmacid Sasha Levin
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Siva Reddy Kallam, Michael Chan, David S. Miller, Sasha Levin

From: Siva Reddy Kallam <siva.kallam@broadcom.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit b7d987295c74500b733a0ba07f9a9bcc4074fa83 ]

tg3_tso_bug() can hit a condition where the entire tx ring is not big
enough to segment the GSO packet. For example, if MSS is very small,
gso_segs can exceed the tx ring size. When we hit the condition, it
will cause tx timeout.

tg3_tso_bug() is called to handle TSO and DMA hardware bugs.
For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet.
For DMA bugs, we can still fall back to linearize the SKB and let the
hardware transmit the TSO packet.

This patch adds a function tg3_tso_bug_gso_check() to check if there
are enough tx descriptors for GSO before calling tg3_tso_bug().
The caller will then handle the error appropriately - drop or
lineraize the SKB.

v2: Corrected patch description to avoid confusion.

Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 0d8af5b..d541520 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7833,6 +7833,14 @@ static int tigon3_dma_hwbug_workaround(struct tg3_napi *tnapi,
 	return ret;
 }
 
+static bool tg3_tso_bug_gso_check(struct tg3_napi *tnapi, struct sk_buff *skb)
+{
+	/* Check if we will never have enough descriptors,
+	 * as gso_segs can be more than current ring size
+	 */
+	return skb_shinfo(skb)->gso_segs < tnapi->tx_pending / 3;
+}
+
 static netdev_tx_t tg3_start_xmit(struct sk_buff *, struct net_device *);
 
 /* Use GSO to workaround all TSO packets that meet HW bug conditions
@@ -7936,14 +7944,19 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		 * vlan encapsulated.
 		 */
 		if (skb->protocol == htons(ETH_P_8021Q) ||
-		    skb->protocol == htons(ETH_P_8021AD))
-			return tg3_tso_bug(tp, tnapi, txq, skb);
+		    skb->protocol == htons(ETH_P_8021AD)) {
+			if (tg3_tso_bug_gso_check(tnapi, skb))
+				return tg3_tso_bug(tp, tnapi, txq, skb);
+			goto drop;
+		}
 
 		if (!skb_is_gso_v6(skb)) {
 			if (unlikely((ETH_HLEN + hdr_len) > 80) &&
-			    tg3_flag(tp, TSO_BUG))
-				return tg3_tso_bug(tp, tnapi, txq, skb);
-
+			    tg3_flag(tp, TSO_BUG)) {
+				if (tg3_tso_bug_gso_check(tnapi, skb))
+					return tg3_tso_bug(tp, tnapi, txq, skb);
+				goto drop;
+			}
 			ip_csum = iph->check;
 			ip_tot_len = iph->tot_len;
 			iph->check = 0;
@@ -8075,7 +8088,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (would_hit_hwbug) {
 		tg3_tx_skb_unmap(tnapi, tnapi->tx_prod, i);
 
-		if (mss) {
+		if (mss && tg3_tso_bug_gso_check(tnapi, skb)) {
 			/* If it's a TSO packet, do GSO instead of
 			 * allocating and copying to a large linear SKB
 			 */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] sctp: translate network order to host order when users get a hmacid
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (15 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen Sasha Levin
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Xin Long, David S. Miller, Sasha Levin

From: Xin Long <lucien.xin@gmail.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 7a84bd46647ff181eb2659fdc99590e6f16e501d ]

Commit ed5a377d87dc ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.

We fix it by changing hmacids to host order when users get them with
getsockopt.

Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/sctp/socket.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1b80f20..3c58330 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5555,6 +5555,7 @@ static int sctp_getsockopt_hmac_ident(struct sock *sk, int len,
 	struct sctp_hmac_algo_param *hmacs;
 	__u16 data_len = 0;
 	u32 num_idents;
+	int i;
 
 	if (!ep->auth_enable)
 		return -EACCES;
@@ -5572,8 +5573,12 @@ static int sctp_getsockopt_hmac_ident(struct sock *sk, int len,
 		return -EFAULT;
 	if (put_user(num_idents, &p->shmac_num_idents))
 		return -EFAULT;
-	if (copy_to_user(p->shmac_idents, hmacs->hmac_ids, data_len))
-		return -EFAULT;
+	for (i = 0; i < num_idents; i++) {
+		__u16 hmacid = ntohs(hmacs->hmac_ids[i]);
+
+		if (copy_to_user(&p->shmac_idents[i], &hmacid, sizeof(__u16)))
+			return -EFAULT;
+	}
 	return 0;
 }
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (16 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] sctp: translate network order to host order when users get a hmacid Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net: Copy inner L3 and L4 headers as unaligned on GRE TEB Sasha Levin
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Alexander Duyck, David S. Miller, Sasha Levin

From: Alexander Duyck <aduyck@mirantis.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 461547f3158978c180d74484d58e82be9b8e7357, since
  we lack the flow dissector flags in this release we guard the
  flow label access using a test on 'skb' being NULL ]

This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.

v2:  Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
     cause us to check for the flow label.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/core/flow_dissector.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2c35c02..f96d2ca 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -113,7 +113,6 @@ ip:
 	case htons(ETH_P_IPV6): {
 		const struct ipv6hdr *iph;
 		struct ipv6hdr _iph;
-		__be32 flow_label;
 
 ipv6:
 		iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, hlen, &_iph);
@@ -130,8 +129,9 @@ ipv6:
 		flow->src = (__force __be32)ipv6_addr_hash(&iph->saddr);
 		flow->dst = (__force __be32)ipv6_addr_hash(&iph->daddr);
 
-		flow_label = ip6_flowlabel(iph);
-		if (flow_label) {
+		if (skb && ip6_flowlabel(iph)) {
+			__be32 flow_label = ip6_flowlabel(iph);
+
 			/* Awesome, IPv6 packet has a flow label so we can
 			 * use that to represent the ports without any
 			 * further dissection.
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] net: Copy inner L3 and L4 headers as unaligned on GRE TEB
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (17 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] bpf: fix branch offset adjustment on backjumps after patching ctx expansion Sasha Levin
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Alexander Duyck, David S. Miller, Sasha Levin

From: Alexander Duyck <aduyck@mirantis.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 78565208d73ca9b654fb9a6b142214d52eeedfd1 ]

This patch corrects the unaligned accesses seen on GRE TEB tunnels when
generating hash keys.  Specifically what this patch does is make it so that
we force the use of skb_copy_bits when the GRE inner headers will be
unaligned due to NET_IP_ALIGNED being a non-zero value.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/core/flow_dissector.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f96d2ca..3556791 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -233,6 +233,13 @@ ipv6:
 					return false;
 				proto = eth->h_proto;
 				nhoff += sizeof(*eth);
+
+				/* Cap headers that we access via pointers at the
+				 * end of the Ethernet header as our maximum alignment
+				 * at that point is only 2 bytes.
+				 */
+				if (NET_IP_ALIGN)
+					hlen = nhoff;
 			}
 			goto again;
 		}
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] bpf: fix branch offset adjustment on backjumps after patching ctx expansion
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (18 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net: Copy inner L3 and L4 headers as unaligned on GRE TEB Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] bonding: Fix ARP monitor validation Sasha Levin
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Daniel Borkmann, David S. Miller, Sasha Levin

From: Daniel Borkmann <daniel@iogearbox.net>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a1b14d27ed0965838350f1377ff97c93ee383492 ]

When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.

Analysis on what the check in adjust_branches() is currently doing:

  /* adjust offset of jmps if necessary */
  if (i < pos && i + insn->off + 1 > pos)
    insn->off += delta;
  else if (i > pos && i + insn->off + 1 < pos)
    insn->off -= delta;

First condition (forward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] <--- i/insn            insns[1] <--- i/insn
  insns[2] <--- pos               insns[P] <--- pos
  insns[3]                        insns[P]  `------| delta
  insns[4] <--- target_X          insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] <--- target_X
                                  insns[5]

First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn->off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.

Second condition (backward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] <--- target_X          insns[1] <--- target_X
  insns[2] <--- pos <-- target_Y  insns[P] <--- pos <-- target_Y
  insns[3]                        insns[P]  `------| delta
  insns[4] <--- i/insn            insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] <--- i/insn
                                  insns[5]

Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i > pos is okay only by itself. However, i +
insn->off + 1 < pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn->off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be <= as pos
itself can be a target in the backjump, too.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 141d562..6582410 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1944,7 +1944,7 @@ static void adjust_branches(struct bpf_prog *prog, int pos, int delta)
 		/* adjust offset of jmps if necessary */
 		if (i < pos && i + insn->off + 1 > pos)
 			insn->off += delta;
-		else if (i > pos && i + insn->off + 1 < pos)
+		else if (i > pos + delta && i + insn->off + 1 <= pos + delta)
 			insn->off -= delta;
 	}
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] bonding: Fix ARP monitor validation
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (19 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] bpf: fix branch offset adjustment on backjumps after patching ctx expansion Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] ipv4: fix memory leaks in ip_cmsg_send() callers Sasha Levin
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, David S. Miller,
	Sasha Levin

From: Jay Vosburgh <jay.vosburgh@canonical.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 21a75f0915dde8674708b39abfcda113911c49b1 ]

The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.

	This logic can fail if there is no currently active slave.  In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave.  The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.

	This can arise if the bonding slaves and switch are working, but
the ARP target is not responding.  When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave.  This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.

	Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.

	This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.

Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/bonding/bond_main.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 72ba774..bd744e3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -214,6 +214,8 @@ static void bond_uninit(struct net_device *bond_dev);
 static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev,
 						struct rtnl_link_stats64 *stats);
 static void bond_slave_arr_handler(struct work_struct *work);
+static bool bond_time_in_interval(struct bonding *bond, unsigned long last_act,
+				  int mod);
 
 /*---------------------------- General routines -----------------------------*/
 
@@ -2397,7 +2399,7 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 		 struct slave *slave)
 {
 	struct arphdr *arp = (struct arphdr *)skb->data;
-	struct slave *curr_active_slave;
+	struct slave *curr_active_slave, *curr_arp_slave;
 	unsigned char *arp_ptr;
 	__be32 sip, tip;
 	int alen, is_arp = skb->protocol == __cpu_to_be16(ETH_P_ARP);
@@ -2444,26 +2446,41 @@ int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 		     &sip, &tip);
 
 	curr_active_slave = rcu_dereference(bond->curr_active_slave);
+	curr_arp_slave = rcu_dereference(bond->current_arp_slave);
 
-	/* Backup slaves won't see the ARP reply, but do come through
-	 * here for each ARP probe (so we swap the sip/tip to validate
-	 * the probe).  In a "redundant switch, common router" type of
-	 * configuration, the ARP probe will (hopefully) travel from
-	 * the active, through one switch, the router, then the other
-	 * switch before reaching the backup.
+	/* We 'trust' the received ARP enough to validate it if:
+	 *
+	 * (a) the slave receiving the ARP is active (which includes the
+	 * current ARP slave, if any), or
+	 *
+	 * (b) the receiving slave isn't active, but there is a currently
+	 * active slave and it received valid arp reply(s) after it became
+	 * the currently active slave, or
+	 *
+	 * (c) there is an ARP slave that sent an ARP during the prior ARP
+	 * interval, and we receive an ARP reply on any slave.  We accept
+	 * these because switch FDB update delays may deliver the ARP
+	 * reply to a slave other than the sender of the ARP request.
 	 *
-	 * We 'trust' the arp requests if there is an active slave and
-	 * it received valid arp reply(s) after it became active. This
-	 * is done to avoid endless looping when we can't reach the
+	 * Note: for (b), backup slaves are receiving the broadcast ARP
+	 * request, not a reply.  This request passes from the sending
+	 * slave through the L2 switch(es) to the receiving slave.  Since
+	 * this is checking the request, sip/tip are swapped for
+	 * validation.
+	 *
+	 * This is done to avoid endless looping when we can't reach the
 	 * arp_ip_target and fool ourselves with our own arp requests.
 	 */
-
 	if (bond_is_active_slave(slave))
 		bond_validate_arp(bond, slave, sip, tip);
 	else if (curr_active_slave &&
 		 time_after(slave_last_rx(bond, curr_active_slave),
 			    curr_active_slave->last_link_up))
 		bond_validate_arp(bond, slave, tip, sip);
+	else if (curr_arp_slave && (arp->ar_op == htons(ARPOP_REPLY)) &&
+		 bond_time_in_interval(bond,
+				       dev_trans_start(curr_arp_slave->dev), 1))
+		bond_validate_arp(bond, slave, sip, tip);
 
 out_unlock:
 	if (arp != (struct arphdr *)skb->data)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] ipv4: fix memory leaks in ip_cmsg_send() callers
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (20 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] bonding: Fix ARP monitor validation Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] af_unix: Guard against other == sk in unix_dgram_sendmsg Sasha Levin
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Eric Dumazet, David S. Miller, Sasha Levin

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 919483096bfe75dda338e98d56da91a263746a0a ]

Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.

Callers are responsible for the freeing.

Many thanks to Dmitry for the report and diagnostic.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv4/ip_sockglue.c | 2 ++
 net/ipv4/ping.c        | 4 +++-
 net/ipv4/raw.c         | 4 +++-
 net/ipv4/udp.c         | 4 +++-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 6ddde89..b6c7bde 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -249,6 +249,8 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc,
 		switch (cmsg->cmsg_type) {
 		case IP_RETOPTS:
 			err = cmsg->cmsg_len - CMSG_ALIGN(sizeof(struct cmsghdr));
+
+			/* Our caller is responsible for freeing ipc->opt */
 			err = ip_options_get(net, &ipc->opt, CMSG_DATA(cmsg),
 					     err < 40 ? err : 40);
 			if (err)
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 05ff44b..f6ee0d5 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -745,8 +745,10 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	if (msg->msg_controllen) {
 		err = ip_cmsg_send(sock_net(sk), msg, &ipc, false);
-		if (err)
+		if (unlikely(err)) {
+			kfree(ipc.opt);
 			return err;
+		}
 		if (ipc.opt)
 			free = 1;
 	}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 561cd4b..c77aac7 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -543,8 +543,10 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	if (msg->msg_controllen) {
 		err = ip_cmsg_send(sock_net(sk), msg, &ipc, false);
-		if (err)
+		if (unlikely(err)) {
+			kfree(ipc.opt);
 			goto out;
+		}
 		if (ipc.opt)
 			free = 1;
 	}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1b8c5ba..a390174 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -963,8 +963,10 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (msg->msg_controllen) {
 		err = ip_cmsg_send(sock_net(sk), msg, &ipc,
 				   sk->sk_family == AF_INET6);
-		if (err)
+		if (unlikely(err)) {
+			kfree(ipc.opt);
 			return err;
+		}
 		if (ipc.opt)
 			free = 1;
 		connected = 0;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] af_unix: Guard against other == sk in unix_dgram_sendmsg
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (21 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] ipv4: fix memory leaks in ip_cmsg_send() callers Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] qmi_wwan: add "4G LTE usb-modem U901" Sasha Levin
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Rainer Weikusat, David S. Miller, Sasha Levin

From: Rainer Weikusat <rweikusat@mobileactivedefense.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a5527dda344fff0514b7989ef7a755729769daa1 ]

The unix_dgram_sendmsg routine use the following test

if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {

to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.

Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/unix/af_unix.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d644042..535a642 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1714,7 +1714,12 @@ restart_locked:
 			goto out_unlock;
 	}
 
-	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+	/* other == sk && unix_peer(other) != sk if
+	 * - unix_peer(sk) == NULL, destination address bound to sk
+	 * - unix_peer(sk) == sk by time of get but disconnected before lock
+	 */
+	if (other != sk &&
+	    unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
 		if (timeo) {
 			timeo = unix_wait_for_peer(other, timeo);
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] qmi_wwan: add "4G LTE usb-modem U901"
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (22 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] af_unix: Guard against other == sk in unix_dgram_sendmsg Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Count HW buffer overrun only once Sasha Levin
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Bjørn Mork, David S. Miller, Sasha Levin

From: Bjørn Mork <bjorn@mork.no>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit aac8d3c282e024c344c5b86dc1eab7af88bb9716 ]

Thomas reports:

T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=05c6 ProdID=6001 Rev=00.00
S:  Manufacturer=USB Modem
S:  Product=USB Modem
S:  SerialNumber=1234567890ABCDEF
C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage

Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 71190dc..cffb252 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -542,6 +542,7 @@ static const struct usb_device_id products[] = {
 
 	/* 3. Combined interface devices matching on interface number */
 	{QMI_FIXED_INTF(0x0408, 0xea42, 4)},	/* Yota / Megafon M100-1 */
+	{QMI_FIXED_INTF(0x05c6, 0x6001, 3)},	/* 4G LTE usb-modem U901 */
 	{QMI_FIXED_INTF(0x05c6, 0x7000, 0)},
 	{QMI_FIXED_INTF(0x05c6, 0x7001, 1)},
 	{QMI_FIXED_INTF(0x05c6, 0x7002, 1)},
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] net/mlx4_en: Count HW buffer overrun only once
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (23 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] qmi_wwan: add "4G LTE usb-modem U901" Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Choose time-stamping shift value according to HW frequency Sasha Levin
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Amir Vadai, Eugenia Emantayev, Or Gerlitz, David S. Miller,
	Sasha Levin

From: Amir Vadai <amir@vadai.me>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 281e8b2fdf8e4ef366b899453cae50e09b577ada ]

RdropOvflw counts overrun of HW buffer, therefore should
be used for rx_fifo_errors only.

Currently RdropOvflw counter is mistakenly also set into
rx_missed_errors and rx_over_errors too, which makes the
device total dropped packets accounting to show wrong results.

Fix that. Use it for rx_fifo_errors only.

Fixes: c27a02cd94d6 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_port.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 0a56f01..760a8b3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -223,11 +223,11 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 	stats->collisions = 0;
 	stats->rx_dropped = be32_to_cpu(mlx4_en_stats->RDROP);
 	stats->rx_length_errors = be32_to_cpu(mlx4_en_stats->RdropLength);
-	stats->rx_over_errors = be32_to_cpu(mlx4_en_stats->RdropOvflw);
+	stats->rx_over_errors = 0;
 	stats->rx_crc_errors = be32_to_cpu(mlx4_en_stats->RCRC);
 	stats->rx_frame_errors = 0;
 	stats->rx_fifo_errors = be32_to_cpu(mlx4_en_stats->RdropOvflw);
-	stats->rx_missed_errors = be32_to_cpu(mlx4_en_stats->RdropOvflw);
+	stats->rx_missed_errors = 0;
 	stats->tx_aborted_errors = 0;
 	stats->tx_carrier_errors = 0;
 	stats->tx_fifo_errors = 0;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] net/mlx4_en: Choose time-stamping shift value according to HW frequency
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (24 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Count HW buffer overrun only once Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Avoid changing dev->features directly in run-time Sasha Levin
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Eugenia Emantayev, Or Gerlitz, David S. Miller, Sasha Levin

From: Eugenia Emantayev <eugenia@mellanox.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 31c128b66e5b28f468076e4f3ca3025c35342041 ]

Previously, the shift value used for time-stamping was constant and didn't
depend on the HW chip frequency. Change that to take the frequency into account
and calculate the maximal value in cycles per wraparound of ten seconds. This
time slot was chosen since it gives a good accuracy in time synchronization.

Algorithm for shift value calculation:
 * Round up the maximal value in cycles to nearest power of two

 * Calculate maximal multiplier by division of all 64 bits set
   to above result

 * Then, invert the function clocksource_khz2mult() to get the shift from
   maximal mult value

Fixes: ec693d47010e ('net/mlx4_en: Add HW timestamping (TS) support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_clock.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 8a083d7..dae2ebb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -236,6 +236,24 @@ static const struct ptp_clock_info mlx4_en_ptp_clock_info = {
 	.enable		= mlx4_en_phc_enable,
 };
 
+#define MLX4_EN_WRAP_AROUND_SEC	10ULL
+
+/* This function calculates the max shift that enables the user range
+ * of MLX4_EN_WRAP_AROUND_SEC values in the cycles register.
+ */
+static u32 freq_to_shift(u16 freq)
+{
+	u32 freq_khz = freq * 1000;
+	u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC;
+	u64 max_val_cycles_rounded = is_power_of_2(max_val_cycles + 1) ?
+		max_val_cycles : roundup_pow_of_two(max_val_cycles) - 1;
+	/* calculate max possible multiplier in order to fit in 64bit */
+	u64 max_mul = div_u64(0xffffffffffffffffULL, max_val_cycles_rounded);
+
+	/* This comes from the reverse of clocksource_khz2mult */
+	return ilog2(div_u64(max_mul * freq_khz, 1000000));
+}
+
 void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev)
 {
 	struct mlx4_dev *dev = mdev->dev;
@@ -247,12 +265,7 @@ void mlx4_en_init_timestamp(struct mlx4_en_dev *mdev)
 	memset(&mdev->cycles, 0, sizeof(mdev->cycles));
 	mdev->cycles.read = mlx4_en_read_clock;
 	mdev->cycles.mask = CLOCKSOURCE_MASK(48);
-	/* Using shift to make calculation more accurate. Since current HW
-	 * clock frequency is 427 MHz, and cycles are given using a 48 bits
-	 * register, the biggest shift when calculating using u64, is 14
-	 * (max_cycles * multiplier < 2^64)
-	 */
-	mdev->cycles.shift = 14;
+	mdev->cycles.shift = freq_to_shift(dev->caps.hca_core_clock);
 	mdev->cycles.mult =
 		clocksource_khz2mult(1000 * dev->caps.hca_core_clock, mdev->cycles.shift);
 	mdev->nominal_c_mult = mdev->cycles.mult;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] net/mlx4_en: Avoid changing dev->features directly in run-time
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (25 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Choose time-stamping shift value according to HW frequency Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] l2tp: Fix error creating L2TP tunnels Sasha Levin
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Eugenia Emantayev, Or Gerlitz, David S. Miller, Sasha Levin

From: Eugenia Emantayev <eugenia@mellanox.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 925ab1aa9394bbaeac47ee5b65d3fdf0fb8135cf ]

It's forbidden to manually change dev->features in run-time. Currently, this is
done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when
VXLAN tunnel is set. However, since the stack actually does features intersection
with hw_enc_features, we can safely revert to advertizing features early when
registering the netdevice.

Fixes: f4a1edd56120 ('net/mlx4_en: Advertize encapsulation offloads [...]')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index a5a0b84..e918959 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2330,8 +2330,6 @@ out:
 	/* set offloads */
 	priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 				      NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
-	priv->dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
-	priv->dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
 }
 
 static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
@@ -2342,8 +2340,6 @@ static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
 	/* unset offloads */
 	priv->dev->hw_enc_features &= ~(NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 				      NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL);
-	priv->dev->hw_features &= ~NETIF_F_GSO_UDP_TUNNEL;
-	priv->dev->features    &= ~NETIF_F_GSO_UDP_TUNNEL;
 
 	ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
 				  VXLAN_STEER_BY_OUTER_MAC, 0);
@@ -2940,6 +2936,11 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 		priv->rss_hash_fn = ETH_RSS_HASH_TOP;
 	}
 
+	if (mdev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
+		dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+		dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
+	}
+
 	mdev->pndev[port] = dev;
 	mdev->upper[port] = NULL;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] l2tp: Fix error creating L2TP tunnels
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (26 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Avoid changing dev->features directly in run-time Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] pppoe: fix reference counting in PPPoE proxy Sasha Levin
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Mark Tomlinson, David S. Miller, Sasha Levin

From: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 853effc55b0f975abd6d318cca486a9c1b67e10f ]

A previous commit (33f72e6) added notification via netlink for tunnels
when created/modified/deleted. If the notification returned an error,
this error was returned from the tunnel function. If there were no
listeners, the error code ESRCH was returned, even though having no
listeners is not an error. Other calls to this and other similar
notification functions either ignore the error code, or filter ESRCH.
This patch checks for ESRCH and does not flag this as an error.

Reviewed-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/l2tp/l2tp_netlink.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 9e13c2f..fe92a08 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -124,8 +124,13 @@ static int l2tp_tunnel_notify(struct genl_family *family,
 	ret = l2tp_nl_tunnel_send(msg, info->snd_portid, info->snd_seq,
 				  NLM_F_ACK, tunnel, cmd);
 
-	if (ret >= 0)
-		return genlmsg_multicast_allns(family, msg, 0,	0, GFP_ATOMIC);
+	if (ret >= 0) {
+		ret = genlmsg_multicast_allns(family, msg, 0, 0, GFP_ATOMIC);
+		/* We don't care if no one is listening */
+		if (ret == -ESRCH)
+			ret = 0;
+		return ret;
+	}
 
 	nlmsg_free(msg);
 
@@ -147,8 +152,13 @@ static int l2tp_session_notify(struct genl_family *family,
 	ret = l2tp_nl_session_send(msg, info->snd_portid, info->snd_seq,
 				   NLM_F_ACK, session, cmd);
 
-	if (ret >= 0)
-		return genlmsg_multicast_allns(family, msg, 0,	0, GFP_ATOMIC);
+	if (ret >= 0) {
+		ret = genlmsg_multicast_allns(family, msg, 0, 0, GFP_ATOMIC);
+		/* We don't care if no one is listening */
+		if (ret == -ESRCH)
+			ret = 0;
+		return ret;
+	}
 
 	nlmsg_free(msg);
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] pppoe: fix reference counting in PPPoE proxy
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (27 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] l2tp: Fix error creating L2TP tunnels Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] route: check and remove route cache when we get route Sasha Levin
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Guillaume Nault, David S. Miller, Sasha Levin

From: Guillaume Nault <g.nault@alphalink.fr>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 29e73269aa4d36f92b35610c25f8b01c789b0dc8 ]

Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
This is already handled correctly in the error path.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ppp/pppoe.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 9c8fabe..d1c4bc1 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -395,6 +395,8 @@ static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
 
 		if (!__pppoe_xmit(sk_pppox(relay_po), skb))
 			goto abort_put;
+
+		sock_put(sk_pppox(relay_po));
 	} else {
 		if (sock_queue_rcv_skb(sk, skb))
 			goto abort_kfree;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] route: check and remove route cache when we get route
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (28 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] pppoe: fix reference counting in PPPoE proxy Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] rtnl: RTM_GETNETCONF: fix wrong return value Sasha Levin
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Xin Long, David S. Miller, Sasha Levin

From: Xin Long <lucien.xin@gmail.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit deed49df7390d5239024199e249190328f1651e7 ]

Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.

Fix this issue by checking  and removing route cache when we get route.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 include/net/ip_fib.h |  1 +
 net/ipv4/route.c     | 77 ++++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..13f1a97 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -59,6 +59,7 @@ struct fib_nh_exception {
 	struct rtable __rcu		*fnhe_rth_input;
 	struct rtable __rcu		*fnhe_rth_output;
 	unsigned long			fnhe_stamp;
+	struct rcu_head			rcu;
 };
 
 struct fnhe_hash_bucket {
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f45f2a1..1d3cdb4d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -125,6 +125,7 @@ static int ip_rt_mtu_expires __read_mostly	= 10 * 60 * HZ;
 static int ip_rt_min_pmtu __read_mostly		= 512 + 20 + 20;
 static int ip_rt_min_advmss __read_mostly	= 256;
 
+static int ip_rt_gc_timeout __read_mostly	= RT_GC_TIMEOUT;
 /*
  *	Interface to generic destination cache.
  */
@@ -753,7 +754,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 				struct fib_nh *nh = &FIB_RES_NH(res);
 
 				update_or_create_fnhe(nh, fl4->daddr, new_gw,
-						      0, 0);
+						0, jiffies + ip_rt_gc_timeout);
 			}
 			if (kill_route)
 				rt->dst.obsolete = DST_OBSOLETE_KILL;
@@ -1538,6 +1539,36 @@ static void ip_handle_martian_source(struct net_device *dev,
 #endif
 }
 
+static void ip_del_fnhe(struct fib_nh *nh, __be32 daddr)
+{
+	struct fnhe_hash_bucket *hash;
+	struct fib_nh_exception *fnhe, __rcu **fnhe_p;
+	u32 hval = fnhe_hashfun(daddr);
+
+	spin_lock_bh(&fnhe_lock);
+
+	hash = rcu_dereference_protected(nh->nh_exceptions,
+					 lockdep_is_held(&fnhe_lock));
+	hash += hval;
+
+	fnhe_p = &hash->chain;
+	fnhe = rcu_dereference_protected(*fnhe_p, lockdep_is_held(&fnhe_lock));
+	while (fnhe) {
+		if (fnhe->fnhe_daddr == daddr) {
+			rcu_assign_pointer(*fnhe_p, rcu_dereference_protected(
+				fnhe->fnhe_next, lockdep_is_held(&fnhe_lock)));
+			fnhe_flush_routes(fnhe);
+			kfree_rcu(fnhe, rcu);
+			break;
+		}
+		fnhe_p = &fnhe->fnhe_next;
+		fnhe = rcu_dereference_protected(fnhe->fnhe_next,
+						 lockdep_is_held(&fnhe_lock));
+	}
+
+	spin_unlock_bh(&fnhe_lock);
+}
+
 /* called in rcu_read_lock() section */
 static int __mkroute_input(struct sk_buff *skb,
 			   const struct fib_result *res,
@@ -1592,11 +1623,20 @@ static int __mkroute_input(struct sk_buff *skb,
 
 	fnhe = find_exception(&FIB_RES_NH(*res), daddr);
 	if (do_cache) {
-		if (fnhe)
+		if (fnhe) {
 			rth = rcu_dereference(fnhe->fnhe_rth_input);
-		else
-			rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
+			if (rth && rth->dst.expires &&
+			    time_after(jiffies, rth->dst.expires)) {
+				ip_del_fnhe(&FIB_RES_NH(*res), daddr);
+				fnhe = NULL;
+			} else {
+				goto rt_cache;
+			}
+		}
+
+		rth = rcu_dereference(FIB_RES_NH(*res).nh_rth_input);
 
+rt_cache:
 		if (rt_cache_valid(rth)) {
 			skb_dst_set_noref(skb, &rth->dst);
 			goto out;
@@ -1945,19 +1985,29 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 		struct fib_nh *nh = &FIB_RES_NH(*res);
 
 		fnhe = find_exception(nh, fl4->daddr);
-		if (fnhe)
+		if (fnhe) {
 			prth = &fnhe->fnhe_rth_output;
-		else {
-			if (unlikely(fl4->flowi4_flags &
-				     FLOWI_FLAG_KNOWN_NH &&
-				     !(nh->nh_gw &&
-				       nh->nh_scope == RT_SCOPE_LINK))) {
-				do_cache = false;
-				goto add;
+			rth = rcu_dereference(*prth);
+			if (rth && rth->dst.expires &&
+			    time_after(jiffies, rth->dst.expires)) {
+				ip_del_fnhe(nh, fl4->daddr);
+				fnhe = NULL;
+			} else {
+				goto rt_cache;
 			}
-			prth = raw_cpu_ptr(nh->nh_pcpu_rth_output);
 		}
+
+		if (unlikely(fl4->flowi4_flags &
+			     FLOWI_FLAG_KNOWN_NH &&
+			     !(nh->nh_gw &&
+			       nh->nh_scope == RT_SCOPE_LINK))) {
+			do_cache = false;
+			goto add;
+		}
+		prth = raw_cpu_ptr(nh->nh_pcpu_rth_output);
 		rth = rcu_dereference(*prth);
+
+rt_cache:
 		if (rt_cache_valid(rth)) {
 			dst_hold(&rth->dst);
 			return rth;
@@ -2504,7 +2554,6 @@ void ip_rt_multicast_event(struct in_device *in_dev)
 }
 
 #ifdef CONFIG_SYSCTL
-static int ip_rt_gc_timeout __read_mostly	= RT_GC_TIMEOUT;
 static int ip_rt_gc_interval __read_mostly  = 60 * HZ;
 static int ip_rt_gc_min_interval __read_mostly	= HZ / 2;
 static int ip_rt_gc_elasticity __read_mostly	= 8;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] rtnl: RTM_GETNETCONF: fix wrong return value
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (29 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] route: check and remove route cache when we get route Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] unix_diag: fix incorrect sign extension in unix_lookup_by_ino Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] sctp: Fix port hash table size computation Sasha Levin
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Anton Protopopov, David S. Miller, Sasha Levin

From: Anton Protopopov <a.s.protopopov@gmail.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a97eb33ff225f34a8124774b3373fd244f0e83ce ]

An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/ipv4/devinet.c  | 2 +-
 net/ipv6/addrconf.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 419d23c..280d46f 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1839,7 +1839,7 @@ static int inet_netconf_get_devconf(struct sk_buff *in_skb,
 	if (err < 0)
 		goto errout;
 
-	err = EINVAL;
+	err = -EINVAL;
 	if (!tb[NETCONFA_IFINDEX])
 		goto errout;
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index fcfbd05..f555f4f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -569,7 +569,7 @@ static int inet6_netconf_get_devconf(struct sk_buff *in_skb,
 	if (err < 0)
 		goto errout;
 
-	err = EINVAL;
+	err = -EINVAL;
 	if (!tb[NETCONFA_IFINDEX])
 		goto errout;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] unix_diag: fix incorrect sign extension in unix_lookup_by_ino
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (30 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] rtnl: RTM_GETNETCONF: fix wrong return value Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  2016-03-02 20:24 ` [added to the 4.1 stable tree] sctp: Fix port hash table size computation Sasha Levin
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits; +Cc: Dmitry V. Levin, David S. Miller, Sasha Levin

From: "Dmitry V. Levin" <ldv@altlinux.org>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit b5f0549231ffb025337be5a625b0ff9f52b016f0 ]

The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
__u32, but unix_lookup_by_ino's argument ino has type int, which is not
a problem yet.
However, when ino is compared with sock_i_ino return value of type
unsigned long, ino is sign extended to signed long, and this results
to incorrect comparison on 64-bit architectures for inode numbers
greater than INT_MAX.

This bug was found by strace test suite.

Fixes: 5d3cae8bc39d ("unix_diag: Dumping exact socket core")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/unix/diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/diag.c b/net/unix/diag.c
index c512f64..4d96797 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -220,7 +220,7 @@ done:
 	return skb->len;
 }
 
-static struct sock *unix_lookup_by_ino(int ino)
+static struct sock *unix_lookup_by_ino(unsigned int ino)
 {
 	int i;
 	struct sock *sk;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [added to the 4.1 stable tree] sctp: Fix port hash table size computation
  2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
                   ` (31 preceding siblings ...)
  2016-03-02 20:24 ` [added to the 4.1 stable tree] unix_diag: fix incorrect sign extension in unix_lookup_by_ino Sasha Levin
@ 2016-03-02 20:24 ` Sasha Levin
  32 siblings, 0 replies; 37+ messages in thread
From: Sasha Levin @ 2016-03-02 20:24 UTC (permalink / raw)
  To: stable, stable-commits
  Cc: Neil Horman, Dmitry Vyukov, Vladislav Yasevich, David S. Miller,
	Sasha Levin

From: Neil Horman <nhorman@tuxdriver.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit d9749fb5942f51555dc9ce1ac0dbb1806960a975 ]

Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
its size computation, observing that the current method never guaranteed
that the hashsize (measured in number of entries) would be a power of two,
which the input hash function for that table requires.  The root cause of
the problem is that two values need to be computed (one, the allocation
order of the storage requries, as passed to __get_free_pages, and two the
number of entries for the hash table).  Both need to be ^2, but for
different reasons, and the existing code is simply computing one order
value, and using it as the basis for both, which is wrong (i.e. it assumes
that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).

To fix this, we change the logic slightly.  We start by computing a goal
allocation order (which is limited by the maximum size hash table we want
to support.  Then we attempt to allocate that size table, decreasing the
order until a successful allocation is made.  Then, with the resultant
successful order we compute the number of buckets that hash table supports,
which we then round down to the nearest power of two, giving us the number
of entries the table actually supports.

I've tested this locally here, using non-debug and spinlock-debug kernels,
and the number of entries in the hashtable consistently work out to be
powers of two in all cases.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Dmitry Vyukov <dvyukov@google.com>
CC: Vladislav Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 net/sctp/protocol.c | 47 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index e13c3c3..9d134ab 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -60,6 +60,8 @@
 #include <net/inet_common.h>
 #include <net/inet_ecn.h>
 
+#define MAX_SCTP_PORT_HASH_ENTRIES (64 * 1024)
+
 /* Global data structures. */
 struct sctp_globals sctp_globals __read_mostly;
 
@@ -1332,6 +1334,8 @@ static __init int sctp_init(void)
 	unsigned long limit;
 	int max_share;
 	int order;
+	int num_entries;
+	int max_entry_order;
 
 	sock_skb_cb_check_size(sizeof(struct sctp_ulpevent));
 
@@ -1384,14 +1388,24 @@ static __init int sctp_init(void)
 
 	/* Size and allocate the association hash table.
 	 * The methodology is similar to that of the tcp hash tables.
+	 * Though not identical.  Start by getting a goal size
 	 */
 	if (totalram_pages >= (128 * 1024))
 		goal = totalram_pages >> (22 - PAGE_SHIFT);
 	else
 		goal = totalram_pages >> (24 - PAGE_SHIFT);
 
-	for (order = 0; (1UL << order) < goal; order++)
-		;
+	/* Then compute the page order for said goal */
+	order = get_order(goal);
+
+	/* Now compute the required page order for the maximum sized table we
+	 * want to create
+	 */
+	max_entry_order = get_order(MAX_SCTP_PORT_HASH_ENTRIES *
+				    sizeof(struct sctp_bind_hashbucket));
+
+	/* Limit the page order by that maximum hash table size */
+	order = min(order, max_entry_order);
 
 	do {
 		sctp_assoc_hashsize = (1UL << order) * PAGE_SIZE /
@@ -1425,27 +1439,42 @@ static __init int sctp_init(void)
 		INIT_HLIST_HEAD(&sctp_ep_hashtable[i].chain);
 	}
 
-	/* Allocate and initialize the SCTP port hash table.  */
+	/* Allocate and initialize the SCTP port hash table.
+	 * Note that order is initalized to start at the max sized
+	 * table we want to support.  If we can't get that many pages
+	 * reduce the order and try again
+	 */
 	do {
-		sctp_port_hashsize = (1UL << order) * PAGE_SIZE /
-					sizeof(struct sctp_bind_hashbucket);
-		if ((sctp_port_hashsize > (64 * 1024)) && order > 0)
-			continue;
 		sctp_port_hashtable = (struct sctp_bind_hashbucket *)
 			__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, order);
 	} while (!sctp_port_hashtable && --order > 0);
+
 	if (!sctp_port_hashtable) {
 		pr_err("Failed bind hash alloc\n");
 		status = -ENOMEM;
 		goto err_bhash_alloc;
 	}
+
+	/* Now compute the number of entries that will fit in the
+	 * port hash space we allocated
+	 */
+	num_entries = (1UL << order) * PAGE_SIZE /
+		      sizeof(struct sctp_bind_hashbucket);
+
+	/* And finish by rounding it down to the nearest power of two
+	 * this wastes some memory of course, but its needed because
+	 * the hash function operates based on the assumption that
+	 * that the number of entries is a power of two
+	 */
+	sctp_port_hashsize = rounddown_pow_of_two(num_entries);
+
 	for (i = 0; i < sctp_port_hashsize; i++) {
 		spin_lock_init(&sctp_port_hashtable[i].lock);
 		INIT_HLIST_HEAD(&sctp_port_hashtable[i].chain);
 	}
 
-	pr_info("Hash tables configured (established %d bind %d)\n",
-		sctp_assoc_hashsize, sctp_port_hashsize);
+	pr_info("Hash tables configured (established %d bind %d/%d)\n",
+		sctp_assoc_hashsize, sctp_port_hashsize, num_entries);
 
 	sctp_sysctl_register();
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications
  2016-03-02 20:23 ` [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications Sasha Levin
@ 2016-03-03  9:35   ` Ido Schimmel
  2016-03-03 17:03     ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Ido Schimmel @ 2016-03-03  9:35 UTC (permalink / raw)
  To: Sasha Levin; +Cc: stable, stable-commits, Jiri Pirko, David S. Miller

Hi Sasha,

Wed, Mar 02, 2016 at 10:23:52PM IST, sasha.levin@oracle.com wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>
>This patch has been added to the 4.1 stable tree. If you have any
>objections, please let us know.
>
>===============
>
>[ Upstream commit 4f2c6ae5c64c353fb1b0425e4747e5603feadba1 ]
>
>When switchdev drivers process FDB notifications from the underlying
>device they resolve the netdev to which the entry points to and notify
>the bridge using the switchdev notifier.
>
>However, since the RTNL mutex is not held there is nothing preventing
>the netdev from disappearing in the middle, which will cause
>br_switchdev_event() to dereference a non-existing netdev.
>
>Make switchdev drivers hold the lock at the beginning of the
>notification processing session and release it once it ends, after
>notifying the bridge.
>
>Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
>when RTNL mutex is held.

You removed the fdb_lock bits from the commit below since they aren't
present in kernel 4.1. Can you remove it from the description as well?
Should probably be:

"Also, remove switchdev_mutex, as it's no longer needed when RTNL mutex
is held."

Thanks, Ido.

>
>Fixes: 03bf0c281234 ("switchdev: introduce switchdev notifier")
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>Signed-off-by: David S. Miller <davem@davemloft.net>
>Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
>---
> drivers/net/ethernet/rocker/rocker.c |  2 ++
> net/bridge/br.c                      |  3 +--
> net/switchdev/switchdev.c            | 15 ++++++++-------
> 3 files changed, 11 insertions(+), 9 deletions(-)
>
>diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>index 73b6fc2..4fedf7f 100644
>--- a/drivers/net/ethernet/rocker/rocker.c
>+++ b/drivers/net/ethernet/rocker/rocker.c
>@@ -3384,12 +3384,14 @@ static void rocker_port_fdb_learn_work(struct work_struct *work)
> 	info.addr = lw->addr;
> 	info.vid = lw->vid;
> 
>+	rtnl_lock();
> 	if (learned && removing)
> 		call_netdev_switch_notifiers(NETDEV_SWITCH_FDB_DEL,
> 					     lw->dev, &info.info);
> 	else if (learned && !removing)
> 		call_netdev_switch_notifiers(NETDEV_SWITCH_FDB_ADD,
> 					     lw->dev, &info.info);
>+	rtnl_unlock();
> 
> 	kfree(work);
> }
>diff --git a/net/bridge/br.c b/net/bridge/br.c
>index 02c24cf..c72e01c 100644
>--- a/net/bridge/br.c
>+++ b/net/bridge/br.c
>@@ -121,6 +121,7 @@ static struct notifier_block br_device_notifier = {
> 	.notifier_call = br_device_event
> };
> 
>+/* called with RTNL */
> static int br_netdev_switch_event(struct notifier_block *unused,
> 				  unsigned long event, void *ptr)
> {
>@@ -130,7 +131,6 @@ static int br_netdev_switch_event(struct notifier_block *unused,
> 	struct netdev_switch_notifier_fdb_info *fdb_info;
> 	int err = NOTIFY_DONE;
> 
>-	rtnl_lock();
> 	p = br_port_get_rtnl(dev);
> 	if (!p)
> 		goto out;
>@@ -155,7 +155,6 @@ static int br_netdev_switch_event(struct notifier_block *unused,
> 	}
> 
> out:
>-	rtnl_unlock();
> 	return err;
> }
> 
>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>index 055453d..a8dbe80 100644
>--- a/net/switchdev/switchdev.c
>+++ b/net/switchdev/switchdev.c
>@@ -15,6 +15,7 @@
> #include <linux/mutex.h>
> #include <linux/notifier.h>
> #include <linux/netdevice.h>
>+#include <linux/rtnetlink.h>
> #include <net/ip_fib.h>
> #include <net/switchdev.h>
> 
>@@ -64,7 +65,6 @@ int netdev_switch_port_stp_update(struct net_device *dev, u8 state)
> }
> EXPORT_SYMBOL_GPL(netdev_switch_port_stp_update);
> 
>-static DEFINE_MUTEX(netdev_switch_mutex);
> static RAW_NOTIFIER_HEAD(netdev_switch_notif_chain);
> 
> /**
>@@ -79,9 +79,9 @@ int register_netdev_switch_notifier(struct notifier_block *nb)
> {
> 	int err;
> 
>-	mutex_lock(&netdev_switch_mutex);
>+	rtnl_lock();
> 	err = raw_notifier_chain_register(&netdev_switch_notif_chain, nb);
>-	mutex_unlock(&netdev_switch_mutex);
>+	rtnl_unlock();
> 	return err;
> }
> EXPORT_SYMBOL_GPL(register_netdev_switch_notifier);
>@@ -97,9 +97,9 @@ int unregister_netdev_switch_notifier(struct notifier_block *nb)
> {
> 	int err;
> 
>-	mutex_lock(&netdev_switch_mutex);
>+	rtnl_lock();
> 	err = raw_notifier_chain_unregister(&netdev_switch_notif_chain, nb);
>-	mutex_unlock(&netdev_switch_mutex);
>+	rtnl_unlock();
> 	return err;
> }
> EXPORT_SYMBOL_GPL(unregister_netdev_switch_notifier);
>@@ -113,16 +113,17 @@ EXPORT_SYMBOL_GPL(unregister_netdev_switch_notifier);
>  *	Call all network notifier blocks. This should be called by driver
>  *	when it needs to propagate hardware event.
>  *	Return values are same as for atomic_notifier_call_chain().
>+ *	rtnl_lock must be held.
>  */
> int call_netdev_switch_notifiers(unsigned long val, struct net_device *dev,
> 				 struct netdev_switch_notifier_info *info)
> {
> 	int err;
> 
>+	ASSERT_RTNL();
>+
> 	info->dev = dev;
>-	mutex_lock(&netdev_switch_mutex);
> 	err = raw_notifier_call_chain(&netdev_switch_notif_chain, val, info);
>-	mutex_unlock(&netdev_switch_mutex);
> 	return err;
> }
> EXPORT_SYMBOL_GPL(call_netdev_switch_notifiers);
>-- 
>2.5.0
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications
  2016-03-03  9:35   ` Ido Schimmel
@ 2016-03-03 17:03     ` David Miller
  2016-03-03 17:52       ` Ido Schimmel
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2016-03-03 17:03 UTC (permalink / raw)
  To: idosch; +Cc: sasha.levin, stable, stable-commits, jiri

From: Ido Schimmel <idosch@mellanox.com>
Date: Thu, 3 Mar 2016 11:35:31 +0200

> Hi Sasha,
> 
> Wed, Mar 02, 2016 at 10:23:52PM IST, sasha.levin@oracle.com wrote:
>>From: Ido Schimmel <idosch@mellanox.com>
>>
>>This patch has been added to the 4.1 stable tree. If you have any
>>objections, please let us know.
>>
>>===============
>>
>>[ Upstream commit 4f2c6ae5c64c353fb1b0425e4747e5603feadba1 ]
>>
>>When switchdev drivers process FDB notifications from the underlying
>>device they resolve the netdev to which the entry points to and notify
>>the bridge using the switchdev notifier.
>>
>>However, since the RTNL mutex is not held there is nothing preventing
>>the netdev from disappearing in the middle, which will cause
>>br_switchdev_event() to dereference a non-existing netdev.
>>
>>Make switchdev drivers hold the lock at the beginning of the
>>notification processing session and release it once it ends, after
>>notifying the bridge.
>>
>>Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
>>when RTNL mutex is held.
> 
> You removed the fdb_lock bits from the commit below since they aren't
> present in kernel 4.1. Can you remove it from the description as well?
> Should probably be:
> 
> "Also, remove switchdev_mutex, as it's no longer needed when RTNL mutex
> is held."

No, I did this, and I do not think editing the commit log message
contents are appropriate for stable backports ever!

You can add notes in a "[]" bracketed section, but that's it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications
  2016-03-03 17:03     ` David Miller
@ 2016-03-03 17:52       ` Ido Schimmel
  0 siblings, 0 replies; 37+ messages in thread
From: Ido Schimmel @ 2016-03-03 17:52 UTC (permalink / raw)
  To: David Miller; +Cc: sasha.levin, stable, stable-commits, jiri

Thu, Mar 03, 2016 at 07:03:40PM IST, davem@davemloft.net wrote:
>From: Ido Schimmel <idosch@mellanox.com>
>Date: Thu, 3 Mar 2016 11:35:31 +0200
>
>> Hi Sasha,
>> 
>> Wed, Mar 02, 2016 at 10:23:52PM IST, sasha.levin@oracle.com wrote:
>>>From: Ido Schimmel <idosch@mellanox.com>
>>>
>>>This patch has been added to the 4.1 stable tree. If you have any
>>>objections, please let us know.
>>>
>>>===============
>>>
>>>[ Upstream commit 4f2c6ae5c64c353fb1b0425e4747e5603feadba1 ]
>>>
>>>When switchdev drivers process FDB notifications from the underlying
>>>device they resolve the netdev to which the entry points to and notify
>>>the bridge using the switchdev notifier.
>>>
>>>However, since the RTNL mutex is not held there is nothing preventing
>>>the netdev from disappearing in the middle, which will cause
>>>br_switchdev_event() to dereference a non-existing netdev.
>>>
>>>Make switchdev drivers hold the lock at the beginning of the
>>>notification processing session and release it once it ends, after
>>>notifying the bridge.
>>>
>>>Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
>>>when RTNL mutex is held.
>> 
>> You removed the fdb_lock bits from the commit below since they aren't
>> present in kernel 4.1. Can you remove it from the description as well?
>> Should probably be:
>> 
>> "Also, remove switchdev_mutex, as it's no longer needed when RTNL mutex
>> is held."
>
>No, I did this, and I do not think editing the commit log message
>contents are appropriate for stable backports ever!

OK, I wasn't aware of that. I'm good with the current patch as is in
that case.

Thank you both.

>
>You can add notes in a "[]" bracketed section, but that's it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-03-03 18:08 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-02 20:23 [added to the 4.1 stable tree] af_iucv: Validate socket address length in iucv_sock_bind() Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] net: dp83640: Fix tx timestamp overflow handling Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] tcp: fix NULL deref in tcp_v4_send_ack() Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] af_unix: fix struct pid memory leak Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] pptp: fix illegal memory access caused by multiple bind()s Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] sctp: allow setting SCTP_SACK_IMMEDIATELY by the application Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] tipc: fix connection abort during subscription cancel Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] switchdev: Require RTNL mutex to be held when sending FDB notifications Sasha Levin
2016-03-03  9:35   ` Ido Schimmel
2016-03-03 17:03     ` David Miller
2016-03-03 17:52       ` Ido Schimmel
2016-03-02 20:23 ` [added to the 4.1 stable tree] tcp: beware of alignments in tcp_get_info() Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6/udp: use sticky pktinfo egress ifindex on connect() Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] net/ipv6: add sysctl option accept_ra_min_hop_limit Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: addrconf: Fix recursive spin lock call Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] ipv6: fix a lockdep splat Sasha Levin
2016-03-02 20:23 ` [added to the 4.1 stable tree] unix: correctly track in-flight fds in sending process user_struct Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] net:Add sysctl_max_skb_frags Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] sctp: translate network order to host order when users get a hmacid Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] net: Copy inner L3 and L4 headers as unaligned on GRE TEB Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] bpf: fix branch offset adjustment on backjumps after patching ctx expansion Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] bonding: Fix ARP monitor validation Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] ipv4: fix memory leaks in ip_cmsg_send() callers Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] af_unix: Guard against other == sk in unix_dgram_sendmsg Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] qmi_wwan: add "4G LTE usb-modem U901" Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Count HW buffer overrun only once Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Choose time-stamping shift value according to HW frequency Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] net/mlx4_en: Avoid changing dev->features directly in run-time Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] l2tp: Fix error creating L2TP tunnels Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] pppoe: fix reference counting in PPPoE proxy Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] route: check and remove route cache when we get route Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] rtnl: RTM_GETNETCONF: fix wrong return value Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] unix_diag: fix incorrect sign extension in unix_lookup_by_ino Sasha Levin
2016-03-02 20:24 ` [added to the 4.1 stable tree] sctp: Fix port hash table size computation Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).