From: David Miller <davem@davemloft.net>
To: stable@vger.kernel.org
Subject: [PATCHES] Networking
Date: Fri, 03 Jul 2015 15:31:36 -0700 (PDT) [thread overview]
Message-ID: <20150703.153136.1941583418556562611.davem@davemloft.net> (raw)
[-- Attachment #1: Type: Text/Plain, Size: 113 bytes --]
Please queue up the following networking bug fixes for 3.14, 3.18,
4.0, and 4.1 -stable, respectively.
Thanks!
[-- Attachment #2: net_314.mbox --]
[-- Type: Application/Octet-Stream, Size: 33801 bytes --]
From ce0a14da13ebb05ca80440d618b78a7a9d046abb Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Tue, 9 Jun 2015 10:23:57 -0700
Subject: [PATCH 01/10] bridge: fix multicast router rlist endless loop
[ Upstream commit 1a040eaca1a22f8da8285ceda6b5e4a2cb704867 ]
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_multicast.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 11a2e6c..7bbc8fe 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1086,6 +1086,9 @@ static void br_multicast_add_router(struct net_bridge *br,
struct net_bridge_port *p;
struct hlist_node *slot = NULL;
+ if (!hlist_unhashed(&port->rlist))
+ return;
+
hlist_for_each_entry(p, &br->router_list, rlist) {
if ((unsigned long) port >= (unsigned long) p)
break;
@@ -1113,12 +1116,8 @@ static void br_multicast_mark_router(struct net_bridge *br,
if (port->multicast_router != 1)
return;
- if (!hlist_unhashed(&port->rlist))
- goto timer;
-
br_multicast_add_router(br, port);
-timer:
mod_timer(&port->multicast_router_timer,
now + br->multicast_querier_interval);
}
--
2.1.0
From e24dddfe3b6c2aa6d7f3a17f13e140d7a01c8f35 Mon Sep 17 00:00:00 2001
From: Shaohua Li <shli@fb.com>
Date: Thu, 11 Jun 2015 16:50:48 -0700
Subject: [PATCH 02/10] net: don't wait for order-3 page allocation
[ Upstream commit fb05e7a89f500cfc06ae277bdc911b281928995d ]
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.
This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.
alloc_skb_with_frags is the same.
The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.
V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer
Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/skbuff.c | 4 +++-
net/core/sock.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 69ec61a..8207f8d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -368,9 +368,11 @@ refill:
for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) {
gfp_t gfp = gfp_mask;
- if (order)
+ if (order) {
gfp |= __GFP_COMP | __GFP_NOWARN |
__GFP_NOMEMALLOC;
+ gfp &= ~__GFP_WAIT;
+ }
nc->frag.page = alloc_pages(gfp, order);
if (likely(nc->frag.page))
break;
diff --git a/net/core/sock.c b/net/core/sock.c
index 650dd58..8ebfa52 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1914,8 +1914,10 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
do {
gfp_t gfp = prio;
- if (order)
+ if (order) {
gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
+ gfp &= ~__GFP_WAIT;
+ }
pfrag->page = alloc_pages(gfp, order);
if (likely(pfrag->page)) {
pfrag->offset = 0;
--
2.1.0
From 7a6b6feb69549c6a0e5ae8312de255829c490a4b Mon Sep 17 00:00:00 2001
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Fri, 12 Jun 2015 10:16:41 -0300
Subject: [PATCH 03/10] sctp: fix ASCONF list handling
[ Upstream commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 ]
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/netns/sctp.h | 1 +
include/net/sctp/structs.h | 4 ++++
net/sctp/socket.c | 43 ++++++++++++++++++++++++++++++++-----------
3 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 3573a81..8ba379f 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -31,6 +31,7 @@ struct netns_sctp {
struct list_head addr_waitq;
struct timer_list addr_wq_timer;
struct list_head auto_asconf_splist;
+ /* Lock that protects both addr_waitq and auto_asconf_splist */
spinlock_t addr_wq_lock;
/* Lock that protects the local_addr_list writers */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 0dfcc92..2c2d388 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -219,6 +219,10 @@ struct sctp_sock {
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
struct sk_buff_head pd_lobby;
+
+ /* These must be the last fields, as they will skipped on copies,
+ * like on accept and peeloff operations
+ */
struct list_head auto_asconf_list;
int do_auto_asconf;
};
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 604a6ac..f940fdc 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1532,8 +1532,10 @@ static void sctp_close(struct sock *sk, long timeout)
/* Supposedly, no process has access to the socket, but
* the net layers still may.
+ * Also, sctp_destroy_sock() needs to be called with addr_wq_lock
+ * held and that should be grabbed before socket lock.
*/
- local_bh_disable();
+ spin_lock_bh(&net->sctp.addr_wq_lock);
bh_lock_sock(sk);
/* Hold the sock, since sk_common_release() will put sock_put()
@@ -1543,7 +1545,7 @@ static void sctp_close(struct sock *sk, long timeout)
sk_common_release(sk);
bh_unlock_sock(sk);
- local_bh_enable();
+ spin_unlock_bh(&net->sctp.addr_wq_lock);
sock_put(sk);
@@ -3511,6 +3513,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
if ((val && sp->do_auto_asconf) || (!val && !sp->do_auto_asconf))
return 0;
+ spin_lock_bh(&sock_net(sk)->sctp.addr_wq_lock);
if (val == 0 && sp->do_auto_asconf) {
list_del(&sp->auto_asconf_list);
sp->do_auto_asconf = 0;
@@ -3519,6 +3522,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
&sock_net(sk)->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
}
+ spin_unlock_bh(&sock_net(sk)->sctp.addr_wq_lock);
return 0;
}
@@ -4009,18 +4013,28 @@ static int sctp_init_sock(struct sock *sk)
local_bh_disable();
percpu_counter_inc(&sctp_sockets_allocated);
sock_prot_inuse_add(net, sk->sk_prot, 1);
+
+ /* Nothing can fail after this block, otherwise
+ * sctp_destroy_sock() will be called without addr_wq_lock held
+ */
if (net->sctp.default_auto_asconf) {
+ spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
list_add_tail(&sp->auto_asconf_list,
&net->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
- } else
+ spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
+ } else {
sp->do_auto_asconf = 0;
+ }
+
local_bh_enable();
return 0;
}
-/* Cleanup any SCTP per socket resources. */
+/* Cleanup any SCTP per socket resources. Must be called with
+ * sock_net(sk)->sctp.addr_wq_lock held if sp->do_auto_asconf is true
+ */
static void sctp_destroy_sock(struct sock *sk)
{
struct sctp_sock *sp;
@@ -6973,6 +6987,19 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
newinet->mc_list = NULL;
}
+static inline void sctp_copy_descendant(struct sock *sk_to,
+ const struct sock *sk_from)
+{
+ int ancestor_size = sizeof(struct inet_sock) +
+ sizeof(struct sctp_sock) -
+ offsetof(struct sctp_sock, auto_asconf_list);
+
+ if (sk_from->sk_family == PF_INET6)
+ ancestor_size += sizeof(struct ipv6_pinfo);
+
+ __inet_sk_copy_descendant(sk_to, sk_from, ancestor_size);
+}
+
/* Populate the fields of the newsk from the oldsk and migrate the assoc
* and its messages to the newsk.
*/
@@ -6987,7 +7014,6 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
struct sk_buff *skb, *tmp;
struct sctp_ulpevent *event;
struct sctp_bind_hashbucket *head;
- struct list_head tmplist;
/* Migrate socket buffer sizes and all the socket level options to the
* new socket.
@@ -6995,12 +7021,7 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
newsk->sk_sndbuf = oldsk->sk_sndbuf;
newsk->sk_rcvbuf = oldsk->sk_rcvbuf;
/* Brute force copy old sctp opt. */
- if (oldsp->do_auto_asconf) {
- memcpy(&tmplist, &newsp->auto_asconf_list, sizeof(tmplist));
- inet_sk_copy_descendant(newsk, oldsk);
- memcpy(&newsp->auto_asconf_list, &tmplist, sizeof(tmplist));
- } else
- inet_sk_copy_descendant(newsk, oldsk);
+ sctp_copy_descendant(newsk, oldsk);
/* Restore the ep value that was overwritten with the above structure
* copy.
--
2.1.0
From c01ad8fca91c171e7a877e227272f92ec4a16559 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Mon, 15 Jun 2015 20:28:51 +0300
Subject: [PATCH 04/10] bridge: fix br_stp_set_bridge_priority race conditions
[ Upstream commit 2dab80a8b486f02222a69daca6859519e05781d9 ]
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_ioctl.c | 2 --
net/bridge/br_stp_if.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index a9a4a1b..8d423bc 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -247,9 +247,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- spin_lock_bh(&br->lock);
br_stp_set_bridge_priority(br, args[1]);
- spin_unlock_bh(&br->lock);
return 0;
case BRCTL_SET_PORT_PRIORITY:
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 189ba1e..9a0005a 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -243,12 +243,13 @@ bool br_stp_recalculate_bridge_id(struct net_bridge *br)
return true;
}
-/* called under bridge lock */
+/* Acquires and releases bridge lock */
void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
{
struct net_bridge_port *p;
int wasroot;
+ spin_lock_bh(&br->lock);
wasroot = br_is_root_bridge(br);
list_for_each_entry(p, &br->port_list, list) {
@@ -266,6 +267,7 @@ void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
br_port_state_selection(br);
if (br_is_root_bridge(br) && !wasroot)
br_become_root_bridge(br);
+ spin_unlock_bh(&br->lock);
}
/* called under bridge lock */
--
2.1.0
From 7eaeca482f0b10db9e87dcfe24b23c901c238508 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 16 Jun 2015 07:59:11 -0700
Subject: [PATCH 05/10] packet: read num_members once in packet_rcv_fanout()
[ Upstream commit f98f4514d07871da7a113dd9e3e330743fd70ae4 ]
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 48b1817..f238530 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1345,7 +1345,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
struct packet_fanout *f = pt->af_packet_priv;
- unsigned int num = f->num_members;
+ unsigned int num = ACCESS_ONCE(f->num_members);
struct packet_sock *po;
unsigned int idx;
--
2.1.0
From 8ce7b05834926e77bd319904fc039e8ae05649e7 Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 17 Jun 2015 15:59:34 -0400
Subject: [PATCH 06/10] packet: avoid out of bounds read in round robin fanout
[ Upstream commit 468479e6043c84f5a65299cc07cb08a22a28c2b1 ]
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index f238530..84a60b8 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1264,16 +1264,6 @@ static void packet_sock_destruct(struct sock *sk)
sk_refcnt_debug_dec(sk);
}
-static int fanout_rr_next(struct packet_fanout *f, unsigned int num)
-{
- int x = atomic_read(&f->rr_cur) + 1;
-
- if (x >= num)
- x = 0;
-
- return x;
-}
-
static unsigned int fanout_demux_hash(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
@@ -1285,13 +1275,9 @@ static unsigned int fanout_demux_lb(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
{
- int cur, old;
+ unsigned int val = atomic_inc_return(&f->rr_cur);
- cur = atomic_read(&f->rr_cur);
- while ((old = atomic_cmpxchg(&f->rr_cur, cur,
- fanout_rr_next(f, num))) != cur)
- cur = old;
- return cur;
+ return val % num;
}
static unsigned int fanout_demux_cpu(struct packet_fanout *f,
--
2.1.0
From 60cbc8b9178271d4c9e671bae16003d825d401e8 Mon Sep 17 00:00:00 2001
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 16 Jun 2015 22:56:39 +0300
Subject: [PATCH 07/10] neigh: do not modify unlinked entries
[ Upstream commit 2c51a97f76d20ebf1f50fef908b986cb051fdff9 ]
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/neighbour.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 7d95f69..0f062c6 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -976,6 +976,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
rc = 0;
if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
goto out_unlock_bh;
+ if (neigh->dead)
+ goto out_dead;
if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
@@ -1032,6 +1034,13 @@ out_unlock_bh:
write_unlock(&neigh->lock);
local_bh_enable();
return rc;
+
+out_dead:
+ if (neigh->nud_state & NUD_STALE)
+ goto out_unlock_bh;
+ write_unlock_bh(&neigh->lock);
+ kfree_skb(skb);
+ return 1;
}
EXPORT_SYMBOL(__neigh_event_send);
@@ -1095,6 +1104,8 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
(old & (NUD_NOARP | NUD_PERMANENT)))
goto out;
+ if (neigh->dead)
+ goto out;
if (!(new & NUD_VALID)) {
neigh_del_timer(neigh);
@@ -1244,6 +1255,8 @@ EXPORT_SYMBOL(neigh_update);
*/
void __neigh_set_probe_once(struct neighbour *neigh)
{
+ if (neigh->dead)
+ return;
neigh->updated = jiffies;
if (!(neigh->nud_state & NUD_FAILED))
return;
--
2.1.0
From fd4c16c4bd895d9ac6e0a75ab32ead1e4be6b0cd Mon Sep 17 00:00:00 2001
From: Christoph Paasch <cpaasch@apple.com>
Date: Thu, 18 Jun 2015 09:15:34 -0700
Subject: [PATCH 08/10] tcp: Do not call tcp_fastopen_reset_cipher from
interrupt context
[ Upstream commit dfea2aa654243f70dc53b8648d0bbdeec55a7df1 ]
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.
Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
[ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 36.008250] 00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[ 36.009630] ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[ 36.011076] 0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[ 36.012494] Call Trace:
[ 36.012953] <IRQ> [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
[ 36.014085] [<ffffffff810967d3>] ___might_sleep+0x103/0x170
[ 36.015117] [<ffffffff81096892>] __might_sleep+0x52/0x90
[ 36.016117] [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
[ 36.017266] [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
[ 36.018485] [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
[ 36.019679] [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
[ 36.020884] [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
[ 36.022058] [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
[ 36.023118] [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
[ 36.024185] [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
[ 36.025327] [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
[ 36.026410] [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
[ 36.027556] [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
[ 36.028784] [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
[ 36.029832] [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
[ 36.030936] [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
[ 36.031875] [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
[ 36.032953] [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
[ 36.034065] [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
[ 36.035069] [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
[ 36.035963] [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
[ 36.036950] [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
[ 36.037847] [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
[ 36.038994] [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
[ 36.040033] [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
[ 36.041025] [<ffffffff81611482>] net_rx_action+0x122/0x310
[ 36.042007] [<ffffffff81076743>] __do_softirq+0x103/0x2f0
[ 36.042978] [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30
This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/af_inet.c | 2 ++
net/ipv4/tcp.c | 7 +++++--
net/ipv4/tcp_fastopen.c | 2 --
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 07bd8ed..951fe55 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int backlog)
err = 0;
if (err)
goto out;
+
+ tcp_fastopen_init_key_once(true);
}
err = inet_csk_listen_start(sk, backlog);
if (err)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 29d240b..dc45221 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2684,10 +2684,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_FASTOPEN:
if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
- TCPF_LISTEN)))
+ TCPF_LISTEN))) {
+ tcp_fastopen_init_key_once(true);
+
err = fastopen_init_queue(sk, val);
- else
+ } else {
err = -EINVAL;
+ }
break;
case TCP_TIMESTAMP:
if (!tp->repair)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index f195d93..ee6518d 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -84,8 +84,6 @@ void tcp_fastopen_cookie_gen(__be32 src, __be32 dst,
__be32 path[4] = { src, dst, 0, 0 };
struct tcp_fastopen_context *ctx;
- tcp_fastopen_init_key_once(true);
-
rcu_read_lock();
ctx = rcu_dereference(tcp_fastopen_ctx);
if (ctx) {
--
2.1.0
From 699dad6e73ec6618dfd7379a087ea553bfe631fb Mon Sep 17 00:00:00 2001
From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Thu, 25 Jun 2015 22:21:02 +0530
Subject: [PATCH 09/10] net: phy: fix phy link up when limiting speed via
device tree
[ Upstream commit eb686231fce3770299760f24fdcf5ad041f44153 ]
When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phy_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 25f7419..62c3fb9 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -765,10 +765,11 @@ static int genphy_config_advert(struct phy_device *phydev)
if (phydev->supported & (SUPPORTED_1000baseT_Half |
SUPPORTED_1000baseT_Full)) {
adv |= ethtool_adv_to_mii_ctrl1000_t(advertise);
- if (adv != oldadv)
- changed = 1;
}
+ if (adv != oldadv)
+ changed = 1;
+
err = phy_write(phydev, MII_CTRL1000, adv);
if (err < 0)
return err;
--
2.1.0
From 5580bdbdf203d2feba885ac06193a780a1b7a146 Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Date: Mon, 29 Jun 2015 10:41:03 +0200
Subject: [PATCH 10/10] sctp: Fix race between OOTB responce and route removal
[ Upstream commit 29c4afc4e98f4dc0ea9df22c631841f9c220b944 ]
There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.
To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responce
On x86_64 the crash looks like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffffa05ec9ac>] [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
<IRQ>
[<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
[<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
[<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
[<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
[<ffffffff810e0329>] ? update_process_times+0x59/0x60
[<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
[<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
[<ffffffff8101f599>] ? read_tsc+0x9/0x10
[<ffffffff8116d4b5>] ? put_page+0x55/0x60
[<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
[<ffffffff81462b68>] ? skb_free_head+0x58/0x80
[<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
[<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
[<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
[<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
[<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
[<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
[<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
[<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
[<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
[<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
[<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
[<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
[<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
[<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
[<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
[<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
[<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
[<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
[<ffffffff8147896a>] net_rx_action+0x21a/0x360
[<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
[<ffffffff8107912d>] irq_exit+0xad/0xb0
[<ffffffff8157d158>] do_IRQ+0x58/0xf0
[<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
<EOI>
[<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
[<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
[<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
[<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
[<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
[<ffffffff8156b365>] rest_init+0x85/0x90
[<ffffffff818eb035>] start_kernel+0x48b/0x4ac
[<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
[<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP <ffff880127c037b8>
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/sctp/output.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 740ca5f..e39e6d5 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -599,7 +599,9 @@ out:
return err;
no_route:
kfree_skb(nskb);
- IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
+
+ if (asoc)
+ IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
/* FIXME: Returning the 'err' will effect all the associations
* associated with a socket, although only one of the paths of the
--
2.1.0
[-- Attachment #3: net_318.mbox --]
[-- Type: Application/Octet-Stream, Size: 55616 bytes --]
From b1a1d95420dcd1175c9fafcb9d21f9d86bd585f6 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Tue, 9 Jun 2015 10:23:57 -0700
Subject: [PATCH 01/16] bridge: fix multicast router rlist endless loop
[ Upstream commit 1a040eaca1a22f8da8285ceda6b5e4a2cb704867 ]
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_multicast.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index b0aee78..c08f510 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1166,6 +1166,9 @@ static void br_multicast_add_router(struct net_bridge *br,
struct net_bridge_port *p;
struct hlist_node *slot = NULL;
+ if (!hlist_unhashed(&port->rlist))
+ return;
+
hlist_for_each_entry(p, &br->router_list, rlist) {
if ((unsigned long) port >= (unsigned long) p)
break;
@@ -1193,12 +1196,8 @@ static void br_multicast_mark_router(struct net_bridge *br,
if (port->multicast_router != 1)
return;
- if (!hlist_unhashed(&port->rlist))
- goto timer;
-
br_multicast_add_router(br, port);
-timer:
mod_timer(&port->multicast_router_timer,
now + br->multicast_querier_interval);
}
--
2.1.0
From e9916309e726585b46f40c3f0c14484471d9a2b9 Mon Sep 17 00:00:00 2001
From: Shaohua Li <shli@fb.com>
Date: Thu, 11 Jun 2015 16:50:48 -0700
Subject: [PATCH 02/16] net: don't wait for order-3 page allocation
[ Upstream commit fb05e7a89f500cfc06ae277bdc911b281928995d ]
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.
This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.
alloc_skb_with_frags is the same.
The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.
V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer
Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/skbuff.c | 2 +-
net/core/sock.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 02ebb71..72400a1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4199,7 +4199,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
while (order) {
if (npages >= 1 << order) {
- page = alloc_pages(gfp_mask |
+ page = alloc_pages((gfp_mask & ~__GFP_WAIT) |
__GFP_COMP |
__GFP_NOWARN |
__GFP_NORETRY,
diff --git a/net/core/sock.c b/net/core/sock.c
index 852acbc..1e5130d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1854,7 +1854,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
pfrag->offset = 0;
if (SKB_FRAG_PAGE_ORDER) {
- pfrag->page = alloc_pages(gfp | __GFP_COMP |
+ pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP |
__GFP_NOWARN | __GFP_NORETRY,
SKB_FRAG_PAGE_ORDER);
if (likely(pfrag->page)) {
--
2.1.0
From e20043fa2981663fe62ab011a314405bae1b7b6c Mon Sep 17 00:00:00 2001
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Fri, 12 Jun 2015 10:16:41 -0300
Subject: [PATCH 03/16] sctp: fix ASCONF list handling
[ Upstream commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 ]
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/netns/sctp.h | 1 +
include/net/sctp/structs.h | 4 ++++
net/sctp/socket.c | 43 ++++++++++++++++++++++++++++++++-----------
3 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 3573a81..8ba379f 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -31,6 +31,7 @@ struct netns_sctp {
struct list_head addr_waitq;
struct timer_list addr_wq_timer;
struct list_head auto_asconf_splist;
+ /* Lock that protects both addr_waitq and auto_asconf_splist */
spinlock_t addr_wq_lock;
/* Lock that protects the local_addr_list writers */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 4ff3f67..2ad3a7b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -223,6 +223,10 @@ struct sctp_sock {
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
struct sk_buff_head pd_lobby;
+
+ /* These must be the last fields, as they will skipped on copies,
+ * like on accept and peeloff operations
+ */
struct list_head auto_asconf_list;
int do_auto_asconf;
};
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 634a2ab..99e640c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1533,8 +1533,10 @@ static void sctp_close(struct sock *sk, long timeout)
/* Supposedly, no process has access to the socket, but
* the net layers still may.
+ * Also, sctp_destroy_sock() needs to be called with addr_wq_lock
+ * held and that should be grabbed before socket lock.
*/
- local_bh_disable();
+ spin_lock_bh(&net->sctp.addr_wq_lock);
bh_lock_sock(sk);
/* Hold the sock, since sk_common_release() will put sock_put()
@@ -1544,7 +1546,7 @@ static void sctp_close(struct sock *sk, long timeout)
sk_common_release(sk);
bh_unlock_sock(sk);
- local_bh_enable();
+ spin_unlock_bh(&net->sctp.addr_wq_lock);
sock_put(sk);
@@ -3581,6 +3583,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
if ((val && sp->do_auto_asconf) || (!val && !sp->do_auto_asconf))
return 0;
+ spin_lock_bh(&sock_net(sk)->sctp.addr_wq_lock);
if (val == 0 && sp->do_auto_asconf) {
list_del(&sp->auto_asconf_list);
sp->do_auto_asconf = 0;
@@ -3589,6 +3592,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
&sock_net(sk)->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
}
+ spin_unlock_bh(&sock_net(sk)->sctp.addr_wq_lock);
return 0;
}
@@ -4122,18 +4126,28 @@ static int sctp_init_sock(struct sock *sk)
local_bh_disable();
percpu_counter_inc(&sctp_sockets_allocated);
sock_prot_inuse_add(net, sk->sk_prot, 1);
+
+ /* Nothing can fail after this block, otherwise
+ * sctp_destroy_sock() will be called without addr_wq_lock held
+ */
if (net->sctp.default_auto_asconf) {
+ spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
list_add_tail(&sp->auto_asconf_list,
&net->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
- } else
+ spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
+ } else {
sp->do_auto_asconf = 0;
+ }
+
local_bh_enable();
return 0;
}
-/* Cleanup any SCTP per socket resources. */
+/* Cleanup any SCTP per socket resources. Must be called with
+ * sock_net(sk)->sctp.addr_wq_lock held if sp->do_auto_asconf is true
+ */
static void sctp_destroy_sock(struct sock *sk)
{
struct sctp_sock *sp;
@@ -7201,6 +7215,19 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
newinet->mc_list = NULL;
}
+static inline void sctp_copy_descendant(struct sock *sk_to,
+ const struct sock *sk_from)
+{
+ int ancestor_size = sizeof(struct inet_sock) +
+ sizeof(struct sctp_sock) -
+ offsetof(struct sctp_sock, auto_asconf_list);
+
+ if (sk_from->sk_family == PF_INET6)
+ ancestor_size += sizeof(struct ipv6_pinfo);
+
+ __inet_sk_copy_descendant(sk_to, sk_from, ancestor_size);
+}
+
/* Populate the fields of the newsk from the oldsk and migrate the assoc
* and its messages to the newsk.
*/
@@ -7215,7 +7242,6 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
struct sk_buff *skb, *tmp;
struct sctp_ulpevent *event;
struct sctp_bind_hashbucket *head;
- struct list_head tmplist;
/* Migrate socket buffer sizes and all the socket level options to the
* new socket.
@@ -7223,12 +7249,7 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
newsk->sk_sndbuf = oldsk->sk_sndbuf;
newsk->sk_rcvbuf = oldsk->sk_rcvbuf;
/* Brute force copy old sctp opt. */
- if (oldsp->do_auto_asconf) {
- memcpy(&tmplist, &newsp->auto_asconf_list, sizeof(tmplist));
- inet_sk_copy_descendant(newsk, oldsk);
- memcpy(&newsp->auto_asconf_list, &tmplist, sizeof(tmplist));
- } else
- inet_sk_copy_descendant(newsk, oldsk);
+ sctp_copy_descendant(newsk, oldsk);
/* Restore the ep value that was overwritten with the above structure
* copy.
--
2.1.0
From a1ab008f2a33ea1be282cffcace3636a102107a8 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Mon, 15 Jun 2015 20:28:51 +0300
Subject: [PATCH 04/16] bridge: fix br_stp_set_bridge_priority race conditions
[ Upstream commit 2dab80a8b486f02222a69daca6859519e05781d9 ]
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_ioctl.c | 2 --
net/bridge/br_stp_if.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index a9a4a1b..8d423bc 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -247,9 +247,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- spin_lock_bh(&br->lock);
br_stp_set_bridge_priority(br, args[1]);
- spin_unlock_bh(&br->lock);
return 0;
case BRCTL_SET_PORT_PRIORITY:
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 4114687..7832d07 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -243,12 +243,13 @@ bool br_stp_recalculate_bridge_id(struct net_bridge *br)
return true;
}
-/* called under bridge lock */
+/* Acquires and releases bridge lock */
void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
{
struct net_bridge_port *p;
int wasroot;
+ spin_lock_bh(&br->lock);
wasroot = br_is_root_bridge(br);
list_for_each_entry(p, &br->port_list, list) {
@@ -266,6 +267,7 @@ void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
br_port_state_selection(br);
if (br_is_root_bridge(br) && !wasroot)
br_become_root_bridge(br);
+ spin_unlock_bh(&br->lock);
}
/* called under bridge lock */
--
2.1.0
From 15b8574426df8909a9c45c72ea2f55fa9e3fcca5 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 16 Jun 2015 07:59:11 -0700
Subject: [PATCH 05/16] packet: read num_members once in packet_rcv_fanout()
[ Upstream commit f98f4514d07871da7a113dd9e3e330743fd70ae4 ]
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 07c04a8..d465f9f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1336,7 +1336,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
struct packet_fanout *f = pt->af_packet_priv;
- unsigned int num = f->num_members;
+ unsigned int num = READ_ONCE(f->num_members);
struct packet_sock *po;
unsigned int idx;
--
2.1.0
From 7190bd7f56064234156b1c3268b003de9a299661 Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 17 Jun 2015 15:59:34 -0400
Subject: [PATCH 06/16] packet: avoid out of bounds read in round robin fanout
[ Upstream commit 468479e6043c84f5a65299cc07cb08a22a28c2b1 ]
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d465f9f..5dcfe05 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1255,16 +1255,6 @@ static void packet_sock_destruct(struct sock *sk)
sk_refcnt_debug_dec(sk);
}
-static int fanout_rr_next(struct packet_fanout *f, unsigned int num)
-{
- int x = atomic_read(&f->rr_cur) + 1;
-
- if (x >= num)
- x = 0;
-
- return x;
-}
-
static unsigned int fanout_demux_hash(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
@@ -1276,13 +1266,9 @@ static unsigned int fanout_demux_lb(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
{
- int cur, old;
+ unsigned int val = atomic_inc_return(&f->rr_cur);
- cur = atomic_read(&f->rr_cur);
- while ((old = atomic_cmpxchg(&f->rr_cur, cur,
- fanout_rr_next(f, num))) != cur)
- cur = old;
- return cur;
+ return val % num;
}
static unsigned int fanout_demux_cpu(struct packet_fanout *f,
--
2.1.0
From c8950f973159741a39daaf031c12c812d8144f2d Mon Sep 17 00:00:00 2001
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 16 Jun 2015 22:56:39 +0300
Subject: [PATCH 07/16] neigh: do not modify unlinked entries
[ Upstream commit 2c51a97f76d20ebf1f50fef908b986cb051fdff9 ]
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/neighbour.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ef31fef..2b0d99d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -977,6 +977,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
rc = 0;
if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
goto out_unlock_bh;
+ if (neigh->dead)
+ goto out_dead;
if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
@@ -1033,6 +1035,13 @@ out_unlock_bh:
write_unlock(&neigh->lock);
local_bh_enable();
return rc;
+
+out_dead:
+ if (neigh->nud_state & NUD_STALE)
+ goto out_unlock_bh;
+ write_unlock_bh(&neigh->lock);
+ kfree_skb(skb);
+ return 1;
}
EXPORT_SYMBOL(__neigh_event_send);
@@ -1096,6 +1105,8 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
(old & (NUD_NOARP | NUD_PERMANENT)))
goto out;
+ if (neigh->dead)
+ goto out;
if (!(new & NUD_VALID)) {
neigh_del_timer(neigh);
@@ -1245,6 +1256,8 @@ EXPORT_SYMBOL(neigh_update);
*/
void __neigh_set_probe_once(struct neighbour *neigh)
{
+ if (neigh->dead)
+ return;
neigh->updated = jiffies;
if (!(neigh->nud_state & NUD_FAILED))
return;
--
2.1.0
From 80f32b21d36ec29220643243f24ba3c548a00aa2 Mon Sep 17 00:00:00 2001
From: Christoph Paasch <cpaasch@apple.com>
Date: Thu, 18 Jun 2015 09:15:34 -0700
Subject: [PATCH 08/16] tcp: Do not call tcp_fastopen_reset_cipher from
interrupt context
[ Upstream commit dfea2aa654243f70dc53b8648d0bbdeec55a7df1 ]
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.
Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
[ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 36.008250] 00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[ 36.009630] ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[ 36.011076] 0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[ 36.012494] Call Trace:
[ 36.012953] <IRQ> [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
[ 36.014085] [<ffffffff810967d3>] ___might_sleep+0x103/0x170
[ 36.015117] [<ffffffff81096892>] __might_sleep+0x52/0x90
[ 36.016117] [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
[ 36.017266] [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
[ 36.018485] [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
[ 36.019679] [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
[ 36.020884] [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
[ 36.022058] [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
[ 36.023118] [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
[ 36.024185] [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
[ 36.025327] [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
[ 36.026410] [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
[ 36.027556] [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
[ 36.028784] [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
[ 36.029832] [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
[ 36.030936] [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
[ 36.031875] [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
[ 36.032953] [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
[ 36.034065] [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
[ 36.035069] [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
[ 36.035963] [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
[ 36.036950] [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
[ 36.037847] [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
[ 36.038994] [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
[ 36.040033] [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
[ 36.041025] [<ffffffff81611482>] net_rx_action+0x122/0x310
[ 36.042007] [<ffffffff81076743>] __do_softirq+0x103/0x2f0
[ 36.042978] [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30
This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/af_inet.c | 2 ++
net/ipv4/tcp.c | 7 +++++--
net/ipv4/tcp_fastopen.c | 2 --
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e67da4e..9a17357 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int backlog)
err = 0;
if (err)
goto out;
+
+ tcp_fastopen_init_key_once(true);
}
err = inet_csk_listen_start(sk, backlog);
if (err)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index de61954..32b25cc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2585,10 +2585,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_FASTOPEN:
if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
- TCPF_LISTEN)))
+ TCPF_LISTEN))) {
+ tcp_fastopen_init_key_once(true);
+
err = fastopen_init_queue(sk, val);
- else
+ } else {
err = -EINVAL;
+ }
break;
case TCP_TIMESTAMP:
if (!tp->repair)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index c730772..b01d5bd 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -78,8 +78,6 @@ static bool __tcp_fastopen_cookie_gen(const void *path,
struct tcp_fastopen_context *ctx;
bool ok = false;
- tcp_fastopen_init_key_once(true);
-
rcu_read_lock();
ctx = rcu_dereference(tcp_fastopen_ctx);
if (ctx) {
--
2.1.0
From 071f8a2f6e46582e718e54d31b147eb5e616838b Mon Sep 17 00:00:00 2001
From: Ido Shamay <idos@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:42 +0300
Subject: [PATCH 09/16] net/mlx4_en: Wake TX queues only when there's enough
room
[ Upstream commit 488a9b48e398b157703766e2cd91ea45ac6997c5 ]
Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.
Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 19 +++++++++++--------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 142ddd5..5980d3f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -66,6 +66,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
ring->size = size;
ring->size_mask = size - 1;
ring->stride = stride;
+ ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
tmp = size * sizeof(struct mlx4_en_tx_info);
ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
@@ -223,6 +224,11 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
MLX4_QP_STATE_RST, NULL, 0, 0, &ring->qp);
}
+static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
+{
+ return ring->prod - ring->cons > ring->full_size;
+}
+
static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
struct mlx4_en_tx_ring *ring, int index,
u8 owner)
@@ -465,11 +471,10 @@ static bool mlx4_en_process_tx_cq(struct net_device *dev,
netdev_tx_completed_queue(ring->tx_queue, packets, bytes);
- /*
- * Wakeup Tx queue if this stopped, and at least 1 packet
- * was completed
+ /* Wakeup Tx queue if this stopped, and ring is not full.
*/
- if (netif_tx_queue_stopped(ring->tx_queue) && txbbs_skipped > 0) {
+ if (netif_tx_queue_stopped(ring->tx_queue) &&
+ !mlx4_en_is_tx_ring_full(ring)) {
netif_tx_wake_queue(ring->tx_queue);
ring->wake_queue++;
}
@@ -913,8 +918,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
skb_tx_timestamp(skb);
/* Check available TXBBs And 2K spare for prefetch */
- stop_queue = (int)(ring->prod - ring_cons) >
- ring->size - HEADROOM - MAX_DESC_TXBBS;
+ stop_queue = mlx4_en_is_tx_ring_full(ring);
if (unlikely(stop_queue)) {
netif_tx_stop_queue(ring->tx_queue);
ring->queue_stopped++;
@@ -983,8 +987,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
smp_rmb();
ring_cons = ACCESS_ONCE(ring->cons);
- if (unlikely(((int)(ring->prod - ring_cons)) <=
- ring->size - HEADROOM - MAX_DESC_TXBBS)) {
+ if (unlikely(!mlx4_en_is_tx_ring_full(ring))) {
netif_tx_wake_queue(ring->tx_queue);
ring->wake_queue++;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 692bd4e..4f90806 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -280,6 +280,7 @@ struct mlx4_en_tx_ring {
u32 size; /* number of TXBBs */
u32 size_mask;
u16 stride;
+ u32 full_size;
u16 cqn; /* index of port CQ associated with this ring */
u32 buf_size;
__be32 doorbell_qpn;
--
2.1.0
From 41394531e05a2bbfd7627a24ae8dac0f7dc1fba9 Mon Sep 17 00:00:00 2001
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Fri, 26 Jun 2015 13:48:17 +0300
Subject: [PATCH 10/16] lib/rhashtable: fix race between
rhashtable_lookup_compare and hashtable resize
[ No applicable upstream commit ]
Hash value passed as argument into rhashtable_lookup_compare could be
computed using different hash table than rhashtable_lookup_compare sees.
This patch passes key into rhashtable_lookup_compare() instead of hash and
compures hash value right in place using the same table as for lookup.
Also it adds comment for rhashtable_hashfn and rhashtable_obj_hashfn:
user must prevent concurrent insert/remove otherwise returned hash value
could be invalid.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: e341694e3eb5 ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Link: http://lkml.kernel.org/r/20150514042151.GA5482@gondor.apana.org.au
Cc: Stable <stable@vger.kernel.org> (v3.17 .. v3.19)
Acked-by: Thomas Graf <tgraf@suug.ch>
---
include/linux/rhashtable.h | 2 +-
lib/rhashtable.c | 12 ++++++++----
net/netlink/af_netlink.c | 5 +----
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index fb298e9d..8405674 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -108,7 +108,7 @@ int rhashtable_expand(struct rhashtable *ht, gfp_t flags);
int rhashtable_shrink(struct rhashtable *ht, gfp_t flags);
void *rhashtable_lookup(const struct rhashtable *ht, const void *key);
-void *rhashtable_lookup_compare(const struct rhashtable *ht, u32 hash,
+void *rhashtable_lookup_compare(const struct rhashtable *ht, void *key,
bool (*compare)(void *, void *), void *arg);
void rhashtable_destroy(const struct rhashtable *ht);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 624a0b7..cb22073 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -61,6 +61,8 @@ static u32 __hashfn(const struct rhashtable *ht, const void *key,
* Computes the hash value using the hash function provided in the 'hashfn'
* of struct rhashtable_params. The returned value is guaranteed to be
* smaller than the number of buckets in the hash table.
+ *
+ * The caller must ensure that no concurrent table mutations occur.
*/
u32 rhashtable_hashfn(const struct rhashtable *ht, const void *key, u32 len)
{
@@ -92,6 +94,8 @@ static u32 obj_hashfn(const struct rhashtable *ht, const void *ptr, u32 hsize)
* 'obj_hashfn' depending on whether the hash table is set up to work with
* a fixed length key. The returned value is guaranteed to be smaller than
* the number of buckets in the hash table.
+ *
+ * The caller must ensure that no concurrent table mutations occur.
*/
u32 rhashtable_obj_hashfn(const struct rhashtable *ht, void *ptr)
{
@@ -474,7 +478,7 @@ EXPORT_SYMBOL_GPL(rhashtable_lookup);
/**
* rhashtable_lookup_compare - search hash table with compare function
* @ht: hash table
- * @hash: hash value of desired entry
+ * @key: pointer to key
* @compare: compare function, must return true on match
* @arg: argument passed on to compare function
*
@@ -486,14 +490,14 @@ EXPORT_SYMBOL_GPL(rhashtable_lookup);
*
* Returns the first entry on which the compare function returned true.
*/
-void *rhashtable_lookup_compare(const struct rhashtable *ht, u32 hash,
+void *rhashtable_lookup_compare(const struct rhashtable *ht, void *key,
bool (*compare)(void *, void *), void *arg)
{
const struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
struct rhash_head *he;
+ u32 hash;
- if (unlikely(hash >= tbl->size))
- return NULL;
+ hash = __hashfn(ht, key, ht->p.key_len, tbl->size);
rht_for_each_rcu(he, tbl->buckets[hash], ht) {
if (!compare(rht_obj(ht, he), arg))
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index c0a4187..c82b2e3 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1003,11 +1003,8 @@ static struct sock *__netlink_lookup(struct netlink_table *table, u32 portid,
.net = net,
.portid = portid,
};
- u32 hash;
- hash = rhashtable_hashfn(&table->hash, &portid, sizeof(portid));
-
- return rhashtable_lookup_compare(&table->hash, hash,
+ return rhashtable_lookup_compare(&table->hash, &portid,
&netlink_compare, &arg);
}
--
2.1.0
From 4155ee8c5d9be3e7074d68182ed2546dd8002f18 Mon Sep 17 00:00:00 2001
From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Thu, 25 Jun 2015 22:21:02 +0530
Subject: [PATCH 11/16] net: phy: fix phy link up when limiting speed via
device tree
[ Upstream commit eb686231fce3770299760f24fdcf5ad041f44153 ]
When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phy_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 3fc91e8..70a0d88 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -782,10 +782,11 @@ static int genphy_config_advert(struct phy_device *phydev)
if (phydev->supported & (SUPPORTED_1000baseT_Half |
SUPPORTED_1000baseT_Full)) {
adv |= ethtool_adv_to_mii_ctrl1000_t(advertise);
- if (adv != oldadv)
- changed = 1;
}
+ if (adv != oldadv)
+ changed = 1;
+
err = phy_write(phydev, MII_CTRL1000, adv);
if (err < 0)
return err;
--
2.1.0
From ed0c52f35c16dec4ea7a4c70d744a71d7914b61e Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 26 Jun 2015 07:32:29 +0200
Subject: [PATCH 12/16] bnx2x: fix lockdep splat
[ Upstream commit d53c66a5b80698620f7c9ba2372fff4017e987b8 ]
Michel reported following lockdep splat
[ 44.718117] INFO: trying to register non-static key.
[ 44.723081] the code is fine but needs lockdep annotation.
[ 44.728559] turning off the locking correctness validator.
[ 44.734036] CPU: 8 PID: 5483 Comm: ethtool Not tainted 4.1.0
[ 44.770289] Call Trace:
[ 44.772741] [<ffffffff816eb1cd>] dump_stack+0x4c/0x65
[ 44.777879] [<ffffffff8111d921>] ? console_unlock+0x1f1/0x510
[ 44.783708] [<ffffffff811121f5>] __lock_acquire+0x1d05/0x1f10
[ 44.789538] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[ 44.795276] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 44.801967] [<ffffffff8111390d>] ? trace_hardirqs_on+0xd/0x10
[ 44.807793] [<ffffffff811330fa>] ? hrtimer_try_to_cancel+0x4a/0x250
[ 44.814142] [<ffffffff81112ba6>] lock_acquire+0xb6/0x290
[ 44.819537] [<ffffffff810d6675>] ? flush_work+0x5/0x280
[ 44.824844] [<ffffffff810d66ad>] flush_work+0x3d/0x280
[ 44.830061] [<ffffffff810d6675>] ? flush_work+0x5/0x280
[ 44.835366] [<ffffffff816f3c43>] ? schedule_hrtimeout_range+0x13/0x20
[ 44.841889] [<ffffffff8112ec9b>] ? usleep_range+0x4b/0x50
[ 44.847365] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[ 44.853102] [<ffffffff810d8585>] ? __cancel_work_timer+0x105/0x1c0
[ 44.859359] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 44.866045] [<ffffffff810d851f>] __cancel_work_timer+0x9f/0x1c0
[ 44.872048] [<ffffffffa0010982>] ? bnx2x_func_stop+0x42/0x90 [bnx2x]
[ 44.878481] [<ffffffff810d8670>] cancel_work_sync+0x10/0x20
[ 44.884134] [<ffffffffa00259e5>] bnx2x_chip_cleanup+0x245/0x730 [bnx2x]
[ 44.890829] [<ffffffff8110ce02>] ? up+0x32/0x50
[ 44.895439] [<ffffffff811306b5>] ? del_timer_sync+0x5/0xd0
[ 44.901005] [<ffffffffa005596d>] bnx2x_nic_unload+0x20d/0x8e0 [bnx2x]
[ 44.907527] [<ffffffff811f1aef>] ? might_fault+0x5f/0xb0
[ 44.912921] [<ffffffffa005851c>] bnx2x_reload_if_running+0x2c/0x50 [bnx2x]
[ 44.919879] [<ffffffffa005a3c5>] bnx2x_set_ringparam+0x2b5/0x460 [bnx2x]
[ 44.926664] [<ffffffff815d498b>] dev_ethtool+0x55b/0x1c40
[ 44.932148] [<ffffffff815dfdc7>] ? rtnl_lock+0x17/0x20
[ 44.937364] [<ffffffff815e7f8b>] dev_ioctl+0x17b/0x630
[ 44.942582] [<ffffffff815abf8d>] sock_do_ioctl+0x5d/0x70
[ 44.947972] [<ffffffff815ac013>] sock_ioctl+0x73/0x280
[ 44.953192] [<ffffffff8124c1c8>] do_vfs_ioctl+0x88/0x5b0
[ 44.958587] [<ffffffff8110d0b3>] ? up_read+0x23/0x40
[ 44.963631] [<ffffffff812584cc>] ? __fget_light+0x6c/0xa0
[ 44.969105] [<ffffffff8124c781>] SyS_ioctl+0x91/0xb0
[ 44.974149] [<ffffffff816f4dd7>] system_call_fastpath+0x12/0x6f
As bnx2x_init_ptp() is only called if bp->flags contains PTP_SUPPORTED,
we also need to guard bnx2x_stop_ptp() with same condition, otherwise
ptp_task workqueue is not initialized and kernel barfs on
cancel_work_sync()
Fixes: eeed018cbfa30 ("bnx2x: Add timestamping and PTP hardware clock support")
Reported-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michal Kalderon <Michal.Kalderon@qlogic.com>
Cc: Ariel Elior <Ariel.Elior@qlogic.com>
Cc: Yuval Mintz <Yuval.Mintz@qlogic.com>
Cc: David Decotigny <decot@google.com>
Acked-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 710eb57..1217eaf 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9307,7 +9307,8 @@ unload_error:
* function stop ramrod is sent, since as part of this ramrod FW access
* PTP registers.
*/
- bnx2x_stop_ptp(bp);
+ if (bp->flags & PTP_SUPPORTED)
+ bnx2x_stop_ptp(bp);
/* Disable HW interrupts, NAPI */
bnx2x_netif_stop(bp, 1);
--
2.1.0
From feb1cd51f229894612d78e1a44d1ad6ffb980ab7 Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Date: Mon, 29 Jun 2015 10:41:03 +0200
Subject: [PATCH 13/16] sctp: Fix race between OOTB responce and route removal
[ Upstream commit 29c4afc4e98f4dc0ea9df22c631841f9c220b944 ]
There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.
To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responce
On x86_64 the crash looks like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffffa05ec9ac>] [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
<IRQ>
[<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
[<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
[<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
[<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
[<ffffffff810e0329>] ? update_process_times+0x59/0x60
[<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
[<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
[<ffffffff8101f599>] ? read_tsc+0x9/0x10
[<ffffffff8116d4b5>] ? put_page+0x55/0x60
[<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
[<ffffffff81462b68>] ? skb_free_head+0x58/0x80
[<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
[<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
[<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
[<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
[<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
[<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
[<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
[<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
[<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
[<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
[<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
[<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
[<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
[<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
[<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
[<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
[<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
[<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
[<ffffffff8147896a>] net_rx_action+0x21a/0x360
[<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
[<ffffffff8107912d>] irq_exit+0xad/0xb0
[<ffffffff8157d158>] do_IRQ+0x58/0xf0
[<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
<EOI>
[<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
[<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
[<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
[<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
[<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
[<ffffffff8156b365>] rest_init+0x85/0x90
[<ffffffff818eb035>] start_kernel+0x48b/0x4ac
[<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
[<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP <ffff880127c037b8>
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/sctp/output.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sctp/output.c b/net/sctp/output.c
index fc5e45b..abe7c2d 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -599,7 +599,9 @@ out:
return err;
no_route:
kfree_skb(nskb);
- IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
+
+ if (asoc)
+ IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
/* FIXME: Returning the 'err' will effect all the associations
* associated with a socket, although only one of the paths of the
--
2.1.0
From a1cdc2dc80589fb885fd1419dc0f5d4ecbae2d1a Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:20 +0200
Subject: [PATCH 14/16] net: mvneta: introduce compatible string "marvell,
armada-xp-neta"
[ Upstream commit f522a975a8101895a85354b9c143f41b8248e71a ]
The mvneta driver supports the Ethernet IP found in the Armada 370, XP,
380 and 385 SoCs. Since at least one more hardware feature is available
for the Armada XP SoCs then a way to identify them is needed.
This patch introduces a new compatible string "marvell,armada-xp-neta".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
index 750d577..f5a8ca2 100644
--- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
+++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
@@ -1,7 +1,7 @@
* Marvell Armada 370 / Armada XP Ethernet Controller (NETA)
Required properties:
-- compatible: should be "marvell,armada-370-neta".
+- compatible: "marvell,armada-370-neta" or "marvell,armada-xp-neta".
- reg: address and length of the register set for the device.
- interrupts: interrupt for the device
- phy: See ethernet.txt file in the same directory.
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 67a84cf..e62c951 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3101,6 +3101,7 @@ static int mvneta_remove(struct platform_device *pdev)
static const struct of_device_id mvneta_match[] = {
{ .compatible = "marvell,armada-370-neta" },
+ { .compatible = "marvell,armada-xp-neta" },
{ }
};
MODULE_DEVICE_TABLE(of, mvneta_match);
--
2.1.0
From 34edbd06c7e56f81878bd6d9169ad34b8a8be3f7 Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:21 +0200
Subject: [PATCH 15/16] ARM: mvebu: update Ethernet compatible string for
Armada XP
[ Upstream commit ea3b55fe83b5fcede82d183164b9d6831b26e33b ]
This patch updates the Ethernet DT nodes for Armada XP SoCs with the
compatible string "marvell,armada-xp-neta".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: 77916519cba3 ("arm: mvebu: Armada XP MV78230 has only three Ethernet interfaces")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
arch/arm/boot/dts/armada-370-xp.dtsi | 2 --
arch/arm/boot/dts/armada-370.dtsi | 8 ++++++++
arch/arm/boot/dts/armada-xp-mv78260.dtsi | 2 +-
arch/arm/boot/dts/armada-xp-mv78460.dtsi | 2 +-
arch/arm/boot/dts/armada-xp.dtsi | 10 +++++++++-
5 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index 83286ec..84366cd 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -225,7 +225,6 @@
};
eth0: ethernet@70000 {
- compatible = "marvell,armada-370-neta";
reg = <0x70000 0x4000>;
interrupts = <8>;
clocks = <&gateclk 4>;
@@ -241,7 +240,6 @@
};
eth1: ethernet@74000 {
- compatible = "marvell,armada-370-neta";
reg = <0x74000 0x4000>;
interrupts = <10>;
clocks = <&gateclk 3>;
diff --git a/arch/arm/boot/dts/armada-370.dtsi b/arch/arm/boot/dts/armada-370.dtsi
index 7513410..b6e0268 100644
--- a/arch/arm/boot/dts/armada-370.dtsi
+++ b/arch/arm/boot/dts/armada-370.dtsi
@@ -302,6 +302,14 @@
dmacap,memset;
};
};
+
+ ethernet@70000 {
+ compatible = "marvell,armada-370-neta";
+ };
+
+ ethernet@74000 {
+ compatible = "marvell,armada-370-neta";
+ };
};
};
};
diff --git a/arch/arm/boot/dts/armada-xp-mv78260.dtsi b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
index 480e237..677160e 100644
--- a/arch/arm/boot/dts/armada-xp-mv78260.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
@@ -296,7 +296,7 @@
};
eth3: ethernet@34000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x34000 0x4000>;
interrupts = <14>;
clocks = <&gateclk 1>;
diff --git a/arch/arm/boot/dts/armada-xp-mv78460.dtsi b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
index 2c7b1fe..e143776 100644
--- a/arch/arm/boot/dts/armada-xp-mv78460.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
@@ -334,7 +334,7 @@
};
eth3: ethernet@34000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x34000 0x4000>;
interrupts = <14>;
clocks = <&gateclk 1>;
diff --git a/arch/arm/boot/dts/armada-xp.dtsi b/arch/arm/boot/dts/armada-xp.dtsi
index bff9f6c..66d25b7 100644
--- a/arch/arm/boot/dts/armada-xp.dtsi
+++ b/arch/arm/boot/dts/armada-xp.dtsi
@@ -125,7 +125,7 @@
};
eth2: ethernet@30000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x30000 0x4000>;
interrupts = <12>;
clocks = <&gateclk 2>;
@@ -168,6 +168,14 @@
};
};
+ ethernet@70000 {
+ compatible = "marvell,armada-xp-neta";
+ };
+
+ ethernet@74000 {
+ compatible = "marvell,armada-xp-neta";
+ };
+
xor@f0900 {
compatible = "marvell,orion-xor";
reg = <0xF0900 0x100
--
2.1.0
From 2e77bed3abb42a8db27d1c8d0970c0bc8651b871 Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:22 +0200
Subject: [PATCH 16/16] net: mvneta: disable IP checksum with jumbo frames for
Armada 370
[ Upstream commit b65657fc240ae6c1d2a1e62db9a0e61ac9631d7a ]
The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't
support TCP/IP checksumming with frame sizes larger than 1600 bytes.
This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and
NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set
to a value greater than 1600 bytes.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/marvell/mvneta.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index e62c951..fb34708 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -304,6 +304,7 @@ struct mvneta_port {
unsigned int link;
unsigned int duplex;
unsigned int speed;
+ unsigned int tx_csum_limit;
};
/* The mvneta_tx_desc and mvneta_rx_desc structures describe the
@@ -2441,8 +2442,10 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
dev->mtu = mtu;
- if (!netif_running(dev))
+ if (!netif_running(dev)) {
+ netdev_update_features(dev);
return 0;
+ }
/* The interface is running, so we have to force a
* reallocation of the queues
@@ -2471,9 +2474,26 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
mvneta_start_dev(pp);
mvneta_port_up(pp);
+ netdev_update_features(dev);
+
return 0;
}
+static netdev_features_t mvneta_fix_features(struct net_device *dev,
+ netdev_features_t features)
+{
+ struct mvneta_port *pp = netdev_priv(dev);
+
+ if (pp->tx_csum_limit && dev->mtu > pp->tx_csum_limit) {
+ features &= ~(NETIF_F_IP_CSUM | NETIF_F_TSO);
+ netdev_info(dev,
+ "Disable IP checksum for MTU greater than %dB\n",
+ pp->tx_csum_limit);
+ }
+
+ return features;
+}
+
/* Get mac address */
static void mvneta_get_mac_addr(struct mvneta_port *pp, unsigned char *addr)
{
@@ -2791,6 +2811,7 @@ static const struct net_device_ops mvneta_netdev_ops = {
.ndo_set_rx_mode = mvneta_set_rx_mode,
.ndo_set_mac_address = mvneta_set_mac_addr,
.ndo_change_mtu = mvneta_change_mtu,
+ .ndo_fix_features = mvneta_fix_features,
.ndo_get_stats64 = mvneta_get_stats64,
.ndo_do_ioctl = mvneta_ioctl,
};
@@ -3029,6 +3050,9 @@ static int mvneta_probe(struct platform_device *pdev)
}
}
+ if (of_device_is_compatible(dn, "marvell,armada-370-neta"))
+ pp->tx_csum_limit = 1600;
+
pp->tx_ring_size = MVNETA_MAX_TXD;
pp->rx_ring_size = MVNETA_MAX_RXD;
--
2.1.0
[-- Attachment #4: net_40.mbox --]
[-- Type: Application/Octet-Stream, Size: 64817 bytes --]
From 6641105f8ad113359438b915a76b6c78eddb3d83 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Tue, 9 Jun 2015 10:23:57 -0700
Subject: [PATCH 01/21] bridge: fix multicast router rlist endless loop
[ Upstream commit 1a040eaca1a22f8da8285ceda6b5e4a2cb704867 ]
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_multicast.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index b0aee78..c08f510 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1166,6 +1166,9 @@ static void br_multicast_add_router(struct net_bridge *br,
struct net_bridge_port *p;
struct hlist_node *slot = NULL;
+ if (!hlist_unhashed(&port->rlist))
+ return;
+
hlist_for_each_entry(p, &br->router_list, rlist) {
if ((unsigned long) port >= (unsigned long) p)
break;
@@ -1193,12 +1196,8 @@ static void br_multicast_mark_router(struct net_bridge *br,
if (port->multicast_router != 1)
return;
- if (!hlist_unhashed(&port->rlist))
- goto timer;
-
br_multicast_add_router(br, port);
-timer:
mod_timer(&port->multicast_router_timer,
now + br->multicast_querier_interval);
}
--
2.1.0
From f36f6b7dc0a529ea475321e3008144ac03e5ec3b Mon Sep 17 00:00:00 2001
From: Richard Cochran <richardcochran@gmail.com>
Date: Thu, 11 Jun 2015 14:51:30 +0200
Subject: [PATCH 02/21] net: igb: fix the start time for periodic output
signals
[ Upstream commit 58c98be137830d34b79024cc5dc95ef54fcd7ffe ]
When programming the start of a periodic output, the code wrongly places
the seconds value into the "low" register and the nanoseconds into the
"high" register. Even though this is backwards, it slipped through my
testing, because the re-arming code in the interrupt service routine is
correct, and the signal does appear starting with the second edge.
This patch fixes the issue by programming the registers correctly.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/intel/igb/igb_ptp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index d20fc8e..c365765 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -540,8 +540,8 @@ static int igb_ptp_feature_enable_i210(struct ptp_clock_info *ptp,
igb->perout[i].start.tv_nsec = rq->perout.start.nsec;
igb->perout[i].period.tv_sec = ts.tv_sec;
igb->perout[i].period.tv_nsec = ts.tv_nsec;
- wr32(trgttiml, rq->perout.start.sec);
- wr32(trgttimh, rq->perout.start.nsec);
+ wr32(trgttimh, rq->perout.start.sec);
+ wr32(trgttiml, rq->perout.start.nsec);
tsauxc |= tsauxc_mask;
tsim |= tsim_mask;
} else {
--
2.1.0
From e8b570a1d27393398e19183305cea6d1e300c246 Mon Sep 17 00:00:00 2001
From: Shaohua Li <shli@fb.com>
Date: Thu, 11 Jun 2015 16:50:48 -0700
Subject: [PATCH 03/21] net: don't wait for order-3 page allocation
[ Upstream commit fb05e7a89f500cfc06ae277bdc911b281928995d ]
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.
This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.
alloc_skb_with_frags is the same.
The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.
V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer
Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/skbuff.c | 2 +-
net/core/sock.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e9f9a15..1e3abb8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4443,7 +4443,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
while (order) {
if (npages >= 1 << order) {
- page = alloc_pages(gfp_mask |
+ page = alloc_pages((gfp_mask & ~__GFP_WAIT) |
__GFP_COMP |
__GFP_NOWARN |
__GFP_NORETRY,
diff --git a/net/core/sock.c b/net/core/sock.c
index 71e3e5f..c77d5d2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1895,7 +1895,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
pfrag->offset = 0;
if (SKB_FRAG_PAGE_ORDER) {
- pfrag->page = alloc_pages(gfp | __GFP_COMP |
+ pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP |
__GFP_NOWARN | __GFP_NORETRY,
SKB_FRAG_PAGE_ORDER);
if (likely(pfrag->page)) {
--
2.1.0
From 8b52b28ac808de685397b2891d6cd294a775e5c6 Mon Sep 17 00:00:00 2001
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Fri, 12 Jun 2015 10:16:41 -0300
Subject: [PATCH 04/21] sctp: fix ASCONF list handling
[ Upstream commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 ]
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/netns/sctp.h | 1 +
include/net/sctp/structs.h | 4 ++++
net/sctp/socket.c | 43 ++++++++++++++++++++++++++++++++-----------
3 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 3573a81..8ba379f 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -31,6 +31,7 @@ struct netns_sctp {
struct list_head addr_waitq;
struct timer_list addr_wq_timer;
struct list_head auto_asconf_splist;
+ /* Lock that protects both addr_waitq and auto_asconf_splist */
spinlock_t addr_wq_lock;
/* Lock that protects the local_addr_list writers */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 2bb2fcf..495c87e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -223,6 +223,10 @@ struct sctp_sock {
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
struct sk_buff_head pd_lobby;
+
+ /* These must be the last fields, as they will skipped on copies,
+ * like on accept and peeloff operations
+ */
struct list_head auto_asconf_list;
int do_auto_asconf;
};
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index aafe94b..4e56571 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1533,8 +1533,10 @@ static void sctp_close(struct sock *sk, long timeout)
/* Supposedly, no process has access to the socket, but
* the net layers still may.
+ * Also, sctp_destroy_sock() needs to be called with addr_wq_lock
+ * held and that should be grabbed before socket lock.
*/
- local_bh_disable();
+ spin_lock_bh(&net->sctp.addr_wq_lock);
bh_lock_sock(sk);
/* Hold the sock, since sk_common_release() will put sock_put()
@@ -1544,7 +1546,7 @@ static void sctp_close(struct sock *sk, long timeout)
sk_common_release(sk);
bh_unlock_sock(sk);
- local_bh_enable();
+ spin_unlock_bh(&net->sctp.addr_wq_lock);
sock_put(sk);
@@ -3587,6 +3589,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
if ((val && sp->do_auto_asconf) || (!val && !sp->do_auto_asconf))
return 0;
+ spin_lock_bh(&sock_net(sk)->sctp.addr_wq_lock);
if (val == 0 && sp->do_auto_asconf) {
list_del(&sp->auto_asconf_list);
sp->do_auto_asconf = 0;
@@ -3595,6 +3598,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
&sock_net(sk)->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
}
+ spin_unlock_bh(&sock_net(sk)->sctp.addr_wq_lock);
return 0;
}
@@ -4128,18 +4132,28 @@ static int sctp_init_sock(struct sock *sk)
local_bh_disable();
percpu_counter_inc(&sctp_sockets_allocated);
sock_prot_inuse_add(net, sk->sk_prot, 1);
+
+ /* Nothing can fail after this block, otherwise
+ * sctp_destroy_sock() will be called without addr_wq_lock held
+ */
if (net->sctp.default_auto_asconf) {
+ spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
list_add_tail(&sp->auto_asconf_list,
&net->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
- } else
+ spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
+ } else {
sp->do_auto_asconf = 0;
+ }
+
local_bh_enable();
return 0;
}
-/* Cleanup any SCTP per socket resources. */
+/* Cleanup any SCTP per socket resources. Must be called with
+ * sock_net(sk)->sctp.addr_wq_lock held if sp->do_auto_asconf is true
+ */
static void sctp_destroy_sock(struct sock *sk)
{
struct sctp_sock *sp;
@@ -7202,6 +7216,19 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
newinet->mc_list = NULL;
}
+static inline void sctp_copy_descendant(struct sock *sk_to,
+ const struct sock *sk_from)
+{
+ int ancestor_size = sizeof(struct inet_sock) +
+ sizeof(struct sctp_sock) -
+ offsetof(struct sctp_sock, auto_asconf_list);
+
+ if (sk_from->sk_family == PF_INET6)
+ ancestor_size += sizeof(struct ipv6_pinfo);
+
+ __inet_sk_copy_descendant(sk_to, sk_from, ancestor_size);
+}
+
/* Populate the fields of the newsk from the oldsk and migrate the assoc
* and its messages to the newsk.
*/
@@ -7216,7 +7243,6 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
struct sk_buff *skb, *tmp;
struct sctp_ulpevent *event;
struct sctp_bind_hashbucket *head;
- struct list_head tmplist;
/* Migrate socket buffer sizes and all the socket level options to the
* new socket.
@@ -7224,12 +7250,7 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
newsk->sk_sndbuf = oldsk->sk_sndbuf;
newsk->sk_rcvbuf = oldsk->sk_rcvbuf;
/* Brute force copy old sctp opt. */
- if (oldsp->do_auto_asconf) {
- memcpy(&tmplist, &newsp->auto_asconf_list, sizeof(tmplist));
- inet_sk_copy_descendant(newsk, oldsk);
- memcpy(&newsp->auto_asconf_list, &tmplist, sizeof(tmplist));
- } else
- inet_sk_copy_descendant(newsk, oldsk);
+ sctp_copy_descendant(newsk, oldsk);
/* Restore the ep value that was overwritten with the above structure
* copy.
--
2.1.0
From 591ded7a930c5d14cc9f4fbb94e5e94c7952792f Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Mon, 15 Jun 2015 20:28:51 +0300
Subject: [PATCH 05/21] bridge: fix br_stp_set_bridge_priority race conditions
[ Upstream commit 2dab80a8b486f02222a69daca6859519e05781d9 ]
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_ioctl.c | 2 --
net/bridge/br_stp_if.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index a9a4a1b..8d423bc 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -247,9 +247,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- spin_lock_bh(&br->lock);
br_stp_set_bridge_priority(br, args[1]);
- spin_unlock_bh(&br->lock);
return 0;
case BRCTL_SET_PORT_PRIORITY:
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 4114687..7832d07 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -243,12 +243,13 @@ bool br_stp_recalculate_bridge_id(struct net_bridge *br)
return true;
}
-/* called under bridge lock */
+/* Acquires and releases bridge lock */
void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
{
struct net_bridge_port *p;
int wasroot;
+ spin_lock_bh(&br->lock);
wasroot = br_is_root_bridge(br);
list_for_each_entry(p, &br->port_list, list) {
@@ -266,6 +267,7 @@ void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
br_port_state_selection(br);
if (br_is_root_bridge(br) && !wasroot)
br_become_root_bridge(br);
+ spin_unlock_bh(&br->lock);
}
/* called under bridge lock */
--
2.1.0
From 51f2798e99939352aafa83b1e81835872fe18cc4 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 16 Jun 2015 07:59:11 -0700
Subject: [PATCH 06/21] packet: read num_members once in packet_rcv_fanout()
[ Upstream commit f98f4514d07871da7a113dd9e3e330743fd70ae4 ]
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index f8db706..b4dfdff 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1347,7 +1347,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
struct packet_fanout *f = pt->af_packet_priv;
- unsigned int num = f->num_members;
+ unsigned int num = READ_ONCE(f->num_members);
struct packet_sock *po;
unsigned int idx;
--
2.1.0
From dc65c0b38936edfe476131df39d2a3ca12246007 Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 17 Jun 2015 15:59:34 -0400
Subject: [PATCH 07/21] packet: avoid out of bounds read in round robin fanout
[ Upstream commit 468479e6043c84f5a65299cc07cb08a22a28c2b1 ]
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b4dfdff..bfe5c69 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1266,16 +1266,6 @@ static void packet_sock_destruct(struct sock *sk)
sk_refcnt_debug_dec(sk);
}
-static int fanout_rr_next(struct packet_fanout *f, unsigned int num)
-{
- int x = atomic_read(&f->rr_cur) + 1;
-
- if (x >= num)
- x = 0;
-
- return x;
-}
-
static unsigned int fanout_demux_hash(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
@@ -1287,13 +1277,9 @@ static unsigned int fanout_demux_lb(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
{
- int cur, old;
+ unsigned int val = atomic_inc_return(&f->rr_cur);
- cur = atomic_read(&f->rr_cur);
- while ((old = atomic_cmpxchg(&f->rr_cur, cur,
- fanout_rr_next(f, num))) != cur)
- cur = old;
- return cur;
+ return val % num;
}
static unsigned int fanout_demux_cpu(struct packet_fanout *f,
--
2.1.0
From 3e32111e0f4825d79313eaa1fdc22adb3441a520 Mon Sep 17 00:00:00 2001
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 16 Jun 2015 22:56:39 +0300
Subject: [PATCH 08/21] neigh: do not modify unlinked entries
[ Upstream commit 2c51a97f76d20ebf1f50fef908b986cb051fdff9 ]
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/neighbour.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 70fe9e1..d0e5d66 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -971,6 +971,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
rc = 0;
if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
goto out_unlock_bh;
+ if (neigh->dead)
+ goto out_dead;
if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
@@ -1027,6 +1029,13 @@ out_unlock_bh:
write_unlock(&neigh->lock);
local_bh_enable();
return rc;
+
+out_dead:
+ if (neigh->nud_state & NUD_STALE)
+ goto out_unlock_bh;
+ write_unlock_bh(&neigh->lock);
+ kfree_skb(skb);
+ return 1;
}
EXPORT_SYMBOL(__neigh_event_send);
@@ -1090,6 +1099,8 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
(old & (NUD_NOARP | NUD_PERMANENT)))
goto out;
+ if (neigh->dead)
+ goto out;
if (!(new & NUD_VALID)) {
neigh_del_timer(neigh);
@@ -1239,6 +1250,8 @@ EXPORT_SYMBOL(neigh_update);
*/
void __neigh_set_probe_once(struct neighbour *neigh)
{
+ if (neigh->dead)
+ return;
neigh->updated = jiffies;
if (!(neigh->nud_state & NUD_FAILED))
return;
--
2.1.0
From 4eeffe9068f66db69429a808e6ddf27b1a8f8d91 Mon Sep 17 00:00:00 2001
From: Christoph Paasch <cpaasch@apple.com>
Date: Thu, 18 Jun 2015 09:15:34 -0700
Subject: [PATCH 09/21] tcp: Do not call tcp_fastopen_reset_cipher from
interrupt context
[ Upstream commit dfea2aa654243f70dc53b8648d0bbdeec55a7df1 ]
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.
Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
[ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 36.008250] 00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[ 36.009630] ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[ 36.011076] 0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[ 36.012494] Call Trace:
[ 36.012953] <IRQ> [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
[ 36.014085] [<ffffffff810967d3>] ___might_sleep+0x103/0x170
[ 36.015117] [<ffffffff81096892>] __might_sleep+0x52/0x90
[ 36.016117] [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
[ 36.017266] [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
[ 36.018485] [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
[ 36.019679] [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
[ 36.020884] [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
[ 36.022058] [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
[ 36.023118] [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
[ 36.024185] [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
[ 36.025327] [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
[ 36.026410] [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
[ 36.027556] [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
[ 36.028784] [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
[ 36.029832] [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
[ 36.030936] [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
[ 36.031875] [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
[ 36.032953] [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
[ 36.034065] [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
[ 36.035069] [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
[ 36.035963] [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
[ 36.036950] [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
[ 36.037847] [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
[ 36.038994] [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
[ 36.040033] [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
[ 36.041025] [<ffffffff81611482>] net_rx_action+0x122/0x310
[ 36.042007] [<ffffffff81076743>] __do_softirq+0x103/0x2f0
[ 36.042978] [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30
This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/af_inet.c | 2 ++
net/ipv4/tcp.c | 7 +++++--
net/ipv4/tcp_fastopen.c | 2 --
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d2e49ba..61edc49 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int backlog)
err = 0;
if (err)
goto out;
+
+ tcp_fastopen_init_key_once(true);
}
err = inet_csk_listen_start(sk, backlog);
if (err)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 995a225..d03a344 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2541,10 +2541,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_FASTOPEN:
if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
- TCPF_LISTEN)))
+ TCPF_LISTEN))) {
+ tcp_fastopen_init_key_once(true);
+
err = fastopen_init_queue(sk, val);
- else
+ } else {
err = -EINVAL;
+ }
break;
case TCP_TIMESTAMP:
if (!tp->repair)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index ea82fd4..9c37181 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -78,8 +78,6 @@ static bool __tcp_fastopen_cookie_gen(const void *path,
struct tcp_fastopen_context *ctx;
bool ok = false;
- tcp_fastopen_init_key_once(true);
-
rcu_read_lock();
ctx = rcu_dereference(tcp_fastopen_ctx);
if (ctx) {
--
2.1.0
From 602553606746ad766f0d91a32890ee0b47eddefa Mon Sep 17 00:00:00 2001
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 23 Jun 2015 08:34:39 +0300
Subject: [PATCH 10/21] ip: report the original address of ICMP messages
[ Upstream commit 34b99df4e6256ddafb663c6de0711dceceddfe0e ]
ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.
Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/ip_sockglue.c | 11 ++++++++++-
net/ipv6/datagram.c | 12 +++++++++++-
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 5cd9927..d9e8ff3 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -432,6 +432,15 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
kfree_skb(skb);
}
+/* For some errors we have valid addr_offset even with zero payload and
+ * zero port. Also, addr_offset should be supported if port is set.
+ */
+static inline bool ipv4_datagram_support_addr(struct sock_exterr_skb *serr)
+{
+ return serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
+ serr->ee.ee_origin == SO_EE_ORIGIN_LOCAL || serr->port;
+}
+
/* IPv4 supports cmsg on all imcp errors and some timestamps
*
* Timestamp code paths do not initialize the fields expected by cmsg:
@@ -498,7 +507,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
serr = SKB_EXT_ERR(skb);
- if (sin && serr->port) {
+ if (sin && ipv4_datagram_support_addr(serr)) {
sin->sin_family = AF_INET;
sin->sin_addr.s_addr = *(__be32 *)(skb_network_header(skb) +
serr->addr_offset);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index ace8dac..d174b91 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -325,6 +325,16 @@ void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu)
kfree_skb(skb);
}
+/* For some errors we have valid addr_offset even with zero payload and
+ * zero port. Also, addr_offset should be supported if port is set.
+ */
+static inline bool ipv6_datagram_support_addr(struct sock_exterr_skb *serr)
+{
+ return serr->ee.ee_origin == SO_EE_ORIGIN_ICMP6 ||
+ serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
+ serr->ee.ee_origin == SO_EE_ORIGIN_LOCAL || serr->port;
+}
+
/* IPv6 supports cmsg on all origins aside from SO_EE_ORIGIN_LOCAL.
*
* At one point, excluding local errors was a quick test to identify icmp/icmp6
@@ -389,7 +399,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
serr = SKB_EXT_ERR(skb);
- if (sin && serr->port) {
+ if (sin && ipv6_datagram_support_addr(serr)) {
const unsigned char *nh = skb_network_header(skb);
sin->sin6_family = AF_INET6;
sin->sin6_flowinfo = 0;
--
2.1.0
From 4b0fd50391fd0ad685791ac85c3e6356425cac8c Mon Sep 17 00:00:00 2001
From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:41 +0300
Subject: [PATCH 11/21] net/mlx4_en: Release TX QP when destroying TX ring
[ Upstream commit 0eb08514fdbdcd16fd6870680cd638f203662e9d ]
TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.
Fixes: ddae0349fdb7 ('net/mlx4: Change QP allocation scheme')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 4 ----
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 1 +
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 -
3 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 2f1324b..f30c322 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1971,10 +1971,6 @@ void mlx4_en_free_resources(struct mlx4_en_priv *priv)
mlx4_en_destroy_cq(priv, &priv->rx_cq[i]);
}
- if (priv->base_tx_qpn) {
- mlx4_qp_release_range(priv->mdev->dev, priv->base_tx_qpn, priv->tx_ring_num);
- priv->base_tx_qpn = 0;
- }
}
int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 8c234ec..998615b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -180,6 +180,7 @@ void mlx4_en_destroy_tx_ring(struct mlx4_en_priv *priv,
mlx4_bf_free(mdev->dev, &ring->bf);
mlx4_qp_remove(mdev->dev, &ring->qp);
mlx4_qp_free(mdev->dev, &ring->qp);
+ mlx4_qp_release_range(priv->mdev->dev, ring->qpn, 1);
mlx4_en_unmap_buffer(&ring->wqres.buf);
mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
kfree(ring->bounce_buf);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 8687c8d..c3e4dfd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -601,7 +601,6 @@ struct mlx4_en_priv {
int vids[128];
bool wol;
struct device *ddev;
- int base_tx_qpn;
struct hlist_head mac_hash[MLX4_EN_MAC_HASH_SIZE];
struct hwtstamp_config hwtstamp_config;
--
2.1.0
From 781559a23a57776f0ac4c21fe3fee0670285861f Mon Sep 17 00:00:00 2001
From: Ido Shamay <idos@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:42 +0300
Subject: [PATCH 12/21] net/mlx4_en: Wake TX queues only when there's enough
room
[ Upstream commit 488a9b48e398b157703766e2cd91ea45ac6997c5 ]
Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.
Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 19 +++++++++++--------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 998615b..35dd887 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -66,6 +66,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
ring->size = size;
ring->size_mask = size - 1;
ring->stride = stride;
+ ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
tmp = size * sizeof(struct mlx4_en_tx_info);
ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
@@ -232,6 +233,11 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
MLX4_QP_STATE_RST, NULL, 0, 0, &ring->qp);
}
+static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
+{
+ return ring->prod - ring->cons > ring->full_size;
+}
+
static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
struct mlx4_en_tx_ring *ring, int index,
u8 owner)
@@ -474,11 +480,10 @@ static bool mlx4_en_process_tx_cq(struct net_device *dev,
netdev_tx_completed_queue(ring->tx_queue, packets, bytes);
- /*
- * Wakeup Tx queue if this stopped, and at least 1 packet
- * was completed
+ /* Wakeup Tx queue if this stopped, and ring is not full.
*/
- if (netif_tx_queue_stopped(ring->tx_queue) && txbbs_skipped > 0) {
+ if (netif_tx_queue_stopped(ring->tx_queue) &&
+ !mlx4_en_is_tx_ring_full(ring)) {
netif_tx_wake_queue(ring->tx_queue);
ring->wake_queue++;
}
@@ -922,8 +927,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
skb_tx_timestamp(skb);
/* Check available TXBBs And 2K spare for prefetch */
- stop_queue = (int)(ring->prod - ring_cons) >
- ring->size - HEADROOM - MAX_DESC_TXBBS;
+ stop_queue = mlx4_en_is_tx_ring_full(ring);
if (unlikely(stop_queue)) {
netif_tx_stop_queue(ring->tx_queue);
ring->queue_stopped++;
@@ -992,8 +996,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
smp_rmb();
ring_cons = ACCESS_ONCE(ring->cons);
- if (unlikely(((int)(ring->prod - ring_cons)) <=
- ring->size - HEADROOM - MAX_DESC_TXBBS)) {
+ if (unlikely(!mlx4_en_is_tx_ring_full(ring))) {
netif_tx_wake_queue(ring->tx_queue);
ring->wake_queue++;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index c3e4dfd..0bf0fdd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -280,6 +280,7 @@ struct mlx4_en_tx_ring {
u32 size; /* number of TXBBs */
u32 size_mask;
u16 stride;
+ u32 full_size;
u16 cqn; /* index of port CQ associated with this ring */
u32 buf_size;
__be32 doorbell_qpn;
--
2.1.0
From b2d1d0cb847f05e62bfe27024a5c881b89090b69 Mon Sep 17 00:00:00 2001
From: Ido Shamay <idos@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:43 +0300
Subject: [PATCH 13/21] net/mlx4_en: Fix wrong csum complete report when rxvlan
offload is disabled
[ Upstream commit 79a258526ce1051cb9684018c25a89d51ac21be8 ]
The check_csum() function relied on hwtstamp_rx_filter to know if rxvlan
offload is disabled. This is wrong since rxvlan offload can be switched
on/off regardless of hwtstamp_rx_filter.
Also moved check_csum to query CQE information to identify VLAN packets
and removed the check of IP packets, since it has been validated before.
Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 05ec5e1..3478c87 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -723,7 +723,7 @@ static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
}
#endif
static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
- int hwtstamp_rx_filter)
+ netdev_features_t dev_features)
{
__wsum hw_checksum = 0;
@@ -731,14 +731,8 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
- if (((struct ethhdr *)va)->h_proto == htons(ETH_P_8021Q) &&
- hwtstamp_rx_filter != HWTSTAMP_FILTER_NONE) {
- /* next protocol non IPv4 or IPv6 */
- if (((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
- != htons(ETH_P_IP) &&
- ((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
- != htons(ETH_P_IPV6))
- return -1;
+ if (cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK) &&
+ !(dev_features & NETIF_F_HW_VLAN_CTAG_RX)) {
hw_checksum = get_fixed_vlan_csum(hw_checksum, hdr);
hdr += sizeof(struct vlan_hdr);
}
@@ -901,7 +895,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if (ip_summed == CHECKSUM_COMPLETE) {
void *va = skb_frag_address(skb_shinfo(gro_skb)->frags);
- if (check_csum(cqe, gro_skb, va, ring->hwtstamp_rx_filter)) {
+ if (check_csum(cqe, gro_skb, va,
+ dev->features)) {
ip_summed = CHECKSUM_NONE;
ring->csum_none++;
ring->csum_complete--;
@@ -956,7 +951,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
}
if (ip_summed == CHECKSUM_COMPLETE) {
- if (check_csum(cqe, skb, skb->data, ring->hwtstamp_rx_filter)) {
+ if (check_csum(cqe, skb, skb->data, dev->features)) {
ip_summed = CHECKSUM_NONE;
ring->csum_complete--;
ring->csum_none++;
--
2.1.0
From 763078fa436c9f6a43801499a72d8b5233b52919 Mon Sep 17 00:00:00 2001
From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:44 +0300
Subject: [PATCH 14/21] mlx4: Disable HA for SRIOV PF RoCE devices
[ Upstream commit 7254acffeeec3c0a75b9c5364c29a6eb00014930 ]
When in HA mode, the driver exposes an IB (RoCE) device instance with only
one port. Under SRIOV, the existing implementation doesn't go well with
the PF RoCE driver's role of Special QPs Para-Virtualization, etc.
As such, disable HA for the mlx4 PF RoCE device in SRIOV mode.
Fixes: a57500903093 ('IB/mlx4: Add port aggregation support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/intf.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 6fce587..0d80aed 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -93,8 +93,14 @@ int mlx4_register_interface(struct mlx4_interface *intf)
mutex_lock(&intf_mutex);
list_add_tail(&intf->list, &intf_list);
- list_for_each_entry(priv, &dev_list, dev_list)
+ list_for_each_entry(priv, &dev_list, dev_list) {
+ if (mlx4_is_mfunc(&priv->dev) && (intf->flags & MLX4_INTFF_BONDING)) {
+ mlx4_dbg(&priv->dev,
+ "SRIOV, disabling HA mode for intf proto %d\n", intf->protocol);
+ intf->flags &= ~MLX4_INTFF_BONDING;
+ }
mlx4_add_device(intf, priv);
+ }
mutex_unlock(&intf_mutex);
--
2.1.0
From 82a914a21da6e910784eddae6f26a9b0f5ae9490 Mon Sep 17 00:00:00 2001
From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Thu, 25 Jun 2015 22:21:02 +0530
Subject: [PATCH 15/21] net: phy: fix phy link up when limiting speed via
device tree
[ Upstream commit eb686231fce3770299760f24fdcf5ad041f44153 ]
When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phy_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index bdfe51f..d551df6 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -796,10 +796,11 @@ static int genphy_config_advert(struct phy_device *phydev)
if (phydev->supported & (SUPPORTED_1000baseT_Half |
SUPPORTED_1000baseT_Full)) {
adv |= ethtool_adv_to_mii_ctrl1000_t(advertise);
- if (adv != oldadv)
- changed = 1;
}
+ if (adv != oldadv)
+ changed = 1;
+
err = phy_write(phydev, MII_CTRL1000, adv);
if (err < 0)
return err;
--
2.1.0
From 931f226c83c6d836f4e98ae7b1c5209bea32c1f1 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 26 Jun 2015 07:32:29 +0200
Subject: [PATCH 16/21] bnx2x: fix lockdep splat
[ Upstream commit d53c66a5b80698620f7c9ba2372fff4017e987b8 ]
Michel reported following lockdep splat
[ 44.718117] INFO: trying to register non-static key.
[ 44.723081] the code is fine but needs lockdep annotation.
[ 44.728559] turning off the locking correctness validator.
[ 44.734036] CPU: 8 PID: 5483 Comm: ethtool Not tainted 4.1.0
[ 44.770289] Call Trace:
[ 44.772741] [<ffffffff816eb1cd>] dump_stack+0x4c/0x65
[ 44.777879] [<ffffffff8111d921>] ? console_unlock+0x1f1/0x510
[ 44.783708] [<ffffffff811121f5>] __lock_acquire+0x1d05/0x1f10
[ 44.789538] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[ 44.795276] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 44.801967] [<ffffffff8111390d>] ? trace_hardirqs_on+0xd/0x10
[ 44.807793] [<ffffffff811330fa>] ? hrtimer_try_to_cancel+0x4a/0x250
[ 44.814142] [<ffffffff81112ba6>] lock_acquire+0xb6/0x290
[ 44.819537] [<ffffffff810d6675>] ? flush_work+0x5/0x280
[ 44.824844] [<ffffffff810d66ad>] flush_work+0x3d/0x280
[ 44.830061] [<ffffffff810d6675>] ? flush_work+0x5/0x280
[ 44.835366] [<ffffffff816f3c43>] ? schedule_hrtimeout_range+0x13/0x20
[ 44.841889] [<ffffffff8112ec9b>] ? usleep_range+0x4b/0x50
[ 44.847365] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[ 44.853102] [<ffffffff810d8585>] ? __cancel_work_timer+0x105/0x1c0
[ 44.859359] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 44.866045] [<ffffffff810d851f>] __cancel_work_timer+0x9f/0x1c0
[ 44.872048] [<ffffffffa0010982>] ? bnx2x_func_stop+0x42/0x90 [bnx2x]
[ 44.878481] [<ffffffff810d8670>] cancel_work_sync+0x10/0x20
[ 44.884134] [<ffffffffa00259e5>] bnx2x_chip_cleanup+0x245/0x730 [bnx2x]
[ 44.890829] [<ffffffff8110ce02>] ? up+0x32/0x50
[ 44.895439] [<ffffffff811306b5>] ? del_timer_sync+0x5/0xd0
[ 44.901005] [<ffffffffa005596d>] bnx2x_nic_unload+0x20d/0x8e0 [bnx2x]
[ 44.907527] [<ffffffff811f1aef>] ? might_fault+0x5f/0xb0
[ 44.912921] [<ffffffffa005851c>] bnx2x_reload_if_running+0x2c/0x50 [bnx2x]
[ 44.919879] [<ffffffffa005a3c5>] bnx2x_set_ringparam+0x2b5/0x460 [bnx2x]
[ 44.926664] [<ffffffff815d498b>] dev_ethtool+0x55b/0x1c40
[ 44.932148] [<ffffffff815dfdc7>] ? rtnl_lock+0x17/0x20
[ 44.937364] [<ffffffff815e7f8b>] dev_ioctl+0x17b/0x630
[ 44.942582] [<ffffffff815abf8d>] sock_do_ioctl+0x5d/0x70
[ 44.947972] [<ffffffff815ac013>] sock_ioctl+0x73/0x280
[ 44.953192] [<ffffffff8124c1c8>] do_vfs_ioctl+0x88/0x5b0
[ 44.958587] [<ffffffff8110d0b3>] ? up_read+0x23/0x40
[ 44.963631] [<ffffffff812584cc>] ? __fget_light+0x6c/0xa0
[ 44.969105] [<ffffffff8124c781>] SyS_ioctl+0x91/0xb0
[ 44.974149] [<ffffffff816f4dd7>] system_call_fastpath+0x12/0x6f
As bnx2x_init_ptp() is only called if bp->flags contains PTP_SUPPORTED,
we also need to guard bnx2x_stop_ptp() with same condition, otherwise
ptp_task workqueue is not initialized and kernel barfs on
cancel_work_sync()
Fixes: eeed018cbfa30 ("bnx2x: Add timestamping and PTP hardware clock support")
Reported-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michal Kalderon <Michal.Kalderon@qlogic.com>
Cc: Ariel Elior <Ariel.Elior@qlogic.com>
Cc: Yuval Mintz <Yuval.Mintz@qlogic.com>
Cc: David Decotigny <decot@google.com>
Acked-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 1ec635f..196474f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9323,7 +9323,8 @@ unload_error:
* function stop ramrod is sent, since as part of this ramrod FW access
* PTP registers.
*/
- bnx2x_stop_ptp(bp);
+ if (bp->flags & PTP_SUPPORTED)
+ bnx2x_stop_ptp(bp);
/* Disable HW interrupts, NAPI */
bnx2x_netif_stop(bp, 1);
--
2.1.0
From 6767cb2803624e27e65a40a6cd996736faca845f Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Date: Mon, 29 Jun 2015 10:41:03 +0200
Subject: [PATCH 17/21] sctp: Fix race between OOTB responce and route removal
[ Upstream commit 29c4afc4e98f4dc0ea9df22c631841f9c220b944 ]
There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.
To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responce
On x86_64 the crash looks like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffffa05ec9ac>] [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
<IRQ>
[<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
[<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
[<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
[<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
[<ffffffff810e0329>] ? update_process_times+0x59/0x60
[<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
[<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
[<ffffffff8101f599>] ? read_tsc+0x9/0x10
[<ffffffff8116d4b5>] ? put_page+0x55/0x60
[<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
[<ffffffff81462b68>] ? skb_free_head+0x58/0x80
[<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
[<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
[<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
[<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
[<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
[<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
[<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
[<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
[<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
[<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
[<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
[<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
[<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
[<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
[<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
[<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
[<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
[<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
[<ffffffff8147896a>] net_rx_action+0x21a/0x360
[<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
[<ffffffff8107912d>] irq_exit+0xad/0xb0
[<ffffffff8157d158>] do_IRQ+0x58/0xf0
[<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
<EOI>
[<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
[<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
[<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
[<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
[<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
[<ffffffff8156b365>] rest_init+0x85/0x90
[<ffffffff818eb035>] start_kernel+0x48b/0x4ac
[<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
[<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP <ffff880127c037b8>
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/sctp/output.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sctp/output.c b/net/sctp/output.c
index fc5e45b..abe7c2d 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -599,7 +599,9 @@ out:
return err;
no_route:
kfree_skb(nskb);
- IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
+
+ if (asoc)
+ IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
/* FIXME: Returning the 'err' will effect all the associations
* associated with a socket, although only one of the paths of the
--
2.1.0
From 42ac34e940440a884922d731c82035a8d1c6397f Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky@amd.com>
Date: Mon, 29 Jun 2015 11:22:12 -0500
Subject: [PATCH 18/21] amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer
allocation
[ Upstream commit 472cfe7127760d68b819cf35a26e5a1b44b30f4e ]
When allocating Rx related buffers, alloc_pages is called using an order
number that is decreased until successful. A system under stress can
experience failures during this allocation process resulting in a warning
being issued. This message can be of concern to end users even though the
failure is not fatal. Since the failure is not fatal and can occur
multiple times, the driver should include the __GFP_NOWARN flag to
suppress the warning message from being issued.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/amd/xgbe/xgbe-desc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
index d81fc6b..5c92fb7 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
@@ -263,7 +263,7 @@ static int xgbe_alloc_pages(struct xgbe_prv_data *pdata,
int ret;
/* Try to obtain pages, decreasing order if necessary */
- gfp |= __GFP_COLD | __GFP_COMP;
+ gfp |= __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
while (order >= 0) {
pages = alloc_pages(gfp, order);
if (pages)
--
2.1.0
From bb499410e6ac5ebb70e5a925618212f485fe9258 Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:20 +0200
Subject: [PATCH 19/21] net: mvneta: introduce compatible string "marvell,
armada-xp-neta"
[ Upstream commit f522a975a8101895a85354b9c143f41b8248e71a ]
The mvneta driver supports the Ethernet IP found in the Armada 370, XP,
380 and 385 SoCs. Since at least one more hardware feature is available
for the Armada XP SoCs then a way to identify them is needed.
This patch introduces a new compatible string "marvell,armada-xp-neta".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
index 750d577..f5a8ca2 100644
--- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
+++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
@@ -1,7 +1,7 @@
* Marvell Armada 370 / Armada XP Ethernet Controller (NETA)
Required properties:
-- compatible: should be "marvell,armada-370-neta".
+- compatible: "marvell,armada-370-neta" or "marvell,armada-xp-neta".
- reg: address and length of the register set for the device.
- interrupts: interrupt for the device
- phy: See ethernet.txt file in the same directory.
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 2db6532..387cedb 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3095,6 +3095,7 @@ static int mvneta_remove(struct platform_device *pdev)
static const struct of_device_id mvneta_match[] = {
{ .compatible = "marvell,armada-370-neta" },
+ { .compatible = "marvell,armada-xp-neta" },
{ }
};
MODULE_DEVICE_TABLE(of, mvneta_match);
--
2.1.0
From 9adfe45b22eb618e7ac403aa470fa3a69e1dad5a Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:21 +0200
Subject: [PATCH 20/21] ARM: mvebu: update Ethernet compatible string for
Armada XP
[ Upstream commit ea3b55fe83b5fcede82d183164b9d6831b26e33b ]
This patch updates the Ethernet DT nodes for Armada XP SoCs with the
compatible string "marvell,armada-xp-neta".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: 77916519cba3 ("arm: mvebu: Armada XP MV78230 has only three Ethernet interfaces")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
arch/arm/boot/dts/armada-370-xp.dtsi | 2 --
arch/arm/boot/dts/armada-370.dtsi | 8 ++++++++
arch/arm/boot/dts/armada-xp-mv78260.dtsi | 2 +-
arch/arm/boot/dts/armada-xp-mv78460.dtsi | 2 +-
arch/arm/boot/dts/armada-xp.dtsi | 10 +++++++++-
5 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index 8a322ad..a038c20 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -265,7 +265,6 @@
};
eth0: ethernet@70000 {
- compatible = "marvell,armada-370-neta";
reg = <0x70000 0x4000>;
interrupts = <8>;
clocks = <&gateclk 4>;
@@ -281,7 +280,6 @@
};
eth1: ethernet@74000 {
- compatible = "marvell,armada-370-neta";
reg = <0x74000 0x4000>;
interrupts = <10>;
clocks = <&gateclk 3>;
diff --git a/arch/arm/boot/dts/armada-370.dtsi b/arch/arm/boot/dts/armada-370.dtsi
index 27397f1..3773025 100644
--- a/arch/arm/boot/dts/armada-370.dtsi
+++ b/arch/arm/boot/dts/armada-370.dtsi
@@ -306,6 +306,14 @@
dmacap,memset;
};
};
+
+ ethernet@70000 {
+ compatible = "marvell,armada-370-neta";
+ };
+
+ ethernet@74000 {
+ compatible = "marvell,armada-370-neta";
+ };
};
};
};
diff --git a/arch/arm/boot/dts/armada-xp-mv78260.dtsi b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
index 4a7cbed..1676d30 100644
--- a/arch/arm/boot/dts/armada-xp-mv78260.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
@@ -319,7 +319,7 @@
};
eth3: ethernet@34000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x34000 0x4000>;
interrupts = <14>;
clocks = <&gateclk 1>;
diff --git a/arch/arm/boot/dts/armada-xp-mv78460.dtsi b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
index 36ce63a..d41fe88 100644
--- a/arch/arm/boot/dts/armada-xp-mv78460.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
@@ -357,7 +357,7 @@
};
eth3: ethernet@34000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x34000 0x4000>;
interrupts = <14>;
clocks = <&gateclk 1>;
diff --git a/arch/arm/boot/dts/armada-xp.dtsi b/arch/arm/boot/dts/armada-xp.dtsi
index 8291723..9ce7d5f 100644
--- a/arch/arm/boot/dts/armada-xp.dtsi
+++ b/arch/arm/boot/dts/armada-xp.dtsi
@@ -175,7 +175,7 @@
};
eth2: ethernet@30000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x30000 0x4000>;
interrupts = <12>;
clocks = <&gateclk 2>;
@@ -218,6 +218,14 @@
};
};
+ ethernet@70000 {
+ compatible = "marvell,armada-xp-neta";
+ };
+
+ ethernet@74000 {
+ compatible = "marvell,armada-xp-neta";
+ };
+
xor@f0900 {
compatible = "marvell,orion-xor";
reg = <0xF0900 0x100
--
2.1.0
From d9f65a9ac62784a5e2c70400c24d8f12cc1f82b9 Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:22 +0200
Subject: [PATCH 21/21] net: mvneta: disable IP checksum with jumbo frames for
Armada 370
[ Upstream commit b65657fc240ae6c1d2a1e62db9a0e61ac9631d7a ]
The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't
support TCP/IP checksumming with frame sizes larger than 1600 bytes.
This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and
NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set
to a value greater than 1600 bytes.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/marvell/mvneta.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 387cedb..87c7f52c 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -304,6 +304,7 @@ struct mvneta_port {
unsigned int link;
unsigned int duplex;
unsigned int speed;
+ unsigned int tx_csum_limit;
};
/* The mvneta_tx_desc and mvneta_rx_desc structures describe the
@@ -2441,8 +2442,10 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
dev->mtu = mtu;
- if (!netif_running(dev))
+ if (!netif_running(dev)) {
+ netdev_update_features(dev);
return 0;
+ }
/* The interface is running, so we have to force a
* reallocation of the queues
@@ -2471,9 +2474,26 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
mvneta_start_dev(pp);
mvneta_port_up(pp);
+ netdev_update_features(dev);
+
return 0;
}
+static netdev_features_t mvneta_fix_features(struct net_device *dev,
+ netdev_features_t features)
+{
+ struct mvneta_port *pp = netdev_priv(dev);
+
+ if (pp->tx_csum_limit && dev->mtu > pp->tx_csum_limit) {
+ features &= ~(NETIF_F_IP_CSUM | NETIF_F_TSO);
+ netdev_info(dev,
+ "Disable IP checksum for MTU greater than %dB\n",
+ pp->tx_csum_limit);
+ }
+
+ return features;
+}
+
/* Get mac address */
static void mvneta_get_mac_addr(struct mvneta_port *pp, unsigned char *addr)
{
@@ -2785,6 +2805,7 @@ static const struct net_device_ops mvneta_netdev_ops = {
.ndo_set_rx_mode = mvneta_set_rx_mode,
.ndo_set_mac_address = mvneta_set_mac_addr,
.ndo_change_mtu = mvneta_change_mtu,
+ .ndo_fix_features = mvneta_fix_features,
.ndo_get_stats64 = mvneta_get_stats64,
.ndo_do_ioctl = mvneta_ioctl,
};
@@ -3023,6 +3044,9 @@ static int mvneta_probe(struct platform_device *pdev)
}
}
+ if (of_device_is_compatible(dn, "marvell,armada-370-neta"))
+ pp->tx_csum_limit = 1600;
+
pp->tx_ring_size = MVNETA_MAX_TXD;
pp->rx_ring_size = MVNETA_MAX_RXD;
--
2.1.0
[-- Attachment #5: net_41.mbox --]
[-- Type: Application/Octet-Stream, Size: 64778 bytes --]
From 88a177706968e0270a0cfa47beff6ba91105d37c Mon Sep 17 00:00:00 2001
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Fri, 12 Jun 2015 10:16:41 -0300
Subject: [PATCH 01/21] sctp: fix ASCONF list handling
[ Upstream commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 ]
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/net/netns/sctp.h | 1 +
include/net/sctp/structs.h | 4 ++++
net/sctp/socket.c | 43 ++++++++++++++++++++++++++++++++-----------
3 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 3573a81..8ba379f 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -31,6 +31,7 @@ struct netns_sctp {
struct list_head addr_waitq;
struct timer_list addr_wq_timer;
struct list_head auto_asconf_splist;
+ /* Lock that protects both addr_waitq and auto_asconf_splist */
spinlock_t addr_wq_lock;
/* Lock that protects the local_addr_list writers */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 2bb2fcf..495c87e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -223,6 +223,10 @@ struct sctp_sock {
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
struct sk_buff_head pd_lobby;
+
+ /* These must be the last fields, as they will skipped on copies,
+ * like on accept and peeloff operations
+ */
struct list_head auto_asconf_list;
int do_auto_asconf;
};
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index f09de7f..5f6c4e6 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1528,8 +1528,10 @@ static void sctp_close(struct sock *sk, long timeout)
/* Supposedly, no process has access to the socket, but
* the net layers still may.
+ * Also, sctp_destroy_sock() needs to be called with addr_wq_lock
+ * held and that should be grabbed before socket lock.
*/
- local_bh_disable();
+ spin_lock_bh(&net->sctp.addr_wq_lock);
bh_lock_sock(sk);
/* Hold the sock, since sk_common_release() will put sock_put()
@@ -1539,7 +1541,7 @@ static void sctp_close(struct sock *sk, long timeout)
sk_common_release(sk);
bh_unlock_sock(sk);
- local_bh_enable();
+ spin_unlock_bh(&net->sctp.addr_wq_lock);
sock_put(sk);
@@ -3580,6 +3582,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
if ((val && sp->do_auto_asconf) || (!val && !sp->do_auto_asconf))
return 0;
+ spin_lock_bh(&sock_net(sk)->sctp.addr_wq_lock);
if (val == 0 && sp->do_auto_asconf) {
list_del(&sp->auto_asconf_list);
sp->do_auto_asconf = 0;
@@ -3588,6 +3591,7 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
&sock_net(sk)->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
}
+ spin_unlock_bh(&sock_net(sk)->sctp.addr_wq_lock);
return 0;
}
@@ -4121,18 +4125,28 @@ static int sctp_init_sock(struct sock *sk)
local_bh_disable();
percpu_counter_inc(&sctp_sockets_allocated);
sock_prot_inuse_add(net, sk->sk_prot, 1);
+
+ /* Nothing can fail after this block, otherwise
+ * sctp_destroy_sock() will be called without addr_wq_lock held
+ */
if (net->sctp.default_auto_asconf) {
+ spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
list_add_tail(&sp->auto_asconf_list,
&net->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
- } else
+ spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
+ } else {
sp->do_auto_asconf = 0;
+ }
+
local_bh_enable();
return 0;
}
-/* Cleanup any SCTP per socket resources. */
+/* Cleanup any SCTP per socket resources. Must be called with
+ * sock_net(sk)->sctp.addr_wq_lock held if sp->do_auto_asconf is true
+ */
static void sctp_destroy_sock(struct sock *sk)
{
struct sctp_sock *sp;
@@ -7195,6 +7209,19 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
newinet->mc_list = NULL;
}
+static inline void sctp_copy_descendant(struct sock *sk_to,
+ const struct sock *sk_from)
+{
+ int ancestor_size = sizeof(struct inet_sock) +
+ sizeof(struct sctp_sock) -
+ offsetof(struct sctp_sock, auto_asconf_list);
+
+ if (sk_from->sk_family == PF_INET6)
+ ancestor_size += sizeof(struct ipv6_pinfo);
+
+ __inet_sk_copy_descendant(sk_to, sk_from, ancestor_size);
+}
+
/* Populate the fields of the newsk from the oldsk and migrate the assoc
* and its messages to the newsk.
*/
@@ -7209,7 +7236,6 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
struct sk_buff *skb, *tmp;
struct sctp_ulpevent *event;
struct sctp_bind_hashbucket *head;
- struct list_head tmplist;
/* Migrate socket buffer sizes and all the socket level options to the
* new socket.
@@ -7217,12 +7243,7 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
newsk->sk_sndbuf = oldsk->sk_sndbuf;
newsk->sk_rcvbuf = oldsk->sk_rcvbuf;
/* Brute force copy old sctp opt. */
- if (oldsp->do_auto_asconf) {
- memcpy(&tmplist, &newsp->auto_asconf_list, sizeof(tmplist));
- inet_sk_copy_descendant(newsk, oldsk);
- memcpy(&newsp->auto_asconf_list, &tmplist, sizeof(tmplist));
- } else
- inet_sk_copy_descendant(newsk, oldsk);
+ sctp_copy_descendant(newsk, oldsk);
/* Restore the ep value that was overwritten with the above structure
* copy.
--
2.1.0
From 1213924626ae9df5231a08d2b1e4d0887c504c68 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Mon, 15 Jun 2015 20:28:51 +0300
Subject: [PATCH 02/21] bridge: fix br_stp_set_bridge_priority race conditions
[ Upstream commit 2dab80a8b486f02222a69daca6859519e05781d9 ]
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_ioctl.c | 2 --
net/bridge/br_stp_if.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
index a9a4a1b..8d423bc 100644
--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -247,9 +247,7 @@ static int old_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
- spin_lock_bh(&br->lock);
br_stp_set_bridge_priority(br, args[1]);
- spin_unlock_bh(&br->lock);
return 0;
case BRCTL_SET_PORT_PRIORITY:
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 4114687..7832d07 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -243,12 +243,13 @@ bool br_stp_recalculate_bridge_id(struct net_bridge *br)
return true;
}
-/* called under bridge lock */
+/* Acquires and releases bridge lock */
void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
{
struct net_bridge_port *p;
int wasroot;
+ spin_lock_bh(&br->lock);
wasroot = br_is_root_bridge(br);
list_for_each_entry(p, &br->port_list, list) {
@@ -266,6 +267,7 @@ void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
br_port_state_selection(br);
if (br_is_root_bridge(br) && !wasroot)
br_become_root_bridge(br);
+ spin_unlock_bh(&br->lock);
}
/* called under bridge lock */
--
2.1.0
From fe7b5387d56d0599544316cd38c61bad46e3fae0 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 16 Jun 2015 07:59:11 -0700
Subject: [PATCH 03/21] packet: read num_members once in packet_rcv_fanout()
[ Upstream commit f98f4514d07871da7a113dd9e3e330743fd70ae4 ]
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b5989c6..131545a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1353,7 +1353,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
struct packet_fanout *f = pt->af_packet_priv;
- unsigned int num = f->num_members;
+ unsigned int num = READ_ONCE(f->num_members);
struct packet_sock *po;
unsigned int idx;
--
2.1.0
From e67d56e09c3565366d2232f6d90bd9152277043f Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 17 Jun 2015 15:59:34 -0400
Subject: [PATCH 04/21] packet: avoid out of bounds read in round robin fanout
[ Upstream commit 468479e6043c84f5a65299cc07cb08a22a28c2b1 ]
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/packet/af_packet.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 131545a..fe1610d 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1272,16 +1272,6 @@ static void packet_sock_destruct(struct sock *sk)
sk_refcnt_debug_dec(sk);
}
-static int fanout_rr_next(struct packet_fanout *f, unsigned int num)
-{
- int x = atomic_read(&f->rr_cur) + 1;
-
- if (x >= num)
- x = 0;
-
- return x;
-}
-
static unsigned int fanout_demux_hash(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
@@ -1293,13 +1283,9 @@ static unsigned int fanout_demux_lb(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
{
- int cur, old;
+ unsigned int val = atomic_inc_return(&f->rr_cur);
- cur = atomic_read(&f->rr_cur);
- while ((old = atomic_cmpxchg(&f->rr_cur, cur,
- fanout_rr_next(f, num))) != cur)
- cur = old;
- return cur;
+ return val % num;
}
static unsigned int fanout_demux_cpu(struct packet_fanout *f,
--
2.1.0
From 93f6561c13c1da2dab4bbc5a41a1304ee739455f Mon Sep 17 00:00:00 2001
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 16 Jun 2015 22:56:39 +0300
Subject: [PATCH 05/21] neigh: do not modify unlinked entries
[ Upstream commit 2c51a97f76d20ebf1f50fef908b986cb051fdff9 ]
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/core/neighbour.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 3de6542..2237c1b 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -957,6 +957,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
rc = 0;
if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
goto out_unlock_bh;
+ if (neigh->dead)
+ goto out_dead;
if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
@@ -1013,6 +1015,13 @@ out_unlock_bh:
write_unlock(&neigh->lock);
local_bh_enable();
return rc;
+
+out_dead:
+ if (neigh->nud_state & NUD_STALE)
+ goto out_unlock_bh;
+ write_unlock_bh(&neigh->lock);
+ kfree_skb(skb);
+ return 1;
}
EXPORT_SYMBOL(__neigh_event_send);
@@ -1076,6 +1085,8 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
(old & (NUD_NOARP | NUD_PERMANENT)))
goto out;
+ if (neigh->dead)
+ goto out;
if (!(new & NUD_VALID)) {
neigh_del_timer(neigh);
@@ -1225,6 +1236,8 @@ EXPORT_SYMBOL(neigh_update);
*/
void __neigh_set_probe_once(struct neighbour *neigh)
{
+ if (neigh->dead)
+ return;
neigh->updated = jiffies;
if (!(neigh->nud_state & NUD_FAILED))
return;
--
2.1.0
From 0f2a5969f4c7f6da272592b2a400d1b55b2c70fa Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg@intel.com>
Date: Wed, 17 Jun 2015 13:54:54 +0200
Subject: [PATCH 06/21] mac80211: fix locking in
update_vlan_tailroom_need_count()
[ Upstream commit 51f458d9612177f69c2e2c437034ae15f93078e7 ]
Unfortunately, Michal's change to fix AP_VLAN crypto tailroom
caused a locking issue that was reported by lockdep, but only
in a few cases - the issue was a classic ABBA deadlock caused
by taking the mtx after the key_mtx, where normally they're
taken the other way around.
As the key mutex protects the field in question (I'm adding a
few annotations to make that clear) only the iteration needs
to be protected, but we can also iterate the interface list
with just RCU protection while holding the key mutex.
Fixes: f9dca80b98ca ("mac80211: fix AP_VLAN crypto tailroom calculation")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/mac80211/key.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/mac80211/key.c b/net/mac80211/key.c
index a907f2d..81e9785 100644
--- a/net/mac80211/key.c
+++ b/net/mac80211/key.c
@@ -66,12 +66,15 @@ update_vlan_tailroom_need_count(struct ieee80211_sub_if_data *sdata, int delta)
if (sdata->vif.type != NL80211_IFTYPE_AP)
return;
- mutex_lock(&sdata->local->mtx);
+ /* crypto_tx_tailroom_needed_cnt is protected by this */
+ assert_key_lock(sdata->local);
+
+ rcu_read_lock();
- list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list)
+ list_for_each_entry_rcu(vlan, &sdata->u.ap.vlans, u.vlan.list)
vlan->crypto_tx_tailroom_needed_cnt += delta;
- mutex_unlock(&sdata->local->mtx);
+ rcu_read_unlock();
}
static void increment_tailroom_need_count(struct ieee80211_sub_if_data *sdata)
@@ -95,6 +98,8 @@ static void increment_tailroom_need_count(struct ieee80211_sub_if_data *sdata)
* http://mid.gmane.org/1308590980.4322.19.camel@jlt3.sipsolutions.net
*/
+ assert_key_lock(sdata->local);
+
update_vlan_tailroom_need_count(sdata, 1);
if (!sdata->crypto_tx_tailroom_needed_cnt++) {
@@ -109,6 +114,8 @@ static void increment_tailroom_need_count(struct ieee80211_sub_if_data *sdata)
static void decrease_tailroom_need_count(struct ieee80211_sub_if_data *sdata,
int delta)
{
+ assert_key_lock(sdata->local);
+
WARN_ON_ONCE(sdata->crypto_tx_tailroom_needed_cnt < delta);
update_vlan_tailroom_need_count(sdata, -delta);
--
2.1.0
From 4f03921ef978ca4b09403c52508d7d0a8ff6f1ec Mon Sep 17 00:00:00 2001
From: Stas Sergeev <stsp@list.ru>
Date: Thu, 18 Jun 2015 18:36:03 +0300
Subject: [PATCH 07/21] mvneta: add forgotten initialization of autonegotiation
bits
[ Upstream commit 538761b794c1542f1c6e31eadd9d7aae118889f7 ]
The commit 898b2970e2c9 ("mvneta: implement SGMII-based in-band link state
signaling")
changed mvneta_adjust_link() so that it does not clear the auto-negotiation
bits in MVNETA_GMAC_AUTONEG_CONFIG register. This was necessary for
auto-negotiation mode to work.
Unfortunately I haven't checked if these bits are ever initialized.
It appears they are not.
This patch adds the missing initialization of the auto-negotiation bits
in the MVNETA_GMAC_AUTONEG_CONFIG register.
It fixes the following regression:
https://www.mail-archive.com/netdev@vger.kernel.org/msg67928.html
Since the patch was tested to fix a regression, it should be applied to
stable tree.
Tested-by: Arnaud Ebalard <arno@natisbad.org>
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: Florian Fainelli <f.fainelli@gmail.com>
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: stable@vger.kernel.org
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/marvell/mvneta.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ce5f7f9..74176ec 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1013,6 +1013,12 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
val = mvreg_read(pp, MVNETA_GMAC_CLOCK_DIVIDER);
val |= MVNETA_GMAC_1MS_CLOCK_ENABLE;
mvreg_write(pp, MVNETA_GMAC_CLOCK_DIVIDER, val);
+ } else {
+ val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG);
+ val &= ~(MVNETA_GMAC_INBAND_AN_ENABLE |
+ MVNETA_GMAC_AN_SPEED_EN |
+ MVNETA_GMAC_AN_DUPLEX_EN);
+ mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val);
}
mvneta_set_ucast_table(pp, -1);
--
2.1.0
From eac15f346e1fcffb8baff2ff8fb59fd1f6e91c12 Mon Sep 17 00:00:00 2001
From: Christoph Paasch <cpaasch@apple.com>
Date: Thu, 18 Jun 2015 09:15:34 -0700
Subject: [PATCH 08/21] tcp: Do not call tcp_fastopen_reset_cipher from
interrupt context
[ Upstream commit dfea2aa654243f70dc53b8648d0bbdeec55a7df1 ]
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.
Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
[ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 36.008250] 00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[ 36.009630] ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[ 36.011076] 0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[ 36.012494] Call Trace:
[ 36.012953] <IRQ> [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
[ 36.014085] [<ffffffff810967d3>] ___might_sleep+0x103/0x170
[ 36.015117] [<ffffffff81096892>] __might_sleep+0x52/0x90
[ 36.016117] [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
[ 36.017266] [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
[ 36.018485] [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
[ 36.019679] [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
[ 36.020884] [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
[ 36.022058] [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
[ 36.023118] [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
[ 36.024185] [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
[ 36.025327] [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
[ 36.026410] [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
[ 36.027556] [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
[ 36.028784] [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
[ 36.029832] [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
[ 36.030936] [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
[ 36.031875] [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
[ 36.032953] [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
[ 36.034065] [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
[ 36.035069] [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
[ 36.035963] [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
[ 36.036950] [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
[ 36.037847] [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
[ 36.038994] [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
[ 36.040033] [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
[ 36.041025] [<ffffffff81611482>] net_rx_action+0x122/0x310
[ 36.042007] [<ffffffff81076743>] __do_softirq+0x103/0x2f0
[ 36.042978] [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30
This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/af_inet.c | 2 ++
net/ipv4/tcp.c | 7 +++++--
net/ipv4/tcp_fastopen.c | 2 --
3 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8b47a4d..a5aa54e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int backlog)
err = 0;
if (err)
goto out;
+
+ tcp_fastopen_init_key_once(true);
}
err = inet_csk_listen_start(sk, backlog);
if (err)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f1377f2..bb2ce74 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2545,10 +2545,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_FASTOPEN:
if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
- TCPF_LISTEN)))
+ TCPF_LISTEN))) {
+ tcp_fastopen_init_key_once(true);
+
err = fastopen_init_queue(sk, val);
- else
+ } else {
err = -EINVAL;
+ }
break;
case TCP_TIMESTAMP:
if (!tp->repair)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 46b087a..f9c0fb8 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -78,8 +78,6 @@ static bool __tcp_fastopen_cookie_gen(const void *path,
struct tcp_fastopen_context *ctx;
bool ok = false;
- tcp_fastopen_init_key_once(true);
-
rcu_read_lock();
ctx = rcu_dereference(tcp_fastopen_ctx);
if (ctx) {
--
2.1.0
From 593c8a4bd06b59ed964aa61ea8f2a8781683bc49 Mon Sep 17 00:00:00 2001
From: "Palik, Imre" <imrep@amazon.de>
Date: Fri, 19 Jun 2015 14:21:51 +0200
Subject: [PATCH 09/21] xen-netback: fix a BUG() during initialization
[ Upstream commit 12b322ac85208de564ecf23aa754d796a91de21f ]
Commit edafc132baac ("xen-netback: making the bandwidth limiter runtime settable")
introduced the capability to change the bandwidth rate limit at runtime.
But it also introduced a possible crashing bug.
If netback receives two XenbusStateConnected without getting the
hotplug-status watch firing in between, then it will try to register the
watches for the rate limiter again. But this triggers a BUG() in the watch
registration code.
The fix modifies connect() to remove the possibly existing packet-rate
watches before trying to install those watches. This behaviour is in line
with how connect() deals with the hotplug-status watch.
Signed-off-by: Imre Palik <imrep@amazon.de>
Cc: Matt Wilson <msw@amazon.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/xen-netback/xenbus.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 968787a..ec383b0 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -681,6 +681,9 @@ static int xen_register_watchers(struct xenbus_device *dev, struct xenvif *vif)
char *node;
unsigned maxlen = strlen(dev->nodename) + sizeof("/rate");
+ if (vif->credit_watch.node)
+ return -EADDRINUSE;
+
node = kmalloc(maxlen, GFP_KERNEL);
if (!node)
return -ENOMEM;
@@ -770,6 +773,7 @@ static void connect(struct backend_info *be)
}
xen_net_read_rate(dev, &credit_bytes, &credit_usec);
+ xen_unregister_watchers(be->vif);
xen_register_watchers(dev, be->vif);
read_xenbus_vif_flags(be);
--
2.1.0
From 0e1299b6fe880118bf01f84ad785541b338f067b Mon Sep 17 00:00:00 2001
From: Julian Anastasov <ja@ssi.bg>
Date: Tue, 23 Jun 2015 08:34:39 +0300
Subject: [PATCH 10/21] ip: report the original address of ICMP messages
[ Upstream commit 34b99df4e6256ddafb663c6de0711dceceddfe0e ]
ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.
Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/ipv4/ip_sockglue.c | 11 ++++++++++-
net/ipv6/datagram.c | 12 +++++++++++-
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 7cfb089..6ddde89 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -432,6 +432,15 @@ void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 inf
kfree_skb(skb);
}
+/* For some errors we have valid addr_offset even with zero payload and
+ * zero port. Also, addr_offset should be supported if port is set.
+ */
+static inline bool ipv4_datagram_support_addr(struct sock_exterr_skb *serr)
+{
+ return serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
+ serr->ee.ee_origin == SO_EE_ORIGIN_LOCAL || serr->port;
+}
+
/* IPv4 supports cmsg on all imcp errors and some timestamps
*
* Timestamp code paths do not initialize the fields expected by cmsg:
@@ -498,7 +507,7 @@ int ip_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
serr = SKB_EXT_ERR(skb);
- if (sin && serr->port) {
+ if (sin && ipv4_datagram_support_addr(serr)) {
sin->sin_family = AF_INET;
sin->sin_addr.s_addr = *(__be32 *)(skb_network_header(skb) +
serr->addr_offset);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 762a58c..62d908e 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -325,6 +325,16 @@ void ipv6_local_rxpmtu(struct sock *sk, struct flowi6 *fl6, u32 mtu)
kfree_skb(skb);
}
+/* For some errors we have valid addr_offset even with zero payload and
+ * zero port. Also, addr_offset should be supported if port is set.
+ */
+static inline bool ipv6_datagram_support_addr(struct sock_exterr_skb *serr)
+{
+ return serr->ee.ee_origin == SO_EE_ORIGIN_ICMP6 ||
+ serr->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
+ serr->ee.ee_origin == SO_EE_ORIGIN_LOCAL || serr->port;
+}
+
/* IPv6 supports cmsg on all origins aside from SO_EE_ORIGIN_LOCAL.
*
* At one point, excluding local errors was a quick test to identify icmp/icmp6
@@ -389,7 +399,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
serr = SKB_EXT_ERR(skb);
- if (sin && serr->port) {
+ if (sin && ipv6_datagram_support_addr(serr)) {
const unsigned char *nh = skb_network_header(skb);
sin->sin6_family = AF_INET6;
sin->sin6_flowinfo = 0;
--
2.1.0
From 9cfd9897505bddf59db0f480ebcb2e956012f196 Mon Sep 17 00:00:00 2001
From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:41 +0300
Subject: [PATCH 11/21] net/mlx4_en: Release TX QP when destroying TX ring
[ Upstream commit 0eb08514fdbdcd16fd6870680cd638f203662e9d ]
TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.
Fixes: ddae0349fdb7 ('net/mlx4: Change QP allocation scheme')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 4 ----
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 1 +
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 -
3 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index cf467a9..a5a0b84 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1973,10 +1973,6 @@ void mlx4_en_free_resources(struct mlx4_en_priv *priv)
mlx4_en_destroy_cq(priv, &priv->rx_cq[i]);
}
- if (priv->base_tx_qpn) {
- mlx4_qp_release_range(priv->mdev->dev, priv->base_tx_qpn, priv->tx_ring_num);
- priv->base_tx_qpn = 0;
- }
}
int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 7bed3a8..0ab298f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -180,6 +180,7 @@ void mlx4_en_destroy_tx_ring(struct mlx4_en_priv *priv,
mlx4_bf_free(mdev->dev, &ring->bf);
mlx4_qp_remove(mdev->dev, &ring->qp);
mlx4_qp_free(mdev->dev, &ring->qp);
+ mlx4_qp_release_range(priv->mdev->dev, ring->qpn, 1);
mlx4_en_unmap_buffer(&ring->wqres.buf);
mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size);
kfree(ring->bounce_buf);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index d021f07..9a4b380 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -579,7 +579,6 @@ struct mlx4_en_priv {
int vids[128];
bool wol;
struct device *ddev;
- int base_tx_qpn;
struct hlist_head mac_hash[MLX4_EN_MAC_HASH_SIZE];
struct hwtstamp_config hwtstamp_config;
--
2.1.0
From 32334e974fb2c819ad4d0c7bbd4e2e3551002f99 Mon Sep 17 00:00:00 2001
From: Ido Shamay <idos@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:42 +0300
Subject: [PATCH 12/21] net/mlx4_en: Wake TX queues only when there's enough
room
[ Upstream commit 488a9b48e398b157703766e2cd91ea45ac6997c5 ]
Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.
Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 19 +++++++++++--------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 +
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 0ab298f..c10d98f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -66,6 +66,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
ring->size = size;
ring->size_mask = size - 1;
ring->stride = stride;
+ ring->full_size = ring->size - HEADROOM - MAX_DESC_TXBBS;
tmp = size * sizeof(struct mlx4_en_tx_info);
ring->tx_info = kmalloc_node(tmp, GFP_KERNEL | __GFP_NOWARN, node);
@@ -232,6 +233,11 @@ void mlx4_en_deactivate_tx_ring(struct mlx4_en_priv *priv,
MLX4_QP_STATE_RST, NULL, 0, 0, &ring->qp);
}
+static inline bool mlx4_en_is_tx_ring_full(struct mlx4_en_tx_ring *ring)
+{
+ return ring->prod - ring->cons > ring->full_size;
+}
+
static void mlx4_en_stamp_wqe(struct mlx4_en_priv *priv,
struct mlx4_en_tx_ring *ring, int index,
u8 owner)
@@ -474,11 +480,10 @@ static bool mlx4_en_process_tx_cq(struct net_device *dev,
netdev_tx_completed_queue(ring->tx_queue, packets, bytes);
- /*
- * Wakeup Tx queue if this stopped, and at least 1 packet
- * was completed
+ /* Wakeup Tx queue if this stopped, and ring is not full.
*/
- if (netif_tx_queue_stopped(ring->tx_queue) && txbbs_skipped > 0) {
+ if (netif_tx_queue_stopped(ring->tx_queue) &&
+ !mlx4_en_is_tx_ring_full(ring)) {
netif_tx_wake_queue(ring->tx_queue);
ring->wake_queue++;
}
@@ -922,8 +927,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
skb_tx_timestamp(skb);
/* Check available TXBBs And 2K spare for prefetch */
- stop_queue = (int)(ring->prod - ring_cons) >
- ring->size - HEADROOM - MAX_DESC_TXBBS;
+ stop_queue = mlx4_en_is_tx_ring_full(ring);
if (unlikely(stop_queue)) {
netif_tx_stop_queue(ring->tx_queue);
ring->queue_stopped++;
@@ -992,8 +996,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
smp_rmb();
ring_cons = ACCESS_ONCE(ring->cons);
- if (unlikely(((int)(ring->prod - ring_cons)) <=
- ring->size - HEADROOM - MAX_DESC_TXBBS)) {
+ if (unlikely(!mlx4_en_is_tx_ring_full(ring))) {
netif_tx_wake_queue(ring->tx_queue);
ring->wake_queue++;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 9a4b380..909fcf8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -279,6 +279,7 @@ struct mlx4_en_tx_ring {
u32 size; /* number of TXBBs */
u32 size_mask;
u16 stride;
+ u32 full_size;
u16 cqn; /* index of port CQ associated with this ring */
u32 buf_size;
__be32 doorbell_qpn;
--
2.1.0
From 4d84400c01fa1ab6be6f974fb5eeae5280c9861f Mon Sep 17 00:00:00 2001
From: Ido Shamay <idos@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:43 +0300
Subject: [PATCH 13/21] net/mlx4_en: Fix wrong csum complete report when rxvlan
offload is disabled
[ Upstream commit 79a258526ce1051cb9684018c25a89d51ac21be8 ]
The check_csum() function relied on hwtstamp_rx_filter to know if rxvlan
offload is disabled. This is wrong since rxvlan offload can be switched
on/off regardless of hwtstamp_rx_filter.
Also moved check_csum to query CQE information to identify VLAN packets
and removed the check of IP packets, since it has been validated before.
Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2a77a6b..eab4e08 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -723,7 +723,7 @@ static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
}
#endif
static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
- int hwtstamp_rx_filter)
+ netdev_features_t dev_features)
{
__wsum hw_checksum = 0;
@@ -731,14 +731,8 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
- if (((struct ethhdr *)va)->h_proto == htons(ETH_P_8021Q) &&
- hwtstamp_rx_filter != HWTSTAMP_FILTER_NONE) {
- /* next protocol non IPv4 or IPv6 */
- if (((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
- != htons(ETH_P_IP) &&
- ((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
- != htons(ETH_P_IPV6))
- return -1;
+ if (cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK) &&
+ !(dev_features & NETIF_F_HW_VLAN_CTAG_RX)) {
hw_checksum = get_fixed_vlan_csum(hw_checksum, hdr);
hdr += sizeof(struct vlan_hdr);
}
@@ -901,7 +895,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
if (ip_summed == CHECKSUM_COMPLETE) {
void *va = skb_frag_address(skb_shinfo(gro_skb)->frags);
- if (check_csum(cqe, gro_skb, va, ring->hwtstamp_rx_filter)) {
+ if (check_csum(cqe, gro_skb, va,
+ dev->features)) {
ip_summed = CHECKSUM_NONE;
ring->csum_none++;
ring->csum_complete--;
@@ -956,7 +951,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
}
if (ip_summed == CHECKSUM_COMPLETE) {
- if (check_csum(cqe, skb, skb->data, ring->hwtstamp_rx_filter)) {
+ if (check_csum(cqe, skb, skb->data, dev->features)) {
ip_summed = CHECKSUM_NONE;
ring->csum_complete--;
ring->csum_none++;
--
2.1.0
From 92a58d5336918d43eb296382bf6106aa96a3d995 Mon Sep 17 00:00:00 2001
From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Thu, 25 Jun 2015 11:29:44 +0300
Subject: [PATCH 14/21] mlx4: Disable HA for SRIOV PF RoCE devices
[ Upstream commit 7254acffeeec3c0a75b9c5364c29a6eb00014930 ]
When in HA mode, the driver exposes an IB (RoCE) device instance with only
one port. Under SRIOV, the existing implementation doesn't go well with
the PF RoCE driver's role of Special QPs Para-Virtualization, etc.
As such, disable HA for the mlx4 PF RoCE device in SRIOV mode.
Fixes: a57500903093 ('IB/mlx4: Add port aggregation support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mellanox/mlx4/intf.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 6fce587..0d80aed 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -93,8 +93,14 @@ int mlx4_register_interface(struct mlx4_interface *intf)
mutex_lock(&intf_mutex);
list_add_tail(&intf->list, &intf_list);
- list_for_each_entry(priv, &dev_list, dev_list)
+ list_for_each_entry(priv, &dev_list, dev_list) {
+ if (mlx4_is_mfunc(&priv->dev) && (intf->flags & MLX4_INTFF_BONDING)) {
+ mlx4_dbg(&priv->dev,
+ "SRIOV, disabling HA mode for intf proto %d\n", intf->protocol);
+ intf->flags &= ~MLX4_INTFF_BONDING;
+ }
mlx4_add_device(intf, priv);
+ }
mutex_unlock(&intf_mutex);
--
2.1.0
From ad3cf1fa154d79993813828e73a05587606ca229 Mon Sep 17 00:00:00 2001
From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Thu, 25 Jun 2015 22:21:02 +0530
Subject: [PATCH 15/21] net: phy: fix phy link up when limiting speed via
device tree
[ Upstream commit eb686231fce3770299760f24fdcf5ad041f44153 ]
When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/phy/phy_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index bdfe51f..d551df6 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -796,10 +796,11 @@ static int genphy_config_advert(struct phy_device *phydev)
if (phydev->supported & (SUPPORTED_1000baseT_Half |
SUPPORTED_1000baseT_Full)) {
adv |= ethtool_adv_to_mii_ctrl1000_t(advertise);
- if (adv != oldadv)
- changed = 1;
}
+ if (adv != oldadv)
+ changed = 1;
+
err = phy_write(phydev, MII_CTRL1000, adv);
if (err < 0)
return err;
--
2.1.0
From 1864f97a7de821d0d45d919c9db1a4acb4d1122a Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 26 Jun 2015 07:32:29 +0200
Subject: [PATCH 16/21] bnx2x: fix lockdep splat
[ Upstream commit d53c66a5b80698620f7c9ba2372fff4017e987b8 ]
Michel reported following lockdep splat
[ 44.718117] INFO: trying to register non-static key.
[ 44.723081] the code is fine but needs lockdep annotation.
[ 44.728559] turning off the locking correctness validator.
[ 44.734036] CPU: 8 PID: 5483 Comm: ethtool Not tainted 4.1.0
[ 44.770289] Call Trace:
[ 44.772741] [<ffffffff816eb1cd>] dump_stack+0x4c/0x65
[ 44.777879] [<ffffffff8111d921>] ? console_unlock+0x1f1/0x510
[ 44.783708] [<ffffffff811121f5>] __lock_acquire+0x1d05/0x1f10
[ 44.789538] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[ 44.795276] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 44.801967] [<ffffffff8111390d>] ? trace_hardirqs_on+0xd/0x10
[ 44.807793] [<ffffffff811330fa>] ? hrtimer_try_to_cancel+0x4a/0x250
[ 44.814142] [<ffffffff81112ba6>] lock_acquire+0xb6/0x290
[ 44.819537] [<ffffffff810d6675>] ? flush_work+0x5/0x280
[ 44.824844] [<ffffffff810d66ad>] flush_work+0x3d/0x280
[ 44.830061] [<ffffffff810d6675>] ? flush_work+0x5/0x280
[ 44.835366] [<ffffffff816f3c43>] ? schedule_hrtimeout_range+0x13/0x20
[ 44.841889] [<ffffffff8112ec9b>] ? usleep_range+0x4b/0x50
[ 44.847365] [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[ 44.853102] [<ffffffff810d8585>] ? __cancel_work_timer+0x105/0x1c0
[ 44.859359] [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[ 44.866045] [<ffffffff810d851f>] __cancel_work_timer+0x9f/0x1c0
[ 44.872048] [<ffffffffa0010982>] ? bnx2x_func_stop+0x42/0x90 [bnx2x]
[ 44.878481] [<ffffffff810d8670>] cancel_work_sync+0x10/0x20
[ 44.884134] [<ffffffffa00259e5>] bnx2x_chip_cleanup+0x245/0x730 [bnx2x]
[ 44.890829] [<ffffffff8110ce02>] ? up+0x32/0x50
[ 44.895439] [<ffffffff811306b5>] ? del_timer_sync+0x5/0xd0
[ 44.901005] [<ffffffffa005596d>] bnx2x_nic_unload+0x20d/0x8e0 [bnx2x]
[ 44.907527] [<ffffffff811f1aef>] ? might_fault+0x5f/0xb0
[ 44.912921] [<ffffffffa005851c>] bnx2x_reload_if_running+0x2c/0x50 [bnx2x]
[ 44.919879] [<ffffffffa005a3c5>] bnx2x_set_ringparam+0x2b5/0x460 [bnx2x]
[ 44.926664] [<ffffffff815d498b>] dev_ethtool+0x55b/0x1c40
[ 44.932148] [<ffffffff815dfdc7>] ? rtnl_lock+0x17/0x20
[ 44.937364] [<ffffffff815e7f8b>] dev_ioctl+0x17b/0x630
[ 44.942582] [<ffffffff815abf8d>] sock_do_ioctl+0x5d/0x70
[ 44.947972] [<ffffffff815ac013>] sock_ioctl+0x73/0x280
[ 44.953192] [<ffffffff8124c1c8>] do_vfs_ioctl+0x88/0x5b0
[ 44.958587] [<ffffffff8110d0b3>] ? up_read+0x23/0x40
[ 44.963631] [<ffffffff812584cc>] ? __fget_light+0x6c/0xa0
[ 44.969105] [<ffffffff8124c781>] SyS_ioctl+0x91/0xb0
[ 44.974149] [<ffffffff816f4dd7>] system_call_fastpath+0x12/0x6f
As bnx2x_init_ptp() is only called if bp->flags contains PTP_SUPPORTED,
we also need to guard bnx2x_stop_ptp() with same condition, otherwise
ptp_task workqueue is not initialized and kernel barfs on
cancel_work_sync()
Fixes: eeed018cbfa30 ("bnx2x: Add timestamping and PTP hardware clock support")
Reported-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michal Kalderon <Michal.Kalderon@qlogic.com>
Cc: Ariel Elior <Ariel.Elior@qlogic.com>
Cc: Yuval Mintz <Yuval.Mintz@qlogic.com>
Cc: David Decotigny <decot@google.com>
Acked-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 33501bc..8a97d28 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9323,7 +9323,8 @@ unload_error:
* function stop ramrod is sent, since as part of this ramrod FW access
* PTP registers.
*/
- bnx2x_stop_ptp(bp);
+ if (bp->flags & PTP_SUPPORTED)
+ bnx2x_stop_ptp(bp);
/* Disable HW interrupts, NAPI */
bnx2x_netif_stop(bp, 1);
--
2.1.0
From 862302cda6c51cd5644e3314189c47ff8a1b0ff4 Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Date: Mon, 29 Jun 2015 10:41:03 +0200
Subject: [PATCH 17/21] sctp: Fix race between OOTB responce and route removal
[ Upstream commit 29c4afc4e98f4dc0ea9df22c631841f9c220b944 ]
There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.
To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responce
On x86_64 the crash looks like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffffa05ec9ac>] [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
<IRQ>
[<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
[<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
[<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
[<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
[<ffffffff810e0329>] ? update_process_times+0x59/0x60
[<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
[<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
[<ffffffff8101f599>] ? read_tsc+0x9/0x10
[<ffffffff8116d4b5>] ? put_page+0x55/0x60
[<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
[<ffffffff81462b68>] ? skb_free_head+0x58/0x80
[<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
[<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
[<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
[<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
[<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
[<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
[<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
[<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
[<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
[<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
[<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
[<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
[<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
[<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
[<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
[<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
[<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
[<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
[<ffffffff8147896a>] net_rx_action+0x21a/0x360
[<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
[<ffffffff8107912d>] irq_exit+0xad/0xb0
[<ffffffff8157d158>] do_IRQ+0x58/0xf0
[<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
<EOI>
[<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
[<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
[<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
[<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
[<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
[<ffffffff8156b365>] rest_init+0x85/0x90
[<ffffffff818eb035>] start_kernel+0x48b/0x4ac
[<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
[<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP <ffff880127c037b8>
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/sctp/output.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sctp/output.c b/net/sctp/output.c
index fc5e45b..abe7c2d 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -599,7 +599,9 @@ out:
return err;
no_route:
kfree_skb(nskb);
- IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
+
+ if (asoc)
+ IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
/* FIXME: Returning the 'err' will effect all the associations
* associated with a socket, although only one of the paths of the
--
2.1.0
From a4d501a32c7ffbc356a2c6677251a5613dd77434 Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky@amd.com>
Date: Mon, 29 Jun 2015 11:22:12 -0500
Subject: [PATCH 18/21] amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer
allocation
[ Upstream commit 472cfe7127760d68b819cf35a26e5a1b44b30f4e ]
When allocating Rx related buffers, alloc_pages is called using an order
number that is decreased until successful. A system under stress can
experience failures during this allocation process resulting in a warning
being issued. This message can be of concern to end users even though the
failure is not fatal. Since the failure is not fatal and can occur
multiple times, the driver should include the __GFP_NOWARN flag to
suppress the warning message from being issued.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/amd/xgbe/xgbe-desc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
index d81fc6b..5c92fb7 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
@@ -263,7 +263,7 @@ static int xgbe_alloc_pages(struct xgbe_prv_data *pdata,
int ret;
/* Try to obtain pages, decreasing order if necessary */
- gfp |= __GFP_COLD | __GFP_COMP;
+ gfp |= __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
while (order >= 0) {
pages = alloc_pages(gfp, order);
if (pages)
--
2.1.0
From 2c5317116ecc9ab45da091fbaa957d20d993752a Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:20 +0200
Subject: [PATCH 19/21] net: mvneta: introduce compatible string "marvell,
armada-xp-neta"
[ Upstream commit f522a975a8101895a85354b9c143f41b8248e71a ]
The mvneta driver supports the Ethernet IP found in the Armada 370, XP,
380 and 385 SoCs. Since at least one more hardware feature is available
for the Armada XP SoCs then a way to identify them is needed.
This patch introduces a new compatible string "marvell,armada-xp-neta".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
index 750d577..f5a8ca2 100644
--- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
+++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
@@ -1,7 +1,7 @@
* Marvell Armada 370 / Armada XP Ethernet Controller (NETA)
Required properties:
-- compatible: should be "marvell,armada-370-neta".
+- compatible: "marvell,armada-370-neta" or "marvell,armada-xp-neta".
- reg: address and length of the register set for the device.
- interrupts: interrupt for the device
- phy: See ethernet.txt file in the same directory.
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 74176ec..4fb27ea 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3185,6 +3185,7 @@ static int mvneta_remove(struct platform_device *pdev)
static const struct of_device_id mvneta_match[] = {
{ .compatible = "marvell,armada-370-neta" },
+ { .compatible = "marvell,armada-xp-neta" },
{ }
};
MODULE_DEVICE_TABLE(of, mvneta_match);
--
2.1.0
From 267b31d1607df11b289559d837c5d1812c078bc6 Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:21 +0200
Subject: [PATCH 20/21] ARM: mvebu: update Ethernet compatible string for
Armada XP
[ Upstream commit ea3b55fe83b5fcede82d183164b9d6831b26e33b ]
This patch updates the Ethernet DT nodes for Armada XP SoCs with the
compatible string "marvell,armada-xp-neta".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: 77916519cba3 ("arm: mvebu: Armada XP MV78230 has only three Ethernet interfaces")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
arch/arm/boot/dts/armada-370-xp.dtsi | 2 --
arch/arm/boot/dts/armada-370.dtsi | 8 ++++++++
arch/arm/boot/dts/armada-xp-mv78260.dtsi | 2 +-
arch/arm/boot/dts/armada-xp-mv78460.dtsi | 2 +-
arch/arm/boot/dts/armada-xp.dtsi | 10 +++++++++-
5 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index ec96f0b..06a2f2a 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -270,7 +270,6 @@
};
eth0: ethernet@70000 {
- compatible = "marvell,armada-370-neta";
reg = <0x70000 0x4000>;
interrupts = <8>;
clocks = <&gateclk 4>;
@@ -286,7 +285,6 @@
};
eth1: ethernet@74000 {
- compatible = "marvell,armada-370-neta";
reg = <0x74000 0x4000>;
interrupts = <10>;
clocks = <&gateclk 3>;
diff --git a/arch/arm/boot/dts/armada-370.dtsi b/arch/arm/boot/dts/armada-370.dtsi
index 00b50db5..ca4257b 100644
--- a/arch/arm/boot/dts/armada-370.dtsi
+++ b/arch/arm/boot/dts/armada-370.dtsi
@@ -307,6 +307,14 @@
dmacap,memset;
};
};
+
+ ethernet@70000 {
+ compatible = "marvell,armada-370-neta";
+ };
+
+ ethernet@74000 {
+ compatible = "marvell,armada-370-neta";
+ };
};
};
};
diff --git a/arch/arm/boot/dts/armada-xp-mv78260.dtsi b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
index 8479fdc..c5fdc99 100644
--- a/arch/arm/boot/dts/armada-xp-mv78260.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78260.dtsi
@@ -318,7 +318,7 @@
};
eth3: ethernet@34000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x34000 0x4000>;
interrupts = <14>;
clocks = <&gateclk 1>;
diff --git a/arch/arm/boot/dts/armada-xp-mv78460.dtsi b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
index 661d54c..0e24f1a 100644
--- a/arch/arm/boot/dts/armada-xp-mv78460.dtsi
+++ b/arch/arm/boot/dts/armada-xp-mv78460.dtsi
@@ -356,7 +356,7 @@
};
eth3: ethernet@34000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x34000 0x4000>;
interrupts = <14>;
clocks = <&gateclk 1>;
diff --git a/arch/arm/boot/dts/armada-xp.dtsi b/arch/arm/boot/dts/armada-xp.dtsi
index 013d63f..8fdd6d7 100644
--- a/arch/arm/boot/dts/armada-xp.dtsi
+++ b/arch/arm/boot/dts/armada-xp.dtsi
@@ -177,7 +177,7 @@
};
eth2: ethernet@30000 {
- compatible = "marvell,armada-370-neta";
+ compatible = "marvell,armada-xp-neta";
reg = <0x30000 0x4000>;
interrupts = <12>;
clocks = <&gateclk 2>;
@@ -220,6 +220,14 @@
};
};
+ ethernet@70000 {
+ compatible = "marvell,armada-xp-neta";
+ };
+
+ ethernet@74000 {
+ compatible = "marvell,armada-xp-neta";
+ };
+
xor@f0900 {
compatible = "marvell,orion-xor";
reg = <0xF0900 0x100
--
2.1.0
From 69fa67f20ca03f9f524404db30ffe4897ad3da86 Mon Sep 17 00:00:00 2001
From: Simon Guinot <simon.guinot@sequanux.org>
Date: Tue, 30 Jun 2015 16:20:22 +0200
Subject: [PATCH 21/21] net: mvneta: disable IP checksum with jumbo frames for
Armada 370
[ Upstream commit b65657fc240ae6c1d2a1e62db9a0e61ac9631d7a ]
The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't
support TCP/IP checksumming with frame sizes larger than 1600 bytes.
This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and
NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set
to a value greater than 1600 bytes.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/marvell/mvneta.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 4fb27ea..74d0389 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -310,6 +310,7 @@ struct mvneta_port {
unsigned int link;
unsigned int duplex;
unsigned int speed;
+ unsigned int tx_csum_limit;
int use_inband_status:1;
};
@@ -2508,8 +2509,10 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
dev->mtu = mtu;
- if (!netif_running(dev))
+ if (!netif_running(dev)) {
+ netdev_update_features(dev);
return 0;
+ }
/* The interface is running, so we have to force a
* reallocation of the queues
@@ -2538,9 +2541,26 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
mvneta_start_dev(pp);
mvneta_port_up(pp);
+ netdev_update_features(dev);
+
return 0;
}
+static netdev_features_t mvneta_fix_features(struct net_device *dev,
+ netdev_features_t features)
+{
+ struct mvneta_port *pp = netdev_priv(dev);
+
+ if (pp->tx_csum_limit && dev->mtu > pp->tx_csum_limit) {
+ features &= ~(NETIF_F_IP_CSUM | NETIF_F_TSO);
+ netdev_info(dev,
+ "Disable IP checksum for MTU greater than %dB\n",
+ pp->tx_csum_limit);
+ }
+
+ return features;
+}
+
/* Get mac address */
static void mvneta_get_mac_addr(struct mvneta_port *pp, unsigned char *addr)
{
@@ -2862,6 +2882,7 @@ static const struct net_device_ops mvneta_netdev_ops = {
.ndo_set_rx_mode = mvneta_set_rx_mode,
.ndo_set_mac_address = mvneta_set_mac_addr,
.ndo_change_mtu = mvneta_change_mtu,
+ .ndo_fix_features = mvneta_fix_features,
.ndo_get_stats64 = mvneta_get_stats64,
.ndo_do_ioctl = mvneta_ioctl,
};
@@ -3107,6 +3128,9 @@ static int mvneta_probe(struct platform_device *pdev)
}
}
+ if (of_device_is_compatible(dn, "marvell,armada-370-neta"))
+ pp->tx_csum_limit = 1600;
+
pp->tx_ring_size = MVNETA_MAX_TXD;
pp->rx_ring_size = MVNETA_MAX_RXD;
--
2.1.0
next reply other threads:[~2015-07-03 22:31 UTC|newest]
Thread overview: 308+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-03 22:31 David Miller [this message]
2015-07-04 3:04 ` [PATCHES] Networking Greg KH
-- strict thread matches above, loose matches on Subject: below --
2020-09-24 21:40 David Miller
2020-09-25 8:38 ` Greg KH
2020-08-24 16:52 David Miller
2020-08-26 10:13 ` Greg KH
2020-08-15 1:36 David Miller
2020-08-17 9:37 ` Greg KH
2020-08-08 1:53 David Miller
2020-08-10 12:08 ` Greg KH
2020-07-29 3:12 David Miller
2020-07-29 11:42 ` Greg KH
2020-07-16 23:07 David Miller
2020-07-17 8:21 ` Greg KH
2020-06-28 0:55 David Miller
2020-06-28 14:15 ` Greg KH
2020-06-16 1:27 David Miller
2020-06-16 7:43 ` Greg KH
2020-06-17 16:16 ` Greg KH
2020-06-10 0:08 David Miller
2020-06-11 10:01 ` Greg KH
2020-06-07 2:55 David Miller
2020-06-07 13:01 ` Greg KH
2020-05-27 6:16 David Miller
2020-05-28 12:21 ` Greg KH
2020-05-16 0:44 David Miller
2020-05-16 16:14 ` Greg KH
2020-05-12 0:41 David Miller
2020-05-12 8:59 ` Greg KH
2020-04-27 1:10 David Miller
2020-04-27 11:40 ` Greg KH
2020-04-17 17:51 David Miller
2020-04-18 9:08 ` Greg KH
2020-04-08 22:05 David Miller
2020-04-09 11:17 ` Greg KH
2020-04-02 20:16 David Miller
2020-04-02 20:23 ` Greg KH
2020-03-27 23:51 David Miller
2020-03-28 8:49 ` Greg KH
2020-03-14 4:52 David Miller
2020-03-15 8:34 ` Greg KH
2020-03-01 5:11 David Miller
2020-03-01 9:27 ` Greg KH
2020-02-18 23:41 David Miller
2020-02-19 20:32 ` Greg KH
2020-02-09 21:21 David Miller
2020-02-09 21:54 ` Greg KH
2020-02-05 14:07 David Miller
2020-02-06 6:54 ` Greg KH
2020-01-30 10:12 David Miller
2020-01-30 10:22 ` Greg KH
2020-01-27 11:16 David Miller
2020-01-27 14:27 ` Greg KH
2020-01-20 19:44 David Miller
2020-01-21 15:29 ` Greg KH
2020-01-11 0:33 David Miller
2020-01-11 8:19 ` Greg KH
2020-01-01 20:13 David Miller
2020-01-01 21:37 ` Greg KH
2019-12-28 8:14 David Miller
2019-12-28 11:23 ` Greg KH
2019-12-16 22:10 David Miller
2019-12-17 7:43 ` Greg KH
2019-12-19 15:42 ` Greg KH
2019-12-03 3:21 David Miller
2019-12-03 6:46 ` Greg KH
2019-11-25 5:54 David Miller
2019-11-25 13:31 ` Greg KH
2019-11-18 8:08 David Miller
2019-11-18 8:16 ` Greg KH
2019-11-10 5:47 David Miller
2019-11-10 15:34 ` Greg KH
2019-10-24 21:30 David Miller
2019-10-25 1:38 ` Greg KH
2019-10-05 21:57 David Miller
2019-10-06 7:50 ` Greg KH
2019-09-29 22:39 David Miller
2019-10-01 13:38 ` Greg KH
2019-09-19 12:07 David Miller
2019-09-19 13:02 ` Greg KH
2019-09-15 19:37 David Miller
2019-09-16 11:05 ` Greg KH
2019-09-05 7:23 David Miller
2019-09-08 10:40 ` Greg KH
2019-08-28 0:42 David Miller
2019-09-02 16:30 ` Greg KH
2019-09-02 17:51 ` David Miller
2019-08-20 23:01 David Miller
2019-08-20 23:19 ` Greg KH
2019-08-07 23:27 David Miller
2019-08-08 6:57 ` Greg KH
2019-07-25 1:55 David Miller
2019-07-26 8:54 ` Greg KH
2019-07-02 1:52 David Miller
2019-07-02 4:40 ` Greg KH
2019-06-18 4:23 David Miller
2019-06-19 12:35 ` Greg KH
2019-06-08 23:27 David Miller
2019-06-09 7:26 ` Greg KH
2019-06-09 19:42 ` David Miller
2019-05-21 6:37 David Miller
2019-05-22 6:36 ` Greg KH
2019-05-14 19:58 David Miller
2019-05-15 6:02 ` Greg KH
2019-05-04 7:01 David Miller
2019-05-04 7:34 ` Greg KH
2019-04-30 2:06 David Miller
2019-04-30 7:53 ` Greg KH
2019-04-18 22:53 David Miller
2019-04-23 20:06 ` Greg KH
2019-04-10 3:55 David Miller
2019-04-10 15:35 ` Sasha Levin
2019-03-28 19:24 David Miller
2019-03-28 20:55 ` Greg KH
2019-03-28 21:51 ` Greg KH
2019-03-28 23:18 ` David Miller
2019-03-29 6:18 ` Greg KH
2019-03-15 1:47 David Miller
2019-03-15 6:30 ` Greg KH
2019-03-19 13:03 ` Greg KH
2019-03-07 22:47 David Miller
2019-03-08 6:38 ` Greg KH
2019-02-24 5:18 David Miller
2019-02-24 7:52 ` Greg KH
2019-02-20 20:42 David Miller
2019-02-21 3:08 ` Sasha Levin
2019-02-21 7:21 ` Greg KH
2019-02-09 23:21 David Miller
2019-02-10 12:21 ` Greg KH
2019-02-01 21:45 David Miller
2019-02-02 9:55 ` Greg KH
2019-01-26 0:18 David Miller
2019-01-26 9:29 ` Greg KH
2019-01-21 23:28 David Miller
2019-01-22 7:18 ` Greg KH
2019-01-23 7:33 ` Greg KH
2019-01-20 19:12 David Miller
2019-01-21 8:00 ` Greg KH
2019-01-04 18:17 David Miller
2019-01-04 18:48 ` Greg KH
2018-12-12 6:31 David Miller
2018-12-13 9:53 ` Greg KH
2018-12-03 7:01 David Miller
2018-12-03 9:13 ` Greg KH
2018-11-21 3:49 David Miller
2018-11-21 17:49 ` Greg KH
2018-11-02 3:55 David Miller
2018-11-02 5:27 ` Greg KH
2018-09-24 16:46 David Miller
2018-09-26 9:32 ` Greg KH
2018-09-18 16:14 David Miller
2018-09-20 5:25 ` Greg KH
2018-09-11 6:15 David Miller
2018-09-11 8:29 ` Greg KH
2018-08-17 19:32 David Miller
2018-08-18 9:43 ` Greg KH
2018-08-04 5:05 David Miller
2018-08-04 7:33 ` Greg KH
2018-08-01 5:32 David Miller
2018-08-01 6:20 ` Greg KH
2018-07-26 23:50 David Miller
2018-07-27 0:06 ` Eric Dumazet
2018-07-27 6:34 ` Greg KH
2018-07-23 3:51 David Miller
2018-07-23 6:21 ` Greg KH
2018-07-18 23:35 David Miller
2018-07-19 6:33 ` Greg KH
2018-06-20 12:37 David Miller
2018-06-21 21:10 ` Greg KH
2018-06-24 11:20 ` Greg KH
2018-06-08 2:18 David Miller
2018-06-08 4:52 ` Greg KH
2018-05-15 20:50 David Miller
2018-05-16 8:40 ` Greg KH
2018-04-26 18:38 David Miller
2018-04-26 18:50 ` Greg KH
2018-04-13 17:47 David Miller
2018-04-14 14:04 ` Greg KH
2018-04-10 19:39 David Miller
2018-04-10 21:26 ` Greg KH
2018-03-28 15:35 David Miller
2018-03-28 15:40 ` Willy Tarreau
2018-03-28 15:46 ` David Miller
2018-03-28 16:36 ` Greg KH
2018-03-28 16:49 ` Greg KH
2018-03-07 2:28 David Miller
2018-03-07 3:30 ` Greg KH
2018-02-06 20:19 David Miller
2018-02-07 19:39 ` Greg KH
2018-01-28 16:22 David Miller
2018-01-28 16:39 ` Greg KH
2018-01-12 21:12 David Miller
2018-01-13 9:54 ` Greg KH
2017-12-31 4:15 David Miller
2017-12-31 10:14 ` Greg KH
2017-12-12 15:44 David Miller
2017-12-14 17:51 ` Greg KH
2017-11-20 11:47 David Miller
2017-11-21 14:04 ` Greg KH
2017-11-14 6:36 David Miller
2017-11-16 14:12 ` Greg KH
2017-10-09 4:02 David Miller
2017-10-09 7:34 ` Greg KH
2017-10-09 7:56 ` Greg KH
2017-10-09 16:55 ` David Miller
2017-10-09 19:04 ` Greg KH
2017-10-09 22:54 ` David Miller
2017-10-10 14:10 ` Greg KH
2017-09-15 4:57 David Miller
2017-09-15 6:24 ` Greg KH
2018-06-07 7:00 ` Jiri Slaby
2018-06-07 9:21 ` Greg KH
2018-06-07 10:47 ` Ido Schimmel
2018-06-07 10:52 ` Greg KH
2018-07-05 16:15 ` Greg KH
2018-07-05 16:42 ` Ido Schimmel
2017-08-24 3:24 David Miller
2017-08-25 0:55 ` Greg KH
2017-08-11 5:25 David Miller
2017-08-11 16:22 ` Greg KH
2017-08-08 23:21 David Miller
2017-08-08 23:30 ` Greg KH
2017-07-17 16:44 David Miller
2017-07-17 19:23 ` Greg KH
2017-07-19 10:27 ` Greg KH
2017-06-29 16:19 David Miller
2017-06-29 17:34 ` Greg KH
2017-05-30 23:14 David Miller
2017-05-31 0:18 ` Greg KH
2017-05-11 2:41 David Miller
2017-05-11 13:10 ` Greg KH
2017-05-22 10:16 ` Greg KH
2017-04-28 19:41 David Miller
2017-04-29 6:23 ` Greg KH
2017-03-25 7:53 David Miller
2017-03-25 9:26 ` Thomas Backlund
2017-03-25 17:38 ` David Miller
2017-03-26 18:47 ` Thomas Backlund
2017-03-27 16:19 ` Greg KH
2017-03-17 1:48 David Miller
2017-03-18 14:13 ` Greg KH
2017-02-23 19:54 David Miller
2017-02-23 20:19 ` Greg KH
2017-02-13 17:15 David Miller
2017-02-15 17:21 ` Greg KH
2017-01-31 21:50 [PATCHES] networking David Miller
2017-02-01 8:10 ` Greg KH
2017-01-12 18:55 [PATCHES] Networking David Miller
2017-01-12 20:40 ` Greg KH
2016-12-07 23:43 David Miller
2016-12-08 6:34 ` Greg KH
2016-11-18 2:59 David Miller
2016-11-18 10:36 ` Greg KH
2016-11-09 17:19 David Miller
2016-11-10 15:50 ` Greg KH
2016-09-21 5:07 David Miller
2016-09-21 9:23 ` Greg KH
2016-08-12 0:50 David Miller
2016-08-12 7:37 ` Greg KH
2016-07-13 21:43 David Miller
2016-07-13 22:38 ` Greg KH
2016-07-06 5:02 David Miller
2016-07-07 0:35 ` Greg KH
2016-06-17 7:03 David Miller
2016-06-18 1:01 ` Greg KH
2016-05-16 16:35 David Miller
2016-05-16 21:50 ` Greg KH
2016-04-15 4:45 David Miller
2016-04-16 17:49 ` Greg KH
2016-02-29 21:56 David Miller
2016-02-29 22:45 ` Greg KH
2016-01-27 2:00 David Miller
2016-01-27 6:35 ` Greg KH
2015-12-22 21:51 David Miller
2016-01-19 5:20 ` Greg KH
2016-01-19 12:00 ` Josh Boyer
2016-01-19 13:29 ` Josh Boyer
2016-01-19 17:39 ` Greg KH
2016-01-19 17:41 ` Josh Boyer
2015-12-10 19:37 David Miller
2015-12-11 16:49 ` Greg KH
2015-11-13 21:38 David Miller
2015-11-14 15:59 ` Jiri Slaby
2015-11-15 17:55 ` David Miller
2015-12-06 5:25 ` Greg KH
2015-10-21 3:51 David Miller
2015-10-23 16:25 ` Greg KH
2015-09-29 4:54 David Miller
2015-09-30 3:33 ` Greg KH
2015-08-27 6:05 David Miller
2015-08-27 7:29 ` Jiri Slaby
2015-08-27 13:35 ` Luis Henriques
2015-08-27 16:34 ` David Miller
2015-09-28 14:04 ` Greg KH
2015-09-26 19:21 ` Greg KH
2015-06-10 3:01 David Miller
2015-06-10 13:26 ` Jiri Slaby
2015-06-19 18:03 ` Greg KH
2015-05-05 17:34 [PATCHES] NETWORKING David Miller
2015-05-06 6:57 ` Jiri Slaby
2015-05-08 11:14 ` Greg KH
2015-05-08 14:42 ` Greg KH
2015-04-29 4:48 [PATCHES] Networking David Miller
2015-04-29 11:09 ` Greg KH
2015-04-29 16:03 ` David Miller
2015-04-30 12:25 ` Jiri Slaby
[not found] <20150421.143012.2106864724544609194.davem@davemloft.net>
2015-04-27 9:23 ` Jiri Slaby
2015-05-04 19:53 ` Ben Hutchings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150703.153136.1941583418556562611.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).