Netdev List
 help / color / mirror / Atom feed
* [PATCH] Fix: Dereference pointer-value of sk_prot->memory_pressure
From: Eric W. Biederman @ 2013-10-23 19:58 UTC (permalink / raw)
  To: David Miller
  Cc: Christoph Paasch, fengguang.wu, netdev, linux-kernel,
	Eric Dumazet
In-Reply-To: <1382533364.7572.15.camel@edumazet-glaptop.roam.corp.google.com>

From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Wed, 23 Oct 2013 12:49:21 -0700

2e685cad57 (tcp_memcontrol: Kill struct tcp_memcontrol) falsly modified
the access to memory_pressure of sk->sk_prot->memory_pressure. The patch
did modify the memory_pressure-field of struct cg_proto, but not the one
of struct proto.

So, the access to sk_prot->memory_pressure should not be changed.

Acked-by: Eric Dumazet <edumazet@google.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---

Resent because I fat fingered and deleted Dave by accident.

 include/net/sock.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index c93542f92420..e3a18ff0c38b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1137,7 +1137,7 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
 	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
 		return !!sk->sk_cgrp->memory_pressure;
 
-	return !!sk->sk_prot->memory_pressure;
+	return !!*sk->sk_prot->memory_pressure;
 }
 
 static inline void sk_leave_memory_pressure(struct sock *sk)
-- 
1.7.5.4

^ permalink raw reply related

* Re: [PATCH] Fix: Dereference pointer-value of sk_prot->memory_pressure
From: David Miller @ 2013-10-23 20:15 UTC (permalink / raw)
  To: ebiederm; +Cc: eric.dumazet, christoph.paasch, fengguang.wu, netdev,
	linux-kernel
In-Reply-To: <87r4bbiwyh.fsf_-_@xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Wed, 23 Oct 2013 12:55:18 -0700

> From: Christoph Paasch <christoph.paasch@uclouvain.be>
> Date: Wed, 23 Oct 2013 12:49:21 -0700
> 
> 2e685cad57 (tcp_memcontrol: Kill struct tcp_memcontrol) falsly modified
> the access to memory_pressure of sk->sk_prot->memory_pressure. The patch
> did modify the memory_pressure-field of struct cg_proto, but not the one
> of struct proto.
> 
> So, the access to sk_prot->memory_pressure should not be changed.
> 
> Acked-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Applied, but I replaced "Fix: " with "net: " in the commit header line.

^ permalink raw reply

* Re: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: David Miller @ 2013-10-23 20:16 UTC (permalink / raw)
  To: antonio; +Cc: David.Laight, netdev
In-Reply-To: <20131023124401.GC1535@neomailbox.net>

From: Antonio Quartulli <antonio@meshcoding.com>
Date: Wed, 23 Oct 2013 14:44:01 +0200

> On Wed, Oct 23, 2013 at 12:18:32PM +0100, David Laight wrote:
>> > My idea is to use the following API:
>> > 
>> > rx_skb_hook(struct netpoll *np, int source, struct sk_buff *skb, int len);
>> > 
>> > Any suggestion or objection?
>> 
>> Don't you need to pass the offset of the udp data?
> 
> Yes, you are right. I just forgot it. Therefore we have:
> 
> rx_skb_hook(struct netpoll *np, int source, struct sk_buff *skb, int offset,
> 	    int len);
> 
> where offset is going to be = (udp_hdr + 1) - skb->data
> and len = skb->len - offset

This looks good to me.

^ permalink raw reply

* Re: [PATCH net-next] net: always inline net_secret_init
From: David Miller @ 2013-10-23 20:27 UTC (permalink / raw)
  To: hannes; +Cc: netdev
In-Reply-To: <20131023064450.GA26236@order.stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 23 Oct 2013 08:44:50 +0200

> Currently net_secret_init does not get inlined, so we always have a call
> to net_secret_init even in the fast path.
> 
> Let's specify net_secret_init as __always_inline so we have the nop in
> the fast-path without the call to net_secret_init and the unlikely path
> at the epilogue of the function.
> 
> jump_labels handle the inlining correctly.
> 
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied, thanks Hannes.

^ permalink raw reply

* Re: [PATCH] sh_eth: add/use RMCR.RNC bit
From: David Miller @ 2013-10-23 20:50 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, nobuhiro.iwamatsu.yj, linux-sh, horms
In-Reply-To: <5266F94E.9030406@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Wed, 23 Oct 2013 02:16:46 +0400

> Hello.
> 
> On 10/16/2013 02:29 AM, Sergei Shtylyov wrote:
> 
>> Declare 'enum EMCR_BIT' containing the single member for the RMCR.RNC
>> bit and
> 
>    Hm, looks like I typoed here, should have been RMCR_BIT. David, should
>    I resubmit or you can fix it while applying? Or simply not worth the
>    trouble?

Applied, with the typo fixed, thanks.

^ permalink raw reply

* Re: [PATCH 0/3] netfilter fixes for net
From: David Miller @ 2013-10-23 20:56 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1382519724-3953-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 23 Oct 2013 11:15:21 +0200

> The following patchset contains three netfilter fixes for your net
> tree, they are:
> 
> * A couple of fixes to resolve info leak to userspace due to uninitialized
>   memory area in ulogd, from Mathias Krause.
> 
> * Fix instruction ordering issues that may lead to the access of
>   uninitialized data in x_tables. The problem involves the table update
>  (producer) and the main packet matching (consumer) routines. Detected in
>   SMP ARMv7, from Will Deacon.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

Pulled, thanks Pablo.

^ permalink raw reply

* Re: [PATCH net] net: sctp: fix ASCONF to allow non SCTP_ADDR_SRC addresses in ipv6
From: David Miller @ 2013-10-23 20:57 UTC (permalink / raw)
  To: dborkman; +Cc: netdev, linux-sctp, micchie
In-Reply-To: <1382459696-1732-1-git-send-email-dborkman@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Tue, 22 Oct 2013 18:34:56 +0200

> Commit 8a07eb0a50 ("sctp: Add ASCONF operation on the single-homed host")
> implemented possible use of IPv4 addresses with non SCTP_ADDR_SRC state
> as source address when sending ASCONF (ADD) packets, but IPv6 part for
> that was not implemented in 8a07eb0a50. Therefore, as this is not restricted
> to IPv4-only, fix this up to allow the same for IPv6 addresses in SCTP.
> 
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Michio Honda <micchie@sfc.wide.ad.jp>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/3] initialize fragment hash secrets with net_get_random_once
From: David Miller @ 2013-10-23 21:02 UTC (permalink / raw)
  To: hannes; +Cc: netdev, netfilter-devel
In-Reply-To: <1382519217-750-1-git-send-email-hannes@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 23 Oct 2013 11:06:54 +0200

> This series switches the inet_frag.rnd hash initialization to
> net_get_random_once.
> 
> Included patches:
>  ipv4: initialize ip4_frags hash secret as late
>  ipv6: split inet6_hash_frag for netfilter and
>  inet: remove old fragmentation hash initializing

Looks good, series applied, thanks Hannes.

^ permalink raw reply

* Re: [PATCH net-next] fix rtnl notification in atomic context
From: Stephen Hemminger @ 2013-10-23 21:03 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David S. Miller, Nicolas Dichtel, Cong Wang, netdev
In-Reply-To: <1382553161-3498-1-git-send-email-ast@plumgrid.com>

On Wed, 23 Oct 2013 11:32:41 -0700
Alexei Starovoitov <ast@plumgrid.com> wrote:

> +
> +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
> +{
> +	__rtmsg_ifinfo(type, dev, change, GFP_KERNEL);
> +}
>  EXPORT_SYMBOL(rtmsg_ifinfo);
>  
>  static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
> -- 

Why add another wrapper function? I think it cleaner to just change all the
callers to use the correct gfp flags.

^ permalink raw reply

* Re: [PATCH net-next] fix rtnl notification in atomic context
From: David Miller @ 2013-10-23 21:09 UTC (permalink / raw)
  To: stephen; +Cc: ast, nicolas.dichtel, amwang, netdev
In-Reply-To: <20131023140343.4604d80d@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 23 Oct 2013 14:03:43 -0700

> On Wed, 23 Oct 2013 11:32:41 -0700
> Alexei Starovoitov <ast@plumgrid.com> wrote:
> 
>> +
>> +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
>> +{
>> +	__rtmsg_ifinfo(type, dev, change, GFP_KERNEL);
>> +}
>>  EXPORT_SYMBOL(rtmsg_ifinfo);
>>  
>>  static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
>> -- 
> 
> Why add another wrapper function? I think it cleaner to just change all the
> callers to use the correct gfp flags.

Indeed, if this were targetted to "net" we'd have the argument of trying
to simplify the patch for -stable inclusion.

But since this is going into net-next, let's just put explicit GFP_* args
at the call site.

^ permalink raw reply

* Re: pull request: batman-adv 2013-10-23
From: David Miller @ 2013-10-23 21:13 UTC (permalink / raw)
  To: antonio; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <1382544303-2694-1-git-send-email-antonio@meshcoding.com>

From: Antonio Quartulli <antonio@meshcoding.com>
Date: Wed, 23 Oct 2013 18:04:47 +0200

> this is another set of changes intended for net-next/linux-3.13.
> (probably our last pull request for this cycle)
> 
> Patches 1 and 2 reshape two of our main data structures in a way that they can
> easily be extended in the future to accommodate new routing protocols.
> 
> Patches from 3 to 9 improve our routing protocol API and its users so that all
> the protocol-related code is not mixed up with the other components anymore.
> 
> Patch 10 limits the local Translation Table maximum size to a value such that it
> can be fully transfered over the air if needed. This value depends on
> fragmentation being enabled or not and on the mtu values.
> 
> Patch 11 makes batman-adv send a uevent in case of soft-interface destruction
> while a "bat-Gateway" was configured (this informs userspace about the GW not
> being available anymore).
> 
> Patches 13 and 14 enable the TT component to detect non-mesh client flag
> changes at runtime (till now those flags where set upon client detection and
> were not changed anymore).
> 
> Patch 16 is a generalisation of our user-to-kernel space communication (and
> viceversa) used to exchange ICMP packets to send/received to/from the mesh
> network. Now it can easily accommodate new ICMP packet types without breaking
> the existing userspace API anymore.
> 
> Remaining patches are minor changes and cleanups.

Pulled, thanks Antonio.

^ permalink raw reply

* Re: [PATCH net-next] fix rtnl notification in atomic context
From: Alexei Starovoitov @ 2013-10-23 21:25 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, nicolas.dichtel, amwang, netdev
In-Reply-To: <20131023.170919.2254167416151180538.davem@davemloft.net>

On Wed, Oct 23, 2013 at 2:09 PM, David Miller <davem@davemloft.net> wrote:
> From: Stephen Hemminger <stephen@networkplumber.org>
> Date: Wed, 23 Oct 2013 14:03:43 -0700
>
>> On Wed, 23 Oct 2013 11:32:41 -0700
>> Alexei Starovoitov <ast@plumgrid.com> wrote:
>>
>>> +
>>> +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
>>> +{
>>> +    __rtmsg_ifinfo(type, dev, change, GFP_KERNEL);
>>> +}
>>>  EXPORT_SYMBOL(rtmsg_ifinfo);
>>>
>>>  static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
>>> --
>>
>> Why add another wrapper function? I think it cleaner to just change all the
>> callers to use the correct gfp flags.
>
> Indeed, if this were targetted to "net" we'd have the argument of trying
> to simplify the patch for -stable inclusion.
>
> But since this is going into net-next, let's just put explicit GFP_* args
> at the call site.

sure. Will respin.

^ permalink raw reply

* [PATCHv2 net] netpoll: fix rx_hook() interface by passing the skb
From: Antonio Quartulli @ 2013-10-23 21:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, David.Laight, Antonio Quartulli
In-Reply-To: <20131023.161603.1190144528425577653.davem@davemloft.net>

Right now skb->data is passed to rx_hook() even if the skb
has not been linearised and without giving rx_hook() a way
to linearise it.

Change the rx_hook() interface and make it accept the skb
and the offset to the UDP payload as arguments. rx_hook() is
also renamed to rx_skb_hook() to ensure that out of the tree
users notice the API change.

In this way any rx_skb_hook() implementation can perform all
the needed operations to properly (and safely) access the
skb data.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 include/linux/netpoll.h |  5 +++--
 net/core/netpoll.c      | 31 ++++++++++++++++++-------------
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index f3c7c24..fbfdb9d 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -24,7 +24,8 @@ struct netpoll {
 	struct net_device *dev;
 	char dev_name[IFNAMSIZ];
 	const char *name;
-	void (*rx_hook)(struct netpoll *, int, char *, int);
+	void (*rx_skb_hook)(struct netpoll *np, int source, struct sk_buff *skb,
+			    int offset, int len);
 
 	union inet_addr local_ip, remote_ip;
 	bool ipv6;
@@ -41,7 +42,7 @@ struct netpoll_info {
 	unsigned long rx_flags;
 	spinlock_t rx_lock;
 	struct semaphore dev_lock;
-	struct list_head rx_np; /* netpolls that registered an rx_hook */
+	struct list_head rx_np; /* netpolls that registered an rx_skb_hook */
 
 	struct sk_buff_head neigh_tx; /* list of neigh requests to reply to */
 	struct sk_buff_head txq;
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index fc75c9e..8f97199 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -636,8 +636,9 @@ static void netpoll_neigh_reply(struct sk_buff *skb, struct netpoll_info *npinfo
 
 			netpoll_send_skb(np, send_skb);
 
-			/* If there are several rx_hooks for the same address,
-			   we're fine by sending a single reply */
+			/* If there are several rx_skb_hooks for the same
+			 * address we're fine by sending a single reply
+			 */
 			break;
 		}
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
@@ -719,8 +720,9 @@ static void netpoll_neigh_reply(struct sk_buff *skb, struct netpoll_info *npinfo
 
 			netpoll_send_skb(np, send_skb);
 
-			/* If there are several rx_hooks for the same address,
-			   we're fine by sending a single reply */
+			/* If there are several rx_skb_hooks for the same
+			 * address, we're fine by sending a single reply
+			 */
 			break;
 		}
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
@@ -756,11 +758,12 @@ static bool pkt_is_ns(struct sk_buff *skb)
 
 int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
 {
-	int proto, len, ulen;
-	int hits = 0;
+	int proto, len, ulen, data_len;
+	int hits = 0, offset;
 	const struct iphdr *iph;
 	struct udphdr *uh;
 	struct netpoll *np, *tmp;
+	uint16_t source;
 
 	if (list_empty(&npinfo->rx_np))
 		goto out;
@@ -820,7 +823,10 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
 
 		len -= iph->ihl*4;
 		uh = (struct udphdr *)(((char *)iph) + iph->ihl*4);
+		offset = (unsigned char *)(uh + 1) - skb->data;
 		ulen = ntohs(uh->len);
+		data_len = skb->len - offset;
+		source = ntohs(uh->source);
 
 		if (ulen != len)
 			goto out;
@@ -834,9 +840,7 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
 			if (np->local_port && np->local_port != ntohs(uh->dest))
 				continue;
 
-			np->rx_hook(np, ntohs(uh->source),
-				       (char *)(uh+1),
-				       ulen - sizeof(struct udphdr));
+			np->rx_skb_hook(np, source, skb, offset, data_len);
 			hits++;
 		}
 	} else {
@@ -859,7 +863,10 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
 		if (!pskb_may_pull(skb, sizeof(struct udphdr)))
 			goto out;
 		uh = udp_hdr(skb);
+		offset = (unsigned char *)(uh + 1) - skb->data;
 		ulen = ntohs(uh->len);
+		data_len = skb->len - offset;
+		source = ntohs(uh->source);
 		if (ulen != skb->len)
 			goto out;
 		if (udp6_csum_init(skb, uh, IPPROTO_UDP))
@@ -872,9 +879,7 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
 			if (np->local_port && np->local_port != ntohs(uh->dest))
 				continue;
 
-			np->rx_hook(np, ntohs(uh->source),
-				       (char *)(uh+1),
-				       ulen - sizeof(struct udphdr));
+			np->rx_skb_hook(np, source, skb, offset, data_len);
 			hits++;
 		}
 #endif
@@ -1062,7 +1067,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp)
 
 	npinfo->netpoll = np;
 
-	if (np->rx_hook) {
+	if (np->rx_skb_hook) {
 		spin_lock_irqsave(&npinfo->rx_lock, flags);
 		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
 		list_add_tail(&np->rx, &npinfo->rx_np);
-- 
1.8.4

^ permalink raw reply related

* Re: -27% netperf TCP_STREAM regression by "tcp_memcontrol: Kill struct tcp_memcontrol"
From: Fengguang Wu @ 2013-10-23 22:07 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Eric W. Biederman, David Miller, netdev, linux-kernel
In-Reply-To: <20131023122543.GH5132@cpaasch-mac>

> -       return !!sk->sk_prot->memory_pressure;
> +       return !!*sk->sk_prot->memory_pressure;

Good catch, Christoph! With no surprise, it restores the performance:

    a4fe34bf902b8f709c63      2e685cad57906e19add7      a235435d612680e595ea  
------------------------  ------------------------  ------------------------  
                  707.40       -41.0%       417.50        -8.8%       645.00  lkp-nex04/micro/netperf/120s-200%-TCP_STREAM
                 2775.60       -23.7%      2116.50        +2.1%      2834.00  lkp-sb03/micro/netperf/120s-200%-TCP_STREAM
                 3483.00       -27.2%      2534.00        -0.1%      3479.00  TOTAL netperf.Throughput_Mbps

It's a bit late, but

Tested-by: Fengguang Wu <fengguang.wu@intel.com>

Thanks,
Fengguang

^ permalink raw reply

* Re: Big performance loss from 3.4.63 to 3.10.13 when routing ipv4
From: Wolfgang Walter @ 2013-10-23 22:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, hannes, netdev, klassert
In-Reply-To: <1382547992.7572.31.camel@edumazet-glaptop.roam.corp.google.com>

On Wednesday 23 October 2013 10:06:32 Eric Dumazet wrote:
> On Wed, 2013-10-23 at 18:59 +0200, Wolfgang Walter wrote:
> > Ah, ok. I use SLUB, but SLABINFO=y.
> > 
> > Without much traffic it is:
> > 
> > # grep dst /proc/slabinfo
> > xfrm_dst_cache      4435   4608    448   36    4 : tunables    0    0    0
> > : slabdata    128    128      0
> > 
> > on the big one.
> > 
> > I can recompile the kernels with SLAB instead of SLUB if SLAB gives more
> > usefull infos.
> Not needed, because it seems we do not merge this SLUB cache with
> another one.

Ok. I can't see xfrm_dst_cache on 32bit-systems, though.

> 
> So please post this information, because I believe the default should be
> 65536, not 1024 or 4096
> 

Indeed I already saw higher values, at the moment I see:

# while true; do grep dst /proc/slabinfo ; sleep 1; done
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12636  12636    448   36    4 : tunables    0    0    0 : slabdata    351    351      0
xfrm_dst_cache     12708  12708    448   36    4 : tunables    0    0    0 : slabdata    353    353      0
xfrm_dst_cache     11529  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11599  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11633  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11633  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11633  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11700  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11763  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11798  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     11964  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     12139  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     12244  12276    448   36    4 : tunables    0    0    0 : slabdata    341    341      0
xfrm_dst_cache     12312  12312    448   36    4 : tunables    0    0    0 : slabdata    342    342      0
xfrm_dst_cache     12492  12492    448   36    4 : tunables    0    0    0 : slabdata    347    347      0



Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply

* [PATCH v2 net-next] fix rtnl notification in atomic context
From: Alexei Starovoitov @ 2013-10-23 23:02 UTC (permalink / raw)
  To: David S. Miller; +Cc: Nicolas Dichtel, Cong Wang, Veaceslav Falico, netdev

commit 991fb3f74c "dev: always advertise rx_flags changes via netlink"
introduced rtnl notification from __dev_set_promiscuity(),
which can be called in atomic context.

Steps to reproduce:
ip tuntap add dev tap1 mode tap
ifconfig tap1 up
tcpdump -nei tap1 &
ip tuntap del dev tap1 mode tap

[  271.627994] device tap1 left promiscuous mode
[  271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940
[  271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
[  271.677525] INFO: lockdep is turned off.
[  271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G        W    3.12.0-rc3+ #73
[  271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[  271.731254]  ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428
[  271.760261]  ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8
[  271.790683]  0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8
[  271.822332] Call Trace:
[  271.838234]  [<ffffffff817544e5>] dump_stack+0x55/0x76
[  271.854446]  [<ffffffff8108bad1>] __might_sleep+0x181/0x240
[  271.870836]  [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
[  271.887076]  [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
[  271.903368]  [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
[  271.919716]  [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
[  271.936088]  [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
[  271.952504]  [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
[  271.968902]  [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
[  271.985302]  [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
[  272.001642]  [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
[  272.017917]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.033961]  [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
[  272.049855]  [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
[  272.065494]  [<ffffffff81732052>] packet_notifier+0x1b2/0x380
[  272.080915]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.096009]  [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
[  272.110803]  [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
[  272.125468]  [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
[  272.139984]  [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
[  272.154523]  [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
[  272.168552]  [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
[  272.182263]  [<ffffffff81622641>] rollback_registered+0x31/0x40
[  272.195369]  [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
[  272.208230]  [<ffffffff81547ca0>] __tun_detach+0x140/0x340
[  272.220686]  [<ffffffff81547ed6>] tun_chr_close+0x36/0x60

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 drivers/net/bonding/bond_main.c |    4 ++--
 include/linux/rtnetlink.h       |    2 +-
 net/core/dev.c                  |   16 ++++++++--------
 net/core/rtnetlink.c            |    9 +++++----
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2daa066..a141f40 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1213,7 +1213,7 @@ static int bond_master_upper_dev_link(struct net_device *bond_dev,
 	if (err)
 		return err;
 	slave_dev->flags |= IFF_SLAVE;
-	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE);
+	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE, GFP_KERNEL);
 	return 0;
 }
 
@@ -1222,7 +1222,7 @@ static void bond_upper_dev_unlink(struct net_device *bond_dev,
 {
 	netdev_upper_dev_unlink(slave_dev, bond_dev);
 	slave_dev->flags &= ~IFF_SLAVE;
-	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE);
+	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE, GFP_KERNEL);
 }
 
 /* enslave device <slave> to bond device <master> */
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index f28544b..939428a 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -15,7 +15,7 @@ extern int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics);
 extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
 			      u32 id, long expires, u32 error);
 
-extern void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change);
+void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t flags);
 
 /* RTNL is used as a global lock for all changes to network configuration  */
 extern void rtnl_lock(void);
diff --git a/net/core/dev.c b/net/core/dev.c
index 0918aad..5d7e821 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1203,7 +1203,7 @@ void netdev_state_change(struct net_device *dev)
 {
 	if (dev->flags & IFF_UP) {
 		call_netdevice_notifiers(NETDEV_CHANGE, dev);
-		rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
+		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
 	}
 }
 EXPORT_SYMBOL(netdev_state_change);
@@ -1293,7 +1293,7 @@ int dev_open(struct net_device *dev)
 	if (ret < 0)
 		return ret;
 
-	rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
+	rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
 	call_netdevice_notifiers(NETDEV_UP, dev);
 
 	return ret;
@@ -1371,7 +1371,7 @@ static int dev_close_many(struct list_head *head)
 	__dev_close_many(head);
 
 	list_for_each_entry_safe(dev, tmp, head, close_list) {
-		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
+		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
 		call_netdevice_notifiers(NETDEV_DOWN, dev);
 		list_del_init(&dev->close_list);
 	}
@@ -5257,7 +5257,7 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
 	unsigned int changes = dev->flags ^ old_flags;
 
 	if (gchanges)
-		rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges);
+		rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
 
 	if (changes & IFF_UP) {
 		if (dev->flags & IFF_UP)
@@ -5489,7 +5489,7 @@ static void rollback_registered_many(struct list_head *head)
 
 		if (!dev->rtnl_link_ops ||
 		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
-			rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
+			rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
 
 		/*
 		 *	Flush the unicast and multicast chains
@@ -5888,7 +5888,7 @@ int register_netdevice(struct net_device *dev)
 	 */
 	if (!dev->rtnl_link_ops ||
 	    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
-		rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);
+		rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
 
 out:
 	return ret;
@@ -6500,7 +6500,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
 	rcu_barrier();
 	call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
-	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
+	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
 
 	/*
 	 *	Flush the unicast and multicast chains
@@ -6539,7 +6539,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	 *	Prevent userspace races by waiting until the network
 	 *	device is fully setup before sending notifications.
 	 */
-	rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);
+	rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
 
 	synchronize_net();
 	err = 0;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4aedf03..cf67144 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1984,14 +1984,15 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
+void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
+		  gfp_t flags)
 {
 	struct net *net = dev_net(dev);
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
 	size_t if_info_size;
 
-	skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), GFP_KERNEL);
+	skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), flags);
 	if (skb == NULL)
 		goto errout;
 
@@ -2002,7 +2003,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
 		kfree_skb(skb);
 		goto errout;
 	}
-	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
+	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
 	return;
 errout:
 	if (err < 0)
@@ -2716,7 +2717,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
 	case NETDEV_JOIN:
 		break;
 	default:
-		rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
+		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
 		break;
 	}
 	return NOTIFY_DONE;
-- 
1.7.9.5

^ permalink raw reply related

* Re: [virtio-net] BUG: sleeping function called from invalid context at kernel/mutex.c:616
From: Fengguang Wu @ 2013-10-23 23:20 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <526638D4.1030403@redhat.com>

Hi Jason,

On Tue, Oct 22, 2013 at 04:35:32PM +0800, Jason Wang wrote:
> On 10/20/2013 10:34 AM, Fengguang Wu wrote:
> > Greetings,
> >
> > I got the below dmesg and the first bad commit is
> >
> > commit 3ab098df35f8b98b6553edc2e40234af512ba877
> > Author: Jason Wang <jasowang@redhat.com>
> > Date:   Tue Oct 15 11:18:58 2013 +0800
> >
> >     virtio-net: don't respond to cpu hotplug notifier if we're not ready
> >     
> >     We're trying to re-configure the affinity unconditionally in cpu hotplug
> >     callback. This may lead the issue during resuming from s3/s4 since
> >     
> >     - virt queues haven't been allocated at that time.
> >     - it's unnecessary since thaw method will re-configure the affinity.
> >     
> >     Fix this issue by checking the config_enable and do nothing is we're not ready.
> >     
> >     The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
> >     (virtio-net: reset virtqueue affinity when doing cpu hotplug).
> >     
> >     Cc: Rusty Russell <rusty@rustcorp.com.au>
> >     Cc: Michael S. Tsirkin <mst@redhat.com>
> >     Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> >     Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >     Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> >     Signed-off-by: Jason Wang <jasowang@redhat.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > [  622.944441] CPU0 attaching NULL sched-domain.
> > [  622.944446] CPU1 attaching NULL sched-domain.
> > [  622.944485] CPU0 attaching NULL sched-domain.
> > [  622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616
> > [  622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1
> > [  622.950796] no locks held by migration/1/10.
> > [  622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317
> > [  622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [  622.950802]  0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70
> > [  622.950803]  ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed
> > [  622.950805]  0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000
> > [  622.950805] Call Trace:
> > [  622.950810]  [<ffffffff81a32f22>] dump_stack+0x54/0x74
> > [  622.950815]  [<ffffffff810edb02>] __might_sleep+0x112/0x114
> > [  622.950817]  [<ffffffff81a396ed>] mutex_lock_nested+0x3c/0x3c6
> > [  622.950818]  [<ffffffff810e861d>] ? up+0x39/0x3e
> > [  622.950821]  [<ffffffff8153ea7c>] ? acpi_os_signal_semaphore+0x21/0x2d
> > [  622.950824]  [<ffffffff81565ed1>] ? acpi_ut_release_mutex+0x5e/0x62
> > [  622.950828]  [<ffffffff816d04ec>] virtnet_cpu_callback+0x33/0x87
> > [  622.950830]  [<ffffffff81a42576>] notifier_call_chain+0x3c/0x5e
> > [  622.950832]  [<ffffffff810e86a8>] __raw_notifier_call_chain+0xe/0x10
> > [  622.950835]  [<ffffffff810c5556>] __cpu_notify+0x20/0x37
> > [  622.950836]  [<ffffffff810c5580>] cpu_notify+0x13/0x15
> > [  622.950838]  [<ffffffff81a237cd>] take_cpu_down+0x27/0x3a
> > [  622.950841]  [<ffffffff81136289>] stop_machine_cpu_stop+0x93/0xf1
> > [  622.950842]  [<ffffffff81136167>] cpu_stopper_thread+0xa0/0x12f
> > [  622.950844]  [<ffffffff811361f6>] ? cpu_stopper_thread+0x12f/0x12f
> > [  622.950847]  [<ffffffff81119710>] ? lock_release_holdtime.part.7+0xa3/0xa8
> > [  622.950848]  [<ffffffff81135e4b>] ? cpu_stop_should_run+0x3f/0x47
> > [  622.950850]  [<ffffffff810ea9b0>] smpboot_thread_fn+0x1c5/0x1e3
> > [  622.950852]  [<ffffffff810ea7eb>] ? lg_global_unlock+0x67/0x67
> > [  622.950854]  [<ffffffff810e36b7>] kthread+0xd8/0xe0
> > [  622.950857]  [<ffffffff81a3bfad>] ? wait_for_common+0x12f/0x164
> > [  622.950859]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
> > [  622.950861]  [<ffffffff81a45ffc>] ret_from_fork+0x7c/0xb0
> > [  622.950862]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
> > [  622.950876] smpboot: CPU 1 is now offline
> > [  623.194556] SMP alternatives: lockdep: fixing up alternatives
> > [  623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1
>  
> Thanks for the testing Fengguang, could you please try the attached
> patch to see if it works?

Yes it reduces the sleeping function bug:

/kernel/x86_64-lkp-CONFIG_SCSI_DEBUG/7c4ed2767afb813493b0a8fb18d666cd44550963

+------------------------------------------------------------------------------------+-----------+--------------+--------------+
|                                                                                    | v3.12-rc3 | 3ab098df35f8 | 7c4ed2767afb |
+------------------------------------------------------------------------------------+-----------+--------------+--------------+
| good_boots                                                                         | 30        | 0            | 93           |
| has_kernel_error_warning                                                           | 0         | 20           | 7            |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/mutex.c                | 0         | 20           |              |
| INFO:rcu_sched_self-detected_stall_on_CPU(t=jiffies_g=c=q=)                        | 0         | 0            | 1            |
| INFO:task_blocked_for_more_than_seconds                                            | 0         | 0            | 2            |
| INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 0         | 0            | 1            |
| Kernel_panic-not_syncing:hung_task:blocked_tasks                                   | 0         | 0            | 1            |
| BUG:kernel_test_crashed                                                            | 0         | 0            | 3            |
| BUG:kernel_test_hang                                                               | 0         | 0            | 1            |
+------------------------------------------------------------------------------------+-----------+--------------+--------------+

However I'll need to increase tests on v3.12-rc3 to make sure it's not
the patch that added the other error messages.

Thanks,
Fengguang

> >From 01e6c3f71c202aa02e4feda169e7cc9fb24193f5 Mon Sep 17 00:00:00 2001
> From: Jason Wang <jasowang@redhat.com>
> Date: Mon, 21 Oct 2013 20:39:09 +0800
> Subject: [PATCH] virtio-net: fix
> 
> ---
>  drivers/net/virtio_net.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9fbdfcd..bbc9cb8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1118,11 +1118,6 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
>  {
>  	struct virtnet_info *vi = container_of(nfb, struct virtnet_info, nb);
>  
> -	mutex_lock(&vi->config_lock);
> -
> -	if (!vi->config_enable)
> -		goto done;
> -
>  	switch(action & ~CPU_TASKS_FROZEN) {
>  	case CPU_ONLINE:
>  	case CPU_DOWN_FAILED:
> @@ -1136,8 +1131,6 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
>  		break;
>  	}
>  
> -done:
> -	mutex_unlock(&vi->config_lock);
>  	return NOTIFY_OK;
>  }
>  
> @@ -1699,6 +1692,8 @@ static int virtnet_freeze(struct virtio_device *vdev)
>  	struct virtnet_info *vi = vdev->priv;
>  	int i;
>  
> +	unregister_hotcpu_notifier(&vi->nb);
> +
>  	/* Prevent config work handler from accessing the device */
>  	mutex_lock(&vi->config_lock);
>  	vi->config_enable = false;
> @@ -1747,6 +1742,10 @@ static int virtnet_restore(struct virtio_device *vdev)
>  	virtnet_set_queues(vi, vi->curr_queue_pairs);
>  	rtnl_unlock();
>  
> +	err = register_hotcpu_notifier(&vi->nb);
> +	if (err)
> +		return err;
> +
>  	return 0;
>  }
>  #endif
> -- 
> 1.8.1.2
> 

^ permalink raw reply

* Deadlock in BPF JIT functions when running upowerd?
From: Darrick J. Wong @ 2013-10-24  1:17 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, darrick.wong; +Cc: netdev, linux-kernel

Hi,

I've been observing a softlockup with 3.11.6 and 3.12-rc6.  It looks like
there's a deadlock occurring on purge_lock in __purge_vmap_area_lazy().  In
short, the BPF JIT code has been changed[1] to call set_memory_r[ow]() when
compiling and freeing JIT bytecode memory.  It seems that it's possible for
upowerd to be compiling some BPF program and call __purge_vmap_area_lazy, then
the timer interrupt comes in (due to the IPI?) and a softirq calls
bpf_jit_free, which also calls __purge_vmap_area_lazy.

I'm not really sure who's at fault here--is this a BPF bug?

[1] 314beb9bcabfd6b4542ccbced2402af2c6f6142a
    "x86: bpf_jit_comp: secure bpf jit against spraying attacks"

--D

Here's what 3.11.6 spits out; the 3.12-rc6 message has the same traceback.

[   52.370437] BUG: soft lockup - CPU#3 stuck for 22s! [upowerd:8359]
[   52.370440] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 xt_conntrack xt_CHECKSUM iptable_mangle fuse tun microcode nfsd nfs_acl exportfs auth_rpcgss nfs lockd sunrpc af_packet xt_physdev xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_sctp xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables sch_fq_codel bridge stp llc lpc_ich mfd_core loop bcache dm_crypt zlib_deflate libcrc32c firewire_ohci firewire_core usb_storage mpt2sas scsi_transport_sas raid_class
[   52.370471] CPU: 3 PID: 8359 Comm: upowerd Not tainted 3.11.6-60-flax #1
[   52.370472] Hardware name: OEM OEM/131-GT-E767, BIOS 6.00 PG 08/25/2011
[   52.370474] task: ffff8806621f9700 ti: ffff88064b6a0000 task.ti: ffff88064b6a0000
[   52.370475] RIP: 0010:[<ffffffff816b5a22>]  [<ffffffff816b5a22>] _raw_spin_lock+0x32/0x40
[   52.370480] RSP: 0018:ffff88067fc63c10  EFLAGS: 00000297
[   52.370481] RAX: 0000000000000061 RBX: ffff88065a318600 RCX: 0000000000000000
[   52.370483] RDX: 0000000000000062 RSI: ffff88067fc63ce0 RDI: ffffffff81ea42bc
[   52.370484] RBP: ffff88067fc63c10 R08: ffffffff81cdd608 R09: 0000000000000000
[   52.370485] R10: ffff88067fc6d8e0 R11: 0000000000000000 R12: ffff88067fc63b88
[   52.370486] R13: ffffffff816b7a47 R14: ffff88067fc63c10 R15: ffff88067fc63cd8
[   52.370487] FS:  00007f55fff297c0(0000) GS:ffff88067fc60000(0000) knlGS:0000000000000000
[   52.370488] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   52.370489] CR2: 00007f55fff47000 CR3: 000000065dd10000 CR4: 00000000000007e0
[   52.370490] Stack:
[   52.370491]  ffff88067fc63cb0 ffffffff811955fd 0000000000000096 0000000000000347
[   52.370494]  00000000000003c1 0000000000000001 0000000000000000 0000000000000000
[   52.370496]  0000000000000033 ffff88067fc63c58 ffff88067fc63c58 0000000000000001
[   52.370499] Call Trace:
[   52.370500]  <IRQ> 
[   52.370501]  [<ffffffff811955fd>] __purge_vmap_area_lazy+0x12d/0x4c0
[   52.370507]  [<ffffffff8119612c>] vm_unmap_aliases+0x17c/0x190
[   52.370512]  [<ffffffff81079814>] change_page_attr_set_clr+0xb4/0x4a0
[   52.370516]  [<ffffffff810a927e>] ? irq_exit+0x7e/0xb0
[   52.370519]  [<ffffffff81048e44>] ? smp_irq_work_interrupt+0x34/0x40
[   52.370522]  [<ffffffff81079d8f>] set_memory_rw+0x2f/0x40
[   52.370525]  [<ffffffff810a0a7c>] bpf_jit_free+0x2c/0x40
[   52.370528]  [<ffffffff815f48aa>] sk_filter_release_rcu+0x1a/0x30
[   52.370532]  [<ffffffff811262d2>] rcu_process_callbacks+0x1e2/0x5b0
[   52.370535]  [<ffffffff810c9999>] ? enqueue_hrtimer+0x39/0xf0
[   52.370537]  [<ffffffff810a8f20>] __do_softirq+0xe0/0x2f0
[   52.370541]  [<ffffffff816b851c>] call_softirq+0x1c/0x30
[   52.370543]  [<ffffffff81046155>] do_softirq+0x55/0x90
[   52.370545]  [<ffffffff810a928e>] irq_exit+0x8e/0xb0
[   52.370547]  [<ffffffff816b8b0a>] smp_apic_timer_interrupt+0x4a/0x60
[   52.370549]  [<ffffffff816b7a47>] apic_timer_interrupt+0x67/0x70
[   52.370550]  <EOI> 
[   52.370552]  [<ffffffff8106eeb4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xe0
[   52.370559]  [<ffffffff81188af7>] ? handle_pte_fault+0x567/0x920
[   52.370561]  [<ffffffff8107cf30>] ? rbt_memtype_copy_nth_element+0xc0/0xc0
[   52.370563]  [<ffffffff81072057>] physflat_send_IPI_allbutself+0x17/0x20
[   52.370566]  [<ffffffff8106a992>] native_send_call_func_ipi+0x72/0x80
[   52.370568]  [<ffffffff8107cf30>] ? rbt_memtype_copy_nth_element+0xc0/0xc0
[   52.370570]  [<ffffffff81105834>] smp_call_function_many+0x1f4/0x290
[   52.370572]  [<ffffffff81105a8a>] smp_call_function+0x3a/0x60
[   52.370574]  [<ffffffff8107cf30>] ? rbt_memtype_copy_nth_element+0xc0/0xc0
[   52.370576]  [<ffffffff81105b18>] on_each_cpu+0x38/0x80
[   52.370578]  [<ffffffff8107d59d>] flush_tlb_kernel_range+0x6d/0x70
[   52.370581]  [<ffffffff81195916>] __purge_vmap_area_lazy+0x446/0x4c0
[   52.370584]  [<ffffffff81228e85>] ? ext4_file_open+0x75/0x1b0
[   52.370586]  [<ffffffff8119612c>] vm_unmap_aliases+0x17c/0x190
[   52.370590]  [<ffffffff81079814>] change_page_attr_set_clr+0xb4/0x4a0
[   52.370592]  [<ffffffff81196ac2>] ? map_vm_area+0x32/0x50
[   52.370595]  [<ffffffff81197761>] ? __vmalloc_node_range+0x121/0x1f0
[   52.370597]  [<ffffffff810a08ab>] ? bpf_jit_compile+0x105b/0x1200
[   52.370600]  [<ffffffff81079d4f>] set_memory_ro+0x2f/0x40
[   52.370602]  [<ffffffff810744ca>] ? module_alloc+0x5a/0x60
[   52.370604]  [<ffffffff810a081c>] bpf_jit_compile+0xfcc/0x1200
[   52.370607]  [<ffffffff811aa75b>] ? __kmalloc+0x18b/0x1f0
[   52.370610]  [<ffffffff811aa606>] ? __kmalloc+0x36/0x1f0
[   52.370612]  [<ffffffff815f4b43>] ? sk_chk_filter+0x283/0x390
[   52.370614]  [<ffffffff815f4d4b>] sk_attach_filter+0xfb/0x1b0
[   52.370617]  [<ffffffff815d071d>] sock_setsockopt+0x4fd/0x900
[   52.370620]  [<ffffffff811d2342>] ? fget_light+0x92/0x100
[   52.370623]  [<ffffffff815cbdd6>] SyS_setsockopt+0xc6/0xd0
[   52.370625]  [<ffffffff816b6dc6>] system_call_fastpath+0x1a/0x1f
[   52.370626] Code: 89 e5 65 48 8b 04 25 f0 b8 00 00 83 80 44 e0 ff ff 01 b8 00 01 00 00 f0 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 <0f> b6 07 38 d0 75 f7 5d c3 0f 1f 44 00 00 66 66 66 66 90 55 48 

^ permalink raw reply

* 16% regression on 10G caused by TCP small queues
From: Stephen Hemminger @ 2013-10-24  2:29 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, Dave Täht; +Cc: netdev

In the course of testing routing functionality, I discovered a that the single flow TCP
throughput was much worse than expected. At first, it looked like a router problem,
or maybe because one end was a FreeBSD system (which has noticeably slower TCP performance).
But reducing it down to two systems directly connected over 10G (ixgbe) found the problem.

With a single TCP flow, in 3.5 kernel the performance with iperf is 9.41 Gbit/sec
which is at the link limit for TCP with timestamps etc. But in 3.6 and later the
throughput dropped to 7.9 Gbit/sec which is a regression of 16%.

Doing bisect shows that the commit causing this is:

  commit 46d3ceabd8d98ed0ad10f20c595ca784e34786c5
  Author: Eric Dumazet <eric.dumazet@gmail.com>
  Date:   Wed Jul 11 05:50:31 2012 +0000

    tcp: TCP Small Queues
    
    This introduce TSQ (TCP Small Queues)


There are several options at this point:
  0. Ignore it. Sorry, this is not acceptable.
     People do transfer files over 10G and expect line rate!
 
  1. Rip it out. which adds to the buffer bloat.
     This is a throughput vs latency tradeoff.

  2. Neuter it by making TCP small queues configurable and default off.
     Allows people who are willing to sacrifice performance go ahead and
     enable it.
 
  3. Tweak it. Make the default queue value in kernel big enough that no loss is
     observable.
 
  4. Do something smarter like a dynamic TCP small queue that adapts.

^ permalink raw reply

* [PACTH net-next] SUNRPC: remove an unnecessary if statement
From: wangweidong @ 2013-10-24  2:35 UTC (permalink / raw)
  To: davem, Trond.Myklebust, bfields; +Cc: dingtianhong, netdev, linux-nfs

If req allocated failed just goto out_free, no need to check the
'i < num_prealloc'. There is just code simplification, no
functional changes.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
---
 net/sunrpc/xprt.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 095363e..a8e20de 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1087,11 +1087,9 @@ struct rpc_xprt *xprt_alloc(struct net *net, size_t size,
 	for (i = 0; i < num_prealloc; i++) {
 		req = kzalloc(sizeof(struct rpc_rqst), GFP_KERNEL);
 		if (!req)
-			break;
+			goto out_free;
 		list_add(&req->rq_list, &xprt->free);
 	}
-	if (i < num_prealloc)
-		goto out_free;
 	if (max_alloc > num_prealloc)
 		xprt->max_reqs = max_alloc;
 	else
-- 1.7.12

^ permalink raw reply related

* Re: 16% regression on 10G caused by TCP small queues
From: Neal Cardwell @ 2013-10-24  2:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, Dave Täht, Netdev
In-Reply-To: <20131023192954.3dd9c784@nehalam.linuxnetplumber.net>

On Wed, Oct 23, 2013 at 10:29 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> In the course of testing routing functionality, I discovered a that the single flow TCP
> throughput was much worse than expected. At first, it looked like a router problem,
> or maybe because one end was a FreeBSD system (which has noticeably slower TCP performance).
> But reducing it down to two systems directly connected over 10G (ixgbe) found the problem.
...
>   4. Do something smarter like a dynamic TCP small queue that adapts.

Yep, Eric made TSQ dynamic a few weeks ago, and mentioned that his
commit helps a single flow on 10Gbps link:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9eeec26e32e087359160406f96e0949b3cc6f10

Can you please check the performance in your setup on 3.12-rc4 or newer? :-)

Thanks!

neal

---

commit c9eeec26e32e087359160406f96e0949b3cc6f10
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Sep 27 03:28:54 2013 -0700

    tcp: TSQ can use a dynamic limit

    When TCP Small Queues was added, we used a sysctl to limit amount of
    packets queues on Qdisc/device queues for a given TCP flow.

    Problem is this limit is either too big for low rates, or too small
    for high rates.

    Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
    auto sizing, it can better control number of packets in Qdisc/device
    queues.

    New limit is two packets or at least 1 to 2 ms worth of packets.

    Low rates flows benefit from this patch by having even smaller
    number of packets in queues, allowing for faster recovery,
    better RTT estimations.

    High rates flows benefit from this patch by allowing more than 2 packets
    in flight as we had reports this was a limiting factor to reach line
    rate. [ In particular if TX completion is delayed because of coalescing
    parameters ]

    Example for a single flow on 10Gbp link controlled by FQ/pacing

    14 packets in flight instead of 2
    ...

^ permalink raw reply

* [PATCH net-next v2 1/5] bonding: remove bond read lock for bond_mii_monitor()
From: Ding Tianhong @ 2013-10-24  3:09 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we has 3 way to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the exist rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 44 +++++++++++------------------------------
 1 file changed, 12 insertions(+), 32 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d90734f..ba90f45 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2155,49 +2155,29 @@ void bond_mii_monitor(struct work_struct *work)
 	struct bonding *bond = container_of(work, struct bonding,
 					    mii_work.work);
 	bool should_notify_peers = false;
-	unsigned long delay;
 
-	read_lock(&bond->lock);
-
-	delay = msecs_to_jiffies(bond->params.miimon);
+	if (!rtnl_trylock())
+		goto re_arm;
 
-	if (!bond_has_slaves(bond))
+	if (!bond_has_slaves(bond)) {
+		rtnl_unlock();
 		goto re_arm;
+	}
 
 	should_notify_peers = bond_should_notify_peers(bond);
 
-	if (bond_miimon_inspect(bond)) {
-		read_unlock(&bond->lock);
-
-		/* Race avoidance with bond_close cancel of workqueue */
-		if (!rtnl_trylock()) {
-			read_lock(&bond->lock);
-			delay = 1;
-			should_notify_peers = false;
-			goto re_arm;
-		}
-
-		read_lock(&bond->lock);
-
+	if (bond_miimon_inspect(bond))
 		bond_miimon_commit(bond);
 
-		read_unlock(&bond->lock);
-		rtnl_unlock();	/* might sleep, hold no other locks */
-		read_lock(&bond->lock);
-	}
+	if (should_notify_peers)
+		call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, bond->dev);
+
+	rtnl_unlock();
 
 re_arm:
 	if (bond->params.miimon)
-		queue_delayed_work(bond->wq, &bond->mii_work, delay);
-
-	read_unlock(&bond->lock);
-
-	if (should_notify_peers) {
-		if (!rtnl_trylock())
-			return;
-		call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, bond->dev);
-		rtnl_unlock();
-	}
+		queue_delayed_work(bond->wq, &bond->mii_work,
+				msecs_to_jiffies(bond->params.miimon));
 }
 
 static bool bond_has_this_ip(struct bonding *bond, __be32 ip)
-- 
1.8.2.1

^ permalink raw reply related

* Re: 16% regression on 10G caused by TCP small queues
From: Stephen Hemminger @ 2013-10-24  3:09 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Eric Dumazet, David Miller, Dave Täht, Netdev
In-Reply-To: <CADVnQymDr2K7z3yfKpW-H3R3W3NP+iuPQF2eMfeyS6dn-szdgA@mail.gmail.com>

On Wed, Oct 23, 2013 at 7:37 PM, Neal Cardwell <ncardwell@google.com> wrote:
> On Wed, Oct 23, 2013 at 10:29 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> In the course of testing routing functionality, I discovered a that the single flow TCP
>> throughput was much worse than expected. At first, it looked like a router problem,
>> or maybe because one end was a FreeBSD system (which has noticeably slower TCP performance).
>> But reducing it down to two systems directly connected over 10G (ixgbe) found the problem.
> ...
>>   4. Do something smarter like a dynamic TCP small queue that adapts.
>
> Yep, Eric made TSQ dynamic a few weeks ago, and mentioned that his
> commit helps a single flow on 10Gbps link:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9eeec26e32e087359160406f96e0949b3cc6f10
>
> Can you please check the performance in your setup on 3.12-rc4 or newer? :-)
>
> Thanks!
>
> neal

I will check 3.12, but what about users on 3.10 which is the LTS
kernel used by most distros?

^ permalink raw reply

* [PATCH net-next v2 0/5] bonding: patchset for rcu use in bonding
From: Ding Tianhong @ 2013-10-24  3:08 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

Hi:

The slave list will add and del by bond_master_upper_dev_link() and bond_upper_dev_unlink(),
which will call call_netdevice_notifiers(), even it is safe to call it in write bond lock now,
but we can't sure that whether it is safe later, because other drivers may deal NETDEV_CHANGEUPPER
in sleep way, so I didn't admit move the bond_upper_dev_unlink() in write bond lock.

now the bond_for_each_slave only protect by rtnl_lock(), maybe use bond_for_each_slave_rcu is a good
way to protect slave list for bond, but as a system slow path, it is no need to transform bond_for_each_slave()
to bond_for_each_slave_rcu() in slow path, so in the patchset, I will remove the unused read bond lock
for monitor function, maybe it is a better way, I will wait to accept any relay for it.

Thanks for the Veaceslav Falico opinion.

v2: add and modify commit for patchset and patch, it will be the first step for the whole patchset.

Ding Tianhong (5):
  bonding: remove bond read lock for bond_mii_monitor()
  bonding: remove bond read lock for bond_alb_monitor()
  bonding: remove bond read lock for bond_loadbalance_arp_mon()
  bonding: remove bond read lock for bond_activebackup_arp_mon()
  bonding: remove bond read lock for bond_3ad_state_machine_handler()

 drivers/net/bonding/bond_3ad.c  |   9 ++--
 drivers/net/bonding/bond_alb.c  |  20 ++------
 drivers/net/bonding/bond_main.c | 100 +++++++++++++---------------------------
 3 files changed, 40 insertions(+), 89 deletions(-)

-- 
1.8.2.1

^ permalink raw reply

* [PATCH net-next v2 4/5] bonding: remove bond read lock for bond_activebackup_arp_mon()
From: Ding Tianhong @ 2013-10-24  3:09 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

The bond slave list may change when the monitor is running, the slave list is no longer
protected by bond->lock, only protected by rtnl lock(), so we has 3 way to modify it:
1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
to call call_netdevice_notifiers() in write lock.
2.remove unused bond->lock for monitor function, only use the exist rtnl lock().
3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/bonding/bond_main.c | 46 ++++++++++++-----------------------------
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 149f4b9..f3df532 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2763,51 +2763,31 @@ void bond_activebackup_arp_mon(struct work_struct *work)
 	struct bonding *bond = container_of(work, struct bonding,
 					    arp_work.work);
 	bool should_notify_peers = false;
-	int delta_in_ticks;
 
-	read_lock(&bond->lock);
-
-	delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
+	if (!rtnl_trylock())
+		goto re_arm;
 
-	if (!bond_has_slaves(bond))
+	if (!bond_has_slaves(bond)) {
+		rtnl_unlock();
 		goto re_arm;
+	}
 
 	should_notify_peers = bond_should_notify_peers(bond);
 
-	if (bond_ab_arp_inspect(bond)) {
-		read_unlock(&bond->lock);
-
-		/* Race avoidance with bond_close flush of workqueue */
-		if (!rtnl_trylock()) {
-			read_lock(&bond->lock);
-			delta_in_ticks = 1;
-			should_notify_peers = false;
-			goto re_arm;
-		}
-
-		read_lock(&bond->lock);
-
+	if (bond_ab_arp_inspect(bond))
 		bond_ab_arp_commit(bond);
 
-		read_unlock(&bond->lock);
-		rtnl_unlock();
-		read_lock(&bond->lock);
-	}
-
 	bond_ab_arp_probe(bond);
 
-re_arm:
-	if (bond->params.arp_interval)
-		queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
+	if (should_notify_peers)
+		call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, bond->dev);
 
-	read_unlock(&bond->lock);
+	rtnl_unlock();
 
-	if (should_notify_peers) {
-		if (!rtnl_trylock())
-			return;
-		call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, bond->dev);
-		rtnl_unlock();
-	}
+re_arm:
+	if (bond->params.arp_interval)
+		queue_delayed_work(bond->wq, &bond->arp_work,
+				msecs_to_jiffies(bond->params.arp_interval));
 }
 
 /*-------------------------- netdev event handling --------------------------*/
-- 
1.8.2.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox