Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH] udp4: Don't take socket reference in receive path
From: Eric Dumazet @ 2014-02-06 20:58 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, edumazet
In-Reply-To: <alpine.DEB.2.02.1402061154580.17707@tomh.mtv.corp.google.com>

On Thu, 2014-02-06 at 11:58 -0800, Tom Herbert wrote:
> The reference counting in the UDP receive path is quite expensive for
> a socket that is share amoungst CPUs. This is probably true for normal
> sockets, but really is painful when just using the socket for
> receive encapsulation.
> 
> udp4_lib_lookup always takes a socket reference, and we also put back
> the reference after calling udp_queue_rcv_skb in the normal receive
> path, so the need for taking the reference seems to be to hold the
> socket after doing rcu_read_unlock. This patch modifies udp_lib_lookup
> to optionally take a reference and is always called with rcu_read_lock.
> In udp4_lib_rcv we call lib_lookup and udp_queue_rcv under the
> rcu_read_lock but without having taken the reference.
> 
> Requesting comments because I suspect there are nuances to this!
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---

Unfortunately this cant work.

When I did the RCU implementation for TCP/UDP, we chose to use
SLAB_DESTROY_BY_RCU.

This meant we have to take a reference, then check again the keys for
the lookup.

If we remove SLAB_DESTROY_BY_RCU, we kill performance for short lived
sessions, because of call_rcu() added latencies.

(One UDP socket is about 1024 bytes in memory, call_rcu() grace period
is throwing away 1024 bytes from cpu caches)

Sure, in your case you know your udp sessions are not short lived,
but many applications used UDP for DNS lookups, using few packets per
socket.

^ permalink raw reply

* Re: [PATCH net] netpoll: fix netconsole IPv6 setup
From: Sabrina Dubroca @ 2014-02-06 20:58 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev
In-Reply-To: <CAHA+R7OQYZ5RLn3v0nucKB=CVpDCZqp=HyNL=qgC9bViVKEuAQ@mail.gmail.com>

2014-02-06, 12:34:10 -0800, Cong Wang wrote:
> On Thu, Feb 6, 2014 at 9:34 AM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> >  net/core/netpoll.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> > index c03f3de..a664f78 100644
> > --- a/net/core/netpoll.c
> > +++ b/net/core/netpoll.c
> > @@ -948,6 +948,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
> >  {
> >         char *cur=opt, *delim;
> >         int ipv6;
> > +       bool ipversion_set = false;
> >
> 
> Or initialize 'ipv6' to -1 and then check if it is -1?

It's overwritten when we parse the remote address. And np->ipv6 is a
bool, so we can't store it there either.

-- 
Sabrina

^ permalink raw reply

* RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Thomas Glanzmann @ 2014-02-06 20:51 UTC (permalink / raw)
  To: Eric Dumazet, netdev, fubar, vfalico, andy, jiri

Hello,
this morning I checked out Linus tip and compiled it after booting my
dmesg is full of:

[    8.944991] RTNL: assertion failed at net/core/dev.c (4494)
[    8.950640] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
[    8.950642] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
[    8.950654] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
[    8.950658]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881020c88000
[    8.950664]  ffffffff812d3091 ffff881023961040 ffffffff812e3132 0000000000000246
[    8.950670]  0000000000000020 ffff881020ab1be8 0000000020ab1ba8 0000000000000000
[    8.950675] Call Trace:
[    8.950686]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
[    8.950694]  [<ffffffff812d3091>] ? netdev_master_upper_dev_get+0x2a/0x4d
[    8.950699]  [<ffffffff812e3132>] ? rtnl_fill_ifinfo+0x2c/0xac4
[    8.950707]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
[    8.950715]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
[    8.950721]  [<ffffffff81102040>] ? ksize+0x12/0x1e
[    8.950726]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
[    8.950731]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
[    8.950739]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
[    8.950747]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
[    8.950754]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]
[    8.950761]  [<ffffffff8104db2d>] ? process_one_work+0x191/0x293
[    8.950766]  [<ffffffff8104dfde>] ? worker_thread+0x121/0x1e7
[    8.950770]  [<ffffffff8104debd>] ? rescuer_thread+0x269/0x269
[    8.950777]  [<ffffffff810527b6>] ? kthread+0x99/0xa1
[    8.950782]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
[    8.950789]  [<ffffffff8139733c>] ? ret_from_fork+0x7c/0xb0
[    8.950794]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
[    8.950797] RTNL: assertion failed at net/core/rtnetlink.c (940)
[    8.956863] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
[    8.956871] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
[    8.956877] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
[    8.956879]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881023961040
[    8.956884]  ffffffff812e315e 0000000000000246 0000000000000020 ffff881020ab1be8
[    8.956890]  0000000020ab1ba8 0000000000000000 ffffffff817ef530 0000000000000008
[    8.956895] Call Trace:
[    8.956899]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
[    8.956903]  [<ffffffff812e315e>] ? rtnl_fill_ifinfo+0x58/0xac4
[    8.956909]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
[    8.956915]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
[    8.956920]  [<ffffffff81102040>] ? ksize+0x12/0x1e
[    8.956925]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
[    8.956929]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
[    8.956936]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
[    8.956942]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
[    8.956949]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]
[    8.956954]  [<ffffffff8104db2d>] ? process_one_work+0x191/0x293
[    8.956958]  [<ffffffff8104dfde>] ? worker_thread+0x121/0x1e7
[    8.956962]  [<ffffffff8104debd>] ? rescuer_thread+0x269/0x269
[    8.956968]  [<ffffffff810527b6>] ? kthread+0x99/0xa1
[    8.956973]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
[    8.956978]  [<ffffffff8139733c>] ? ret_from_fork+0x7c/0xb0
[    8.956983]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59

Full dmesg is at: http://pbot.rmdir.de/dJZsX0d71RFPaZAAjcTBWQ

I'm running Debian Wheezy and my Interface Konfiguration is:

auto lo
iface lo inet loopback

auto bond0
iface bond0 inet static
       address 10.100.4.62
       netmask 255.255.0.0
       gateway 10.100.0.1
       slaves eth0 eth1
       bond-mode 802.3ad
       bond-miimon 100

auto bond0.101
iface bond0.101 inet static
       address 10.101.99.4
       netmask 255.255.0.0

auto bond0.102
iface bond0.102 inet static
       address 10.102.99.4
       netmask 255.255.0.0

auto bond0.103
iface bond0.103 inet static
       address 10.103.99.4
       netmask 255.255.0.0

auto bond1
iface bond1 inet static
       address 10.100.5.62
       netmask 255.255.0.0
       slaves eth2 eth3
       bond-mode 802.3ad
       bond-miimon 100

auto bond1.101
iface bond1.101 inet static
       address 10.101.99.5
       netmask 255.255.0.0

auto bond1.102
iface bond1.102 inet static
       address 10.102.99.5
       netmask 255.255.0.0

auto bond1.103
iface bond1.103 inet static
       address 10.103.99.5
       netmask 255.255.0.0

My IPv4 interface Konfiguration is:

(node-62) [~/work/linux-2.6] ip -4 a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.100.4.62/16 brd 10.100.255.255 scope global bond0
       valid_lft forever preferred_lft forever
7: bond0.101@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.101.99.4/16 brd 10.101.255.255 scope global bond0.101
       valid_lft forever preferred_lft forever
8: bond0.102@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.102.99.4/16 brd 10.102.255.255 scope global bond0.102
       valid_lft forever preferred_lft forever
9: bond0.103@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.103.99.4/16 brd 10.103.255.255 scope global bond0.103
       valid_lft forever preferred_lft forever
10: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.100.5.62/16 brd 10.100.255.255 scope global bond1
       valid_lft forever preferred_lft forever
11: bond1.101@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.101.99.5/16 brd 10.101.255.255 scope global bond1.101
       valid_lft forever preferred_lft forever
12: bond1.102@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.102.99.5/16 brd 10.102.255.255 scope global bond1.102
       valid_lft forever preferred_lft forever
13: bond1.103@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    inet 10.103.99.5/16 brd 10.103.255.255 scope global bond1.103
       valid_lft forever preferred_lft forever

I also have IPv6 with link-local and global addresses configured.

Kernel Config is at: http://pbot.rmdir.de/VTAnhVv8ECP7a7SPaxMsFA

I'm available for testing fixes or providing ssh access to the system if you
want to do further tests. I have the same config running on 3.13.0 on the same
machine without any problems whatsoever. If I should bisect it, let me know.

Cheers,
        Thomas

^ permalink raw reply

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: Eric Dumazet @ 2014-02-06 21:00 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Miller, netdev, David Rientjes,
	linux-kernel@vger.kernel.org
In-Reply-To: <1391718270.15777.20.camel@joe-AO722>

On Thu, 2014-02-06 at 12:24 -0800, Joe Perches wrote:

> Perhaps add __GFP_THISNODE too ?

Why ?

^ permalink raw reply

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: David Rientjes @ 2014-02-06 21:03 UTC (permalink / raw)
  To: Joe Perches
  Cc: Eric Dumazet, David Miller, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <1391718270.15777.20.camel@joe-AO722>

On Thu, 6 Feb 2014, Joe Perches wrote:

> On Thu, 2014-02-06 at 10:42 -0800, Eric Dumazet wrote:
> > sock_alloc_send_pskb() & sk_page_frag_refill()
> > have a loop trying high order allocations to prepare
> > skb with low number of fragments as this increases performance.
> > 
> > Problem is that under memory pressure/fragmentation, this can
> > trigger OOM while the intent was only to try the high order
> > allocations, then fallback to order-0 allocations.
> []
> > Call Trace:
> >  [<ffffffff8043766c>] dump_header+0xe1/0x23e
> >  [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
> >  [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
> >  [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
> >  [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
> >  [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
> >  [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
> []
> > diff --git a/net/core/sock.c b/net/core/sock.c
> []
> > @@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> >  			while (order) {
> >  				if (npages >= 1 << order) {
> >  					page = alloc_pages(sk->sk_allocation |
> > -							   __GFP_COMP | __GFP_NOWARN,
> > +							   __GFP_COMP |
> > +							   __GFP_NOWARN |
> > +							   __GFP_NORETRY,
> >  							   order);
> >  					if (page)
> >  						goto fill_page;
> > @@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
> >  		gfp_t gfp = prio;
> >  
> >  		if (order)
> > -			gfp |= __GFP_COMP | __GFP_NOWARN;
> > +			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
> 
> Perhaps add __GFP_THISNODE too ?
> 

How does __GFP_THISNODE have anything to do with avoiding oom killing due 
to high-order fragmentation?  If they absolutely require local memory to 
currnet's cpu node then that would make sense, but the fallback still 
allocates order-0 memory remotely and with __GFP_THISNODE on this attempt 
we wouldn't even attempt remote reclaim.

^ permalink raw reply

* Re: [PATCH] inet: defines IPPROTO_* needed for module alias generation
From: Carlos O'Donell @ 2014-02-06 21:33 UTC (permalink / raw)
  To: Jan Moskyto Matejka, David S. Miller; +Cc: netdev, linux-kernel
In-Reply-To: <1391685000-7346-1-git-send-email-mq@suse.cz>

On 02/06/2014 06:10 AM, Jan Moskyto Matejka wrote:
> Commit cfd280c91253 ("net: sync some IP headers with glibc") changed a set of
> define's to an enum (with no explanation why) which introduced a bug
> in module mip6 where aliases are generated using the IPPROTO_* defines;
> mip6 doesn't load if require_module called with the aliases from
> xfrm_get_type().

I wrote that code and I apologize for not giving a reason at
the time.

There are two reasons:

* It makes the debuginfo better and debugging easier via the enum.

* It harmonizes those headers with what is already in glibc.

Harmonizing this header with glibc makes it easier for userspace
to synchronize changes and perhaps eventually use the UAPI headers
directly.

> Reverting this change back to define's to fix the aliases.
> 
> modinfo mip6 (before this change)
> alias:          xfrm-type-10-IPPROTO_DSTOPTS
> alias:          xfrm-type-10-IPPROTO_ROUTING
> 
> modinfo mip6 (after this change)
> alias:          xfrm-type-10-43
> alias:          xfrm-type-10-60

Instead of reverting these changes I suggest someone fix
whatever is processing that information.

I do not condone the application of this patch for the
above two reasons. Though you might argue that I should
just make all debuggers and compilers better at dealing
with DW_at_macro_info/DW_MACINFO_* debug info... and 
you also would not be wrong.

I hope that answers your question.

> Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
> ---
>  include/uapi/linux/in6.h | 23 +++++++----------------
>  1 file changed, 7 insertions(+), 16 deletions(-)
> 
> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> index 633b93c..e9a1d2d97 100644
> --- a/include/uapi/linux/in6.h
> +++ b/include/uapi/linux/in6.h
> @@ -128,22 +128,13 @@ struct in6_flowlabel_req {
>   *	IPV6 extension headers
>   */
>  #if __UAPI_DEF_IPPROTO_V6
> -enum {
> -  IPPROTO_HOPOPTS = 0,		/* IPv6 hop-by-hop options      */
> -#define IPPROTO_HOPOPTS		IPPROTO_HOPOPTS
> -  IPPROTO_ROUTING = 43,		/* IPv6 routing header          */
> -#define IPPROTO_ROUTING		IPPROTO_ROUTING
> -  IPPROTO_FRAGMENT = 44,	/* IPv6 fragmentation header    */
> -#define IPPROTO_FRAGMENT	IPPROTO_FRAGMENT
> -  IPPROTO_ICMPV6 = 58,		/* ICMPv6                       */
> -#define IPPROTO_ICMPV6		IPPROTO_ICMPV6
> -  IPPROTO_NONE = 59,		/* IPv6 no next header          */
> -#define IPPROTO_NONE		IPPROTO_NONE
> -  IPPROTO_DSTOPTS = 60,		/* IPv6 destination options     */
> -#define IPPROTO_DSTOPTS		IPPROTO_DSTOPTS
> -  IPPROTO_MH = 135,		/* IPv6 mobility header         */
> -#define IPPROTO_MH		IPPROTO_MH
> -};
> +#define IPPROTO_HOPOPTS		0	/* IPv6 hop-by-hop options	*/
> +#define IPPROTO_ROUTING		43	/* IPv6 routing header		*/
> +#define IPPROTO_FRAGMENT	44	/* IPv6 fragmentation header	*/
> +#define IPPROTO_ICMPV6		58	/* ICMPv6			*/
> +#define IPPROTO_NONE		59	/* IPv6 no next header		*/
> +#define IPPROTO_DSTOPTS		60	/* IPv6 destination options	*/
> +#define IPPROTO_MH		135	/* IPv6 mobility header		*/
>  #endif /* __UAPI_DEF_IPPROTO_V6 */
>  
>  /*
> 

Cheers,
Carlos.

^ permalink raw reply

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: Joe Perches @ 2014-02-06 21:34 UTC (permalink / raw)
  To: David Rientjes
  Cc: Eric Dumazet, David Miller, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <alpine.DEB.2.02.1402061302020.9567@chino.kir.corp.google.com>

On Thu, 2014-02-06 at 13:03 -0800, David Rientjes wrote:
> On Thu, 6 Feb 2014, Joe Perches wrote:
> 
> > On Thu, 2014-02-06 at 10:42 -0800, Eric Dumazet wrote:
> > > sock_alloc_send_pskb() & sk_page_frag_refill()
> > > have a loop trying high order allocations to prepare
> > > skb with low number of fragments as this increases performance.
> > > 
> > > Problem is that under memory pressure/fragmentation, this can
> > > trigger OOM while the intent was only to try the high order
> > > allocations, then fallback to order-0 allocations.
> > []
> > > Call Trace:
> > >  [<ffffffff8043766c>] dump_header+0xe1/0x23e
> > >  [<ffffffff80437a02>] oom_kill_process+0x6a/0x323
> > >  [<ffffffff80438443>] out_of_memory+0x4b3/0x50d
> > >  [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
> > >  [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
> > >  [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
> > >  [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
> > []
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > []
> > > @@ -1775,7 +1775,9 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
> > >  			while (order) {
> > >  				if (npages >= 1 << order) {
> > >  					page = alloc_pages(sk->sk_allocation |
> > > -							   __GFP_COMP | __GFP_NOWARN,
> > > +							   __GFP_COMP |
> > > +							   __GFP_NOWARN |
> > > +							   __GFP_NORETRY,
> > >  							   order);
> > >  					if (page)
> > >  						goto fill_page;
> > > @@ -1845,7 +1847,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio)
> > >  		gfp_t gfp = prio;
> > >  
> > >  		if (order)
> > > -			gfp |= __GFP_COMP | __GFP_NOWARN;
> > > +			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
> > 
> > Perhaps add __GFP_THISNODE too ?
> > 
> 
> How does __GFP_THISNODE have anything to do with avoiding oom killing due 
> to high-order fragmentation?

I don't think it does.

> If they absolutely require local memory to 
> currnet's cpu node then that would make sense,

I presumed THISNODE would be used only with NORETRY

> but the fallback still 
> allocates order-0 memory remotely and with __GFP_THISNODE on this attempt 
> we wouldn't even attempt remote reclaim.
any other alloc attempt could work on other cpus.

It was just a thought, ignore it if it's a dumb thought.

^ permalink raw reply

* Re: [PATCH] net: use __GFP_NORETRY for high order allocations
From: David Rientjes @ 2014-02-06 21:39 UTC (permalink / raw)
  To: Joe Perches
  Cc: Eric Dumazet, David Miller, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <1391722444.15777.28.camel@joe-AO722>

On Thu, 6 Feb 2014, Joe Perches wrote:

> > If they absolutely require local memory to 
> > currnet's cpu node then that would make sense,
> 
> I presumed THISNODE would be used only with NORETRY
> 

Unfortunately, that would avoid attempting to defragment or reclaim 
remotely for the order-3 allocation and only allocate remotely for 
order-0.

So we have to make a decision here what is more important: allocating 
high-order memory or local memory to current's cpu at this given time?

The policy of transparent hugepages, for example, is always to allocate 
order-9 memory remotely instead of falling back to allocating a smallpage 
locally.

The original code never cared about local vs remote memory, so I don't 
think there's any restriction to allocating remotely.  I also don't know 
many allocators that do __GFP_THISNODE and care about local memory to 
current's cpu at this point in time, usually users of that flag are 
allocating memory that must be local to a specific node (for things like 
kmem_cache_alloc_node()).

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Cong Wang @ 2014-02-06 21:40 UTC (permalink / raw)
  To: Thomas Glanzmann
  Cc: Eric Dumazet, netdev, fubar, Veaceslav Falico, andy,
	Jiří Pírko
In-Reply-To: <20140206205106.GA10488@glanzmann.de>

On Thu, Feb 6, 2014 at 12:51 PM, Thomas Glanzmann <thomas@glanzmann.de> wrote:
> Hello,
> this morning I checked out Linus tip and compiled it after booting my
> dmesg is full of:
>
> [    8.944991] RTNL: assertion failed at net/core/dev.c (4494)
> [    8.950640] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
> [    8.950642] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
> [    8.950654] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
> [    8.950658]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881020c88000
> [    8.950664]  ffffffff812d3091 ffff881023961040 ffffffff812e3132 0000000000000246
> [    8.950670]  0000000000000020 ffff881020ab1be8 0000000020ab1ba8 0000000000000000
> [    8.950675] Call Trace:
> [    8.950686]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
> [    8.950694]  [<ffffffff812d3091>] ? netdev_master_upper_dev_get+0x2a/0x4d
> [    8.950699]  [<ffffffff812e3132>] ? rtnl_fill_ifinfo+0x2c/0xac4
> [    8.950707]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
> [    8.950715]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
> [    8.950721]  [<ffffffff81102040>] ? ksize+0x12/0x1e
> [    8.950726]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
> [    8.950731]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
> [    8.950739]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
> [    8.950747]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
> [    8.950754]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]
> [    8.950761]  [<ffffffff8104db2d>] ? process_one_work+0x191/0x293
> [    8.950766]  [<ffffffff8104dfde>] ? worker_thread+0x121/0x1e7
> [    8.950770]  [<ffffffff8104debd>] ? rescuer_thread+0x269/0x269
> [    8.950777]  [<ffffffff810527b6>] ? kthread+0x99/0xa1
> [    8.950782]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
> [    8.950789]  [<ffffffff8139733c>] ? ret_from_fork+0x7c/0xb0
> [    8.950794]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59


Hmm, rtmsg_ifinfo() should be called with rtnl lock, but
__enable_port() is called
with rcu_read_lock() which means we can't block inside it, therefore we probably
should take rtnl lock outside:

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index cce1f1b..3c09ffa 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2065,6 +2065,7 @@ void bond_3ad_state_machine_handler(struct
work_struct *work)
        struct slave *slave;
        struct port *port;

+       rtnl_lock();
        read_lock(&bond->lock);
        rcu_read_lock();

@@ -2123,6 +2124,7 @@ void bond_3ad_state_machine_handler(struct
work_struct *work)
 re_arm:
        rcu_read_unlock();
        read_unlock(&bond->lock);
+       rtnl_unlock();
        queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
 }

^ permalink raw reply related

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Jay Vosburgh @ 2014-02-06 21:45 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: Eric Dumazet, netdev, vfalico, andy, jiri
In-Reply-To: <20140206205106.GA10488@glanzmann.de>

Thomas Glanzmann <thomas@glanzmann.de> wrote:

>Hello,
>this morning I checked out Linus tip and compiled it after booting my
>dmesg is full of:
>
>[    8.944991] RTNL: assertion failed at net/core/dev.c (4494)
>[    8.950640] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
>[    8.950642] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
>[    8.950654] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
>[    8.950658]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881020c88000
>[    8.950664]  ffffffff812d3091 ffff881023961040 ffffffff812e3132 0000000000000246
>[    8.950670]  0000000000000020 ffff881020ab1be8 0000000020ab1ba8 0000000000000000
>[    8.950675] Call Trace:
>[    8.950686]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
>[    8.950694]  [<ffffffff812d3091>] ? netdev_master_upper_dev_get+0x2a/0x4d
>[    8.950699]  [<ffffffff812e3132>] ? rtnl_fill_ifinfo+0x2c/0xac4
>[    8.950707]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
>[    8.950715]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
>[    8.950721]  [<ffffffff81102040>] ? ksize+0x12/0x1e
>[    8.950726]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
>[    8.950731]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
>[    8.950739]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
>[    8.950747]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
>[    8.950754]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]

	This was apparently introduced by commit:

commit 1d3ee88ae0d605629bf369ab0b868dae8ca62a48
Author: sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com>
Date:   Thu Jan 16 22:57:56 2014 -0800

    bonding: add netlink attributes to slave link dev
    

	which adds an "rtmsg_ifinfo" call to bond_set_active_slave and
bond_set_backup_slave, both of which are invoked in several places in
the 802.3ad code without RTNL held (via __enable_port and
__disable_port).

	Still looking into a fix.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Jay Vosburgh @ 2014-02-06 21:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: Thomas Glanzmann, Eric Dumazet, netdev, Veaceslav Falico, andy,
	Jiří Pírko
In-Reply-To: <CAHA+R7OGS_KG1un36KBixbgs7LTPWGRYdJ+Vsn2iM0FzNV5FzA@mail.gmail.com>

Cong Wang <cwang@twopensource.com> wrote:

>On Thu, Feb 6, 2014 at 12:51 PM, Thomas Glanzmann <thomas@glanzmann.de> wrote:
>> Hello,
>> this morning I checked out Linus tip and compiled it after booting my
>> dmesg is full of:
>>
>> [    8.944991] RTNL: assertion failed at net/core/dev.c (4494)
>> [    8.950640] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
>> [    8.950642] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
>> [    8.950654] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
>> [    8.950658]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881020c88000
>> [    8.950664]  ffffffff812d3091 ffff881023961040 ffffffff812e3132 0000000000000246
>> [    8.950670]  0000000000000020 ffff881020ab1be8 0000000020ab1ba8 0000000000000000
>> [    8.950675] Call Trace:
>> [    8.950686]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
>> [    8.950694]  [<ffffffff812d3091>] ? netdev_master_upper_dev_get+0x2a/0x4d
>> [    8.950699]  [<ffffffff812e3132>] ? rtnl_fill_ifinfo+0x2c/0xac4
>> [    8.950707]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
>> [    8.950715]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
>> [    8.950721]  [<ffffffff81102040>] ? ksize+0x12/0x1e
>> [    8.950726]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
>> [    8.950731]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
>> [    8.950739]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
>> [    8.950747]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
>> [    8.950754]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]
>> [    8.950761]  [<ffffffff8104db2d>] ? process_one_work+0x191/0x293
>> [    8.950766]  [<ffffffff8104dfde>] ? worker_thread+0x121/0x1e7
>> [    8.950770]  [<ffffffff8104debd>] ? rescuer_thread+0x269/0x269
>> [    8.950777]  [<ffffffff810527b6>] ? kthread+0x99/0xa1
>> [    8.950782]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
>> [    8.950789]  [<ffffffff8139733c>] ? ret_from_fork+0x7c/0xb0
>> [    8.950794]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
>
>
>Hmm, rtmsg_ifinfo() should be called with rtnl lock, but
>__enable_port() is called
>with rcu_read_lock() which means we can't block inside it, therefore we probably
>should take rtnl lock outside:
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index cce1f1b..3c09ffa 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2065,6 +2065,7 @@ void bond_3ad_state_machine_handler(struct
>work_struct *work)
>        struct slave *slave;
>        struct port *port;
>
>+       rtnl_lock();
>        read_lock(&bond->lock);
>        rcu_read_lock();
>
>@@ -2123,6 +2124,7 @@ void bond_3ad_state_machine_handler(struct
>work_struct *work)
> re_arm:
>        rcu_read_unlock();
>        read_unlock(&bond->lock);
>+       rtnl_unlock();
>        queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
> }

	That would eliminate the warning, but is suboptimal.  Acquiring
RTNL is not necessary on the vast majority of state machine runs
(because no state changes take place, i.e., no ports are disabled or
enabled).  The above change would add 10 round trips per second to RTNL,
which seems excessive.

	Also, we cannot unconditionally acquire RTNL in this function,
as it would race with the call to cancel_delayed_work_sync from
bond_close (via bond_work_cancel_all).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH] inet: defines IPPROTO_* needed for module alias generation
From: Cong Wang @ 2014-02-06 21:51 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Jan Moskyto Matejka, David S. Miller, netdev,
	linux-kernel@vger.kernel.org
In-Reply-To: <52F3FFBC.1010505@redhat.com>

On Thu, Feb 6, 2014 at 1:33 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 02/06/2014 06:10 AM, Jan Moskyto Matejka wrote:
>> Reverting this change back to define's to fix the aliases.
>>
>> modinfo mip6 (before this change)
>> alias:          xfrm-type-10-IPPROTO_DSTOPTS
>> alias:          xfrm-type-10-IPPROTO_ROUTING
>>
>> modinfo mip6 (after this change)
>> alias:          xfrm-type-10-43
>> alias:          xfrm-type-10-60
>
> Instead of reverting these changes I suggest someone fix
> whatever is processing that information.

It is stringfy by c preprocessor. enum should be processed
after preprocessing, therefore fails to be converted to it
real value at preprocessing stage.

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Jay Vosburgh @ 2014-02-06 22:07 UTC (permalink / raw)
  To: Cong Wang
  Cc: Thomas Glanzmann, Eric Dumazet, netdev, Veaceslav Falico, andy,
	Jiří Pírko
In-Reply-To: <30988.1391723318@death.nxdomain>

Jay Vosburgh <fubar@us.ibm.com> wrote:

>Cong Wang <cwang@twopensource.com> wrote:
>
>>On Thu, Feb 6, 2014 at 12:51 PM, Thomas Glanzmann <thomas@glanzmann.de> wrote:
>>> Hello,
>>> this morning I checked out Linus tip and compiled it after booting my
>>> dmesg is full of:
>>>
>>> [    8.944991] RTNL: assertion failed at net/core/dev.c (4494)
>>> [    8.950640] CPU: 3 PID: 388 Comm: kworker/u24:4 Not tainted 3.14.0-rc1+ #3
>>> [    8.950642] Hardware name: Supermicro X9SRD-F/X9SRD-F, BIOS 1.0a 10/15/2012
>>> [    8.950654] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
>>> [    8.950658]  0000000000000000 ffff881020c88000 ffffffff8138e219 ffff881020c88000
>>> [    8.950664]  ffffffff812d3091 ffff881023961040 ffffffff812e3132 0000000000000246
>>> [    8.950670]  0000000000000020 ffff881020ab1be8 0000000020ab1ba8 0000000000000000
>>> [    8.950675] Call Trace:
>>> [    8.950686]  [<ffffffff8138e219>] ? dump_stack+0x41/0x51
>>> [    8.950694]  [<ffffffff812d3091>] ? netdev_master_upper_dev_get+0x2a/0x4d
>>> [    8.950699]  [<ffffffff812e3132>] ? rtnl_fill_ifinfo+0x2c/0xac4
>>> [    8.950707]  [<ffffffff81072211>] ? print_time.part.5+0x50/0x54
>>> [    8.950715]  [<ffffffff812caf94>] ? __kmalloc_reserve.isra.42+0x2a/0x6d
>>> [    8.950721]  [<ffffffff81102040>] ? ksize+0x12/0x1e
>>> [    8.950726]  [<ffffffff812cb2b7>] ? __alloc_skb+0xb5/0x1a9
>>> [    8.950731]  [<ffffffff812e4626>] ? rtmsg_ifinfo+0x6c/0xd6
>>> [    8.950739]  [<ffffffffa035f4f9>] ? __enable_port.isra.17+0x51/0x5a [bonding]
>>> [    8.950747]  [<ffffffffa0360463>] ? ad_agg_selection_logic+0x3d3/0x3ed [bonding]
>>> [    8.950754]  [<ffffffffa0360d40>] ? bond_3ad_state_machine_handler+0x555/0x918 [bonding]
>>> [    8.950761]  [<ffffffff8104db2d>] ? process_one_work+0x191/0x293
>>> [    8.950766]  [<ffffffff8104dfde>] ? worker_thread+0x121/0x1e7
>>> [    8.950770]  [<ffffffff8104debd>] ? rescuer_thread+0x269/0x269
>>> [    8.950777]  [<ffffffff810527b6>] ? kthread+0x99/0xa1
>>> [    8.950782]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
>>> [    8.950789]  [<ffffffff8139733c>] ? ret_from_fork+0x7c/0xb0
>>> [    8.950794]  [<ffffffff8105271d>] ? __kthread_parkme+0x59/0x59
>>
>>
>>Hmm, rtmsg_ifinfo() should be called with rtnl lock, but
>>__enable_port() is called
>>with rcu_read_lock() which means we can't block inside it, therefore we probably
>>should take rtnl lock outside:
>>
>>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>>index cce1f1b..3c09ffa 100644
>>--- a/drivers/net/bonding/bond_3ad.c
>>+++ b/drivers/net/bonding/bond_3ad.c
>>@@ -2065,6 +2065,7 @@ void bond_3ad_state_machine_handler(struct
>>work_struct *work)
>>        struct slave *slave;
>>        struct port *port;
>>
>>+       rtnl_lock();
>>        read_lock(&bond->lock);
>>        rcu_read_lock();
>>
>>@@ -2123,6 +2124,7 @@ void bond_3ad_state_machine_handler(struct
>>work_struct *work)
>> re_arm:
>>        rcu_read_unlock();
>>        read_unlock(&bond->lock);
>>+       rtnl_unlock();
>>        queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
>> }
>
>	That would eliminate the warning, but is suboptimal.  Acquiring
>RTNL is not necessary on the vast majority of state machine runs
>(because no state changes take place, i.e., no ports are disabled or
>enabled).  The above change would add 10 round trips per second to RTNL,
>which seems excessive.
>
>	Also, we cannot unconditionally acquire RTNL in this function,
>as it would race with the call to cancel_delayed_work_sync from
>bond_close (via bond_work_cancel_all).

	Thought of one more problem: we can't hold a regular lock while
calling rtmsg_ifinfo, as it may sleep in alloc_skb.  The rtmsg_ifinfo
call has to be RTNL and nothing else.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net] netpoll: fix netconsole IPv6 setup
From: Cong Wang @ 2014-02-06 22:09 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: David Miller, netdev
In-Reply-To: <20140206205850.GA12704@kria>

On Thu, Feb 6, 2014 at 12:58 PM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> 2014-02-06, 12:34:10 -0800, Cong Wang wrote:
>> On Thu, Feb 6, 2014 at 9:34 AM, Sabrina Dubroca <sd@queasysnail.net> wrote:
>> >  net/core/netpoll.c | 4 +++-
>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/net/core/netpoll.c b/net/core/netpoll.c
>> > index c03f3de..a664f78 100644
>> > --- a/net/core/netpoll.c
>> > +++ b/net/core/netpoll.c
>> > @@ -948,6 +948,7 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
>> >  {
>> >         char *cur=opt, *delim;
>> >         int ipv6;
>> > +       bool ipversion_set = false;
>> >
>>
>> Or initialize 'ipv6' to -1 and then check if it is -1?
>
> It's overwritten when we parse the remote address. And np->ipv6 is a
> bool, so we can't store it there either.
>


Yeah, I misunderstood the problem. Your fix looks good.

Acked-by: Cong Wang <cwang@twopensource.com>

For net-next, I think we should change bool np->ipv6 to int np->ip_version,
so that we can tell if it is set or not.

Thanks!

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Cong Wang @ 2014-02-06 22:12 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Thomas Glanzmann, Eric Dumazet, netdev, Veaceslav Falico, andy,
	Jiří Pírko
In-Reply-To: <31272.1391724462@death.nxdomain>

On Thu, Feb 6, 2014 at 2:07 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> Jay Vosburgh <fubar@us.ibm.com> wrote:
>
>>Cong Wang <cwang@twopensource.com> wrote:
>>
>>
>>       That would eliminate the warning, but is suboptimal.  Acquiring
>>RTNL is not necessary on the vast majority of state machine runs
>>(because no state changes take place, i.e., no ports are disabled or
>>enabled).  The above change would add 10 round trips per second to RTNL,
>>which seems excessive.
>>
>>       Also, we cannot unconditionally acquire RTNL in this function,
>>as it would race with the call to cancel_delayed_work_sync from
>>bond_close (via bond_work_cancel_all).

OK.

>
>         Thought of one more problem: we can't hold a regular lock while
> calling rtmsg_ifinfo, as it may sleep in alloc_skb.  The rtmsg_ifinfo
> call has to be RTNL and nothing else.
>

s/GFP_KERNEL/GFP_ATOMIC/

^ permalink raw reply

* [PATCH net] tg3: Fix deadlock in tg3_change_mtu()
From: Nithin Nayak Sujir @ 2014-02-06 22:13 UTC (permalink / raw)
  To: davem; +Cc: netdev, david.vrabel, Nithin Nayak Sujir, stable, Michael Chan

Quoting David Vrabel -
"5780 cards cannot have jumbo frames and TSO enabled together.  When
jumbo frames are enabled by setting the MTU, the TSO feature must be
cleared.  This is done indirectly by calling netdev_update_features()
which will call tg3_fix_features() to actually clear the flags.

netdev_update_features() will also trigger a new netlink message for the
feature change event which will result in a call to tg3_get_stats64()
which deadlocks on the tg3 lock."

tg3_set_mtu() does not need to be under the tg3 lock since converting
the flags to use set_bit(). Move it out to after tg3_netif_stop().

Cc: <stable@vger.kernel.org>
Reported-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index e2ca03e..0bb79b8 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -14113,12 +14113,12 @@ static int tg3_change_mtu(struct net_device *dev, int new_mtu)
 
 	tg3_netif_stop(tp);
 
+	tg3_set_mtu(dev, tp, new_mtu);
+
 	tg3_full_lock(tp, 1);
 
 	tg3_halt(tp, RESET_KIND_SHUTDOWN, 1);
 
-	tg3_set_mtu(dev, tp, new_mtu);
-
 	/* Reset PHY, otherwise the read DMA engine will be in a mode that
 	 * breaks all requests to 256 bytes.
 	 */
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH iproute2 1/3] tcp_metrics: Rename addr to daddr and add local variable
From: Christoph Paasch @ 2014-02-06 22:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <1391724904-30504-1-git-send-email-christoph.paasch@uclouvain.be>

Renaming addr to daddr, because we will introduce saddr later.

The local variable is necessary to store RTA_PAYLOAD(a) temporarily.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 ip/tcp_metrics.c | 51 ++++++++++++++++++++++++++-------------------------
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
index c6be3c94415f..4b30771ae3a8 100644
--- a/ip/tcp_metrics.c
+++ b/ip/tcp_metrics.c
@@ -74,7 +74,7 @@ static struct
 	int flushp;
 	int flushe;
 	int cmd;
-	inet_prefix addr;
+	inet_prefix daddr;
 } f;
 
 static int flush_update(void)
@@ -95,8 +95,8 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 	struct rtattr *attrs[TCP_METRICS_ATTR_MAX + 1], *a;
 	int len = n->nlmsg_len;
 	char abuf[256];
-	inet_prefix addr;
-	int family, i, atype;
+	inet_prefix daddr;
+	int family, i, atype, dlen = 0;
 
 	if (n->nlmsg_type != genl_family)
 		return -1;
@@ -114,35 +114,37 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 
 	a = attrs[TCP_METRICS_ATTR_ADDR_IPV4];
 	if (a) {
-		if (f.addr.family && f.addr.family != AF_INET)
+		if (f.daddr.family && f.daddr.family != AF_INET)
 			return 0;
-		memcpy(&addr.data, RTA_DATA(a), 4);
-		addr.bytelen = 4;
+		memcpy(&daddr.data, RTA_DATA(a), 4);
+		daddr.bytelen = 4;
 		family = AF_INET;
 		atype = TCP_METRICS_ATTR_ADDR_IPV4;
+		dlen = RTA_PAYLOAD(a);
 	} else {
 		a = attrs[TCP_METRICS_ATTR_ADDR_IPV6];
 		if (a) {
-			if (f.addr.family && f.addr.family != AF_INET6)
+			if (f.daddr.family && f.daddr.family != AF_INET6)
 				return 0;
-			memcpy(&addr.data, RTA_DATA(a), 16);
-			addr.bytelen = 16;
+			memcpy(&daddr.data, RTA_DATA(a), 16);
+			daddr.bytelen = 16;
 			family = AF_INET6;
 			atype = TCP_METRICS_ATTR_ADDR_IPV6;
+			dlen = RTA_PAYLOAD(a);
 		} else
 			return 0;
 	}
 
-	if (f.addr.family && f.addr.bitlen >= 0 &&
-	    inet_addr_match(&addr, &f.addr, f.addr.bitlen))
+	if (f.daddr.family && f.daddr.bitlen >= 0 &&
+	    inet_addr_match(&daddr, &f.daddr, f.daddr.bitlen))
 		return 0;
 
 	if (f.flushb) {
 		struct nlmsghdr *fn;
 		TCPM_REQUEST(req2, 128, TCP_METRICS_CMD_DEL, NLM_F_REQUEST);
 
-		addattr_l(&req2.n, sizeof(req2), atype, &addr.data,
-			  addr.bytelen);
+		addattr_l(&req2.n, sizeof(req2), atype, &daddr.data,
+			  daddr.bytelen);
 
 		if (NLMSG_ALIGN(f.flushp) + req2.n.nlmsg_len > f.flushe) {
 			if (flush_update())
@@ -161,8 +163,7 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 		fprintf(fp, "Deleted ");
 
 	fprintf(fp, "%s",
-		format_host(family, RTA_PAYLOAD(a), &addr.data,
-			    abuf, sizeof(abuf)));
+		format_host(family, dlen, &daddr.data, abuf, sizeof(abuf)));
 
 	a = attrs[TCP_METRICS_ATTR_AGE];
 	if (a) {
@@ -260,8 +261,8 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 	int ack;
 
 	memset(&f, 0, sizeof(f));
-	f.addr.bitlen = -1;
-	f.addr.family = preferred_family;
+	f.daddr.bitlen = -1;
+	f.daddr.family = preferred_family;
 
 	switch (preferred_family) {
 	case AF_UNSPEC:
@@ -283,14 +284,14 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 		}
 		if (matches(*argv, "help") == 0)
 			usage();
-		if (f.addr.bitlen >= 0)
+		if (f.daddr.bitlen >= 0)
 			duparg2(who, *argv);
 
-		get_prefix(&f.addr, *argv, preferred_family);
-		if (f.addr.bytelen && f.addr.bytelen * 8 == f.addr.bitlen) {
-			if (f.addr.family == AF_INET)
+		get_prefix(&f.daddr, *argv, preferred_family);
+		if (f.daddr.bytelen && f.daddr.bytelen * 8 == f.daddr.bitlen) {
+			if (f.daddr.family == AF_INET)
 				atype = TCP_METRICS_ATTR_ADDR_IPV4;
-			else if (f.addr.family == AF_INET6)
+			else if (f.daddr.family == AF_INET6)
 				atype = TCP_METRICS_ATTR_ADDR_IPV6;
 		}
 		if ((CMD_DEL & cmd) && atype < 0) {
@@ -310,7 +311,7 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 		cmd = CMD_DEL;
 
 	/* flush for all addresses ? Single del without address */
-	if (cmd == CMD_FLUSH && f.addr.bitlen <= 0 &&
+	if (cmd == CMD_FLUSH && f.daddr.bitlen <= 0 &&
 	    preferred_family == AF_UNSPEC) {
 		cmd = CMD_DEL;
 		req.g.cmd = TCP_METRICS_CMD_DEL;
@@ -338,8 +339,8 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 		if (ack)
 			req.n.nlmsg_flags |= NLM_F_ACK;
 		if (atype >= 0)
-			addattr_l(&req.n, sizeof(req), atype, &f.addr.data,
-				  f.addr.bytelen);
+			addattr_l(&req.n, sizeof(req), atype, &f.daddr.data,
+				  f.daddr.bytelen);
 	} else {
 		req.n.nlmsg_flags |= NLM_F_DUMP;
 	}
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH iproute2 0/3] Support for tcp-metrics source address
From: Christoph Paasch @ 2014-02-06 22:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

This patchset implements support for showing and deleting tcp-metrics
based on the source-address.

Christoph Paasch (3):
  tcp_metrics: Rename addr to daddr and add local variable
  tcp_metrics: Display source-address
  tcp_metrics: Allow removal based on the source-IP

 include/linux/tcp_metrics.h |   2 +
 ip/tcp_metrics.c            | 154 +++++++++++++++++++++++++++++++-------------
 2 files changed, 111 insertions(+), 45 deletions(-)

-- 
1.8.3.2

^ permalink raw reply

* [PATCH iproute2 2/3] tcp_metrics: Display source-address
From: Christoph Paasch @ 2014-02-06 22:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <1391724904-30504-1-git-send-email-christoph.paasch@uclouvain.be>

This patch allows to display the source-IP.
stype will be used in the next patch that allows to remove based on the
source-IP.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 include/linux/tcp_metrics.h |  2 ++
 ip/tcp_metrics.c            | 30 ++++++++++++++++++++++++++++--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp_metrics.h b/include/linux/tcp_metrics.h
index cb5157b55f32..54a37b13f2c4 100644
--- a/include/linux/tcp_metrics.h
+++ b/include/linux/tcp_metrics.h
@@ -35,6 +35,8 @@ enum {
 	TCP_METRICS_ATTR_FOPEN_SYN_DROPS,	/* u16, count of drops */
 	TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS,	/* msecs age */
 	TCP_METRICS_ATTR_FOPEN_COOKIE,		/* binary */
+	TCP_METRICS_ATTR_SADDR_IPV4,		/* u32 */
+	TCP_METRICS_ATTR_SADDR_IPV6,		/* binary */
 
 	__TCP_METRICS_ATTR_MAX,
 };
diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
index 4b30771ae3a8..a7215c86379f 100644
--- a/ip/tcp_metrics.c
+++ b/ip/tcp_metrics.c
@@ -95,8 +95,8 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 	struct rtattr *attrs[TCP_METRICS_ATTR_MAX + 1], *a;
 	int len = n->nlmsg_len;
 	char abuf[256];
-	inet_prefix daddr;
-	int family, i, atype, dlen = 0;
+	inet_prefix daddr, saddr;
+	int family, i, atype, stype, dlen = 0, slen = 0;
 
 	if (n->nlmsg_type != genl_family)
 		return -1;
@@ -135,6 +135,26 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 			return 0;
 	}
 
+	a = attrs[TCP_METRICS_ATTR_SADDR_IPV4];
+	if (a) {
+		if (f.saddr.family && f.saddr.family != AF_INET)
+			return 0;
+		memcpy(&saddr.data, RTA_DATA(a), 4);
+		saddr.bytelen = 4;
+		stype = TCP_METRICS_ATTR_SADDR_IPV4;
+		slen = RTA_PAYLOAD(a);
+	} else {
+		a = attrs[TCP_METRICS_ATTR_SADDR_IPV6];
+		if (a) {
+			if (f.saddr.family && f.saddr.family != AF_INET6)
+				return 0;
+			memcpy(&saddr.data, RTA_DATA(a), 16);
+			saddr.bytelen = 16;
+			stype = TCP_METRICS_ATTR_SADDR_IPV6;
+			slen = RTA_PAYLOAD(a);
+		}
+	}
+
 	if (f.daddr.family && f.daddr.bitlen >= 0 &&
 	    inet_addr_match(&daddr, &f.daddr, f.daddr.bitlen))
 		return 0;
@@ -248,6 +268,12 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 		fprintf(fp, " fo_cookie %s", cookie);
 	}
 
+	if (slen) {
+		fprintf(fp, " source %s",
+			format_host(family, slen, &saddr.data, abuf,
+				    sizeof(abuf)));
+	}
+
 	fprintf(fp, "\n");
 
 	fflush(fp);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH iproute2 3/3] tcp_metrics: Allow removal based on the source-IP
From: Christoph Paasch @ 2014-02-06 22:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <1391724904-30504-1-git-send-email-christoph.paasch@uclouvain.be>

This patch allows adding the source-IP attribute to the netlink-command.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 ip/tcp_metrics.c | 87 ++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 62 insertions(+), 25 deletions(-)

diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
index a7215c86379f..e0f03449e94b 100644
--- a/ip/tcp_metrics.c
+++ b/ip/tcp_metrics.c
@@ -75,6 +75,7 @@ static struct
 	int flushe;
 	int cmd;
 	inet_prefix daddr;
+	inet_prefix saddr;
 } f;
 
 static int flush_update(void)
@@ -157,6 +158,12 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 
 	if (f.daddr.family && f.daddr.bitlen >= 0 &&
 	    inet_addr_match(&daddr, &f.daddr, f.daddr.bitlen))
+	       return 0;
+	/* Only check for the source-address if the kernel supports it,
+	 * meaning slen != 0.
+	 */
+	if (slen && f.saddr.family && f.saddr.bitlen >= 0 &&
+	    inet_addr_match(&saddr, &f.saddr, f.saddr.bitlen))
 		return 0;
 
 	if (f.flushb) {
@@ -165,6 +172,9 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 
 		addattr_l(&req2.n, sizeof(req2), atype, &daddr.data,
 			  daddr.bytelen);
+		if (slen)
+			addattr_l(&req2.n, sizeof(req2), stype, &saddr.data,
+				  saddr.bytelen);
 
 		if (NLMSG_ALIGN(f.flushp) + req2.n.nlmsg_len > f.flushe) {
 			if (flush_update())
@@ -283,12 +293,14 @@ static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
 static int tcpm_do_cmd(int cmd, int argc, char **argv)
 {
 	TCPM_REQUEST(req, 1024, TCP_METRICS_CMD_GET, NLM_F_REQUEST);
-	int atype = -1;
+	int atype = -1, stype = -1;
 	int ack;
 
 	memset(&f, 0, sizeof(f));
 	f.daddr.bitlen = -1;
 	f.daddr.family = preferred_family;
+	f.saddr.bitlen = -1;
+	f.saddr.family = preferred_family;
 
 	switch (preferred_family) {
 	case AF_UNSPEC:
@@ -301,31 +313,53 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 	}
 
 	for (; argc > 0; argc--, argv++) {
-		char *who = "address";
-
-		if (strcmp(*argv, "addr") == 0 ||
-		    strcmp(*argv, "address") == 0) {
-			who = *argv;
+		if (strcmp(*argv, "src") == 0 ||
+		    strcmp(*argv, "source") == 0) {
+			char *who = *argv;
 			NEXT_ARG();
-		}
-		if (matches(*argv, "help") == 0)
-			usage();
-		if (f.daddr.bitlen >= 0)
-			duparg2(who, *argv);
-
-		get_prefix(&f.daddr, *argv, preferred_family);
-		if (f.daddr.bytelen && f.daddr.bytelen * 8 == f.daddr.bitlen) {
-			if (f.daddr.family == AF_INET)
-				atype = TCP_METRICS_ATTR_ADDR_IPV4;
-			else if (f.daddr.family == AF_INET6)
-				atype = TCP_METRICS_ATTR_ADDR_IPV6;
-		}
-		if ((CMD_DEL & cmd) && atype < 0) {
-			fprintf(stderr, "Error: a specific IP address is expected rather than \"%s\"\n",
-				*argv);
-			return -1;
-		}
+			if (matches(*argv, "help") == 0)
+				usage();
+			if (f.saddr.bitlen >= 0)
+				duparg2(who, *argv);
+
+			get_prefix(&f.saddr, *argv, preferred_family);
+			if (f.saddr.bytelen && f.saddr.bytelen * 8 == f.saddr.bitlen) {
+				if (f.saddr.family == AF_INET)
+					stype = TCP_METRICS_ATTR_SADDR_IPV4;
+				else if (f.saddr.family == AF_INET6)
+					stype = TCP_METRICS_ATTR_SADDR_IPV6;
+			}
 
+			if (stype < 0) {
+				fprintf(stderr, "Error: a specific IP address is expected rather than \"%s\"\n",
+					*argv);
+				return -1;
+			}
+		} else {
+			char *who = "address";
+			if (strcmp(*argv, "addr") == 0 ||
+			    strcmp(*argv, "address") == 0) {
+				who = *argv;
+				NEXT_ARG();
+			}
+			if (matches(*argv, "help") == 0)
+				usage();
+			if (f.daddr.bitlen >= 0)
+				duparg2(who, *argv);
+
+			get_prefix(&f.daddr, *argv, preferred_family);
+			if (f.daddr.bytelen && f.daddr.bytelen * 8 == f.daddr.bitlen) {
+				if (f.daddr.family == AF_INET)
+					atype = TCP_METRICS_ATTR_ADDR_IPV4;
+				else if (f.daddr.family == AF_INET6)
+					atype = TCP_METRICS_ATTR_ADDR_IPV6;
+			}
+			if ((CMD_DEL & cmd) && atype < 0) {
+				fprintf(stderr, "Error: a specific IP address is expected rather than \"%s\"\n",
+					*argv);
+				return -1;
+			}
+		}
 		argc--; argv++;
 	}
 
@@ -338,7 +372,7 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 
 	/* flush for all addresses ? Single del without address */
 	if (cmd == CMD_FLUSH && f.daddr.bitlen <= 0 &&
-	    preferred_family == AF_UNSPEC) {
+	    f.saddr.bitlen <= 0 && preferred_family == AF_UNSPEC) {
 		cmd = CMD_DEL;
 		req.g.cmd = TCP_METRICS_CMD_DEL;
 		ack = 1;
@@ -367,6 +401,9 @@ static int tcpm_do_cmd(int cmd, int argc, char **argv)
 		if (atype >= 0)
 			addattr_l(&req.n, sizeof(req), atype, &f.daddr.data,
 				  f.daddr.bytelen);
+		if (stype >= 0)
+			addattr_l(&req.n, sizeof(req), stype, &f.saddr.data,
+				  f.saddr.bytelen);
 	} else {
 		req.n.nlmsg_flags |= NLM_F_DUMP;
 	}
-- 
1.8.3.2

^ permalink raw reply related

* Re: Fw: [Bug 70071] New: Sending netconsole messages over a bridged network interface doesn't work anymore
From: Cong Wang @ 2014-02-06 22:18 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Stephen Hemminger, netdev
In-Reply-To: <52F34D41.8000704@lab.ntt.co.jp>

On Thu, Feb 6, 2014 at 12:52 AM, Toshiaki Makita
<makita.toshiaki@lab.ntt.co.jp> wrote:
> (2014/02/06 15:49), Cong Wang wrote:
>> On Wed, Feb 5, 2014 at 9:44 PM, Toshiaki Makita
>> <makita.toshiaki@lab.ntt.co.jp> wrote:
>>> Tested this patch with latest net-tree and netconsole works with it.
>>> But I thinks it is better to move that "if" to br_add_if() because if we
>>> don't have npinfo, we don't have to alloc p->np in br_add_if(), right?
>>>
>>
>> Hmm, we shouldn't handle netpoll-specific code inside br_add_if(),
>> we probably need the attached patch instead. Please give it
>> a try, or I will test it tomorrow, it's too late here.
>>
>
> I tested whether netconsole works and whether it can be built
> with/without CONFIG_NET_POLL_CONTROLLER, and couldn't find any problem.
> This looks good to me.
>

Excellent! I will send it formally.

Thanks for testing!

^ permalink raw reply

* Re: [GIT PULL] tree-wide: clean up no longer required #include <linux/init.h>
From: Stephen Rothwell @ 2014-02-06 22:25 UTC (permalink / raw)
  To: torvalds
  Cc: Paul Gortmaker, linux-alpha, linux-arch, linux-arm-kernel,
	linux-ia64, linux-m68k, linux-mips, linuxppc-dev, linux-s390,
	sparclinux, x86, netdev, kvm, rusty, gregkh, akpm
In-Reply-To: <1391547118-21967-1-git-send-email-paul.gortmaker@windriver.com>

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

Hi Linus,

On Tue, 4 Feb 2014 15:51:58 -0500 Paul Gortmaker <paul.gortmaker@windriver.com> wrote:
>
> We've had this in linux-next for 2+ weeks (thanks Stephen!) as a
> linux-stable like queue of patches, and as can be seen here:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
> 
> most of the changes in the last week have been trivial adding acks
> or dropping patches that maintainers decided to take themselves.

Any thoughts on merging this?  I can feel it bitrotting :-(

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: RTNL: assertion failed at net/core/dev.c (4494) and RTNL: assertion failed at net/core/rtnetlink.c (940)
From: Jay Vosburgh @ 2014-02-06 22:33 UTC (permalink / raw)
  To: Cong Wang
  Cc: Thomas Glanzmann, Eric Dumazet, netdev, Veaceslav Falico, andy,
	Jiří Pírko, sfeldma@cumulusnetworks.com
In-Reply-To: <CAHA+R7McuqYDshugsWcaeBbBsEg3QY59LiovYthcq6bH1fLTNg@mail.gmail.com>


Cong Wang <cwang@twopensource.com> wrote:

>On Thu, Feb 6, 2014 at 2:07 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>> Jay Vosburgh <fubar@us.ibm.com> wrote:
>>
>>>Cong Wang <cwang@twopensource.com> wrote:
>>>
>>>
>>>       That would eliminate the warning, but is suboptimal.  Acquiring
>>>RTNL is not necessary on the vast majority of state machine runs
>>>(because no state changes take place, i.e., no ports are disabled or
>>>enabled).  The above change would add 10 round trips per second to RTNL,
>>>which seems excessive.
>>>
>>>       Also, we cannot unconditionally acquire RTNL in this function,
>>>as it would race with the call to cancel_delayed_work_sync from
>>>bond_close (via bond_work_cancel_all).
>
>OK.
>
>>
>>         Thought of one more problem: we can't hold a regular lock while
>> calling rtmsg_ifinfo, as it may sleep in alloc_skb.  The rtmsg_ifinfo
>> call has to be RTNL and nothing else.
>>
>
>s/GFP_KERNEL/GFP_ATOMIC/

	Yah, that would help with extra locks, but not totally solve
things.  I'm looking around, and seeing a number of other places that
will end up at one of these rtmsg_ifinfo calls with incorrect locking:

	bond_ab_arp_probe calls via bond_set_slave_active_flags and
bond_set_slave_inactive_flags without RTNL.

	bond_change_active_slave calls via bond_set_slave_inactive_flags
and bond_set_slave_active_flags with other locks held, and maybe without
RTNL; I'm not sure if bond_option_active_slave_set holds RTNL when it
calls bond_select_active_slave.

	bond_open calls via bond_set_slave_active_flags and
bond_set_slave_inactive_flags with RTNL, but also with other locks held.

	bond_loadbalance_arp_mon calls bond_set_active_slave and
bond_set_backup_slave without RTNL.

	This is in addition to the cases in the 802.3ad code from
__enable_port and __disable_port calls.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH] net: asix: fix bad header length bug
From: Emil Goode @ 2014-02-06 22:41 UTC (permalink / raw)
  To: David Laight
  Cc: 'Igor Gnatenko', David S. Miller, Ming Lei, Mark Brown,
	Jeff Kirsher, Glen Turner, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6B99F7@AcuExch.aculab.com>

On Thu, Feb 06, 2014 at 03:28:13PM +0000, David Laight wrote:
> From: Igor Gnatenko
> > On Thu, 2014-02-06 at 13:56 +0100, Emil Goode wrote:
> > > The AX88772B occasionally send rx packets that cross urb boundaries
> > > and the remaining partial packet is sent with no header.
> > > When the buffer with a partial packet is of less number of octets
> > > than the value of hard_header_len the buffer is discarded by the
> > > usbnet module. This is causing dropped packages and error messages
> > > in dmesg.
> > >
> > > This can be reproduced by using ping with a packet size
> > > between 1965-1976.
> > >
> > > The bug has been reported here:
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=29082
> > >
> > > Signed-off-by: Emil Goode <emilgoode@gmail.com>
> > Reported-and-tested-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
> > > ---
> > >  drivers/net/usb/asix_devices.c |    1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
> > > index 9765a7d..120bb29 100644
> > > --- a/drivers/net/usb/asix_devices.c
> > > +++ b/drivers/net/usb/asix_devices.c
> > > @@ -455,6 +455,7 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
> > >  	dev->net->ethtool_ops = &ax88772_ethtool_ops;
> > >  	dev->net->needed_headroom = 4; /* cf asix_tx_fixup() */
> > >  	dev->net->needed_tailroom = 4; /* cf asix_tx_fixup() */
> > > +	dev->net->hard_header_len = 0; /* Partial packets have no header */
> 
> That is the wrong place for the fix.
> 
> It should only be done when rx_urb_size is set to a multiple of the usb
> packet size.
> That is only done for some of the supported devices.
> 
> In fact, if the rx_urb_size is a multiple of the usb frame size (or 1k)
> then maybe the usbnet code should assume that the driver is capable
> of processing ethernet frames that cross usb packet boundaries and
> not delete short packets at all - regardless of the hard_header_len.
> 
> 	David
>

I will do some more digging in the code, but the test of skb->len
against hard_header_len is done already in the completion callback
function passed to usb_fill_bulk_urb so it seems that buffers of less
than hard_header_len number of octets will be dropped regardless.

Best regards,

Emil Goode

^ permalink raw reply

* Re: [RFC PATCH] udp4: Don't take socket reference in receive path
From: Tom Herbert @ 2014-02-06 22:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Linux Netdev List, Eric Dumazet
In-Reply-To: <1391720338.10160.12.camel@edumazet-glaptop2.roam.corp.google.com>

On Thu, Feb 6, 2014 at 12:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-02-06 at 11:58 -0800, Tom Herbert wrote:
>> The reference counting in the UDP receive path is quite expensive for
>> a socket that is share amoungst CPUs. This is probably true for normal
>> sockets, but really is painful when just using the socket for
>> receive encapsulation.
>>
>> udp4_lib_lookup always takes a socket reference, and we also put back
>> the reference after calling udp_queue_rcv_skb in the normal receive
>> path, so the need for taking the reference seems to be to hold the
>> socket after doing rcu_read_unlock. This patch modifies udp_lib_lookup
>> to optionally take a reference and is always called with rcu_read_lock.
>> In udp4_lib_rcv we call lib_lookup and udp_queue_rcv under the
>> rcu_read_lock but without having taken the reference.
>>
>> Requesting comments because I suspect there are nuances to this!
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>
> Unfortunately this cant work.
>
> When I did the RCU implementation for TCP/UDP, we chose to use
> SLAB_DESTROY_BY_RCU.
>
> This meant we have to take a reference, then check again the keys for
> the lookup.
>
> If we remove SLAB_DESTROY_BY_RCU, we kill performance for short lived
> sessions, because of call_rcu() added latencies.
>
Thanks for the explanation.

> (One UDP socket is about 1024 bytes in memory, call_rcu() grace period
> is throwing away 1024 bytes from cpu caches)
>
> Sure, in your case you know your udp sessions are not short lived,
> but many applications used UDP for DNS lookups, using few packets per
> socket.
>
The rationale for SLAB_DESTROY_BY_RCU might be different for UDP than
TCP. For instance, in the DNS example small connected UDP flows are
more an issue on the client, the server (which is likely to have much
greater load) should be using unconnected sockets.

In any case, I am still looking for a way to address this. Like I said
in the commit log, this per packet cost for UDP processing is far too
high at least in encapsulation path. I thought about extending
SO__REUSEPORT to provide CPU affinity but that seems like overkill
with its own performance implications. Alternatively, we could have
fast path for the encapsulation using UDP offload model which bypass
sockets completely which seems unpleasant.

>
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox