Netdev List
 help / color / mirror / Atom feed
* Re: Linux 3.1-rc9
From: Simon Kirby @ 2011-11-03  0:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development
In-Reply-To: <1320279351.4793.60.camel@gandalf.stny.rr.com>

On Wed, Nov 02, 2011 at 08:15:51PM -0400, Steven Rostedt wrote:

> On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
> >  
> > [   49.032008] other info that might help us debug this:
> > [   49.032008] 
> > [   49.032008]  Possible unsafe locking scenario:
> > [   49.032008] 
> > [   49.032008]        CPU0                    CPU1
> > [   49.032008]        ----                    ----
> > [   49.032008]   lock(slock-AF_INET);
> > [   49.039565]                                lock(slock-AF_INET/1);
> > [   49.039565]                                lock(slock-AF_INET);
> > [   49.039565]   lock(slock-AF_INET/1);
> > [   49.039565] 
> > [   49.039565]  *** DEADLOCK ***
> > [   49.039565] 
> 
> > Did that help? I'm not sure if that's what you wanted to see...
> 
> 
> Yes, this looks much better than what you previously showed. The added
> "/1" makes a world of difference.
> 
> Thanks!
> 
> I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
> you are chasing, but instead fixed the output of what the bug
> produced ;)

Well, I was testing this without Eric's patch as I figured you wanted to
see the splat. :) Testing again with Eric's patch now.

Simon-

^ permalink raw reply

* Re: Linux 3.1-rc9
From: Steven Rostedt @ 2011-11-03  0:15 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development
In-Reply-To: <20111103000941.GJ5971@hostway.ca>

On Wed, 2011-11-02 at 17:09 -0700, Simon Kirby wrote:
>  
> [   49.032008] other info that might help us debug this:
> [   49.032008] 
> [   49.032008]  Possible unsafe locking scenario:
> [   49.032008] 
> [   49.032008]        CPU0                    CPU1
> [   49.032008]        ----                    ----
> [   49.032008]   lock(slock-AF_INET);
> [   49.039565]                                lock(slock-AF_INET/1);
> [   49.039565]                                lock(slock-AF_INET);
> [   49.039565]   lock(slock-AF_INET/1);
> [   49.039565] 
> [   49.039565]  *** DEADLOCK ***
> [   49.039565] 

> Did that help? I'm not sure if that's what you wanted to see...


Yes, this looks much better than what you previously showed. The added
"/1" makes a world of difference.

Thanks!

I'll add your "Tested-by". Seems rather strange as we didn't fix the bug
you are chasing, but instead fixed the output of what the bug
produced ;)

-- Steve

^ permalink raw reply

* Re: Linux 3.1-rc9
From: Simon Kirby @ 2011-11-03  0:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development
In-Reply-To: <20111102230009.GB27457@home.goodmis.org>

On Wed, Nov 02, 2011 at 07:00:10PM -0400, Steven Rostedt wrote:

> On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> > Thomas pointed me here.
> > 
> > On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > > [104661.244767] 
> > > [104661.244767]  Possible unsafe locking scenario:
> > > [104661.244767]        
> > > [104661.244767]        CPU0                    CPU1
> > > [104661.244767]        ----                    ----
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]                                lock(slock-AF_INET);
> > > [104661.244767]   lock(slock-AF_INET);
> > > [104661.244767] 
> > > [104661.244767]  *** DEADLOCK ***
> > > [104661.244767] 
> > 
> > Bah, I used the __print_lock_name() function to show the lock names in
> > the above, which leaves off the subclass number. I'll go write up a
> > patch that fixes that.
> > 
> 
> Simon,
> 
> If you are still triggering the bug. Could you do me a favor and apply
> the following patch. Just to make sure it fixes the confusing output
> from above.
> 
> Thanks,
> 
> -- Steve
> 
> 
> diff --git a/kernel/lockdep.c b/kernel/lockdep.c
> index 91d67ce..d821ac9 100644
> --- a/kernel/lockdep.c
> +++ b/kernel/lockdep.c
> @@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
>  	usage[i] = '\0';
>  }
>  
> -static int __print_lock_name(struct lock_class *class)
> +static void __print_lock_name(struct lock_class *class)
>  {
>  	char str[KSYM_NAME_LEN];
>  	const char *name;
>  
>  	name = class->name;
> -	if (!name)
> +	if (!name) {
>  		name = __get_key_name(class->key, str);
> -
> -	return printk("%s", name);
> +		printk("%s", name);
> +	} else {
> +		printk("%s", name);
> +		if (class->name_version > 1)
> +			printk("#%d", class->name_version);
> +		if (class->subclass)
> +			printk("/%d", class->subclass);
> +	}
>  }
>  
>  static void print_lock_name(struct lock_class *class)
> @@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
>  
>  	get_usage_chars(class, usage);
>  
> -	name = class->name;
> -	if (!name) {
> -		name = __get_key_name(class->key, str);
> -		printk(" (%s", name);
> -	} else {
> -		printk(" (%s", name);
> -		if (class->name_version > 1)
> -			printk("#%d", class->name_version);
> -		if (class->subclass)
> -			printk("/%d", class->subclass);
> -	}
> +	printk(" (");
> +	__print_lock_name(class);
>  	printk("){%s}", usage);
>  }

Hello!

I am now able to reproduce on demand by just starting an "ab" from
another box and "ip route add blackhole <other machine>" on the target
box while the ab is running. The first time I tried this without your
patch, and got the trace I had before. With your patch, I got this:

[  366.198866] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?
[  366.198981] 
[  366.198982] =================================
[  366.199118] [ INFO: inconsistent lock state ]
[  366.199189] 3.1.0-hw-lockdep+ #58
[  366.199259] ---------------------------------
[  366.199331] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[  366.199407] kworker/0:1/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
[  366.199480]  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.199773] {IN-SOFTIRQ-W} state was registered at:
[  366.199846]   [<ffffffff81098b7c>] __lock_acquire+0xcbc/0x2180
[  366.199973]   [<ffffffff8109a149>] lock_acquire+0x109/0x140
[  366.200096]   [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[  366.200220]   [<ffffffff8166eb6d>] udp_queue_rcv_skb+0x26d/0x4b0
[  366.200346]   [<ffffffff8166f4d3>] __udp4_lib_rcv+0x2f3/0x910
[  366.200470]   [<ffffffff8166fb05>] udp_rcv+0x15/0x20
[  366.200592]   [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[  366.200718]   [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[  366.200841]   [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[  366.200965]   [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[  366.201086]   [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[  366.201211]   [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[  366.201335]   [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[  366.201458]   [<ffffffff810640e8>] __do_softirq+0x138/0x250
[  366.201582]   [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.201706]   [<ffffffff810153c5>] do_softirq+0x95/0xd0
[  366.202822]   [<ffffffff81063efd>] local_bh_enable+0xed/0x110
[  366.202822]   [<ffffffff81617a68>] dev_queue_xmit+0x1a8/0x8a0
[  366.202822]   [<ffffffff81621fca>] neigh_resolve_output+0x17a/0x220
[  366.202822]   [<ffffffff8164ab7c>] ip_finish_output+0x2ec/0x590
[  366.202822]   [<ffffffff8164aea8>] ip_output+0x88/0xe0
[  366.202822]   [<ffffffff81649b08>] ip_local_out+0x28/0x80
[  366.202822]   [<ffffffff81649b69>] ip_send_skb+0x9/0x40
[  366.202822]   [<ffffffff8166dce8>] udp_send_skb+0x108/0x370
[  366.202822]   [<ffffffff8167093c>] udp_sendmsg+0x7dc/0x920
[  366.202822]   [<ffffffff81678c4f>] inet_sendmsg+0xbf/0x120
[  366.202822]   [<ffffffff81602193>] sock_sendmsg+0xe3/0x110
[  366.202822]   [<ffffffff81602ab5>] sys_sendto+0x105/0x140
[  366.202822]   [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[  366.202822] irq event stamp: 1175966
[  366.202822] hardirqs last  enabled at (1175964): [<ffffffff816f9174>] restore_args+0x0/0x30
[  366.202822] hardirqs last disabled at (1175965): [<ffffffff8106415d>] __do_softirq+0x1ad/0x250
[  366.202822] softirqs last  enabled at (1175966): [<ffffffff810641a6>] __do_softirq+0x1f6/0x250
[  366.202822] softirqs last disabled at (1175907): [<ffffffff817030fc>] call_softirq+0x1c/0x30
[  366.202822] 
[  366.202822] other info that might help us debug this:
[  366.202822]  Possible unsafe locking scenario:
[  366.202822] 
[  366.202822]        CPU0
[  366.202822]        ----
[  366.202822]   lock(slock-AF_INET);
[  366.202822]   <Interrupt>
[  366.202822]     lock(slock-AF_INET);
[  366.202822] 
[  366.202822]  *** DEADLOCK ***
[  366.202822] 
[  366.202822] 1 lock held by kworker/0:1/0:
[  366.202822]  #0:  (slock-AF_INET){+.?...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[  366.202822] 
[  366.202822] stack backtrace:
[  366.202822] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-hw-lockdep+ #58
[  366.202822] Call Trace:
[  366.202822]  [<ffffffff81095f31>] print_usage_bug+0x241/0x310
[  366.202822]  [<ffffffff810964b4>] mark_lock+0x4b4/0x6c0
[  366.202822]  [<ffffffff81097300>] ? check_usage_forwards+0x110/0x110
[  366.202822]  [<ffffffff81096762>] mark_held_locks+0xa2/0x130
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff81096b0d>] trace_hardirqs_on_caller+0x13d/0x1c0
[  366.202822]  [<ffffffff813a60ee>] trace_hardirqs_on_thunk+0x3a/0x3f
[  366.202822]  [<ffffffff816f9174>] ? retint_restore_args+0x13/0x13
[  366.202822]  [<ffffffff8101b80e>] ? mwait_idle+0x14e/0x170
[  366.202822]  [<ffffffff8101b805>] ? mwait_idle+0x145/0x170
[  366.202822]  [<ffffffff81013156>] cpu_idle+0x96/0xf0
[  366.202822]  [<ffffffff816ef2eb>] start_secondary+0x1ca/0x1ff

...which of course is a different splat, so I ran it again:

[   49.028097] =======================================================
[   49.028244] [ INFO: possible circular locking dependency detected ]
[   49.028321] 3.1.0-hw-lockdep+ #58
[   49.028391] -------------------------------------------------------
[   49.028466] tcsh/2490 is trying to acquire lock:
[   49.028539]  (slock-AF_INET/1){+.-...}, at: [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.028882] 
[   49.028883] but task is already holding lock:
[   49.029018]  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.029310] 
[   49.029310] which lock already depends on the new lock.
[   49.029312] 
[   49.029513] 
[   49.029514] the existing dependency chain (in reverse order) is:
[   49.029653] 
[   49.029654] -> #1 (slock-AF_INET){+.-...}:
[   49.029986]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.030115]        [<ffffffff816f842c>] _raw_spin_lock+0x3c/0x50
[   49.030242]        [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.031959]        [<ffffffff8164f963>] inet_csk_clone+0x13/0x90
[   49.032008]        [<ffffffff816697d5>] tcp_create_openreq_child+0x25/0x4d0
[   49.032008]        [<ffffffff81667aa8>] tcp_v4_syn_recv_sock+0x48/0x2c0
[   49.032008]        [<ffffffff81669625>] tcp_check_req+0x335/0x4c0
[   49.032008]        [<ffffffff81666c8e>] tcp_v4_do_rcv+0x29e/0x460
[   49.032008]        [<ffffffff816676dc>] tcp_v4_rcv+0x88c/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81613ce0>] process_backlog+0xd0/0x1e0
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063ded>] local_bh_enable_ip+0xed/0x110
[   49.032008]        [<ffffffff816f8ccf>] _raw_spin_unlock_bh+0x3f/0x50
[   49.032008]        [<ffffffff81605ca1>] release_sock+0x161/0x1d0
[   49.032008]        [<ffffffff8167911d>] inet_stream_connect+0x6d/0x2f0
[   49.032008]        [<ffffffff815ffe4b>] kernel_connect+0xb/0x10
[   49.032008]        [<ffffffff816addb6>] xs_tcp_setup_socket+0x2a6/0x4c0
[   49.032008]        [<ffffffff81078d29>] process_one_work+0x1e9/0x560
[   49.032008]        [<ffffffff81079433>] worker_thread+0x193/0x420
[   49.032008]        [<ffffffff81080496>] kthread+0x96/0xb0
[   49.032008]        [<ffffffff81703004>] kernel_thread_helper+0x4/0x10
[   49.032008] 
[   49.032008] -> #0 (slock-AF_INET/1){+.-...}:
[   49.032008]        [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.032008]        [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.032008]        [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.032008]        [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.032008]        [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.032008]        [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.032008]        [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.032008]        [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.032008]        [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.032008]        [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.032008]        [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.032008]        [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.032008]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.032008]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.032008]        [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.032008]        [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.032008]        [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.032008]        [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.032008]        [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.032008]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.032008]        [<ffffffff816f90b3>] ret_from_intr+0x0/0x1a
[   49.032008]        [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.032008]        [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.032008]        [<ffffffff81060590>] do_wait+0x150/0x270
[   49.032008]        [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.032008]        [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.032008] 
[   49.032008] other info that might help us debug this:
[   49.032008] 
[   49.032008]  Possible unsafe locking scenario:
[   49.032008] 
[   49.032008]        CPU0                    CPU1
[   49.032008]        ----                    ----
[   49.032008]   lock(slock-AF_INET);
[   49.039565]                                lock(slock-AF_INET/1);
[   49.039565]                                lock(slock-AF_INET);
[   49.039565]   lock(slock-AF_INET/1);
[   49.039565] 
[   49.039565]  *** DEADLOCK ***
[   49.039565] 
[   49.039565] 3 locks held by tcsh/2490:
[   49.039565]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff8160738e>] sk_clone+0x10e/0x3e0
[   49.039565]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81613815>] __netif_receive_skb+0x165/0x560
[   49.039565]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816446d0>] ip_local_deliver_finish+0x40/0x2f0
[   49.039565] 
[   49.039565] stack backtrace:
[   49.039565] Pid: 2490, comm: tcsh Not tainted 3.1.0-hw-lockdep+ #58
[   49.039565] Call Trace:
[   49.039565]  <IRQ>  [<ffffffff81097dab>] print_circular_bug+0x21b/0x330
[   49.039565]  [<ffffffff81099f00>] __lock_acquire+0x2040/0x2180
[   49.039565]  [<ffffffff8109a149>] lock_acquire+0x109/0x140
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816f83da>] _raw_spin_lock_nested+0x3a/0x50
[   49.039565]  [<ffffffff816676b7>] ? tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816676b7>] tcp_v4_rcv+0x867/0xc10
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644790>] ip_local_deliver_finish+0x100/0x2f0
[   49.039565]  [<ffffffff816446d0>] ? ip_local_deliver_finish+0x40/0x2f0
[   49.039565]  [<ffffffff81644a0d>] ip_local_deliver+0x8d/0xa0
[   49.039565]  [<ffffffff81644033>] ip_rcv_finish+0x1a3/0x510
[   49.039565]  [<ffffffff81644612>] ip_rcv+0x272/0x2f0
[   49.039565]  [<ffffffff81613b87>] __netif_receive_skb+0x4d7/0x560
[   49.039565]  [<ffffffff81613815>] ? __netif_receive_skb+0x165/0x560
[   49.039565]  [<ffffffff81615c44>] netif_receive_skb+0x104/0x120
[   49.039565]  [<ffffffff81615b63>] ? netif_receive_skb+0x23/0x120
[   49.039565]  [<ffffffff816161cb>] ? dev_gro_receive+0x29b/0x380
[   49.039565]  [<ffffffff816160c2>] ? dev_gro_receive+0x192/0x380
[   49.039565]  [<ffffffff81615d90>] napi_skb_finish+0x50/0x70
[   49.039565]  [<ffffffff81616455>] napi_gro_receive+0xc5/0xd0
[   49.039565]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[   49.039565]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[   49.039565]  [<ffffffff816166a0>] net_rx_action+0x140/0x2c0
[   49.039565]  [<ffffffff810640e8>] __do_softirq+0x138/0x250
[   49.039565]  [<ffffffff817030fc>] call_softirq+0x1c/0x30
[   49.039565]  [<ffffffff810153c5>] do_softirq+0x95/0xd0
[   49.039565]  [<ffffffff81063cbd>] irq_exit+0xdd/0x110
[   49.039565]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0
[   49.039565]  [<ffffffff816f90b3>] common_interrupt+0x73/0x73
[   49.039565]  <EOI>  [<ffffffff810944fd>] ? trace_hardirqs_off+0xd/0x10
[   49.039565]  [<ffffffff816f864f>] ? _raw_write_unlock_irq+0x2f/0x50
[   49.039565]  [<ffffffff816f864b>] ? _raw_write_unlock_irq+0x2b/0x50
[   49.039565]  [<ffffffff8105f63f>] release_task+0x24f/0x4c0
[   49.039565]  [<ffffffff8105f414>] ? release_task+0x24/0x4c0
[   49.039565]  [<ffffffff810601de>] wait_consider_task+0x92e/0xb90
[   49.039565]  [<ffffffff81096b0d>] ? trace_hardirqs_on_caller+0x13d/0x1c0
[   49.039565]  [<ffffffff81060590>] do_wait+0x150/0x270
[   49.039565]  [<ffffffff81096b9d>] ? trace_hardirqs_on+0xd/0x10
[   49.039565]  [<ffffffff81060751>] sys_wait4+0xa1/0xf0
[   49.039565]  [<ffffffff8105e9b0>] ? wait_noreap_copyout+0x150/0x150
[   49.039565]  [<ffffffff81700e92>] system_call_fastpath+0x16/0x1b
[   49.045277] huh, entered softirq 3 NET_RX ffffffff81616560 preempt_count 00000102, exited with 00000103?

Did that help? I'm not sure if that's what you wanted to see...

Simon-

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: Tony Lindgren @ 2011-11-02 23:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, linux-kernel, linux-omap
In-Reply-To: <4EB1D2C7.2070700@gmail.com>

* Eric Dumazet <eric.dumazet@gmail.com> [111102 15:56]:
> On 03/11/2011 00:24, Tony Lindgren wrote:
> 
> 
> > 
> > Seems to be SLOB for omap1_defconfig.
> > 
> > Tony
> 
> 
> OK this makes sense now
> 
> Your patch is absolutely needed, I completely forgot about SLOB :(
> 
> since, kmalloc(386) on SLOB gives exactly ksize=386 bytes, not nearest
> power of two.
> 
> [   60.305763] malloc(size=385)->ffff880112c11e38 ksize=386 -> nsize=2
> [   60.305921] malloc(size=385)->ffff88007c92ce28 ksize=386 -> nsize=2
> [   60.306898] malloc(size=656)->ffff88007c44ad28 ksize=656 -> nsize=272
> [   60.325385] malloc(size=656)->ffff88007c575868 ksize=656 -> nsize=272
> [   60.325531] malloc(size=656)->ffff88011c777230 ksize=656 -> nsize=272
> [   60.325701] malloc(size=656)->ffff880114011008 ksize=656 -> nsize=272
> [   60.346716] malloc(size=385)->ffff880114142008 ksize=386 -> nsize=2
> [   60.346900] malloc(size=385)->ffff88011c777690 ksize=386 -> nsize=2
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Thanks !

OK, thanks for explaining why it happens :) I also verified this does
not happen with SLAB or SLUB, only with SLOB.

I've updated the patch withyour comments and ack, updated patch below.

Regards,

Tony 


From: Tony Lindgren <tony@atomide.com>
Date: Wed, 2 Nov 2011 15:46:41 -0700
Subject: [PATCH] net: Add back alignment for size for __alloc_skb

Commit 87fb4b7b533073eeeaed0b6bf7c2328995f6c075 (net: more
accurate skb truesize) changed the alignment of size. This
can cause problems at least on some machines with NFS root:

Unhandled fault: alignment exception (0x801) at 0xc183a43a
Internal error: : 801 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (3.1.0-08784-g5eeee4a #733)
pc : [<c02fbba0>]    lr : [<c02fbb9c>]    psr: 60000013
sp : c180fef8  ip : 00000000  fp : c181f580
r10: 00000000  r9 : c044b28c  r8 : 00000001
r7 : c183a3a0  r6 : c1835be0  r5 : c183a412  r4 : 000001f2
r3 : 00000000  r2 : 00000000  r1 : ffffffe6  r0 : c183a43a
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 10004000  DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc180e270)
Stack: (0xc180fef8 to 0xc1810000)
fee0:                                                       00000024 00000000
ff00: 00000000 c183b9c0 c183b8e0 c044b28c c0507ccc c019dfc4 c180ff2c c0503cf8
ff20: c180ff4c c180ff4c 00000000 c1835420 c182c740 c18349c0 c05233c0 00000000
ff40: 00000000 c00e6bb8 c180e000 00000000 c04dd82c c0507e7c c050cc18 c183b9c0
ff60: c05233c0 00000000 00000000 c01f34f4 c0430d70 c019d364 c04dd898 c04dd898
ff80: c04dd82c c0507e7c c180e000 00000000 c04c584c c01f4918 c04dd898 c04dd82c
ffa0: c04ddd28 c180e000 00000000 c0008758 c181fa60 3231d82c 00000037 00000000
ffc0: 00000000 c04dd898 c04dd82c c04ddd28 00000013 00000000 00000000 00000000
ffe0: 00000000 c04b2224 00000000 c04b21a0 c001056c c001056c 00000000 00000000
Function entered at [<c02fbba0>] from [<c019dfc4>]
Function entered at [<c019dfc4>] from [<c01f34f4>]
Function entered at [<c01f34f4>] from [<c01f4918>]
Function entered at [<c01f4918>] from [<c0008758>]
Function entered at [<c0008758>] from [<c04b2224>]
Function entered at [<c04b2224>] from [<c001056c>]
Code: e1a00005 e3a01028 ebfa7cb0 e35a0000 (e5858028)

Here PC is at __alloc_skb and &shinfo->dataref is unaligned because
skb->end can be unaligned without this patch.

As explained by Eric Dumazet <eric.dumazet@gmail.com>, this happens
only with SLOB, and not with SLAB or SLUB:

* Eric Dumazet <eric.dumazet@gmail.com> [111102 15:56]:
>
> Your patch is absolutely needed, I completely forgot about SLOB :(
>
> since, kmalloc(386) on SLOB gives exactly ksize=386 bytes, not nearest
> power of two.
>
> [   60.305763] malloc(size=385)->ffff880112c11e38 ksize=386 -> nsize=2
> [   60.305921] malloc(size=385)->ffff88007c92ce28 ksize=386 -> nsize=2
> [   60.306898] malloc(size=656)->ffff88007c44ad28 ksize=656 -> nsize=272
> [   60.325385] malloc(size=656)->ffff88007c575868 ksize=656 -> nsize=272
> [   60.325531] malloc(size=656)->ffff88011c777230 ksize=656 -> nsize=272
> [   60.325701] malloc(size=656)->ffff880114011008 ksize=656 -> nsize=272
> [   60.346716] malloc(size=385)->ffff880114142008 ksize=386 -> nsize=2
> [   60.346900] malloc(size=385)->ffff88011c777690 ksize=386 -> nsize=2

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -189,6 +189,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
+	size = SKB_DATA_ALIGN(size);
 	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	data = kmalloc_node_track_caller(size, gfp_mask, node);
 	if (!data)

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: Eric Dumazet @ 2011-11-02 23:31 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: David Miller, netdev, linux-kernel, linux-omap
In-Reply-To: <20111102232403.GO31337@atomide.com>

On 03/11/2011 00:24, Tony Lindgren wrote:


> 
> Seems to be SLOB for omap1_defconfig.
> 
> Tony


OK this makes sense now

Your patch is absolutely needed, I completely forgot about SLOB :(

since, kmalloc(386) on SLOB gives exactly ksize=386 bytes, not nearest
power of two.

[   60.305763] malloc(size=385)->ffff880112c11e38 ksize=386 -> nsize=2
[   60.305921] malloc(size=385)->ffff88007c92ce28 ksize=386 -> nsize=2
[   60.306898] malloc(size=656)->ffff88007c44ad28 ksize=656 -> nsize=272
[   60.325385] malloc(size=656)->ffff88007c575868 ksize=656 -> nsize=272
[   60.325531] malloc(size=656)->ffff88011c777230 ksize=656 -> nsize=272
[   60.325701] malloc(size=656)->ffff880114011008 ksize=656 -> nsize=272
[   60.346716] malloc(size=385)->ffff880114142008 ksize=386 -> nsize=2
[   60.346900] malloc(size=385)->ffff88011c777690 ksize=386 -> nsize=2

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks !


^ permalink raw reply

* Re: Subnet router anycast for FE80/10 ?
From: David Stevens @ 2011-11-02 23:27 UTC (permalink / raw)
  To: David Lamparter; +Cc: Andreas Hofmeister, David Lamparter, netdev, netdev-owner
In-Reply-To: <20111102215359.GJ297914@jupiter.n2.diac24.net>

netdev-owner@vger.kernel.org wrote on 11/02/2011 02:53:59 PM:

> From: David Lamparter <equinox@diac24.net>

> Going back to Andreas's original question about Subnet-Router Anycast
> for fe80::/64 (or /10), RFC 4291 says
>    +------------------------------------------------+----------------+
>    |                   subnet prefix                | 00000000000000 |
>    +------------------------------------------------+----------------+
> 
>    The "subnet prefix" in an anycast address is the prefix that
>    identifies a specific link.
> 
> But fe80::/64 does not identify a specific link, as it is link-local and
> would specify all links but not one specifically. So, fe80:: is not a
> Subnet-Router anycast address, I'd say.

fe80:: is in fact a subnet-router anycast address because it's a valid
prefix with all-0's host part. Adding it, as linux does,  simply means
a host can use "fe80::" to get an answer from any router on a particular
link. That might be useful, e.g., if a host wants routing protocol 
information
from a directly attached router.

It isn't unique on a host, but all LL addresses require a scope_id to
identify an interface, anyway, so there is no ambiguity. Any multihomed
v6 host will have multiple fe80/10 routes -- one for each interface--
too, used for receiving packets. Those routes and those anycast addresses
only matter for input processing, as is true for all LL addresses.

So, there's no particular reason I see to treat LL as a special case and
exclude them. I haven't seen any RFC wording that forbids it, though the
implementation predates RFC4291.

You certainly don't have to use them in any application if you don't like 
it. :-)

                                                                +-DLS

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: Tony Lindgren @ 2011-11-02 23:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, linux-kernel, linux-omap
In-Reply-To: <4EB1D05B.6020605@gmail.com>

* Eric Dumazet <eric.dumazet@gmail.com> [111102 15:46]:
> On 03/11/2011 00:19, Tony Lindgren wrote:
> 
> > * David Miller <davem@davemloft.net> [111102 15:42]:
> >> c> 
> >>> The issue is calculation of skb->end, which is based upon calculated
> >>> 'size' variable.
> >>>
> >>> skb->end determines alignment of skb_shared_info, which is where the
> >>> alignment problem is occuring for Tony.
> >>
> >> Right, and SMP_CACHE_BYTES setting should save us in any case.
> >>
> >> For ARM, SMP_CACHE_BYTES seems to be set to L1_CACHE_BYTES which in
> >> turn is set via ARM_L1_CACHE_SHIFT which can be set seemingly to any
> >> value but the defaults are 5 and 6 which should be OK.
> >>
> >> So unless Tony is using a non-standard setting of ARM_L1_CACHE_SHIFT,
> >> this report is a bit mysterious.
> > 
> > This is happening at least with omap1_defconfig. In that case we have
> > ARM_L1_CACHE_SHIFT=5.
> > 
> > Regards,
> > 
> > Tony
> 
> 
> Do you use SLOB, SLUB or SLAB ?

Seems to be SLOB for omap1_defconfig.

Tony

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: David Miller @ 2011-11-02 23:22 UTC (permalink / raw)
  To: tony; +Cc: eric.dumazet, netdev, linux-kernel, linux-omap
In-Reply-To: <20111102231917.GN31337@atomide.com>

From: Tony Lindgren <tony@atomide.com>
Date: Wed, 2 Nov 2011 16:19:17 -0700

> * David Miller <davem@davemloft.net> [111102 15:42]:
>> c> 
>> > The issue is calculation of skb->end, which is based upon calculated
>> > 'size' variable.
>> > 
>> > skb->end determines alignment of skb_shared_info, which is where the
>> > alignment problem is occuring for Tony.
>> 
>> Right, and SMP_CACHE_BYTES setting should save us in any case.
>> 
>> For ARM, SMP_CACHE_BYTES seems to be set to L1_CACHE_BYTES which in
>> turn is set via ARM_L1_CACHE_SHIFT which can be set seemingly to any
>> value but the defaults are 5 and 6 which should be OK.
>> 
>> So unless Tony is using a non-standard setting of ARM_L1_CACHE_SHIFT,
>> this report is a bit mysterious.
> 
> This is happening at least with omap1_defconfig. In that case we have
> ARM_L1_CACHE_SHIFT=5.

Ok, so ksize(x) gives an odd value for whichever allocator you are
using.  Which is the point Eric was trying to make.  It becomes
clearernow.

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: Eric Dumazet @ 2011-11-02 23:20 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: David Miller, netdev, linux-kernel, linux-omap
In-Reply-To: <20111102231917.GN31337@atomide.com>

On 03/11/2011 00:19, Tony Lindgren wrote:

> * David Miller <davem@davemloft.net> [111102 15:42]:
>> c> 
>>> The issue is calculation of skb->end, which is based upon calculated
>>> 'size' variable.
>>>
>>> skb->end determines alignment of skb_shared_info, which is where the
>>> alignment problem is occuring for Tony.
>>
>> Right, and SMP_CACHE_BYTES setting should save us in any case.
>>
>> For ARM, SMP_CACHE_BYTES seems to be set to L1_CACHE_BYTES which in
>> turn is set via ARM_L1_CACHE_SHIFT which can be set seemingly to any
>> value but the defaults are 5 and 6 which should be OK.
>>
>> So unless Tony is using a non-standard setting of ARM_L1_CACHE_SHIFT,
>> this report is a bit mysterious.
> 
> This is happening at least with omap1_defconfig. In that case we have
> ARM_L1_CACHE_SHIFT=5.
> 
> Regards,
> 
> Tony


Do you use SLOB, SLUB or SLAB ?

Thanks

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: Tony Lindgren @ 2011-11-02 23:19 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, linux-kernel, linux-omap
In-Reply-To: <20111102.191716.1682282796914825621.davem@davemloft.net>

* David Miller <davem@davemloft.net> [111102 15:42]:
> c> 
> > The issue is calculation of skb->end, which is based upon calculated
> > 'size' variable.
> > 
> > skb->end determines alignment of skb_shared_info, which is where the
> > alignment problem is occuring for Tony.
> 
> Right, and SMP_CACHE_BYTES setting should save us in any case.
> 
> For ARM, SMP_CACHE_BYTES seems to be set to L1_CACHE_BYTES which in
> turn is set via ARM_L1_CACHE_SHIFT which can be set seemingly to any
> value but the defaults are 5 and 6 which should be OK.
> 
> So unless Tony is using a non-standard setting of ARM_L1_CACHE_SHIFT,
> this report is a bit mysterious.

This is happening at least with omap1_defconfig. In that case we have
ARM_L1_CACHE_SHIFT=5.

Regards,

Tony

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: David Miller @ 2011-11-02 23:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tony, netdev, linux-kernel, linux-omap
In-Reply-To: <4EB1CEC4.3090909@gmail.com>

c> 
> The issue is calculation of skb->end, which is based upon calculated
> 'size' variable.
> 
> skb->end determines alignment of skb_shared_info, which is where the
> alignment problem is occuring for Tony.

Right, and SMP_CACHE_BYTES setting should save us in any case.

For ARM, SMP_CACHE_BYTES seems to be set to L1_CACHE_BYTES which in
turn is set via ARM_L1_CACHE_SHIFT which can be set seemingly to any
value but the defaults are 5 and 6 which should be OK.

So unless Tony is using a non-standard setting of ARM_L1_CACHE_SHIFT,
this report is a bit mysterious.

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: Eric Dumazet @ 2011-11-02 23:14 UTC (permalink / raw)
  To: David Miller; +Cc: tony, netdev, linux-kernel, linux-omap
In-Reply-To: <20111102.190955.1902322759075682192.davem@davemloft.net>

On 03/11/2011 00:09, David Miller wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 02 Nov 2011 23:55:11 +0100
> 
>> There is a problem with your kmalloc() or alignments on your architecture.
>>
>> What is the SMP_CACHE_BYTES value ?
> 
> kmalloc() behavior doesn't have anything to do with this bug.
> 
> The issue is calculation of skb->end, which is based upon calculated
> 'size' variable.
> 
> skb->end determines alignment of skb_shared_info, which is where the
> alignment problem is occuring for Tony.
> 


I understood that David, but check the code, and you can see that exact
skb->end depends also on ksize()

So maybe the right fix is to make sure skb->end is properly aligned, say
with SLOB

So a more generic fix is welcomed.

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: David Miller @ 2011-11-02 23:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tony, netdev, linux-kernel, linux-omap
In-Reply-To: <20111102.190955.1902322759075682192.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 02 Nov 2011 19:09:55 -0400 (EDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 02 Nov 2011 23:55:11 +0100
> 
>> There is a problem with your kmalloc() or alignments on your architecture.
>> 
>> What is the SMP_CACHE_BYTES value ?
> 
> kmalloc() behavior doesn't have anything to do with this bug.

Sorry, I take that back, I see now how ksize() and friends are being
used here.

^ permalink raw reply

* Re: [PATCH] r8169: Fix WOL in power down case
From: Francois Romieu @ 2011-11-02 23:08 UTC (permalink / raw)
  To: Stefan Becker; +Cc: netdev
In-Reply-To: <loom.20111101T174532-579@post.gmane.org>

Stefan Becker <chemobejk@gmail.com> :
[...]
> I just upgraded to Fedora 16 with Kernel 3.1.0 and WoL stopped working in the
> power down case on my R8168 :-(

Plain power down, not suspend/resume ?

[...]
> According to the code the combination "RTL8168c/8111c" means RTL_GIGA_MAC_VER_19
> to _22, so the code changes in the fix commit don't apply to them. Could it be
> that the fix doesn't cover all the necessary HW?

Yes.

However I am almost sure the regression would come from a different commit
and I'd rather identify it. Can you narrow the kernel version a bit ?
Your vendor's kernel identifier before regression would be a good start.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: David Miller @ 2011-11-02 23:09 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tony, netdev, linux-kernel, linux-omap
In-Reply-To: <4EB1CA4F.3090403@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 02 Nov 2011 23:55:11 +0100

> There is a problem with your kmalloc() or alignments on your architecture.
> 
> What is the SMP_CACHE_BYTES value ?

kmalloc() behavior doesn't have anything to do with this bug.

The issue is calculation of skb->end, which is based upon calculated
'size' variable.

skb->end determines alignment of skb_shared_info, which is where the
alignment problem is occuring for Tony.


^ permalink raw reply

* Re: Linux 3.1-rc9
From: Steven Rostedt @ 2011-11-02 23:00 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development
In-Reply-To: <20111102221023.GA27457@home.goodmis.org>

On Wed, Nov 02, 2011 at 06:10:23PM -0400, Steven Rostedt wrote:
> Thomas pointed me here.
> 
> On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> > [104661.244767] 
> > [104661.244767]  Possible unsafe locking scenario:
> > [104661.244767]        
> > [104661.244767]        CPU0                    CPU1
> > [104661.244767]        ----                    ----
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]                                lock(slock-AF_INET);
> > [104661.244767]   lock(slock-AF_INET);
> > [104661.244767] 
> > [104661.244767]  *** DEADLOCK ***
> > [104661.244767] 
> 
> Bah, I used the __print_lock_name() function to show the lock names in
> the above, which leaves off the subclass number. I'll go write up a
> patch that fixes that.
> 

Simon,

If you are still triggering the bug. Could you do me a favor and apply
the following patch. Just to make sure it fixes the confusing output
from above.

Thanks,

-- Steve


diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 91d67ce..d821ac9 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -490,16 +490,22 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS])
 	usage[i] = '\0';
 }
 
-static int __print_lock_name(struct lock_class *class)
+static void __print_lock_name(struct lock_class *class)
 {
 	char str[KSYM_NAME_LEN];
 	const char *name;
 
 	name = class->name;
-	if (!name)
+	if (!name) {
 		name = __get_key_name(class->key, str);
-
-	return printk("%s", name);
+		printk("%s", name);
+	} else {
+		printk("%s", name);
+		if (class->name_version > 1)
+			printk("#%d", class->name_version);
+		if (class->subclass)
+			printk("/%d", class->subclass);
+	}
 }
 
 static void print_lock_name(struct lock_class *class)
@@ -509,17 +515,8 @@ static void print_lock_name(struct lock_class *class)
 
 	get_usage_chars(class, usage);
 
-	name = class->name;
-	if (!name) {
-		name = __get_key_name(class->key, str);
-		printk(" (%s", name);
-	} else {
-		printk(" (%s", name);
-		if (class->name_version > 1)
-			printk("#%d", class->name_version);
-		if (class->subclass)
-			printk("/%d", class->subclass);
-	}
+	printk(" (");
+	__print_lock_name(class);
 	printk("){%s}", usage);
 }
 

^ permalink raw reply related

* Re: [PATCH] net: Add back alignment for size for __alloc_skb
From: Eric Dumazet @ 2011-11-02 22:55 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: David S. Miller, netdev, linux-kernel, linux-omap
In-Reply-To: <20111102224317.GM31337@atomide.com>

On 02/11/2011 23:43, Tony Lindgren wrote:

> Commit 87fb4b7b533073eeeaed0b6bf7c2328995f6c075 (net: more
> accurate skb truesize) changed the alignment of size. This
> can cause problems at least on some machines with NFS root:
> 
> Unhandled fault: alignment exception (0x801) at 0xc183a43a
> Internal error: : 801 [#1] PREEMPT
> Modules linked in:
> CPU: 0    Not tainted  (3.1.0-08784-g5eeee4a #733)
> pc : [<c02fbba0>]    lr : [<c02fbb9c>]    psr: 60000013
> sp : c180fef8  ip : 00000000  fp : c181f580
> r10: 00000000  r9 : c044b28c  r8 : 00000001
> r7 : c183a3a0  r6 : c1835be0  r5 : c183a412  r4 : 000001f2
> r3 : 00000000  r2 : 00000000  r1 : ffffffe6  r0 : c183a43a
> Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> Control: 0005317f  Table: 10004000  DAC: 00000017
> Process swapper (pid: 1, stack limit = 0xc180e270)
> Stack: (0xc180fef8 to 0xc1810000)
> fee0:                                                       00000024 00000000
> ff00: 00000000 c183b9c0 c183b8e0 c044b28c c0507ccc c019dfc4 c180ff2c c0503cf8
> ff20: c180ff4c c180ff4c 00000000 c1835420 c182c740 c18349c0 c05233c0 00000000
> ff40: 00000000 c00e6bb8 c180e000 00000000 c04dd82c c0507e7c c050cc18 c183b9c0
> ff60: c05233c0 00000000 00000000 c01f34f4 c0430d70 c019d364 c04dd898 c04dd898
> ff80: c04dd82c c0507e7c c180e000 00000000 c04c584c c01f4918 c04dd898 c04dd82c
> ffa0: c04ddd28 c180e000 00000000 c0008758 c181fa60 3231d82c 00000037 00000000
> ffc0: 00000000 c04dd898 c04dd82c c04ddd28 00000013 00000000 00000000 00000000
> ffe0: 00000000 c04b2224 00000000 c04b21a0 c001056c c001056c 00000000 00000000
> Function entered at [<c02fbba0>] from [<c019dfc4>]
> Function entered at [<c019dfc4>] from [<c01f34f4>]
> Function entered at [<c01f34f4>] from [<c01f4918>]
> Function entered at [<c01f4918>] from [<c0008758>]
> Function entered at [<c0008758>] from [<c04b2224>]
> Function entered at [<c04b2224>] from [<c001056c>]
> Code: e1a00005 e3a01028 ebfa7cb0 e35a0000 (e5858028)
> 
> Here PC is at __alloc_skb and &shinfo->dataref is unaligned because
> skb->end can be unaligned without this patch.
> 
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> 
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -189,6 +189,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
>  	 * Both skb->head and skb_shared_info are cache line aligned.
>  	 */
> +	size = SKB_DATA_ALIGN(size);
>  	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  	data = kmalloc_node_track_caller(size, gfp_mask, node);
>  	if (!data)



Hmm, I dont think this is the right way to fix the bug.

There is a problem with your kmalloc() or alignments on your architecture.

What is the SMP_CACHE_BYTES value ?


^ permalink raw reply

* [PATCH] l2tp: fix l2tp_recv_dequeue()
From: Eric Dumazet @ 2011-11-02 22:53 UTC (permalink / raw)
  To: Misha Labjuk; +Cc: netdev
In-Reply-To: <CAKpUy7Y794CJkAEtbsbTecbPq+KkniSTx4ftCncohzGsuJfk-w@mail.gmail.com>

On 02/11/2011 23:35, Misha Labjuk wrote:

> Thanks!!  Panic disappeared.
> 
>


Thanks !

Here is the official submission then :)

[PATCH] l2tp: fix l2tp_recv_dequeue()

Misha Labjuk reported panics occurring in l2tp_recv_dequeue()

If we release reorder_q.lock, we must not keep a dangling pointer (tmp)
and restart the whole loop.

Reported-by: Misha Labjuk <spiked.yar@gmail.com>
Tested-by: Misha Labjuk <spiked.yar@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/l2tp/l2tp_core.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 34b2dde..bf8d50c 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -397,6 +397,7 @@ static void l2tp_recv_dequeue(struct l2tp_session
*session)
 	 * expect to send up next, dequeue it and any other
 	 * in-sequence packets behind it.
 	 */
+start:
 	spin_lock_bh(&session->reorder_q.lock);
 	skb_queue_walk_safe(&session->reorder_q, skb, tmp) {
 		if (time_after(jiffies, L2TP_SKB_CB(skb)->expires)) {
@@ -433,7 +434,7 @@ static void l2tp_recv_dequeue(struct l2tp_session
*session)
 		 */
 		spin_unlock_bh(&session->reorder_q.lock);
 		l2tp_recv_dequeue_skb(session, skb);
-		spin_lock_bh(&session->reorder_q.lock);
+		goto start;
 	}

 out:

^ permalink raw reply related

* Re: [PATCH] drivers/net/usb/asix:  resync from vendor's copy
From: Mark Lord @ 2011-11-02 22:44 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <1320266549.2782.21.camel@bwh-desktop>

On 11-11-02 04:42 PM, Ben Hutchings wrote:
[various good notes on things that still need fixing]

Thanks Ben, those are exactly the kind of things
I was hoping for people like you to point out here.

Also, I have now opened a line of communications with ASIX themselves,
and have their blessing for this project.  They've also now sent me
a newer code dump than the one I was using.  I'll diff the two and
add in any relevant changes for the next iteration.

Cheers

^ permalink raw reply

* [PATCH] net: Add back alignment for size for __alloc_skb
From: Tony Lindgren @ 2011-11-02 22:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, linux-omap, Eric Dumazet

Commit 87fb4b7b533073eeeaed0b6bf7c2328995f6c075 (net: more
accurate skb truesize) changed the alignment of size. This
can cause problems at least on some machines with NFS root:

Unhandled fault: alignment exception (0x801) at 0xc183a43a
Internal error: : 801 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (3.1.0-08784-g5eeee4a #733)
pc : [<c02fbba0>]    lr : [<c02fbb9c>]    psr: 60000013
sp : c180fef8  ip : 00000000  fp : c181f580
r10: 00000000  r9 : c044b28c  r8 : 00000001
r7 : c183a3a0  r6 : c1835be0  r5 : c183a412  r4 : 000001f2
r3 : 00000000  r2 : 00000000  r1 : ffffffe6  r0 : c183a43a
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 10004000  DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc180e270)
Stack: (0xc180fef8 to 0xc1810000)
fee0:                                                       00000024 00000000
ff00: 00000000 c183b9c0 c183b8e0 c044b28c c0507ccc c019dfc4 c180ff2c c0503cf8
ff20: c180ff4c c180ff4c 00000000 c1835420 c182c740 c18349c0 c05233c0 00000000
ff40: 00000000 c00e6bb8 c180e000 00000000 c04dd82c c0507e7c c050cc18 c183b9c0
ff60: c05233c0 00000000 00000000 c01f34f4 c0430d70 c019d364 c04dd898 c04dd898
ff80: c04dd82c c0507e7c c180e000 00000000 c04c584c c01f4918 c04dd898 c04dd82c
ffa0: c04ddd28 c180e000 00000000 c0008758 c181fa60 3231d82c 00000037 00000000
ffc0: 00000000 c04dd898 c04dd82c c04ddd28 00000013 00000000 00000000 00000000
ffe0: 00000000 c04b2224 00000000 c04b21a0 c001056c c001056c 00000000 00000000
Function entered at [<c02fbba0>] from [<c019dfc4>]
Function entered at [<c019dfc4>] from [<c01f34f4>]
Function entered at [<c01f34f4>] from [<c01f4918>]
Function entered at [<c01f4918>] from [<c0008758>]
Function entered at [<c0008758>] from [<c04b2224>]
Function entered at [<c04b2224>] from [<c001056c>]
Code: e1a00005 e3a01028 ebfa7cb0 e35a0000 (e5858028)

Here PC is at __alloc_skb and &shinfo->dataref is unaligned because
skb->end can be unaligned without this patch.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -189,6 +189,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
 	 * Both skb->head and skb_shared_info are cache line aligned.
 	 */
+	size = SKB_DATA_ALIGN(size);
 	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	data = kmalloc_node_track_caller(size, gfp_mask, node);
 	if (!data)

^ permalink raw reply

* Re: Linux 3.1-rc9
From: Eric Dumazet @ 2011-11-02 22:42 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development, Balazs Scheidler,
	KOVACS Krisztian
In-Reply-To: <20111102191621.GF5971@hostway.ca>

On 02/11/2011 20:16, Simon Kirby wrote:

 
> Actually, we have an anti-abuse daemon that injects blackhole routes, so
> this makes sense. (The daemon was written before ipsets were merged and
> normal netfilter rules make it fall over under attack.)
> 
> I'll try with this patch. Thanks!
> 


Thanks !

Here is the official submission, please add your 'Tested-by' signature
when you can confirm problem goes away.

(It did here, when I injected random NULL returns from
inet_csk_route_child_sock(), so I am confident this is the problem you hit )

[PATCH] net: add missing bh_unlock_sock() calls

Simon Kirby reported lockdep warnings and following messages :

[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

Problem comes from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

If inet_csk_route_child_sock() returns NULL, we should release socket
lock before freeing it.

Another lock imbalance exists if __inet_inherit_port() returns an error
since commit 093d282321da ( tproxy: fix hash locking issue when using
port redirection in __inet_inherit_port()) a backport is also needed for
>= 2.6.37 kernels.

Reported-by: Dimon Kirby <sim@hostway.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Balazs Scheidler <bazsi@balabit.hu>
CC: KOVACS Krisztian <hidden@balabit.hu>
---
 net/dccp/ipv4.c     |    1 +
 net/ipv4/tcp_ipv4.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 332639b..90a919a 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -433,6 +433,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0ea10ee..683d97a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1510,6 +1510,7 @@ exit:
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
 	return NULL;
 put_and_exit:
+	bh_unlock_sock(newsk);
 	sock_put(newsk);
 	goto exit;
 }

^ permalink raw reply related

* Re: PROBLEM: pppol2tp over pppoe NULL pointer dereference
From: Misha Labjuk @ 2011-11-02 22:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1320217652.30178.1.camel@edumazet-laptop>

Thanks!!  Panic disappeared.

2011/11/2 Eric Dumazet <eric.dumazet@gmail.com>:
> OK thanks, could you try the following patch as well ?
>
> If we release reorder_q.lock, we must not keep a dangling pointer (tmp)
> and restart the whole loop.
>
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 34b2dde..bf8d50c 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -397,6 +397,7 @@ static void l2tp_recv_dequeue(struct l2tp_session *session)
>         * expect to send up next, dequeue it and any other
>         * in-sequence packets behind it.
>         */
> +start:
>        spin_lock_bh(&session->reorder_q.lock);
>        skb_queue_walk_safe(&session->reorder_q, skb, tmp) {
>                if (time_after(jiffies, L2TP_SKB_CB(skb)->expires)) {
> @@ -433,7 +434,7 @@ static void l2tp_recv_dequeue(struct l2tp_session *session)
>                 */
>                spin_unlock_bh(&session->reorder_q.lock);
>                l2tp_recv_dequeue_skb(session, skb);
> -               spin_lock_bh(&session->reorder_q.lock);
> +               goto start;
>        }
>
>  out:
>

^ permalink raw reply

* [PATCH] macvlan: receive multicast with local address
From: Stephen Hemminger @ 2011-11-02 22:11 UTC (permalink / raw)
  To: David Miller, Arnd Bergmann; +Cc: netdev

When implementing VRRP v2 using macvlan several problems were
discovered.  VRRP is weird in that all routers participating
in a redundant group use the same virtual MAC address.
Macvlan is a natural driver to use for this but it doesn't
work.  The problem is that packets with a macvlan device's
source address are not received.

The problem is actually a regression that date back almost 2 years now.
The original problems started with:

commit 618e1b7482f7a8a4c6c6e8ccbe140e4c331df4e9
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Thu Nov 26 06:07:10 2009 +0000

    macvlan: implement bridge, VEPA and private mode
    
This patches restores the original 2.6.32 behavior. Allowing multicast
packets received with the VRRP source address to be received.


Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
This should go to -net, and -stable (since it is a regression).

P.s: The VEPA patch also introduced another regression, it changed
the default mode from private to VEPA which also breaks application
assumptions. But changing it back after 2 years also risks also breaking
things.

--- a/drivers/net/macvlan.c	2011-11-02 13:12:52.186069720 -0700
+++ b/drivers/net/macvlan.c	2011-11-02 14:50:54.476670346 -0700
@@ -192,6 +192,13 @@ static rx_handler_result_t macvlan_handl
 			 */
 			macvlan_broadcast(skb, port, src->dev,
 					  MACVLAN_MODE_VEPA);
+		else {
+			/* forward to original port. */
+			vlan = src;
+			ret = macvlan_broadcast_one(skb, vlan, eth, 0);
+			goto out;
+		}
+
 		return RX_HANDLER_PASS;
 	}
 

^ permalink raw reply

* Re: Linux 3.1-rc9
From: Steven Rostedt @ 2011-11-02 22:10 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Thomas Gleixner, David Miller, Peter Zijlstra, Linus Torvalds,
	Linux Kernel Mailing List, Dave Jones, Martin Schwidefsky,
	Ingo Molnar, Network Development
In-Reply-To: <20111031173246.GA10614@hostway.ca>

Thomas pointed me here.

On Mon, Oct 31, 2011 at 10:32:46AM -0700, Simon Kirby wrote:
> [104661.244767] 
> [104661.244767]  Possible unsafe locking scenario:
> [104661.244767]        
> [104661.244767]        CPU0                    CPU1
> [104661.244767]        ----                    ----
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]                                lock(slock-AF_INET);
> [104661.244767]   lock(slock-AF_INET);
> [104661.244767] 
> [104661.244767]  *** DEADLOCK ***
> [104661.244767] 

Bah, I used the __print_lock_name() function to show the lock names in
the above, which leaves off the subclass number. I'll go write up a
patch that fixes that.

Thanks,

-- Steve

^ permalink raw reply

* Re: Subnet router anycast for FE80/10 ?
From: David Lamparter @ 2011-11-02 21:53 UTC (permalink / raw)
  To: David Stevens; +Cc: David Lamparter, Andreas Hofmeister, netdev, netdev-owner
In-Reply-To: <OFCBEF5360.F10E7D75-ON8825793C.00613C5C-8825793C.00623B8A@us.ibm.com>

On Wed, Nov 02, 2011 at 10:52:57AM -0700, David Stevens wrote:
> > > This address seems not to be explicitly mentioned in any RFC, but RFC 
> > > 4291 says "All routers are required to support the Subnet-Router 
> anycast 
> > > addresses for the subnets to which they have interfaces."
> > 
> > That this directly contradicts RFC 2526 which specifies the
> > subnet-router anycast address to be either ::ffff:ffff:ffff:ff80 or
> > ::fcff:ffff:ffff:ff80 depending on the phase of the moon (well,
> > interface type actually, but same thing. Also, the /64 <> /10
> > distinction would matter here.)
>
>         The subnet-router anycast address is defined in section 2.6.1 of 
> RFC 4291 to be "all 0's" for the prefix. The definition above is for
> reserved anycast addresses. RFC 2526 says "IPv6 defines a required
> Subnet-Router anycast address [3] for all routers within a subnet prefix,
> and allows additional anycast addresses to be taken from the unicast
> address space. This document defines an additional set of reserved
> anycast addresses...".

Argh. I got thoroughly confused. Please ignore everything I said.

> > [...] only one router will receive the packet [...]
> 
>         The host implementation is very straightforward. Not every host
> on a segment has to use the *same* host for an anycast address (it's
> kind of the point that it won't, in fact). A host simply needs to
> do a solicitation for the anycast address and keep the first one that
> answers (by definition, the "closest").

Right. Sorry. I was half-asleep when I wrote my previous mail and
clearly didn't think through the entire problem. I should probably stop
writing mails when not at least 80% awake :).

Neighbor Discovery can indeed select one of the anycast hosts and talk
to it using its lower-layer address, and no other host should
receive/process the packet.

Going back to Andreas's original question about Subnet-Router Anycast
for fe80::/64 (or /10), RFC 4291 says
   +------------------------------------------------+----------------+
   |                   subnet prefix                | 00000000000000 |
   +------------------------------------------------+----------------+

   The "subnet prefix" in an anycast address is the prefix that
   identifies a specific link.

But fe80::/64 does not identify a specific link, as it is link-local and
would specify all links but not one specifically. So, fe80:: is not a
Subnet-Router anycast address, I'd say.


Hoping I was awake enough this time...


-David

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox