[PATCH net] net: neighbour: add neighbour dead check for neigh_timer

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
@ 2013-12-03 13:48 Ding Tianhong
  2013-12-03 15:03 ` Hannes Frederic Sowa
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Ding Tianhong @ 2013-12-03 13:48 UTC (permalink / raw)
  To: David S. Miller, Gao feng, YOSHIFUJI Hideaki, Joe Perches,
	Veaceslav Falico, Netdev

I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:

[64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
[64306.089535] PGD 0
[64306.089706] Oops: 0000 [#1] SMP
[64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
[64306.090142] Die func triggered, code:1
[64306.090147] CPU 1
[64306.090258] Supported: Yes, External
[64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
[64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
[64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
[64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
[64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
[64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
[64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
[64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
[64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
[64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
[64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
[64306.090310] Stack:
[64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
[64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
[64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
[64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
[64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
[64306.090464] Call Trace:

--------------------------- cut here -------------------------------------

I found the NULL place int the neigh_timer_handler, the neigh->ops is NULL,
so the calling of neigh->ops->solicit(neigh, skb) will panic, I found the
neigh has been freed via the kdump, the so I think the neigh was kfreed while
the neigh timer handler is running.

The situation is that there are several server in the local lan:
A: 128.5.10.83
B: 128.5.10.85
C: 128.5.10.xx

I panic the A by manual, and set B's IP to 128.5.10.83, so send broadcast to tell
the lan that B is 128.5.10.83, then the B panic, it is hard to appeared again, so
I only met once.

I think the reason is that:

	CPU0					CPU1
	-----					-----
						<SOFTIRQ>
						call_timer_fn();
						base->running_timer = neigh->timer;
						neigh_timer_handler();
	neigh_release();
	write_lock(&neigh->lock);
	del_timer(neigh->timer);
	write_unlock(&neigh->lock);
						write_lock(&neigh->lock);
	kfree(neigh);
 						neigh->ops->solicit();

--------------------------------------------------------------------------

The reason is : besides deactivating the timer it also makes sure the handler
has finished executing on other CPUs, But the del_timer() just only detach the
timer and not wait for the timer finish executing on other CPUs.

According to the David's opinion, the del_timer_sync() should belongs in
neigh_del_timer(), but the neigh_del_timer() will be called in
write_lock(&neigh->lock), it will occur deadlock if the timer is calling the same
lock, so del_timer_sync() should not be used in neigh_del_timer().

I fix the problem by add neigh->dead check in neigh_timer_handler(), because
if the neigh is in release path, the neigh is already in dead state, if the
timer is running on other CPUs, the timer will be finished and no problems
will occur when kfree the neighbour.

I think the latest kernel still has the problem and make the patch for it.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 net/core/neighbour.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ca15f32..38f3d23 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -888,6 +888,11 @@ static void neigh_timer_handler(unsigned long arg)
 
 	write_lock(&neigh->lock);
 
+	if (neigh->dead) {
+		pr_warn("neighbour is dead and should be destroyed\n");
+		goto out;
+	}
+
 	state = neigh->nud_state;
 	now = jiffies;
 	next = now + HZ;
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-03 13:48 [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler() Ding Tianhong
@ 2013-12-03 15:03 ` Hannes Frederic Sowa
  2013-12-04  1:36   ` Ding Tianhong
  2013-12-03 16:28 ` Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Hannes Frederic Sowa @ 2013-12-03 15:03 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: David S. Miller, Gao feng, YOSHIFUJI Hideaki, Joe Perches,
	Veaceslav Falico, Netdev

Hi Ding!

On Tue, Dec 03, 2013 at 09:48:45PM +0800, Ding Tianhong wrote:
> I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:
> 
> [64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
> [64306.089535] PGD 0
> [64306.089706] Oops: 0000 [#1] SMP
> [64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
> [64306.090142] Die func triggered, code:1
> [64306.090147] CPU 1
> [64306.090258] Supported: Yes, External
> [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
> [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
> [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
> [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
> [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
> [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
> [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
> [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
> [64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
> [64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
> [64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
> [64306.090310] Stack:
> [64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
> [64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
> [64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
> [64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
> [64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
> [64306.090464] Call Trace:

Do you still have the full output of the panic including the Code section
and could post it here?

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-03 13:48 [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler() Ding Tianhong
  2013-12-03 15:03 ` Hannes Frederic Sowa
@ 2013-12-03 16:28 ` Eric Dumazet
  2013-12-04  1:59   ` Ding Tianhong
  2013-12-03 16:37 ` David Miller
  2013-12-04  2:37 ` Gao feng
  3 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2013-12-03 16:28 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: David S. Miller, Gao feng, YOSHIFUJI Hideaki, Joe Perches,
	Veaceslav Falico, Netdev

On Tue, 2013-12-03 at 21:48 +0800, Ding Tianhong wrote:
> I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:
> 
> [64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
> [64306.089535] PGD 0
> [64306.089706] Oops: 0000 [#1] SMP
> [64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
> [64306.090142] Die func triggered, code:1
> [64306.090147] CPU 1
> [64306.090258] Supported: Yes, External
> [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
> [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
> [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
> [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
> [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
> [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
> [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
> [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
> [64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
> [64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
> [64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
> [64306.090310] Stack:
> [64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
> [64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
> [64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
> [64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
> [64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
> [64306.090464] Call Trace:
> 
> --------------------------- cut here -------------------------------------
> 
> I found the NULL place int the neigh_timer_handler, the neigh->ops is NULL,
> so the calling of neigh->ops->solicit(neigh, skb) will panic, I found the
> neigh has been freed via the kdump, the so I think the neigh was kfreed while
> the neigh timer handler is running.
> 
> The situation is that there are several server in the local lan:
> A: 128.5.10.83
> B: 128.5.10.85
> C: 128.5.10.xx
> 
> I panic the A by manual, and set B's IP to 128.5.10.83, so send broadcast to tell
> the lan that B is 128.5.10.83, then the B panic, it is hard to appeared again, so
> I only met once.
> 
> I think the reason is that:
> 
> 	CPU0					CPU1
> 	-----					-----
> 						<SOFTIRQ>
> 						call_timer_fn();
> 						base->running_timer = neigh->timer;
> 						neigh_timer_handler();
> 	neigh_release();
> 	write_lock(&neigh->lock);
> 	del_timer(neigh->timer);
> 	write_unlock(&neigh->lock);
> 						write_lock(&neigh->lock);
> 	kfree(neigh);

Your patch would not help if the previous 2 actions are ordered like
that :

       kfree(neigh)
                                               write_lock(&neigh->lock);

>  						neigh->ops->solicit();
> 
> --------------------------------------------------------------------------
> 
> The reason is : besides deactivating the timer it also makes sure the handler
> has finished executing on other CPUs, But the del_timer() just only detach the
> timer and not wait for the timer finish executing on other CPUs.
> 
> According to the David's opinion, the del_timer_sync() should belongs in
> neigh_del_timer(), but the neigh_del_timer() will be called in
> write_lock(&neigh->lock), it will occur deadlock if the timer is calling the same
> lock, so del_timer_sync() should not be used in neigh_del_timer().
> 
> I fix the problem by add neigh->dead check in neigh_timer_handler(), because
> if the neigh is in release path, the neigh is already in dead state, if the
> timer is running on other CPUs, the timer will be finished and no problems
> will occur when kfree the neighbour.
> 
> I think the latest kernel still has the problem and make the patch for it.
> 
> Suggested-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  net/core/neighbour.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index ca15f32..38f3d23 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -888,6 +888,11 @@ static void neigh_timer_handler(unsigned long arg)
>  
>  	write_lock(&neigh->lock);
>  
> +	if (neigh->dead) {
> +		pr_warn("neighbour is dead and should be destroyed\n");
> +		goto out;
> +	}
> +
>  	state = neigh->nud_state;
>  	now = jiffies;
>  	next = now + HZ;

I do not think this patch is correct.

You need to test _before_ the write_lock() call.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-03 13:48 [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler() Ding Tianhong
  2013-12-03 15:03 ` Hannes Frederic Sowa
  2013-12-03 16:28 ` Eric Dumazet
@ 2013-12-03 16:37 ` David Miller
  2013-12-04  2:37 ` Gao feng
  3 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2013-12-03 16:37 UTC (permalink / raw)
  To: dingtianhong; +Cc: gaofeng, yoshfuji, joe, vfalico, netdev

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Tue, 3 Dec 2013 21:48:45 +0800

> @@ -888,6 +888,11 @@ static void neigh_timer_handler(unsigned long arg)
>  
>  	write_lock(&neigh->lock);
>  
> +	if (neigh->dead) {
> +		pr_warn("neighbour is dead and should be destroyed\n");
> +		goto out;
> +	}
> +

I agree with Eric, if the neigh has been freed at this point you
cannot touch any member of it.

This means you cannot take neigh->lock.

This means you can't even test neigh->dead because it could be
arbitrary garbage, it's freed memory after all.

Worse yet, the neigh->timer is implicitly referenced by the timer
subsystem, the caller of neigh_timer_handler().  It is referencing
freed memory if it is touching neigh->timer members at all.

You can't avoid things by pretending that you can leave this timer
pending or potentially running asynchronously.

You must guarentee that all pending timer instances are complete
and no longer scheduled before you allow kfree(neigh) to execute.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-03 15:03 ` Hannes Frederic Sowa
@ 2013-12-04  1:36   ` Ding Tianhong
  0 siblings, 0 replies; 30+ messages in thread
From: Ding Tianhong @ 2013-12-04  1:36 UTC (permalink / raw)
  To: David S. Miller, Gao feng, YOSHIFUJI Hideaki, Joe Perches,
	Veaceslav Falico, Netdev, Hannes Frederic Sowa

On 2013/12/3 23:03, Hannes Frederic Sowa wrote:
> Hi Ding!
> 
> On Tue, Dec 03, 2013 at 09:48:45PM +0800, Ding Tianhong wrote:
>> I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:
>>
>> [64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.089535] PGD 0
>> [64306.089706] Oops: 0000 [#1] SMP
>> [64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
>> [64306.090142] Die func triggered, code:1
>> [64306.090147] CPU 1
>> [64306.090258] Supported: Yes, External
>> [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
>> [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
>> [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
>> [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
>> [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
>> [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
>> [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
>> [64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
>> [64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
>> [64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
>> [64306.090310] Stack:
>> [64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
>> [64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
>> [64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
>> [64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
>> [64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
>> [64306.090464] Call Trace:
> 
> Do you still have the full output of the panic including the Code section
> and could post it here?
> 
> Greetings,
> 
>   Hannes
> 

ok, the bt -t is:

PID: 58359  TASK: ffff880c25d02300  CPU: 1   COMMAND: "socknal_cd04"
 #0 [ffff880c273496e0] machine_kexec at ffffffff810203a2
 #1 [ffff880c27349730] crash_kexec at ffffffff81085c80
 #2 [ffff880c27349800] oops_end at ffffffff81395c80
 #3 [ffff880c27349820] no_context at ffffffff8102d0a7
 #4 [ffff880c27349860] __bad_area_nosemaphore at ffffffff8102d365
 #5 [ffff880c27349888] vsnprintf at fffffff
 #6 [ffff880c27349958] neigh_timer_handler at ffffffff812f8e22
 #7 [ffff880c273499a8] neigh_timer_handler at ffffffff812f8e36
 #8 [ffff880c273499d8] ip_queue_xmit at ffffffff8131f8e0
 #9 [ffff880c27349a88] tcp_transmit_skb at ffffffff8133398b
#10 [ffff880c27349af8] tcp_connect at ffffffff813355c6
caitao 00137488(c00137488) 2013-11-19 10:27:31
#11 [ffff880c27349b28] tcp_v4_connect at ffffffff8133a85a
#12 [ffff880c27349bf8] inet_stream_connect at ffffffff813490a8
#13 [ffff880c27349c08] libcfs_debug_vmsg2 at ffffffffa0acd29c
#14 [ffff880c27349c78] libcfs_sock_connect at ffffffffa0ac3d04
#15 [ffff880c27349cc8] dequeue_task_fair at ffffffff81040445


and the memory for struct neighbour is:

crash>  neighbour ffff8801cddf1500
struct neighbour {
  next = 0x4037ad30000045, 
  tbl = 0x2484560013d0640, 
  parms = 0xdc03ff7f03484560, 
  dev = 0x23351d86, 
  used = 170277469553264, 
  confirmed = 649366077121561602, 
  updated = 0, 
  flags = 0 '\000', 
  nud_state = 0 '\000', 
  type = 0 '\000', 
  dead = 0 '\000', 
  probes = {
    counter = 0
  }, 
  lock = {
    raw_lock = {
      lock = 16777216
    }
  }, 
  ha = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00", 
  hh = 0x0, 
  refcnt = {
    counter = 0
  }, 
  output = 0, 
  arp_queue = {
    next = 0x10002, 
    prev = 0x0, 
    qlen = 1, 
    lock = {
      raw_lock = {
        slock = 0
      }
    }
  }, 
  timer = {
    entry = {
      next = 0x0, 
      prev = 0x0
    }, 
    expires = 0, 
    function = 0, 
    data = 0, 
    base = 0x0, 
    start_site = 0x0, 
    start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
    start_pid = 0
  }, 
  ops = 0x0, 
  primary_key = 0xffff8801cddf15f0 ""
}

Regards
Ding

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-03 16:28 ` Eric Dumazet
@ 2013-12-04  1:59   ` Ding Tianhong
  0 siblings, 0 replies; 30+ messages in thread
From: Ding Tianhong @ 2013-12-04  1:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Gao feng, YOSHIFUJI Hideaki, Joe Perches,
	Veaceslav Falico, Netdev

On 2013/12/4 0:28, Eric Dumazet wrote:
> On Tue, 2013-12-03 at 21:48 +0800, Ding Tianhong wrote:
>> I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:
>>
>> [64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.089535] PGD 0
>> [64306.089706] Oops: 0000 [#1] SMP
>> [64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
>> [64306.090142] Die func triggered, code:1
>> [64306.090147] CPU 1
>> [64306.090258] Supported: Yes, External
>> [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
>> [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
>> [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
>> [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
>> [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
>> [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
>> [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
>> [64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
>> [64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
>> [64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
>> [64306.090310] Stack:
>> [64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
>> [64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
>> [64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
>> [64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
>> [64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
>> [64306.090464] Call Trace:
>>
>> --------------------------- cut here -------------------------------------
>>
>> I found the NULL place int the neigh_timer_handler, the neigh->ops is NULL,
>> so the calling of neigh->ops->solicit(neigh, skb) will panic, I found the
>> neigh has been freed via the kdump, the so I think the neigh was kfreed while
>> the neigh timer handler is running.
>>
>> The situation is that there are several server in the local lan:
>> A: 128.5.10.83
>> B: 128.5.10.85
>> C: 128.5.10.xx
>>
>> I panic the A by manual, and set B's IP to 128.5.10.83, so send broadcast to tell
>> the lan that B is 128.5.10.83, then the B panic, it is hard to appeared again, so
>> I only met once.
>>
>> I think the reason is that:
>>
>> 	CPU0					CPU1
>> 	-----					-----
>> 						<SOFTIRQ>
>> 						call_timer_fn();
>> 						base->running_timer = neigh->timer;
>> 						neigh_timer_handler();
>> 	neigh_release();
>> 	write_lock(&neigh->lock);
>> 	del_timer(neigh->timer);
>> 	write_unlock(&neigh->lock);
>> 						write_lock(&neigh->lock);
>> 	kfree(neigh);
> 
> Your patch would not help if the previous 2 actions are ordered like
> that :
> 
>        kfree(neigh)
>                                                write_lock(&neigh->lock);
> 

> I do not think this patch is correct.
> 
> You need to test _before_ the write_lock() call.
> 
>

Agree, I miss it, thanks.

I must make sure all pending timer is complete finished and not running on CPUs before kfree the neigh.

Regards
Ding

> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-03 13:48 [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler() Ding Tianhong
                   ` (2 preceding siblings ...)
  2013-12-03 16:37 ` David Miller
@ 2013-12-04  2:37 ` Gao feng
  2013-12-04  4:04   ` Ding Tianhong
  3 siblings, 1 reply; 30+ messages in thread
From: Gao feng @ 2013-12-04  2:37 UTC (permalink / raw)
  To: Ding Tianhong, David S. Miller, YOSHIFUJI Hideaki, Joe Perches,
	Veaceslav Falico, Netdev

On 12/03/2013 09:48 PM, Ding Tianhong wrote:
> I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:
> 
> [64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
> [64306.089535] PGD 0
> [64306.089706] Oops: 0000 [#1] SMP
> [64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
> [64306.090142] Die func triggered, code:1
> [64306.090147] CPU 1
> [64306.090258] Supported: Yes, External
> [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
> [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
> [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
> [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
> [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
> [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
> [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
> [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
> [64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
> [64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
> [64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
> [64306.090310] Stack:
> [64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
> [64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
> [64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
> [64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
> [64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
> [64306.090464] Call Trace:
> 
> --------------------------- cut here -------------------------------------
> 
> I found the NULL place int the neigh_timer_handler, the neigh->ops is NULL,
> so the calling of neigh->ops->solicit(neigh, skb) will panic, I found the
> neigh has been freed via the kdump, the so I think the neigh was kfreed while
> the neigh timer handler is running.
> 
> The situation is that there are several server in the local lan:
> A: 128.5.10.83
> B: 128.5.10.85
> C: 128.5.10.xx
> 
> I panic the A by manual, and set B's IP to 128.5.10.83, so send broadcast to tell
> the lan that B is 128.5.10.83, then the B panic, it is hard to appeared again, so
> I only met once.
> 
> I think the reason is that:
> 
> 	CPU0					CPU1
> 	-----					-----
> 						<SOFTIRQ>
> 						call_timer_fn();
> 						base->running_timer = neigh->timer;
> 						neigh_timer_handler();
> 	neigh_release();
> 	write_lock(&neigh->lock);
> 	del_timer(neigh->timer);
> 	write_unlock(&neigh->lock);
> 						write_lock(&neigh->lock);
> 	kfree(neigh);
>  						neigh->ops->solicit();
> 
> --------------------------------------------------------------------------
> 
> The reason is : besides deactivating the timer it also makes sure the handler
> has finished executing on other CPUs, But the del_timer() just only detach the
> timer and not wait for the timer finish executing on other CPUs.
> 

the neigh->timer has the reference to the neigh, and this reference is
released at the end of neigh_timer_handler. so the neigh should not be
freed before the timer handler finished.

As your description, some logic ignores the reference of neigh and calls
neigh_destroy directly.

You should find out the incorrect destroying of the neigh.

> According to the David's opinion, the del_timer_sync() should belongs in
> neigh_del_timer(), but the neigh_del_timer() will be called in
> write_lock(&neigh->lock), it will occur deadlock if the timer is calling the same
> lock, so del_timer_sync() should not be used in neigh_del_timer().
> 
> I fix the problem by add neigh->dead check in neigh_timer_handler(), because
> if the neigh is in release path, the neigh is already in dead state, if the
> timer is running on other CPUs, the timer will be finished and no problems
> will occur when kfree the neighbour.
> 
> I think the latest kernel still has the problem and make the patch for it.
> 
> Suggested-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  net/core/neighbour.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index ca15f32..38f3d23 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -888,6 +888,11 @@ static void neigh_timer_handler(unsigned long arg)
>  
>  	write_lock(&neigh->lock);
>  
> +	if (neigh->dead) {
> +		pr_warn("neighbour is dead and should be destroyed\n");
> +		goto out;
> +	}
> +
>  	state = neigh->nud_state;
>  	now = jiffies;
>  	next = now + HZ;
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  2:37 ` Gao feng
@ 2013-12-04  4:04   ` Ding Tianhong
  2013-12-04  4:21     ` David Miller
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-04  4:04 UTC (permalink / raw)
  To: Gao feng, David S. Miller, YOSHIFUJI Hideaki, Joe Perches,
	Veaceslav Falico, Netdev

On 2013/12/4 10:37, Gao feng wrote:
> On 12/03/2013 09:48 PM, Ding Tianhong wrote:
>> I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:
>>
>> [64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.089535] PGD 0
>> [64306.089706] Oops: 0000 [#1] SMP
>> [64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
>> [64306.090142] Die func triggered, code:1
>> [64306.090147] CPU 1
>> [64306.090258] Supported: Yes, External
>> [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
>> [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
>> [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
>> [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
>> [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
>> [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
>> [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
>> [64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
>> [64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
>> [64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
>> [64306.090310] Stack:
>> [64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
>> [64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
>> [64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
>> [64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
>> [64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
>> [64306.090464] Call Trace:
>>
>> --------------------------- cut here -------------------------------------
>>
>> I found the NULL place int the neigh_timer_handler, the neigh->ops is NULL,
>> so the calling of neigh->ops->solicit(neigh, skb) will panic, I found the
>> neigh has been freed via the kdump, the so I think the neigh was kfreed while
>> the neigh timer handler is running.
>>
>> The situation is that there are several server in the local lan:
>> A: 128.5.10.83
>> B: 128.5.10.85
>> C: 128.5.10.xx
>>
>> I panic the A by manual, and set B's IP to 128.5.10.83, so send broadcast to tell
>> the lan that B is 128.5.10.83, then the B panic, it is hard to appeared again, so
>> I only met once.
>>
>> I think the reason is that:
>>
>> 	CPU0					CPU1
>> 	-----					-----
>> 						<SOFTIRQ>
>> 						call_timer_fn();
>> 						base->running_timer = neigh->timer;
>> 						neigh_timer_handler();
>> 	neigh_release();
>> 	write_lock(&neigh->lock);
>> 	del_timer(neigh->timer);
>> 	write_unlock(&neigh->lock);
>> 						write_lock(&neigh->lock);
>> 	kfree(neigh);
>>  						neigh->ops->solicit();
>>
>> --------------------------------------------------------------------------
>>
>> The reason is : besides deactivating the timer it also makes sure the handler
>> has finished executing on other CPUs, But the del_timer() just only detach the
>> timer and not wait for the timer finish executing on other CPUs.
>>
> 
> the neigh->timer has the reference to the neigh, and this reference is
> released at the end of neigh_timer_handler. so the neigh should not be
> freed before the timer handler finished.
> 
> As your description, some logic ignores the reference of neigh and calls
> neigh_destroy directly.
> 
> You should find out the incorrect destroying of the neigh.
> 

The destroying neigh could be trigger by userspace, just like set the ip address which
in arp table to the local device ip, some I could not control it, it maybe anytime,
but the timer handler is execute by logic, this is normal, so I think the logic
is no problem, and the process of destroying neigh may conflict with the timer handler,
it is a synchronous problem to make sure the timer should be finished before the
reference neigh is freed.

Regards
Ding

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  4:04   ` Ding Tianhong
@ 2013-12-04  4:21     ` David Miller
  2013-12-04  6:19       ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2013-12-04  4:21 UTC (permalink / raw)
  To: dingtianhong; +Cc: gaofeng, yoshfuji, joe, vfalico, netdev

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Wed, 4 Dec 2013 12:04:31 +0800

> The destroying neigh could be trigger by userspace, just like set the ip address which
> in arp table to the local device ip, some I could not control it, it maybe anytime,
> but the timer handler is execute by logic, this is normal, so I think the logic
> is no problem, and the process of destroying neigh may conflict with the timer handler,
> it is a synchronous problem to make sure the timer should be finished before the
> reference neigh is freed.

The more I think about this, the more none of the explanations for this bug
make any sense.

neigh_destroy() _ONLY_ runs when:

	if (atomic_dec_and_test(&neigh->refcnt))

triggers in neigh_release().

This means it triggers if, and only if, neigh_refcnt goes to zero.

If the refcnt goes to zero, NO TIMER can be running.  If the timer is
running, then there refcnt must be at least '1'.

The only plausible theory would be that something is releasing a neigh
too early, when references to the neigh still actually exist.

And that's a bug that should be fixed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  4:21     ` David Miller
@ 2013-12-04  6:19       ` Ding Tianhong
  2013-12-04  6:27         ` Eric Dumazet
  2013-12-04  6:36         ` David Miller
  0 siblings, 2 replies; 30+ messages in thread
From: Ding Tianhong @ 2013-12-04  6:19 UTC (permalink / raw)
  To: David Miller; +Cc: gaofeng, yoshfuji, joe, vfalico, netdev

On 2013/12/4 12:21, David Miller wrote:
> From: Ding Tianhong <dingtianhong@huawei.com>
> Date: Wed, 4 Dec 2013 12:04:31 +0800
> 
>> The destroying neigh could be trigger by userspace, just like set the ip address which
>> in arp table to the local device ip, some I could not control it, it maybe anytime,
>> but the timer handler is execute by logic, this is normal, so I think the logic
>> is no problem, and the process of destroying neigh may conflict with the timer handler,
>> it is a synchronous problem to make sure the timer should be finished before the
>> reference neigh is freed.
> 
> The more I think about this, the more none of the explanations for this bug
> make any sense.
> 
> neigh_destroy() _ONLY_ runs when:
> 
> 	if (atomic_dec_and_test(&neigh->refcnt))
> 
> triggers in neigh_release().
> 
> This means it triggers if, and only if, neigh_refcnt goes to zero.
> 
> If the refcnt goes to zero, NO TIMER can be running.  If the timer is
> running, then there refcnt must be at least '1'.

Hi David:

Yes, you are right, but when the timer is running and prior to get the neigh->lock, the refcnt
could be dec to 0, you could not stop it by existing mechanism.

the refcnt of neighbour could only be inc by these actions:

1.create neighbour, the refcnt will be set to 1.
2.add timer, the refcnt++.
3.neigh_lookup, if found the neigh, refcnt++.

I can show the whole process of my analysis:

		CPU 0				CPU 1
		-----				-----
	create_neigh() => refcnt = 1;		
	add timer =>	refcnt++;
						<SOFTIRQ>
						base->running_timer = neigh->timer;
						neigh_timer_handler() => at this time, refcnt is 2;

user->	neigh_changeaddr()
	neigh_flush_dev();
	neigh_del_imer, refcnt dec to 1;
	release_neigh(), refcnt is 0,
	destroy_neigh()
	kfree(neighbour);
						write(neigh->lock)

So in my opinion, the point of the problem is that I should not kfree the neighbour until
the timer is not running on CPUs and not pending.

If I miss someghing, pls point out.

Regards
Ding

> 
> The only plausible theory would be that something is releasing a neigh
> too early, when references to the neigh still actually exist.
> 
> And that's a bug that should be fixed.
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  6:19       ` Ding Tianhong
@ 2013-12-04  6:27         ` Eric Dumazet
  2013-12-04  9:16           ` Ding Tianhong
  2013-12-04  6:36         ` David Miller
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2013-12-04  6:27 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: David Miller, gaofeng, yoshfuji, joe, vfalico, netdev

On Wed, 2013-12-04 at 14:19 +0800, Ding Tianhong wrote:
> On 2013/12/4 12:21, David Miller wrote:
> > From: Ding Tianhong <dingtianhong@huawei.com>
> > Date: Wed, 4 Dec 2013 12:04:31 +0800
> > 
> >> The destroying neigh could be trigger by userspace, just like set the ip address which
> >> in arp table to the local device ip, some I could not control it, it maybe anytime,
> >> but the timer handler is execute by logic, this is normal, so I think the logic
> >> is no problem, and the process of destroying neigh may conflict with the timer handler,
> >> it is a synchronous problem to make sure the timer should be finished before the
> >> reference neigh is freed.
> > 
> > The more I think about this, the more none of the explanations for this bug
> > make any sense.
> > 
> > neigh_destroy() _ONLY_ runs when:
> > 
> > 	if (atomic_dec_and_test(&neigh->refcnt))
> > 
> > triggers in neigh_release().
> > 
> > This means it triggers if, and only if, neigh_refcnt goes to zero.
> > 
> > If the refcnt goes to zero, NO TIMER can be running.  If the timer is
> > running, then there refcnt must be at least '1'.
> 
> Hi David:
> 
> Yes, you are right, but when the timer is running and prior to get the neigh->lock, the refcnt
> could be dec to 0, you could not stop it by existing mechanism.
> 
> the refcnt of neighbour could only be inc by these actions:
> 
> 1.create neighbour, the refcnt will be set to 1.
> 2.add timer, the refcnt++.
> 3.neigh_lookup, if found the neigh, refcnt++.
> 
> I can show the whole process of my analysis:
> 
> 		CPU 0				CPU 1
> 		-----				-----
> 	create_neigh() => refcnt = 1;		
> 	add timer =>	refcnt++;
> 						<SOFTIRQ>
> 						base->running_timer = neigh->timer;
> 						neigh_timer_handler() => at this time, refcnt is 2;
> 
> user->	neigh_changeaddr()
> 	neigh_flush_dev();
> 	neigh_del_imer, refcnt dec to 1;

Nope : del_timer() would return 0 here, so we do not decrement refcnt.

I can tell you, if this was not the case, a lot of things would be
terribly broken, like TCP stack.

> 	release_neigh(), refcnt is 0,
> 	destroy_neigh()
> 	kfree(neighbour);
> 						write(neigh->lock)
> 
> So in my opinion, the point of the problem is that I should not kfree the neighbour until
> the timer is not running on CPUs and not pending.
> 
> If I miss someghing, pls point out.

As David explained, if a timer is running, refcnt can not reach 0,
untill the timer handler finished.

So _something_ is calling neigh_release(n) without prior neigh_hold()

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  6:19       ` Ding Tianhong
  2013-12-04  6:27         ` Eric Dumazet
@ 2013-12-04  6:36         ` David Miller
  1 sibling, 0 replies; 30+ messages in thread
From: David Miller @ 2013-12-04  6:36 UTC (permalink / raw)
  To: dingtianhong; +Cc: gaofeng, yoshfuji, joe, vfalico, netdev

From: Ding Tianhong <dingtianhong@huawei.com>
Date: Wed, 4 Dec 2013 14:19:06 +0800

> Yes, you are right, but when the timer is running and prior to get
> the neigh->lock, the refcnt could be dec to 0, you could not stop it
> by existing mechanism.
> 
> the refcnt of neighbour could only be inc by these actions:
> 
> 1.create neighbour, the refcnt will be set to 1.
> 2.add timer, the refcnt++.
> 3.neigh_lookup, if found the neigh, refcnt++.

The refcnt cannot dec to 0 if the timer is running, that's why
the last action of the timer handler is neigh_release().

The timer handler always holds a reference to the neighbour entry for
which it is running, that is why it must always release it.

If it can dec to 0, then this neigh_release() call in the timer
handler is illegal.  But it is not.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  6:27         ` Eric Dumazet
@ 2013-12-04  9:16           ` Ding Tianhong
  2013-12-04 10:10             ` Gao feng
  2013-12-04 15:24             ` Eric Dumazet
  0 siblings, 2 replies; 30+ messages in thread
From: Ding Tianhong @ 2013-12-04  9:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, gaofeng, yoshfuji, joe, vfalico, netdev

On 2013/12/4 14:27, Eric Dumazet wrote:
> On Wed, 2013-12-04 at 14:19 +0800, Ding Tianhong wrote:
>> On 2013/12/4 12:21, David Miller wrote:
>>> From: Ding Tianhong <dingtianhong@huawei.com>
>>> Date: Wed, 4 Dec 2013 12:04:31 +0800
>>>
>>>> The destroying neigh could be trigger by userspace, just like set the ip address which
>>>> in arp table to the local device ip, some I could not control it, it maybe anytime,
>>>> but the timer handler is execute by logic, this is normal, so I think the logic
>>>> is no problem, and the process of destroying neigh may conflict with the timer handler,
>>>> it is a synchronous problem to make sure the timer should be finished before the
>>>> reference neigh is freed.
>>>
>>> The more I think about this, the more none of the explanations for this bug
>>> make any sense.
>>>
>>> neigh_destroy() _ONLY_ runs when:
>>>
>>> 	if (atomic_dec_and_test(&neigh->refcnt))
>>>
>>> triggers in neigh_release().
>>>
>>> This means it triggers if, and only if, neigh_refcnt goes to zero.
>>>
>>> If the refcnt goes to zero, NO TIMER can be running.  If the timer is
>>> running, then there refcnt must be at least '1'.
>>
>> Hi David:
>>
>> Yes, you are right, but when the timer is running and prior to get the neigh->lock, the refcnt
>> could be dec to 0, you could not stop it by existing mechanism.
>>
>> the refcnt of neighbour could only be inc by these actions:
>>
>> 1.create neighbour, the refcnt will be set to 1.
>> 2.add timer, the refcnt++.
>> 3.neigh_lookup, if found the neigh, refcnt++.
>>
>> I can show the whole process of my analysis:
>>
>> 		CPU 0				CPU 1
>> 		-----				-----
>> 	create_neigh() => refcnt = 1;		
>> 	add timer =>	refcnt++;
>> 						<SOFTIRQ>
>> 						base->running_timer = neigh->timer;
>> 						neigh_timer_handler() => at this time, refcnt is 2;
>>
>> user->	neigh_changeaddr()
>> 	neigh_flush_dev();
>> 	neigh_del_imer, refcnt dec to 1;
> 
> Nope : del_timer() would return 0 here, so we do not decrement refcnt.
> 

The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 

> I can tell you, if this was not the case, a lot of things would be
> terribly broken, like TCP stack.
> 
>> 	release_neigh(), refcnt is 0,
>> 	destroy_neigh()
>> 	kfree(neighbour);
>> 						write(neigh->lock)
>>
>> So in my opinion, the point of the problem is that I should not kfree the neighbour until
>> the timer is not running on CPUs and not pending.
>>
>> If I miss someghing, pls point out.
> 
> As David explained, if a timer is running, refcnt can not reach 0,
> untill the timer handler finished.
> 
> So _something_ is calling neigh_release(n) without prior neigh_hold()
> 

Maybe I could try to find more about it, but it is hard to reoccur the problem.

Regards
Ding
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  9:16           ` Ding Tianhong
@ 2013-12-04 10:10             ` Gao feng
  2013-12-04 15:24             ` Eric Dumazet
  1 sibling, 0 replies; 30+ messages in thread
From: Gao feng @ 2013-12-04 10:10 UTC (permalink / raw)
  To: Ding Tianhong, Eric Dumazet; +Cc: David Miller, yoshfuji, joe, vfalico, netdev

On 12/04/2013 05:16 PM, Ding Tianhong wrote:
> On 2013/12/4 14:27, Eric Dumazet wrote:
>> On Wed, 2013-12-04 at 14:19 +0800, Ding Tianhong wrote:
>>> On 2013/12/4 12:21, David Miller wrote:
>>>> From: Ding Tianhong <dingtianhong@huawei.com>
>>>> Date: Wed, 4 Dec 2013 12:04:31 +0800
>>>>
>>>>> The destroying neigh could be trigger by userspace, just like set the ip address which
>>>>> in arp table to the local device ip, some I could not control it, it maybe anytime,
>>>>> but the timer handler is execute by logic, this is normal, so I think the logic
>>>>> is no problem, and the process of destroying neigh may conflict with the timer handler,
>>>>> it is a synchronous problem to make sure the timer should be finished before the
>>>>> reference neigh is freed.
>>>>
>>>> The more I think about this, the more none of the explanations for this bug
>>>> make any sense.
>>>>
>>>> neigh_destroy() _ONLY_ runs when:
>>>>
>>>> 	if (atomic_dec_and_test(&neigh->refcnt))
>>>>
>>>> triggers in neigh_release().
>>>>
>>>> This means it triggers if, and only if, neigh_refcnt goes to zero.
>>>>
>>>> If the refcnt goes to zero, NO TIMER can be running.  If the timer is
>>>> running, then there refcnt must be at least '1'.
>>>
>>> Hi David:
>>>
>>> Yes, you are right, but when the timer is running and prior to get the neigh->lock, the refcnt
>>> could be dec to 0, you could not stop it by existing mechanism.
>>>
>>> the refcnt of neighbour could only be inc by these actions:
>>>
>>> 1.create neighbour, the refcnt will be set to 1.
>>> 2.add timer, the refcnt++.
>>> 3.neigh_lookup, if found the neigh, refcnt++.
>>>
>>> I can show the whole process of my analysis:
>>>
>>> 		CPU 0				CPU 1
>>> 		-----				-----
>>> 	create_neigh() => refcnt = 1;		
>>> 	add timer =>	refcnt++;
>>> 						<SOFTIRQ>
>>> 						base->running_timer = neigh->timer;
>>> 						neigh_timer_handler() => at this time, refcnt is 2;
>>>
>>> user->	neigh_changeaddr()
>>> 	neigh_flush_dev();
>>> 	neigh_del_imer, refcnt dec to 1;
>>
>> Nope : del_timer() would return 0 here, so we do not decrement refcnt.
>>
> 
> The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
> then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 
> 
>> I can tell you, if this was not the case, a lot of things would be
>> terribly broken, like TCP stack.
>>
>>> 	release_neigh(), refcnt is 0,
>>> 	destroy_neigh()
>>> 	kfree(neighbour);
>>> 						write(neigh->lock)
>>>
>>> So in my opinion, the point of the problem is that I should not kfree the neighbour until
>>> the timer is not running on CPUs and not pending.
>>>
>>> If I miss someghing, pls point out.
>>
>> As David explained, if a timer is running, refcnt can not reach 0,
>> untill the timer handler finished.
>>
>> So _something_ is calling neigh_release(n) without prior neigh_hold()
>>

seems like neigh_release in neigh_del_timer should be removed, it can't
release the neigh for the timer which may want to call neigh_release too,
this release job should be done by timer handler.

we should make sure if the timer is not pending, it doesn't have the
reference of neigh.

It's great if we can reproduce this bug, but...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04  9:16           ` Ding Tianhong
  2013-12-04 10:10             ` Gao feng
@ 2013-12-04 15:24             ` Eric Dumazet
  2013-12-05  0:32               ` Gao feng
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2013-12-04 15:24 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: David Miller, gaofeng, yoshfuji, joe, vfalico, netdev

On Wed, 2013-12-04 at 17:16 +0800, Ding Tianhong wrote:
> >> 						base->running_timer = neigh->timer;
> >> 						neigh_timer_handler() => at this time, refcnt is 2;
> >>
> >> user->	neigh_changeaddr()
> >> 	neigh_flush_dev();
> >> 	neigh_del_imer, refcnt dec to 1;
> > 
> > Nope : del_timer() would return 0 here, so we do not decrement refcnt.
> > 
> 
> The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
> then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 

Again no. You are very mistaken.

del_timer() return code is not a hint. Its a precise meaning.

It cannot return 1 if the timer function is running or is about to run.

If you believe there is  bug in del_timer(), fix it ;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-04 15:24             ` Eric Dumazet
@ 2013-12-05  0:32               ` Gao feng
  2013-12-05  3:17                 ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Gao feng @ 2013-12-05  0:32 UTC (permalink / raw)
  To: Eric Dumazet, Ding Tianhong; +Cc: David Miller, yoshfuji, joe, vfalico, netdev

On 12/04/2013 11:24 PM, Eric Dumazet wrote:
> On Wed, 2013-12-04 at 17:16 +0800, Ding Tianhong wrote:
>>>> 						base->running_timer = neigh->timer;
>>>> 						neigh_timer_handler() => at this time, refcnt is 2;
>>>>
>>>> user->	neigh_changeaddr()
>>>> 	neigh_flush_dev();
>>>> 	neigh_del_imer, refcnt dec to 1;
>>>
>>> Nope : del_timer() would return 0 here, so we do not decrement refcnt.
>>>
>>
>> The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
>> then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 
> 
> Again no. You are very mistaken.
> 
> del_timer() return code is not a hint. Its a precise meaning.
> 
> It cannot return 1 if the timer function is running or is about to run.
> 
> If you believe there is  bug in del_timer(), fix it ;)
> 
> 

Yes, you are right, __run_timers did this job.
So We still don't know what's the root reason.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-05  0:32               ` Gao feng
@ 2013-12-05  3:17                 ` Ding Tianhong
  2013-12-18  6:37                   ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-05  3:17 UTC (permalink / raw)
  To: Gao feng, Eric Dumazet; +Cc: David Miller, yoshfuji, joe, vfalico, netdev

On 2013/12/5 8:32, Gao feng wrote:
> On 12/04/2013 11:24 PM, Eric Dumazet wrote:
>> On Wed, 2013-12-04 at 17:16 +0800, Ding Tianhong wrote:
>>>>> 						base->running_timer = neigh->timer;
>>>>> 						neigh_timer_handler() => at this time, refcnt is 2;
>>>>>
>>>>> user->	neigh_changeaddr()
>>>>> 	neigh_flush_dev();
>>>>> 	neigh_del_imer, refcnt dec to 1;
>>>>
>>>> Nope : del_timer() would return 0 here, so we do not decrement refcnt.
>>>>
>>>
>>> The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
>>> then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 
>>
>> Again no. You are very mistaken.
>>
>> del_timer() return code is not a hint. Its a precise meaning.
>>
>> It cannot return 1 if the timer function is running or is about to run.
>>
>> If you believe there is  bug in del_timer(), fix it ;)
>>
>>
> 
> Yes, you are right, __run_timers did this job.
> So We still don't know what's the root reason.
> 
Yes, I miss it, the running timer is detached from the list, thanks for all above.

Regards
Ding

> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-05  3:17                 ` Ding Tianhong
@ 2013-12-18  6:37                   ` Ding Tianhong
  2013-12-18  7:51                     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-18  6:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, yoshfuji, joe, vfalico, netdev

On 2013/12/5 11:17, Ding Tianhong wrote:
> On 2013/12/5 8:32, Gao feng wrote:
>> On 12/04/2013 11:24 PM, Eric Dumazet wrote:
>>> On Wed, 2013-12-04 at 17:16 +0800, Ding Tianhong wrote:
>>>>>> 						base->running_timer = neigh->timer;
>>>>>> 						neigh_timer_handler() => at this time, refcnt is 2;
>>>>>>
>>>>>> user->	neigh_changeaddr()
>>>>>> 	neigh_flush_dev();
>>>>>> 	neigh_del_imer, refcnt dec to 1;
>>>>>
>>>>> Nope : del_timer() would return 0 here, so we do not decrement refcnt.
>>>>>
>>>>
>>>> The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
>>>> then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 
>>>
>>> Again no. You are very mistaken.
>>>
>>> del_timer() return code is not a hint. Its a precise meaning.
>>>
>>> It cannot return 1 if the timer function is running or is about to run.
>>>
>>> If you believe there is  bug in del_timer(), fix it ;)
>>>
>>>
>>
>> Yes, you are right, __run_timers did this job.
>> So We still don't know what's the root reason.
>>
> Yes, I miss it, the running timer is detached from the list, thanks for all above.
> 
> Regards
> Ding
> 


Hi Eric:

I was so doubt about the situation, can you give me some advise?

	CPU0					  CPU1					  CPU2
      --------		      			--------                		---------
neigh_timer_handler				
write_lock(n->lock);		
	...
write_unlock(n->lock);
n->ref_cnt = 2 or 3(if mode_time)				
	...					neigh_flush_dev
						write_lock(n->lock);
						n->ref_cnt = 2;
						n->nud_state = NUD_NONE;
						write_unlock(n->lock);
						neigh_release()
						n->ref_cnt = 1;
						...					neigh_periodic_work
											write_lock(n->lock);
											write_unlock(n->lock);
											neigh_release();
											kfree(n)
n->ops->solicit()									...
...

if that possible? or I was totally wrong? pls give me some advise if I miss something, thanks a lot.

Best Regards
Ding

>>
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18  6:37                   ` Ding Tianhong
@ 2013-12-18  7:51                     ` Hannes Frederic Sowa
  2013-12-18  8:19                       ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Frederic Sowa @ 2013-12-18  7:51 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: Eric Dumazet, David Miller, yoshfuji, joe, vfalico, netdev

Hi Ding!

May I step in for short?

On Wed, Dec 18, 2013 at 02:37:35PM +0800, Ding Tianhong wrote:
> On 2013/12/5 11:17, Ding Tianhong wrote:
> > On 2013/12/5 8:32, Gao feng wrote:
> >> On 12/04/2013 11:24 PM, Eric Dumazet wrote:
> >>> On Wed, 2013-12-04 at 17:16 +0800, Ding Tianhong wrote:
> >>>>>> 						base->running_timer = neigh->timer;
> >>>>>> 						neigh_timer_handler() => at this time, refcnt is 2;
> >>>>>>
> >>>>>> user->	neigh_changeaddr()
> >>>>>> 	neigh_flush_dev();
> >>>>>> 	neigh_del_imer, refcnt dec to 1;
> >>>>>
> >>>>> Nope : del_timer() would return 0 here, so we do not decrement refcnt.
> >>>>>
> >>>>
> >>>> The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
> >>>> then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 
> >>>
> >>> Again no. You are very mistaken.
> >>>
> >>> del_timer() return code is not a hint. Its a precise meaning.
> >>>
> >>> It cannot return 1 if the timer function is running or is about to run.
> >>>
> >>> If you believe there is  bug in del_timer(), fix it ;)
> >>>
> >>>
> >>
> >> Yes, you are right, __run_timers did this job.
> >> So We still don't know what's the root reason.
> >>
> > Yes, I miss it, the running timer is detached from the list, thanks for all above.
> > 
> > Regards
> > Ding
> > 
> 
> 
> Hi Eric:
> 
> I was so doubt about the situation, can you give me some advise?
> 
> 	CPU0					  CPU1					  CPU2
>       --------		      			--------                		---------
> neigh_timer_handler				
> write_lock(n->lock);		
> 	...
> write_unlock(n->lock);
> n->ref_cnt = 2 or 3(if mode_time)				
> 	...					neigh_flush_dev
> 						write_lock(n->lock);
> 						n->ref_cnt = 2;
> 						n->nud_state = NUD_NONE;
> 						write_unlock(n->lock);
> 						neigh_release()
> 						n->ref_cnt = 1;
> 						...					neigh_periodic_work
> 											write_lock(n->lock);
> 											write_unlock(n->lock);
> 											neigh_release();
> 											kfree(n)
> n->ops->solicit()									...
> ...
> 
> if that possible? or I was totally wrong? pls give me some advise if I miss something, thanks a lot.

When you first posted the patch I had my doubts that there is such
race. E.g. n->dead was 0 in the neigh you posted. After all the neigh
timer has its own reference and does check ->dead before proceeding. I maybe
wrong because the memory could already be overwritten.

(Maybe we should move the ->dead check inside the write_lock to use it as a
barrier.)

Maybe there was another dereference which caused the bug. Could you check if
you have the code section from the panic? It should look something like

Code: FF FF FF .. <FF> ..

Sometimes using addr2line on vmlinux with the RIP could give another hint.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18  7:51                     ` Hannes Frederic Sowa
@ 2013-12-18  8:19                       ` Ding Tianhong
  2013-12-18  8:41                         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-18  8:19 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, yoshfuji, joe, vfalico, netdev

On 2013/12/18 15:51, Hannes Frederic Sowa wrote:
> Hi Ding!
> 
> May I step in for short?
> 
> On Wed, Dec 18, 2013 at 02:37:35PM +0800, Ding Tianhong wrote:
>> On 2013/12/5 11:17, Ding Tianhong wrote:
>>> On 2013/12/5 8:32, Gao feng wrote:
>>>> On 12/04/2013 11:24 PM, Eric Dumazet wrote:
>>>>> On Wed, 2013-12-04 at 17:16 +0800, Ding Tianhong wrote:
>>>>>>>> 						base->running_timer = neigh->timer;
>>>>>>>> 						neigh_timer_handler() => at this time, refcnt is 2;
>>>>>>>>
>>>>>>>> user->	neigh_changeaddr()
>>>>>>>> 	neigh_flush_dev();
>>>>>>>> 	neigh_del_imer, refcnt dec to 1;
>>>>>>>
>>>>>>> Nope : del_timer() would return 0 here, so we do not decrement refcnt.
>>>>>>>
>>>>>>
>>>>>> The first call for del_timer() will return 1, because the timer->entry.next is not NULL,
>>>>>> then in the neigh_destroy, the del_timer() again will return 0 because timer->entry.next is NULL. 
>>>>>
>>>>> Again no. You are very mistaken.
>>>>>
>>>>> del_timer() return code is not a hint. Its a precise meaning.
>>>>>
>>>>> It cannot return 1 if the timer function is running or is about to run.
>>>>>
>>>>> If you believe there is  bug in del_timer(), fix it ;)
>>>>>
>>>>>
>>>>
>>>> Yes, you are right, __run_timers did this job.
>>>> So We still don't know what's the root reason.
>>>>
>>> Yes, I miss it, the running timer is detached from the list, thanks for all above.
>>>
>>> Regards
>>> Ding
>>>
>>
>>
>> Hi Eric:
>>
>> I was so doubt about the situation, can you give me some advise?
>>
>> 	CPU0					  CPU1					  CPU2
>>       --------		      			--------                		---------
>> neigh_timer_handler				
>> write_lock(n->lock);		
>> 	...
>> write_unlock(n->lock);
>> n->ref_cnt = 2 or 3(if mode_time)				
>> 	...					neigh_flush_dev
>> 						write_lock(n->lock);
>> 						n->ref_cnt = 2;
>> 						n->nud_state = NUD_NONE;
>> 						write_unlock(n->lock);
>> 						neigh_release()
>> 						n->ref_cnt = 1;
>> 						...					neigh_periodic_work
>> 											write_lock(n->lock);
>> 											write_unlock(n->lock);
>> 											neigh_release();
>> 											kfree(n)
>> n->ops->solicit()									...
>> ...
>>
>> if that possible? or I was totally wrong? pls give me some advise if I miss something, thanks a lot.
> 
> When you first posted the patch I had my doubts that there is such
> race. E.g. n->dead was 0 in the neigh you posted. After all the neigh
> timer has its own reference and does check ->dead before proceeding. I maybe
> wrong because the memory could already be overwritten.
> 
> (Maybe we should move the ->dead check inside the write_lock to use it as a
> barrier.)
> 
> Maybe there was another dereference which caused the bug. Could you check if
> you have the code section from the panic? It should look something like
> 
> Code: FF FF FF .. <FF> ..
> 
> Sometimes using addr2line on vmlinux with the RIP could give another hint.
> 
> Greetings,
> 
>   Hannes
> 

Hi Hannes:

Thanks for your attention. I hope it useful to you.

235 [64306.090258] Supported: Yes, External
236 [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
237 [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
238 [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
239 [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
240 [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
241 [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
242 [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
243 [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560

ffffffff812f8e36 (t) neigh_timer_handler+278  ../linux/net/core/neighbour.c: 876
crash> dis -l neigh_timer_handler
/usr/src/linux/net/core/neighbour.c: 798
0xffffffff812f8d20 <neigh_timer_handler>:       sub    $0x28,%rsp
0xffffffff812f8d24 <neigh_timer_handler+4>:     mov    %r14,0x20(%rsp)
/usr/src/linux/net/core/neighbour.c: 804
0xffffffff812f8d29 <neigh_timer_handler+9>:     lea    0x40(%rdi),%r14
/usr/src/linux/net/core/neighbour.c: 798
0xffffffff812f8d2d <neigh_timer_handler+13>:    mov    %rbx,(%rsp)
0xffffffff812f8d31 <neigh_timer_handler+17>:    mov    %rdi,%rbx
0xffffffff812f8d34 <neigh_timer_handler+20>:    mov    %rbp,0x8(%rsp)
0xffffffff812f8d39 <neigh_timer_handler+25>:    mov    %r12,0x10(%rsp)
/usr/src/linux/net/core/neighbour.c: 804
0xffffffff812f8d3e <neigh_timer_handler+30>:    mov    %r14,%rdi
/usr/src/linux/net/core/neighbour.c: 798
0xffffffff812f8d41 <neigh_timer_handler+33>:    mov    %r13,0x18(%rsp)
/usr/src/linux/net/core/neighbour.c: 878
0xffffffff812f8d46 <neigh_timer_handler+38>:    xor    %r13d,%r13d
/usr/src/linux/net/core/neighbour.c: 804
0xffffffff812f8d49 <neigh_timer_handler+41>:    callq  0xffffffff81394bb0 <_write_lock>
/usr/src/linux/net/core/neighbour.c: 806
0xffffffff812f8d4e <neigh_timer_handler+46>:    movzbl 0x39(%rbx),%edx
/usr/src/linux/net/core/neighbour.c: 807
0xffffffff812f8d52 <neigh_timer_handler+50>:    mov    0x634ba7(%rip),%rdi        # 0xffffffff8192d900
0xffffffff812f8d59 <neigh_timer_handler+57>:    lea    0x70(%rbx),%r12
/usr/src/linux/net/core/neighbour.c: 808
0xffffffff812f8d5d <neigh_timer_handler+61>:    lea    0xfa(%rdi),%rbp
/usr/src/linux/net/core/neighbour.c: 806
0xffffffff812f8d64 <neigh_timer_handler+68>:    movzbl %dl,%eax
/usr/src/linux/net/core/neighbour.c: 810
0xffffffff812f8d67 <neigh_timer_handler+71>:    test   $0x1b,%al
0xffffffff812f8d69 <neigh_timer_handler+73>:    je     0xffffffff812f8e50 <neigh_timer_handler+304>
/usr/src/linux/net/core/neighbour.c: 817
0xffffffff812f8d6f <neigh_timer_handler+79>:    test   $0x2,%al
0xffffffff812f8d71 <neigh_timer_handler+81>:    je     0xffffffff812f8e98 <neigh_timer_handler+376>
/usr/src/linux/net/core/neighbour.c: 818
0xffffffff812f8d77 <neigh_timer_handler+87>:    mov    0x10(%rbx),%r8
0xffffffff812f8d7b <neigh_timer_handler+91>:    movslq 0x5c(%r8),%rsi
0xffffffff812f8d7f <neigh_timer_handler+95>:    add    0x28(%rbx),%rsi
0xffffffff812f8d83 <neigh_timer_handler+99>:    cmp    %rdi,%rsi
0xffffffff812f8d86 <neigh_timer_handler+102>:   js     0xffffffff812f8f58 <neigh_timer_handler+568>
0xffffffff812f8d8c <neigh_timer_handler+108>:   mov    %rsi,%rbp
/usr/src/linux/net/core/neighbour.c: 857
0xffffffff812f8d8f <neigh_timer_handler+111>:   movzbl %dl,%eax
0xffffffff812f8d92 <neigh_timer_handler+114>:   test   $0x11,%al
0xffffffff812f8d94 <neigh_timer_handler+116>:   je     0xffffffff812f8db0 <neigh_timer_handler+144>
/usr/src/linux/net/core/neighbour.c: 768
0xffffffff812f8d96 <neigh_timer_handler+118>:   test   $0x10,%al
/usr/src/linux/include/asm/atomic_64.h: 23
0xffffffff812f8d98 <neigh_timer_handler+120>:   mov    0x3c(%rbx),%esi
/usr/src/linux/net/core/neighbour.c: 767
0xffffffff812f8d9b <neigh_timer_handler+123>:   mov    0x10(%rbx),%rcx
/usr/src/linux/net/core/neighbour.c: 768
0xffffffff812f8d9f <neigh_timer_handler+127>:   je     0xffffffff812f8fe8 <neigh_timer_handler+712>
0xffffffff812f8da5 <neigh_timer_handler+133>:   mov    0x68(%rcx),%edx
/usr/src/linux/net/core/neighbour.c: 857
0xffffffff812f8da8 <neigh_timer_handler+136>:   cmp    %edx,%esi
0xffffffff812f8daa <neigh_timer_handler+138>:   jge    0xffffffff812f9000 <neigh_timer_handler+736>
/usr/src/linux/net/core/neighbour.c: 864
0xffffffff812f8db0 <neigh_timer_handler+144>:   test   $0x1b,%al
0xffffffff812f8db2 <neigh_timer_handler+146>:   lea    0x70(%rbx),%r12
0xffffffff812f8db6 <neigh_timer_handler+150>:   je     0xffffffff812f8df1 <neigh_timer_handler+209>
/usr/src/linux/net/core/neighbour.c: 865
0xffffffff812f8db8 <neigh_timer_handler+152>:   mov    0x634b41(%rip),%rax        # 0xffffffff8192d900
0xffffffff812f8dbf <neigh_timer_handler+159>:   mov    %rbp,%rdx
0xffffffff812f8dc2 <neigh_timer_handler+162>:   sub    %rax,%rdx
0xffffffff812f8dc5 <neigh_timer_handler+165>:   mov    %rdx,%rax
0xffffffff812f8dc8 <neigh_timer_handler+168>:   sub    $0x7d,%rax
0xffffffff812f8dcc <neigh_timer_handler+172>:   js     0xffffffff812f9020 <neigh_timer_handler+768>
/usr/src/linux/net/core/neighbour.c: 867
0xffffffff812f8dd2 <neigh_timer_handler+178>:   lea    0x98(%rbx),%rdi
0xffffffff812f8dd9 <neigh_timer_handler+185>:   mov    %rbp,%rsi
0xffffffff812f8ddc <neigh_timer_handler+188>:   callq  0xffffffff81057db0 <mod_timer>
0xffffffff812f8de1 <neigh_timer_handler+193>:   test   %eax,%eax
0xffffffff812f8de3 <neigh_timer_handler+195>:   je     0xffffffff812f8fd0 <neigh_timer_handler+688>
0xffffffff812f8de9 <neigh_timer_handler+201>:   movzbl 0x39(%rbx),%eax
0xffffffff812f8ded <neigh_timer_handler+205>:   lea    0x70(%rbx),%r12
/usr/src/linux/net/core/neighbour.c: 870
0xffffffff812f8df1 <neigh_timer_handler+209>:   test   $0x11,%al
0xffffffff812f8df3 <neigh_timer_handler+211>:   je     0xffffffff812f8e50 <neigh_timer_handler+304>
/usr/src/linux/include/linux/skbuff.h: 787
0xffffffff812f8df5 <neigh_timer_handler+213>:   mov    0x80(%rbx),%rbp
0xffffffff812f8dfc <neigh_timer_handler+220>:   lea    0x80(%rbx),%rax
/usr/src/linux/include/linux/skbuff.h: 788
0xffffffff812f8e03 <neigh_timer_handler+227>:   cmp    %rbp,%rax
0xffffffff812f8e06 <neigh_timer_handler+230>:   je     0xffffffff812f9060 <neigh_timer_handler+832>
/usr/src/linux/net/core/neighbour.c: 873
0xffffffff812f8e0c <neigh_timer_handler+236>:   test   %rbp,%rbp
0xffffffff812f8e0f <neigh_timer_handler+239>:   je     0xffffffff812f8e21 <neigh_timer_handler+257>
/usr/src/linux/net/core/neighbour.c: 874
0xffffffff812f8e11 <neigh_timer_handler+241>:   mov    %rbp,%rdi
0xffffffff812f8e14 <neigh_timer_handler+244>:   mov    $0x20,%esi
0xffffffff812f8e19 <neigh_timer_handler+249>:   callq  0xffffffff812e83c0 <skb_copy>
0xffffffff812f8e1e <neigh_timer_handler+254>:   mov    %rax,%rbp
/usr/src/linux/include/asm/spinlock.h: 294
0xffffffff812f8e21 <neigh_timer_handler+257>:   lock addl $0x1000000,0x40(%rbx)
/usr/src/linux/net/core/neighbour.c: 876
0xffffffff812f8e29 <neigh_timer_handler+265>:   mov    0xe8(%rbx),%rax
0xffffffff812f8e30 <neigh_timer_handler+272>:   mov    %rbp,%rsi
0xffffffff812f8e33 <neigh_timer_handler+275>:   mov    %rbx,%rdi
0xffffffff812f8e36 <neigh_timer_handler+278>:   callq  *0x8(%rax)						<-----crash
/usr/src/linux/net/core/neighbour.c: 877
0xffffffff812f8e39 <neigh_timer_handler+281>:   lea    0x3c(%rbx),%rax
/usr/src/linux/include/asm/atomic_64.h: 93
0xffffffff812f8e3d <neigh_timer_handler+285>:   lock incl 0x3c(%rbx)
/usr/src/linux/net/core/neighbour.c: 878
0xffffffff812f8e41 <neigh_timer_handler+289>:   mov    %rbp,%rdi
0xffffffff812f8e44 <neigh_timer_handler+292>:   callq  0xffffffff812e57c0 <kfree_skb>
0xffffffff812f8e49 <neigh_timer_handler+297>:   jmp    0xffffffff812f8e58 <neigh_timer_handler+312>
0xffffffff812f8e4b <neigh_timer_handler+299>:   nopl   0x0(%rax,%rax,1)
/usr/src/linux/include/asm/spinlock.h: 294
0xffffffff812f8e50 <neigh_timer_handler+304>:   lock addl $0x1000000,0x40(%rbx)
/usr/src/linux/net/core/neighbour.c: 884
0xffffffff812f8e58 <neigh_timer_handler+312>:   test   %r13d,%r13d
0xffffffff812f8e5b <neigh_timer_handler+315>:   je     0xffffffff812f8e65 <neigh_timer_handler+325>
/usr/src/linux/net/core/neighbour.c: 885
0xffffffff812f8e5d <neigh_timer_handler+317>:   mov    %rbx,%rdi
0xffffffff812f8e60 <neigh_timer_handler+320>:   callq  0xffffffff812f8950 <neigh_update_notify>
/usr/src/linux/include/asm/atomic_64.h: 123
0xffffffff812f8e65 <neigh_timer_handler+325>:   lock decl (%r12)
0xffffffff812f8e6a <neigh_timer_handler+330>:   sete   %al
/usr/src/linux/include/net/neighbour.h: 288
0xffffffff812f8e6d <neigh_timer_handler+333>:   test   %al,%al
0xffffffff812f8e6f <neigh_timer_handler+335>:   jne    0xffffffff812f8f18 <neigh_timer_handler+504>
/usr/src/linux/net/core/neighbour.c: 888
0xffffffff812f8e75 <neigh_timer_handler+341>:   mov    (%rsp),%rbx
0xffffffff812f8e79 <neigh_timer_handler+345>:   mov    0x8(%rsp),%rbp
0xffffffff812f8e7e <neigh_timer_handler+350>:   mov    0x10(%rsp),%r12
0xffffffff812f8e83 <neigh_timer_handler+355>:   mov    0x18(%rsp),%r13
0xffffffff812f8e88 <neigh_timer_handler+360>:   mov    0x20(%rsp),%r14
0xffffffff812f8e8d <neigh_timer_handler+365>:   add    $0x28,%rsp
0xffffffff812f8e91 <neigh_timer_handler+369>:   retq   
0xffffffff812f8e92 <neigh_timer_handler+370>:   nopw   0x0(%rax,%rax,1)
/usr/src/linux/net/core/neighbour.c: 836
0xffffffff812f8e98 <neigh_timer_handler+376>:   test   $0x8,%al
0xffffffff812f8e9a <neigh_timer_handler+378>:   je     0xffffffff812f8f40 <neigh_timer_handler+544>
/usr/src/linux/net/core/neighbour.c: 837
0xffffffff812f8ea0 <neigh_timer_handler+384>:   mov    0x10(%rbx),%r8
0xffffffff812f8ea4 <neigh_timer_handler+388>:   movslq 0x60(%r8),%rax
0xffffffff812f8ea8 <neigh_timer_handler+392>:   add    0x28(%rbx),%rax
0xffffffff812f8eac <neigh_timer_handler+396>:   cmp    %rdi,%rax
0xffffffff812f8eaf <neigh_timer_handler+399>:   js     0xffffffff812f9030 <neigh_timer_handler+784>
/usr/src/linux/net/core/neighbour.c: 840
0xffffffff812f8eb5 <neigh_timer_handler+405>:   movb   $0x2,0x39(%rbx)
/usr/src/linux/net/core/neighbour.c: 689
0xffffffff812f8eb9 <neigh_timer_handler+409>:   mov    0xe8(%rbx),%rcx
/usr/src/linux/net/core/neighbour.c: 841
0xffffffff812f8ec0 <neigh_timer_handler+416>:   mov    0x634a39(%rip),%rax        # 0xffffffff8192d900
/usr/src/linux/net/core/neighbour.c: 691
0xffffffff812f8ec7 <neigh_timer_handler+423>:   mov    0x68(%rbx),%rdx
/usr/src/linux/net/core/neighbour.c: 841
0xffffffff812f8ecb <neigh_timer_handler+427>:   mov    %rax,0x30(%rbx)
/usr/src/linux/net/core/neighbour.c: 689
0xffffffff812f8ecf <neigh_timer_handler+431>:   mov    0x20(%rcx),%rax
/usr/src/linux/net/core/neighbour.c: 691
0xffffffff812f8ed3 <neigh_timer_handler+435>:   test   %rdx,%rdx
/usr/src/linux/net/core/neighbour.c: 689
0xffffffff812f8ed6 <neigh_timer_handler+438>:   mov    %rax,0x78(%rbx)
/usr/src/linux/net/core/neighbour.c: 691
0xffffffff812f8eda <neigh_timer_handler+442>:   jne    0xffffffff812f8ee7 <neigh_timer_handler+455>
0xffffffff812f8edc <neigh_timer_handler+444>:   jmp    0xffffffff812f8efe <neigh_timer_handler+478>
0xffffffff812f8ede <neigh_timer_handler+446>:   xchg   %ax,%ax
0xffffffff812f8ee0 <neigh_timer_handler+448>:   mov    0xe8(%rbx),%rcx
/usr/src/linux/net/core/neighbour.c: 692
0xffffffff812f8ee7 <neigh_timer_handler+455>:   mov    0x28(%rcx),%rax
0xffffffff812f8eeb <neigh_timer_handler+459>:   mov    %rax,0x88(%rdx)
/usr/src/linux/net/core/neighbour.c: 691
0xffffffff812f8ef2 <neigh_timer_handler+466>:   mov    (%rdx),%rdx
0xffffffff812f8ef5 <neigh_timer_handler+469>:   test   %rdx,%rdx
0xffffffff812f8ef8 <neigh_timer_handler+472>:   jne    0xffffffff812f8ee0 <neigh_timer_handler+448>
0xffffffff812f8efa <neigh_timer_handler+474>:   mov    0x10(%rbx),%r8
/usr/src/linux/net/core/neighbour.c: 844
0xffffffff812f8efe <neigh_timer_handler+478>:   movslq 0x5c(%r8),%rbp
0xffffffff812f8f02 <neigh_timer_handler+482>:   add    0x28(%rbx),%rbp
0xffffffff812f8f06 <neigh_timer_handler+486>:   movzbl 0x39(%rbx),%edx
0xffffffff812f8f0a <neigh_timer_handler+490>:   mov    $0x1,%r13d
0xffffffff812f8f10 <neigh_timer_handler+496>:   jmpq   0xffffffff812f8d8f <neigh_timer_handler+111>
0xffffffff812f8f15 <neigh_timer_handler+501>:   nopl   (%rax)
/usr/src/linux/include/net/neighbour.h: 289
0xffffffff812f8f18 <neigh_timer_handler+504>:   mov    %rbx,%rdi
/usr/src/linux/net/core/neighbour.c: 888
0xffffffff812f8f1b <neigh_timer_handler+507>:   mov    0x8(%rsp),%rbp
0xffffffff812f8f20 <neigh_timer_handler+512>:   mov    (%rsp),%rbx
0xffffffff812f8f24 <neigh_timer_handler+516>:   mov    0x10(%rsp),%r12
0xffffffff812f8f29 <neigh_timer_handler+521>:   mov    0x18(%rsp),%r13
0xffffffff812f8f2e <neigh_timer_handler+526>:   mov    0x20(%rsp),%r14
0xffffffff812f8f33 <neigh_timer_handler+531>:   add    $0x28,%rsp
/usr/src/linux/include/net/neighbour.h: 289
0xffffffff812f8f37 <neigh_timer_handler+535>:   jmpq   0xffffffff812f8ba0 <neigh_destroy>
0xffffffff812f8f3c <neigh_timer_handler+540>:   nopl   0x0(%rax)
/usr/src/linux/net/core/neighbour.c: 854
0xffffffff812f8f40 <neigh_timer_handler+544>:   mov    0x10(%rbx),%rax
0xffffffff812f8f44 <neigh_timer_handler+548>:   movslq 0x54(%rax),%rax
0xffffffff812f8f48 <neigh_timer_handler+552>:   lea    (%rax,%rdi,1),%rbp
0xffffffff812f8f4c <neigh_timer_handler+556>:   jmpq   0xffffffff812f8d8f <neigh_timer_handler+111>
0xffffffff812f8f51 <neigh_timer_handler+561>:   nopl   0x0(%rax)
/usr/src/linux/net/core/neighbour.c: 822
0xffffffff812f8f58 <neigh_timer_handler+568>:   movslq 0x60(%r8),%rax
0xffffffff812f8f5c <neigh_timer_handler+572>:   add    0x20(%rbx),%rax
0xffffffff812f8f60 <neigh_timer_handler+576>:   cmp    %rdi,%rax
0xffffffff812f8f63 <neigh_timer_handler+579>:   js     0xffffffff812f9070 <neigh_timer_handler+848>
/usr/src/linux/net/core/neighbour.c: 825
0xffffffff812f8f69 <neigh_timer_handler+585>:   movb   $0x8,0x39(%rbx)
/usr/src/linux/net/core/neighbour.c: 672
0xffffffff812f8f6d <neigh_timer_handler+589>:   mov    0xe8(%rbx),%rcx
/usr/src/linux/net/core/neighbour.c: 826
0xffffffff812f8f74 <neigh_timer_handler+596>:   mov    0x634985(%rip),%rax        # 0xffffffff8192d900
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f8f7b <neigh_timer_handler+603>:   mov    0x68(%rbx),%rdx
/usr/src/linux/net/core/neighbour.c: 826
0xffffffff812f8f7f <neigh_timer_handler+607>:   mov    %rax,0x30(%rbx)
/usr/src/linux/net/core/neighbour.c: 672
0xffffffff812f8f83 <neigh_timer_handler+611>:   mov    0x18(%rcx),%rax
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f8f87 <neigh_timer_handler+615>:   test   %rdx,%rdx
/usr/src/linux/net/core/neighbour.c: 672
0xffffffff812f8f8a <neigh_timer_handler+618>:   mov    %rax,0x78(%rbx)
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f8f8e <neigh_timer_handler+622>:   jne    0xffffffff812f8faf <neigh_timer_handler+655>
/usr/src/linux/net/core/neighbour.c: 828
0xffffffff812f8f90 <neigh_timer_handler+624>:   movslq 0x60(%r8),%rax
0xffffffff812f8f94 <neigh_timer_handler+628>:   movzbl 0x39(%rbx),%edx
0xffffffff812f8f98 <neigh_timer_handler+632>:   xor    %r13d,%r13d
0xffffffff812f8f9b <neigh_timer_handler+635>:   lea    (%rax,%rdi,1),%rbp
0xffffffff812f8f9f <neigh_timer_handler+639>:   jmpq   0xffffffff812f8d8f <neigh_timer_handler+111>
0xffffffff812f8fa4 <neigh_timer_handler+644>:   nopl   0x0(%rax)
0xffffffff812f8fa8 <neigh_timer_handler+648>:   mov    0xe8(%rbx),%rcx
/usr/src/linux/net/core/neighbour.c: 675
0xffffffff812f8faf <neigh_timer_handler+655>:   mov    0x18(%rcx),%rax
0xffffffff812f8fb3 <neigh_timer_handler+659>:   mov    %rax,0x88(%rdx)
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f8fba <neigh_timer_handler+666>:   mov    (%rdx),%rdx
0xffffffff812f8fbd <neigh_timer_handler+669>:   test   %rdx,%rdx
0xffffffff812f8fc0 <neigh_timer_handler+672>:   jne    0xffffffff812f8fa8 <neigh_timer_handler+648>
0xffffffff812f8fc2 <neigh_timer_handler+674>:   mov    0x10(%rbx),%r8
0xffffffff812f8fc6 <neigh_timer_handler+678>:   jmp    0xffffffff812f8f90 <neigh_timer_handler+624>
0xffffffff812f8fc8 <neigh_timer_handler+680>:   nopl   0x0(%rax,%rax,1)
/usr/src/linux/net/core/neighbour.c: 868
0xffffffff812f8fd0 <neigh_timer_handler+688>:   lea    0x70(%rbx),%r12
/usr/src/linux/include/asm/atomic_64.h: 93
0xffffffff812f8fd4 <neigh_timer_handler+692>:   lock incl 0x70(%rbx)
0xffffffff812f8fd8 <neigh_timer_handler+696>:   movzbl 0x39(%rbx),%eax
0xffffffff812f8fdc <neigh_timer_handler+700>:   jmpq   0xffffffff812f8df1 <neigh_timer_handler+209>
0xffffffff812f8fe1 <neigh_timer_handler+705>:   nopl   0x0(%rax)
/usr/src/linux/net/core/neighbour.c: 768
0xffffffff812f8fe8 <neigh_timer_handler+712>:   mov    0x6c(%rcx),%edx
0xffffffff812f8feb <neigh_timer_handler+715>:   add    0x68(%rcx),%edx
0xffffffff812f8fee <neigh_timer_handler+718>:   add    0x70(%rcx),%edx
0xffffffff812f8ff1 <neigh_timer_handler+721>:   jmpq   0xffffffff812f8da8 <neigh_timer_handler+136>
0xffffffff812f8ff6 <neigh_timer_handler+726>:   nopw   %cs:0x0(%rax,%rax,1)
/usr/src/linux/net/core/neighbour.c: 859
0xffffffff812f9000 <neigh_timer_handler+736>:   movb   $0x20,0x39(%rbx)
/usr/src/linux/net/core/neighbour.c: 861
0xffffffff812f9004 <neigh_timer_handler+740>:   mov    %rbx,%rdi
0xffffffff812f9007 <neigh_timer_handler+743>:   mov    $0x1,%r13d
0xffffffff812f900d <neigh_timer_handler+749>:   callq  0xffffffff812f8880 <neigh_invalidate>
0xffffffff812f9012 <neigh_timer_handler+754>:   movzbl 0x39(%rbx),%eax
0xffffffff812f9016 <neigh_timer_handler+758>:   jmpq   0xffffffff812f8db0 <neigh_timer_handler+144>
0xffffffff812f901b <neigh_timer_handler+763>:   nopl   0x0(%rax,%rax,1)
/usr/src/linux/net/core/neighbour.c: 866
0xffffffff812f9020 <neigh_timer_handler+768>:   mov    0x6348d9(%rip),%rax        # 0xffffffff8192d900
0xffffffff812f9027 <neigh_timer_handler+775>:   lea    0x7d(%rax),%rbp
0xffffffff812f902b <neigh_timer_handler+779>:   jmpq   0xffffffff812f8dd2 <neigh_timer_handler+178>
/usr/src/linux/net/core/neighbour.c: 847
0xffffffff812f9030 <neigh_timer_handler+784>:   movb   $0x10,0x39(%rbx)
/usr/src/linux/net/core/neighbour.c: 848
0xffffffff812f9034 <neigh_timer_handler+788>:   mov    0x6348c5(%rip),%rax        # 0xffffffff8192d900
/usr/src/linux/net/core/neighbour.c: 850
0xffffffff812f903b <neigh_timer_handler+795>:   mov    $0x10,%edx
/usr/src/linux/include/asm/atomic_64.h: 35
0xffffffff812f9040 <neigh_timer_handler+800>:   movl   $0x0,0x3c(%rbx)
/usr/src/linux/net/core/neighbour.c: 848
0xffffffff812f9047 <neigh_timer_handler+807>:   mov    %rax,0x30(%rbx)
/usr/src/linux/net/core/neighbour.c: 850
0xffffffff812f904b <neigh_timer_handler+811>:   movslq 0x54(%r8),%rax
0xffffffff812f904f <neigh_timer_handler+815>:   lea    (%rax,%rdi,1),%rbp
0xffffffff812f9053 <neigh_timer_handler+819>:   jmpq   0xffffffff812f8d8f <neigh_timer_handler+111>
0xffffffff812f9058 <neigh_timer_handler+824>:   nopl   0x0(%rax,%rax,1)
/usr/src/linux/include/linux/skbuff.h: 788
0xffffffff812f9060 <neigh_timer_handler+832>:   xor    %ebp,%ebp
0xffffffff812f9062 <neigh_timer_handler+834>:   jmpq   0xffffffff812f8e21 <neigh_timer_handler+257>
0xffffffff812f9067 <neigh_timer_handler+839>:   nopw   0x0(%rax,%rax,1)
/usr/src/linux/net/core/neighbour.c: 831
0xffffffff812f9070 <neigh_timer_handler+848>:   movb   $0x4,0x39(%rbx)
/usr/src/linux/net/core/neighbour.c: 672
0xffffffff812f9074 <neigh_timer_handler+852>:   mov    0xe8(%rbx),%rcx
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f907b <neigh_timer_handler+859>:   mov    $0x4,%edx
/usr/src/linux/net/core/neighbour.c: 832
0xffffffff812f9080 <neigh_timer_handler+864>:   mov    0x634879(%rip),%rax        # 0xffffffff8192d900
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f9087 <neigh_timer_handler+871>:   mov    0x68(%rbx),%rsi
0xffffffff812f908b <neigh_timer_handler+875>:   mov    $0x1,%r13d
/usr/src/linux/net/core/neighbour.c: 832
0xffffffff812f9091 <neigh_timer_handler+881>:   mov    %rax,0x30(%rbx)
/usr/src/linux/net/core/neighbour.c: 672
0xffffffff812f9095 <neigh_timer_handler+885>:   mov    0x18(%rcx),%rax
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f9099 <neigh_timer_handler+889>:   test   %rsi,%rsi
/usr/src/linux/net/core/neighbour.c: 672
0xffffffff812f909c <neigh_timer_handler+892>:   mov    %rax,0x78(%rbx)
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f90a0 <neigh_timer_handler+896>:   jne    0xffffffff812f90b7 <neigh_timer_handler+919>
0xffffffff812f90a2 <neigh_timer_handler+898>:   jmpq   0xffffffff812f8d8f <neigh_timer_handler+111>
0xffffffff812f90a7 <neigh_timer_handler+903>:   nopw   0x0(%rax,%rax,1)
0xffffffff812f90b0 <neigh_timer_handler+912>:   mov    0xe8(%rbx),%rcx
/usr/src/linux/net/core/neighbour.c: 675
0xffffffff812f90b7 <neigh_timer_handler+919>:   mov    0x18(%rcx),%rax
0xffffffff812f90bb <neigh_timer_handler+923>:   mov    %rax,0x88(%rsi)
/usr/src/linux/net/core/neighbour.c: 674
0xffffffff812f90c2 <neigh_timer_handler+930>:   mov    (%rsi),%rsi
0xffffffff812f90c5 <neigh_timer_handler+933>:   test   %rsi,%rsi
0xffffffff812f90c8 <neigh_timer_handler+936>:   jne    0xffffffff812f90b0 <neigh_timer_handler+912>
0xffffffff812f90ca <neigh_timer_handler+938>:   jmpq   0xffffffff812f8f06 <neigh_timer_handler+486>
0xffffffff812f90cf <neigh_timer_handler+943>:   nop


crash>  neighbour ffff8801cddf1500
struct neighbour {
  next = 0x4037ad30000045, 
  tbl = 0x2484560013d0640, 
  parms = 0xdc03ff7f03484560, 
  dev = 0x23351d86, 
  used = 170277469553264, 
  confirmed = 649366077121561602, 
  updated = 0, 
  flags = 0 '\000', 
  nud_state = 0 '\000', 
  type = 0 '\000', 
  dead = 0 '\000', 
  probes = {
    counter = 0
  }, 
  lock = {
    raw_lock = {
      lock = 16777216
    }
  }, 
  ha = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\0
00", 
  hh = 0x0, 
  refcnt = {
    counter = 0
  }, 
  output = 0, 
  arp_queue = {
    next = 0x10002, 
    prev = 0x0, 
    qlen = 1, 
    lock = {
      raw_lock = {
        slock = 0
      }
    }
  }, 
  timer = {
    entry = {
      next = 0x0, 
      prev = 0x0
    }, 
    expires = 0, 
    function = 0, 
    data = 0, 
    base = 0x0, 
    start_site = 0x0, 
    start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
    start_pid = 0
  }, 
  ops = 0x0, 
  primary_key = 0xffff8801cddf15f0 ""
}


crash> struct -o neighbour
struct neighbour {
    [0] struct neighbour *next;
    [8] struct neigh_table *tbl;
   [16] struct neigh_parms *parms;
   [24] struct net_device *dev;
   [32] long unsigned int used;
   [40] long unsigned int confirmed;
   [48] long unsigned int updated;
   [56] __u8 flags;
   [57] __u8 nud_state;
   [58] __u8 type;
   [59] __u8 dead;
   [60] atomic_t probes;
   [64] rwlock_t lock;
   [68] unsigned char ha[32];
  [104] struct hh_cache *hh;
  [112] atomic_t refcnt;
  [120] int (*output)(struct sk_buff *);
  [128] struct sk_buff_head arp_queue;
  [152] struct timer_list timer;
  [232] const struct neigh_ops *ops;
  [240] u8 primary_key[0];
}
SIZE: 240


> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18  8:19                       ` Ding Tianhong
@ 2013-12-18  8:41                         ` Hannes Frederic Sowa
  2013-12-18  8:57                           ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Frederic Sowa @ 2013-12-18  8:41 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: Eric Dumazet, David Miller, yoshfuji, joe, vfalico, netdev

On Wed, Dec 18, 2013 at 04:19:43PM +0800, Ding Tianhong wrote:
> 0xffffffff812f8e29 <neigh_timer_handler+265>:   mov    0xe8(%rbx),%rax
> 0xffffffff812f8e30 <neigh_timer_handler+272>:   mov    %rbp,%rsi
> 0xffffffff812f8e33 <neigh_timer_handler+275>:   mov    %rbx,%rdi
> 0xffffffff812f8e36 <neigh_timer_handler+278>:   callq  *0x8(%rax)						<-----crash
> /usr/src/linux/net/core/neighbour.c: 877
> 0xffffffff812f8e39 <neigh_timer_handler+281>:   lea    0x3c(%rbx),%rax

For me it looks like this:

%rax is neigh->ops and the function pointer solicit is NULL and causes the the
page fault.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18  8:41                         ` Hannes Frederic Sowa
@ 2013-12-18  8:57                           ` Ding Tianhong
  2013-12-18  9:28                             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-18  8:57 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, yoshfuji, joe, vfalico, netdev

On 2013/12/18 16:41, Hannes Frederic Sowa wrote:
> On Wed, Dec 18, 2013 at 04:19:43PM +0800, Ding Tianhong wrote:
>> 0xffffffff812f8e29 <neigh_timer_handler+265>:   mov    0xe8(%rbx),%rax
>> 0xffffffff812f8e30 <neigh_timer_handler+272>:   mov    %rbp,%rsi
>> 0xffffffff812f8e33 <neigh_timer_handler+275>:   mov    %rbx,%rdi
>> 0xffffffff812f8e36 <neigh_timer_handler+278>:   callq  *0x8(%rax)						<-----crash
>> /usr/src/linux/net/core/neighbour.c: 877
>> 0xffffffff812f8e39 <neigh_timer_handler+281>:   lea    0x3c(%rbx),%rax
> 
> For me it looks like this:
> 
> %rax is neigh->ops and the function pointer solicit is NULL and causes the the
> page fault.
> 
> 
yes, it is. So I was trying to find the situation that may free the neighbour when
the timer is running, but I could not yet.

Regards
Ding

> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18  8:57                           ` Ding Tianhong
@ 2013-12-18  9:28                             ` Hannes Frederic Sowa
  2013-12-18 10:02                               ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Frederic Sowa @ 2013-12-18  9:28 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: Eric Dumazet, David Miller, yoshfuji, joe, vfalico, netdev

On Wed, Dec 18, 2013 at 04:57:01PM +0800, Ding Tianhong wrote:
> On 2013/12/18 16:41, Hannes Frederic Sowa wrote:
> > On Wed, Dec 18, 2013 at 04:19:43PM +0800, Ding Tianhong wrote:
> >> 0xffffffff812f8e29 <neigh_timer_handler+265>:   mov    0xe8(%rbx),%rax
> >> 0xffffffff812f8e30 <neigh_timer_handler+272>:   mov    %rbp,%rsi
> >> 0xffffffff812f8e33 <neigh_timer_handler+275>:   mov    %rbx,%rdi
> >> 0xffffffff812f8e36 <neigh_timer_handler+278>:   callq  *0x8(%rax)						<-----crash
> >> /usr/src/linux/net/core/neighbour.c: 877
> >> 0xffffffff812f8e39 <neigh_timer_handler+281>:   lea    0x3c(%rbx),%rax
> > 
> > For me it looks like this:
> > 
> > %rax is neigh->ops and the function pointer solicit is NULL and causes the the
> > page fault.
> > 
> > 
> yes, it is. So I was trying to find the situation that may free the neighbour when
> the timer is running, but I could not yet.

Hm. Ok. It is actually ops which is NULL, not the function pointer, may bad.

Could you try to follow param or table links and check if this is an arp or
ndisc one? Maybe some interactions with arp.c or ndisc.c causes this bug?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18  9:28                             ` Hannes Frederic Sowa
@ 2013-12-18 10:02                               ` Ding Tianhong
  2013-12-18 10:21                                 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-18 10:02 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, yoshfuji, joe, vfalico, netdev

On 2013/12/18 17:28, Hannes Frederic Sowa wrote:
> On Wed, Dec 18, 2013 at 04:57:01PM +0800, Ding Tianhong wrote:
>> On 2013/12/18 16:41, Hannes Frederic Sowa wrote:
>>> On Wed, Dec 18, 2013 at 04:19:43PM +0800, Ding Tianhong wrote:
>>>> 0xffffffff812f8e29 <neigh_timer_handler+265>:   mov    0xe8(%rbx),%rax
>>>> 0xffffffff812f8e30 <neigh_timer_handler+272>:   mov    %rbp,%rsi
>>>> 0xffffffff812f8e33 <neigh_timer_handler+275>:   mov    %rbx,%rdi
>>>> 0xffffffff812f8e36 <neigh_timer_handler+278>:   callq  *0x8(%rax)						<-----crash
>>>> /usr/src/linux/net/core/neighbour.c: 877
>>>> 0xffffffff812f8e39 <neigh_timer_handler+281>:   lea    0x3c(%rbx),%rax
>>>
>>> For me it looks like this:
>>>
>>> %rax is neigh->ops and the function pointer solicit is NULL and causes the the
>>> page fault.
>>>
>>>
>> yes, it is. So I was trying to find the situation that may free the neighbour when
>> the timer is running, but I could not yet.
> 
> Hm. Ok. It is actually ops which is NULL, not the function pointer, may bad.
> 
> Could you try to follow param or table links and check if this is an arp or
> ndisc one? Maybe some interactions with arp.c or ndisc.c causes this bug?
> 
> 

David and Eric has said that someone may called neigh_release in a wrong place, I agree with that,
and review the code which calling the function in the kernel, I could not find any obvious problem,
and doubt with the situation:

CPU0					  CPU1					  CPU2
      --------		      			--------                		---------
neigh_timer_handler				
write_lock(n->lock);		
	...
write_unlock(n->lock);
n->ref_cnt = 2 or 3(if mode_time)				
	...					neigh_flush_dev
						write_lock(n->lock);
						n->ref_cnt = 2;
						n->nud_state = NUD_NONE;
						write_unlock(n->lock);
						neigh_release()
						n->ref_cnt = 1;
						...					neigh_periodic_work
											write_lock(n->lock);
											write_unlock(n->lock);
											neigh_release();
											kfree(n)
n->ops->solicit()									...
...


.

Regards
Ding


> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18 10:02                               ` Ding Tianhong
@ 2013-12-18 10:21                                 ` Hannes Frederic Sowa
  2013-12-18 11:57                                   ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Frederic Sowa @ 2013-12-18 10:21 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: Eric Dumazet, David Miller, yoshfuji, joe, vfalico, netdev

On Wed, Dec 18, 2013 at 06:02:33PM +0800, Ding Tianhong wrote:
> On 2013/12/18 17:28, Hannes Frederic Sowa wrote:
> > On Wed, Dec 18, 2013 at 04:57:01PM +0800, Ding Tianhong wrote:
> >> On 2013/12/18 16:41, Hannes Frederic Sowa wrote:
> >>> On Wed, Dec 18, 2013 at 04:19:43PM +0800, Ding Tianhong wrote:
> >>>> 0xffffffff812f8e29 <neigh_timer_handler+265>:   mov    0xe8(%rbx),%rax
> >>>> 0xffffffff812f8e30 <neigh_timer_handler+272>:   mov    %rbp,%rsi
> >>>> 0xffffffff812f8e33 <neigh_timer_handler+275>:   mov    %rbx,%rdi
> >>>> 0xffffffff812f8e36 <neigh_timer_handler+278>:   callq  *0x8(%rax)						<-----crash
> >>>> /usr/src/linux/net/core/neighbour.c: 877
> >>>> 0xffffffff812f8e39 <neigh_timer_handler+281>:   lea    0x3c(%rbx),%rax
> >>>
> >>> For me it looks like this:
> >>>
> >>> %rax is neigh->ops and the function pointer solicit is NULL and causes the the
> >>> page fault.
> >>>
> >>>
> >> yes, it is. So I was trying to find the situation that may free the neighbour when
> >> the timer is running, but I could not yet.
> > 
> > Hm. Ok. It is actually ops which is NULL, not the function pointer, may bad.
> > 
> > Could you try to follow param or table links and check if this is an arp or
> > ndisc one? Maybe some interactions with arp.c or ndisc.c causes this bug?
> > 
> > 
> 
> David and Eric has said that someone may called neigh_release in a wrong place, I agree with that,
> and review the code which calling the function in the kernel, I could not find any obvious problem,
> and doubt with the situation:

Maybe it could also be a reference count overflow and we wrap around to zero
again? Otherwise I agree, it really looks like this is the case.

> CPU0					  CPU1					  CPU2
>       --------		      			--------                		---------
> neigh_timer_handler				
> write_lock(n->lock);		
> 	...
> write_unlock(n->lock);
> n->ref_cnt = 2 or 3(if mode_time)				



> 	...					neigh_flush_dev
> 						write_lock(n->lock);
> 						n->ref_cnt = 2;
> 						n->nud_state = NUD_NONE;
> 						write_unlock(n->lock);
> 						neigh_release()
> 						n->ref_cnt = 1;
> 						...					neigh_periodic_work
> 											write_lock(n->lock);
> 											write_unlock(n->lock);
> 											neigh_release();
> 											kfree(n)
> n->ops->solicit()									...

On CPU0 the neigh_release happens after solicit. So the timer_handler should
still be guarded to not touch already freed memory. The table lock should make
sure that we either see a reference from the hash table or we don't (with
appropriate reference count). It looks consistent for me for now.

I guess you cannot reproduce this?

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18 10:21                                 ` Hannes Frederic Sowa
@ 2013-12-18 11:57                                   ` Ding Tianhong
  2013-12-18 14:27                                     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-18 11:57 UTC (permalink / raw)
  To: Ding Tianhong, Eric Dumazet, David Miller, yoshfuji, joe, vfalico,
	netdev

于 2013/12/18 18:21, Hannes Frederic Sowa 写道:
> On Wed, Dec 18, 2013 at 06:02:33PM +0800, Ding Tianhong wrote:
>> On 2013/12/18 17:28, Hannes Frederic Sowa wrote:
>>> On Wed, Dec 18, 2013 at 04:57:01PM +0800, Ding Tianhong wrote:
>>>> On 2013/12/18 16:41, Hannes Frederic Sowa wrote:
>>>>> On Wed, Dec 18, 2013 at 04:19:43PM +0800, Ding Tianhong wrote:
>>>>>> 0xffffffff812f8e29 <neigh_timer_handler+265>:   mov    0xe8(%rbx),%rax
>>>>>> 0xffffffff812f8e30 <neigh_timer_handler+272>:   mov    %rbp,%rsi
>>>>>> 0xffffffff812f8e33 <neigh_timer_handler+275>:   mov    %rbx,%rdi
>>>>>> 0xffffffff812f8e36 <neigh_timer_handler+278>:   callq  *0x8(%rax)						<-----crash
>>>>>> /usr/src/linux/net/core/neighbour.c: 877
>>>>>> 0xffffffff812f8e39 <neigh_timer_handler+281>:   lea    0x3c(%rbx),%rax
>>>>>
>>>>> For me it looks like this:
>>>>>
>>>>> %rax is neigh->ops and the function pointer solicit is NULL and causes the the
>>>>> page fault.
>>>>>
>>>>>
>>>> yes, it is. So I was trying to find the situation that may free the neighbour when
>>>> the timer is running, but I could not yet.
>>>
>>> Hm. Ok. It is actually ops which is NULL, not the function pointer, may bad.
>>>
>>> Could you try to follow param or table links and check if this is an arp or
>>> ndisc one? Maybe some interactions with arp.c or ndisc.c causes this bug?
>>>
>>>
>>
>> David and Eric has said that someone may called neigh_release in a wrong place, I agree with that,
>> and review the code which calling the function in the kernel, I could not find any obvious problem,
>> and doubt with the situation:
> 
> Maybe it could also be a reference count overflow and we wrap around to zero
> again? Otherwise I agree, it really looks like this is the case.
> 
>> CPU0					  CPU1					  CPU2
>>       --------		      			--------                		---------
>> neigh_timer_handler				
>> write_lock(n->lock);		
>> 	...
>> write_unlock(n->lock);
>> n->ref_cnt = 2 or 3(if mode_time)				
> 
> 
> 
>> 	...					neigh_flush_dev
>> 						write_lock(n->lock);
>> 						n->ref_cnt = 2;
>> 						n->nud_state = NUD_NONE;
>> 						write_unlock(n->lock);
>> 						neigh_release()
>> 						n->ref_cnt = 1;
>> 						...					neigh_periodic_work
>> 											write_lock(n->lock);
>> 											write_unlock(n->lock);
>> 											neigh_release();
>> 											kfree(n)
>> n->ops->solicit()									...
> 
> On CPU0 the neigh_release happens after solicit. So the timer_handler should
> still be guarded to not touch already freed memory. The table lock should make
> sure that we either see a reference from the hash table or we don't (with
> appropriate reference count). It looks consistent for me for now.
> 
> I guess you cannot reproduce this?
> 
> Greetings,
> 
>   Hannes
> 

yes, I cannot repruduce the bug again.

Regards
Ding
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18 11:57                                   ` Ding Tianhong
@ 2013-12-18 14:27                                     ` Hannes Frederic Sowa
  2013-12-18 15:12                                       ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Frederic Sowa @ 2013-12-18 14:27 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Ding Tianhong, Eric Dumazet, David Miller, yoshfuji, joe, vfalico,
	netdev

On Wed, Dec 18, 2013 at 07:57:40PM +0800, Ding Tianhong wrote:
> yes, I cannot repruduce the bug again.

Hmm, it actually seems hard to hit even if the race happens. Even if slab
poisoning is active it would only hit if ->solicit would be called again,
because that is the only pointer dereference directly used in the old memory.

neigh_alloc allocates memory with kzalloc, so it would null out that memory,
so the race would not only have to race with kfree, the memory needs to be
reallocated in the mean time.

I would suggest adding some poisoning manually in neigh_release before kfree
and check for this in all periodic called functions. Maybe we can see it
again?

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18 14:27                                     ` Hannes Frederic Sowa
@ 2013-12-18 15:12                                       ` Ding Tianhong
  2013-12-18 15:46                                         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 30+ messages in thread
From: Ding Tianhong @ 2013-12-18 15:12 UTC (permalink / raw)
  To: Ding Tianhong, Eric Dumazet, David Miller, yoshfuji, joe, vfalico,
	netdev

于 2013/12/18 22:27, Hannes Frederic Sowa 写道:
> On Wed, Dec 18, 2013 at 07:57:40PM +0800, Ding Tianhong wrote:
>> yes, I cannot repruduce the bug again.
> 
> Hmm, it actually seems hard to hit even if the race happens. Even if slab
> poisoning is active it would only hit if ->solicit would be called again,
> because that is the only pointer dereference directly used in the old memory.
> 
> neigh_alloc allocates memory with kzalloc, so it would null out that memory,
> so the race would not only have to race with kfree, the memory needs to be
> reallocated in the mean time.
> 
> I would suggest adding some poisoning manually in neigh_release before kfree
> and check for this in all periodic called functions. Maybe we can see it
> again?
> 
> Greetings,
> 
>   Hannes
> 

Great, thanks for your help, I think make the neigh_release not kfree neighbour until
the timer is over is a clear way to  fix this, maybe you could another idea, glad to
hear your opinion.

Regards
Ding

> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18 15:12                                       ` Ding Tianhong
@ 2013-12-18 15:46                                         ` Hannes Frederic Sowa
  2013-12-19  3:32                                           ` Ding Tianhong
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Frederic Sowa @ 2013-12-18 15:46 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Ding Tianhong, Eric Dumazet, David Miller, yoshfuji, joe, vfalico,
	netdev

On Wed, Dec 18, 2013 at 11:12:51PM +0800, Ding Tianhong wrote:
> 于 2013/12/18 22:27, Hannes Frederic Sowa 写道:
> > On Wed, Dec 18, 2013 at 07:57:40PM +0800, Ding Tianhong wrote:
> >> yes, I cannot repruduce the bug again.
> > 
> > Hmm, it actually seems hard to hit even if the race happens. Even if slab
> > poisoning is active it would only hit if ->solicit would be called again,
> > because that is the only pointer dereference directly used in the old memory.
> > 
> > neigh_alloc allocates memory with kzalloc, so it would null out that memory,
> > so the race would not only have to race with kfree, the memory needs to be
> > reallocated in the mean time.
> > 
> > I would suggest adding some poisoning manually in neigh_release before kfree
> > and check for this in all periodic called functions. Maybe we can see it
> > again?
> >
> Great, thanks for your help, I think make the neigh_release not kfree neighbour until
> the timer is over is a clear way to  fix this, maybe you could another idea, glad to
> hear your opinion.

But I don't suggest this as an fix, just as a help for debugging this issue.

Maybe you could also store the _RET_IP_ in the to be freed struct neighbour
(just before kfree) and thus have it available in case the machine panics (or
simply print it with printk).

Maybe it would make sense to use kmem_cache_create and kmem_cache_alloc for
struct neighs so we can better utilize the slub debugging features.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
  2013-12-18 15:46                                         ` Hannes Frederic Sowa
@ 2013-12-19  3:32                                           ` Ding Tianhong
  0 siblings, 0 replies; 30+ messages in thread
From: Ding Tianhong @ 2013-12-19  3:32 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Eric Dumazet, David Miller, yoshfuji, joe,
	vfalico, netdev

On 2013/12/18 23:46, Hannes Frederic Sowa wrote:
> On Wed, Dec 18, 2013 at 11:12:51PM +0800, Ding Tianhong wrote:
>> 于 2013/12/18 22:27, Hannes Frederic Sowa 写道:
>>> On Wed, Dec 18, 2013 at 07:57:40PM +0800, Ding Tianhong wrote:
>>>> yes, I cannot repruduce the bug again.
>>>
>>> Hmm, it actually seems hard to hit even if the race happens. Even if slab
>>> poisoning is active it would only hit if ->solicit would be called again,
>>> because that is the only pointer dereference directly used in the old memory.
>>>
>>> neigh_alloc allocates memory with kzalloc, so it would null out that memory,
>>> so the race would not only have to race with kfree, the memory needs to be
>>> reallocated in the mean time.
>>>
>>> I would suggest adding some poisoning manually in neigh_release before kfree
>>> and check for this in all periodic called functions. Maybe we can see it
>>> again?
>>>
>> Great, thanks for your help, I think make the neigh_release not kfree neighbour until
>> the timer is over is a clear way to  fix this, maybe you could another idea, glad to
>> hear your opinion.
> 
> But I don't suggest this as an fix, just as a help for debugging this issue.
> 
> Maybe you could also store the _RET_IP_ in the to be freed struct neighbour
> (just before kfree) and thus have it available in case the machine panics (or
> simply print it with printk).
> 
> Maybe it would make sense to use kmem_cache_create and kmem_cache_alloc for
> struct neighs so we can better utilize the slub debugging features.
> 
> Greetings,
> 
>   Hannes
> 
> 

Good idea, I will try it, but I still could not make it happen again.

I can repeat the process that the problem happed:

(1).A: xxx.xxx.xxx.83, B:xxx.xxx.xxx.84
(2). down A, B instead of A, ifconfig B xxx.xxx.xxx.83
(3).use "/sbin/arping -I %s -U -b -c 1 -w 4 %s "to tell vlan B is xxx.xxx.xxx.83,
(4). then it happened. 

Regards
Ding
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-12-19  3:39 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-03 13:48 [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler() Ding Tianhong
2013-12-03 15:03 ` Hannes Frederic Sowa
2013-12-04  1:36   ` Ding Tianhong
2013-12-03 16:28 ` Eric Dumazet
2013-12-04  1:59   ` Ding Tianhong
2013-12-03 16:37 ` David Miller
2013-12-04  2:37 ` Gao feng
2013-12-04  4:04   ` Ding Tianhong
2013-12-04  4:21     ` David Miller
2013-12-04  6:19       ` Ding Tianhong
2013-12-04  6:27         ` Eric Dumazet
2013-12-04  9:16           ` Ding Tianhong
2013-12-04 10:10             ` Gao feng
2013-12-04 15:24             ` Eric Dumazet
2013-12-05  0:32               ` Gao feng
2013-12-05  3:17                 ` Ding Tianhong
2013-12-18  6:37                   ` Ding Tianhong
2013-12-18  7:51                     ` Hannes Frederic Sowa
2013-12-18  8:19                       ` Ding Tianhong
2013-12-18  8:41                         ` Hannes Frederic Sowa
2013-12-18  8:57                           ` Ding Tianhong
2013-12-18  9:28                             ` Hannes Frederic Sowa
2013-12-18 10:02                               ` Ding Tianhong
2013-12-18 10:21                                 ` Hannes Frederic Sowa
2013-12-18 11:57                                   ` Ding Tianhong
2013-12-18 14:27                                     ` Hannes Frederic Sowa
2013-12-18 15:12                                       ` Ding Tianhong
2013-12-18 15:46                                         ` Hannes Frederic Sowa
2013-12-19  3:32                                           ` Ding Tianhong
2013-12-04  6:36         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).