All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ding Tianhong <dingtianhong@huawei.com>
To: Gao feng <gaofeng@cn.fujitsu.com>,
	"David S. Miller" <davem@davemloft.net>,
	YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>,
	"Joe Perches" <joe@perches.com>,
	Veaceslav Falico <vfalico@redhat.com>,
	Netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
Date: Wed, 4 Dec 2013 12:04:31 +0800	[thread overview]
Message-ID: <529EA9CF.2090008@huawei.com> (raw)
In-Reply-To: <529E9579.7090201@cn.fujitsu.com>

On 2013/12/4 10:37, Gao feng wrote:
> On 12/03/2013 09:48 PM, Ding Tianhong wrote:
>> I have met the oops in Suse11 SP2, the kernel is 2.6.32.59-0.7-default:
>>
>> [64306.089036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [64306.089343] IP: [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.089535] PGD 0
>> [64306.089706] Oops: 0000 [#1] SMP
>> [64306.089935] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:0/0:1:0:0/scsi_generic/sg0/dev
>> [64306.090142] Die func triggered, code:1
>> [64306.090147] CPU 1
>> [64306.090258] Supported: Yes, External
>> [64306.090262] Pid: 58359, comm: socknal_cd04 Tainted: P          N  2.6.32.59-0.7-default #1 T3500 G3
>> [64306.090266] RIP: 0010:[<ffffffff812f8e36>]  [<ffffffff812f8e36>] neigh_timer_handler+0x116/0x3b0
>> [64306.090272] RSP: 0018:ffff880c273499d8  EFLAGS: 00010206
>> [64306.090275] RAX: 0000000000000000 RBX: ffff8801cddf1500 RCX: ffff8801cddf14f2
>> [64306.090278] RDX: 0000000000000000 RSI: ffff8805e40d3a28 RDI: ffff8801cddf1500
>> [64306.090281] RBP: ffff8805e40d3a28 R08: ffff8801cddf1530 R09: ffff880c27349b17
>> [64306.090284] R10: 000000000000000e R11: ffffffff812f8e22 R12: ffff880185c0e840
>> [64306.090287] R13: 0000000000000000 R14: ffff8805e40d3a60 R15: 0000000003484560
>> [64306.090291] FS:  00007f081210e700(0000) GS:ffff880036420000(0000) knlGS:0000000000000000
>> [64306.090295] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [64306.090297] CR2: 0000000000000008 CR3: 0000000001804000 CR4: 00000000000406e0
>> [64306.090301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [64306.090304] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [64306.090308] Process socknal_cd04 (pid: 58359, threadinfo ffff880c27348000, task ffff880c25d02300)
>> [64306.090310] Stack:
>> [64306.090426]  ffffffff8131f8e0 0000000000000003 ffff8801c189eb40 000000000000000a
>> [64306.090437] <0> ffffffff00000000 0000000000000002 ffff880c27349a30 ffffffffa304790c
>> [64306.090444] <0> 0000000000000246 000051010a010000 ffffffff81318357 31312d3300007fff
>> [64306.090450] <0> ffff880c27349bb8 0248456003484560 ffff8801c189eb40 ffffffff81869300
>> [64306.090456] <0> ffff880185c0e840 ffff8801cddf1514 ffff8805e40d3a28 00000000000000d0
>> [64306.090464] Call Trace:
>>
>> --------------------------- cut here -------------------------------------
>>
>> I found the NULL place int the neigh_timer_handler, the neigh->ops is NULL,
>> so the calling of neigh->ops->solicit(neigh, skb) will panic, I found the
>> neigh has been freed via the kdump, the so I think the neigh was kfreed while
>> the neigh timer handler is running.
>>
>> The situation is that there are several server in the local lan:
>> A: 128.5.10.83
>> B: 128.5.10.85
>> C: 128.5.10.xx
>>
>> I panic the A by manual, and set B's IP to 128.5.10.83, so send broadcast to tell
>> the lan that B is 128.5.10.83, then the B panic, it is hard to appeared again, so
>> I only met once.
>>
>> I think the reason is that:
>>
>> 	CPU0					CPU1
>> 	-----					-----
>> 						<SOFTIRQ>
>> 						call_timer_fn();
>> 						base->running_timer = neigh->timer;
>> 						neigh_timer_handler();
>> 	neigh_release();
>> 	write_lock(&neigh->lock);
>> 	del_timer(neigh->timer);
>> 	write_unlock(&neigh->lock);
>> 						write_lock(&neigh->lock);
>> 	kfree(neigh);
>>  						neigh->ops->solicit();
>>
>> --------------------------------------------------------------------------
>>
>> The reason is : besides deactivating the timer it also makes sure the handler
>> has finished executing on other CPUs, But the del_timer() just only detach the
>> timer and not wait for the timer finish executing on other CPUs.
>>
> 
> the neigh->timer has the reference to the neigh, and this reference is
> released at the end of neigh_timer_handler. so the neigh should not be
> freed before the timer handler finished.
> 
> As your description, some logic ignores the reference of neigh and calls
> neigh_destroy directly.
> 
> You should find out the incorrect destroying of the neigh.
> 

The destroying neigh could be trigger by userspace, just like set the ip address which
in arp table to the local device ip, some I could not control it, it maybe anytime,
but the timer handler is execute by logic, this is normal, so I think the logic
is no problem, and the process of destroying neigh may conflict with the timer handler,
it is a synchronous problem to make sure the timer should be finished before the
reference neigh is freed.

Regards
Ding

  reply	other threads:[~2013-12-04  4:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03 13:48 [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler() Ding Tianhong
2013-12-03 15:03 ` Hannes Frederic Sowa
2013-12-04  1:36   ` Ding Tianhong
2013-12-03 16:28 ` Eric Dumazet
2013-12-04  1:59   ` Ding Tianhong
2013-12-03 16:37 ` David Miller
2013-12-04  2:37 ` Gao feng
2013-12-04  4:04   ` Ding Tianhong [this message]
2013-12-04  4:21     ` David Miller
2013-12-04  6:19       ` Ding Tianhong
2013-12-04  6:27         ` Eric Dumazet
2013-12-04  9:16           ` Ding Tianhong
2013-12-04 10:10             ` Gao feng
2013-12-04 15:24             ` Eric Dumazet
2013-12-05  0:32               ` Gao feng
2013-12-05  3:17                 ` Ding Tianhong
2013-12-18  6:37                   ` Ding Tianhong
2013-12-18  7:51                     ` Hannes Frederic Sowa
2013-12-18  8:19                       ` Ding Tianhong
2013-12-18  8:41                         ` Hannes Frederic Sowa
2013-12-18  8:57                           ` Ding Tianhong
2013-12-18  9:28                             ` Hannes Frederic Sowa
2013-12-18 10:02                               ` Ding Tianhong
2013-12-18 10:21                                 ` Hannes Frederic Sowa
2013-12-18 11:57                                   ` Ding Tianhong
2013-12-18 14:27                                     ` Hannes Frederic Sowa
2013-12-18 15:12                                       ` Ding Tianhong
2013-12-18 15:46                                         ` Hannes Frederic Sowa
2013-12-19  3:32                                           ` Ding Tianhong
2013-12-04  6:36         ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=529EA9CF.2090008@huawei.com \
    --to=dingtianhong@huawei.com \
    --cc=davem@davemloft.net \
    --cc=gaofeng@cn.fujitsu.com \
    --cc=joe@perches.com \
    --cc=netdev@vger.kernel.org \
    --cc=vfalico@redhat.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.