From: Ding Tianhong <dthxman@gmail.com>
To: Ding Tianhong <dingtianhong@huawei.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
David Miller <davem@davemloft.net>,
yoshfuji@linux-ipv6.org, joe@perches.com, vfalico@redhat.com,
netdev@vger.kernel.org
Subject: Re: [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler()
Date: Wed, 18 Dec 2013 19:57:40 +0800 [thread overview]
Message-ID: <52B18DB4.80403@gmail.com> (raw)
In-Reply-To: <20131218102132.GB3505@order.stressinduktion.org>
于 2013/12/18 18:21, Hannes Frederic Sowa 写道:
> On Wed, Dec 18, 2013 at 06:02:33PM +0800, Ding Tianhong wrote:
>> On 2013/12/18 17:28, Hannes Frederic Sowa wrote:
>>> On Wed, Dec 18, 2013 at 04:57:01PM +0800, Ding Tianhong wrote:
>>>> On 2013/12/18 16:41, Hannes Frederic Sowa wrote:
>>>>> On Wed, Dec 18, 2013 at 04:19:43PM +0800, Ding Tianhong wrote:
>>>>>> 0xffffffff812f8e29 <neigh_timer_handler+265>: mov 0xe8(%rbx),%rax
>>>>>> 0xffffffff812f8e30 <neigh_timer_handler+272>: mov %rbp,%rsi
>>>>>> 0xffffffff812f8e33 <neigh_timer_handler+275>: mov %rbx,%rdi
>>>>>> 0xffffffff812f8e36 <neigh_timer_handler+278>: callq *0x8(%rax) <-----crash
>>>>>> /usr/src/linux/net/core/neighbour.c: 877
>>>>>> 0xffffffff812f8e39 <neigh_timer_handler+281>: lea 0x3c(%rbx),%rax
>>>>>
>>>>> For me it looks like this:
>>>>>
>>>>> %rax is neigh->ops and the function pointer solicit is NULL and causes the the
>>>>> page fault.
>>>>>
>>>>>
>>>> yes, it is. So I was trying to find the situation that may free the neighbour when
>>>> the timer is running, but I could not yet.
>>>
>>> Hm. Ok. It is actually ops which is NULL, not the function pointer, may bad.
>>>
>>> Could you try to follow param or table links and check if this is an arp or
>>> ndisc one? Maybe some interactions with arp.c or ndisc.c causes this bug?
>>>
>>>
>>
>> David and Eric has said that someone may called neigh_release in a wrong place, I agree with that,
>> and review the code which calling the function in the kernel, I could not find any obvious problem,
>> and doubt with the situation:
>
> Maybe it could also be a reference count overflow and we wrap around to zero
> again? Otherwise I agree, it really looks like this is the case.
>
>> CPU0 CPU1 CPU2
>> -------- -------- ---------
>> neigh_timer_handler
>> write_lock(n->lock);
>> ...
>> write_unlock(n->lock);
>> n->ref_cnt = 2 or 3(if mode_time)
>
>
>
>> ... neigh_flush_dev
>> write_lock(n->lock);
>> n->ref_cnt = 2;
>> n->nud_state = NUD_NONE;
>> write_unlock(n->lock);
>> neigh_release()
>> n->ref_cnt = 1;
>> ... neigh_periodic_work
>> write_lock(n->lock);
>> write_unlock(n->lock);
>> neigh_release();
>> kfree(n)
>> n->ops->solicit() ...
>
> On CPU0 the neigh_release happens after solicit. So the timer_handler should
> still be guarded to not touch already freed memory. The table lock should make
> sure that we either see a reference from the hash table or we don't (with
> appropriate reference count). It looks consistent for me for now.
>
> I guess you cannot reproduce this?
>
> Greetings,
>
> Hannes
>
yes, I cannot repruduce the bug again.
Regards
Ding
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-12-18 12:09 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 13:48 [PATCH net] net: neighbour: add neighbour dead check for neigh_timer_handler() Ding Tianhong
2013-12-03 15:03 ` Hannes Frederic Sowa
2013-12-04 1:36 ` Ding Tianhong
2013-12-03 16:28 ` Eric Dumazet
2013-12-04 1:59 ` Ding Tianhong
2013-12-03 16:37 ` David Miller
2013-12-04 2:37 ` Gao feng
2013-12-04 4:04 ` Ding Tianhong
2013-12-04 4:21 ` David Miller
2013-12-04 6:19 ` Ding Tianhong
2013-12-04 6:27 ` Eric Dumazet
2013-12-04 9:16 ` Ding Tianhong
2013-12-04 10:10 ` Gao feng
2013-12-04 15:24 ` Eric Dumazet
2013-12-05 0:32 ` Gao feng
2013-12-05 3:17 ` Ding Tianhong
2013-12-18 6:37 ` Ding Tianhong
2013-12-18 7:51 ` Hannes Frederic Sowa
2013-12-18 8:19 ` Ding Tianhong
2013-12-18 8:41 ` Hannes Frederic Sowa
2013-12-18 8:57 ` Ding Tianhong
2013-12-18 9:28 ` Hannes Frederic Sowa
2013-12-18 10:02 ` Ding Tianhong
2013-12-18 10:21 ` Hannes Frederic Sowa
2013-12-18 11:57 ` Ding Tianhong [this message]
2013-12-18 14:27 ` Hannes Frederic Sowa
2013-12-18 15:12 ` Ding Tianhong
2013-12-18 15:46 ` Hannes Frederic Sowa
2013-12-19 3:32 ` Ding Tianhong
2013-12-04 6:36 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52B18DB4.80403@gmail.com \
--to=dthxman@gmail.com \
--cc=davem@davemloft.net \
--cc=dingtianhong@huawei.com \
--cc=eric.dumazet@gmail.com \
--cc=joe@perches.com \
--cc=netdev@vger.kernel.org \
--cc=vfalico@redhat.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.