* neigh use-after-free
@ 2015-04-03 20:27 Jörn Engel
2015-04-03 20:52 ` Alexei Potashnik
0 siblings, 1 reply; 10+ messages in thread
From: Jörn Engel @ 2015-04-03 20:27 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Alexei Potashnik
[-- Attachment #1: Type: text/plain, Size: 5388 bytes --]
After hunting down a mystery crash in the timer code, we added a debug
patch (attached) and got two useful backtraces.
[ 4974.815900] ------------[ cut here ]------------
[ 4974.815914] WARNING: at include/net/neighbour.h:315 neigh_hold.part.26+0x1e/0x27()
[ 4974.815987] CPU: 47 PID: 9066 Comm: iscsi_ttx-8 Tainted: G O 3.10.59+
[ 4974.815989] Hardware name: /0X3D66, BIOS 2.0.19 08/29/2013
[ 4974.815990] 0000000000000009 ffff884d472e1828 ffffffff8163baae ffff884d472e1860
[ 4974.815998] ffffffff8103f141 ffff884e267e0c00 ffff884d55121ef0 ffff884e267e0c28
[ 4974.816001] 0000000000000000 ffff887f0aad2b58 ffff884d472e1870 ffffffff8103f21a
[ 4974.816004] Call Trace:
[ 4974.816012] [<ffffffff8163baae>] dump_stack+0x19/0x1b
[ 4974.816017] [<ffffffff8103f141>] warn_slowpath_common+0x61/0x80
[ 4974.816019] [<ffffffff8103f21a>] warn_slowpath_null+0x1a/0x20
[ 4974.816021] [<ffffffff8163de95>] neigh_hold.part.26+0x1e/0x27
[ 4974.816027] [<ffffffff8155f08c>] neigh_add_timer+0x3c/0x60
[ 4974.816029] [<ffffffff8155f1ab>] __neigh_event_send+0xfb/0x220
[ 4974.816031] [<ffffffff8155f40b>] neigh_resolve_output+0x13b/0x220
[ 4974.816038] [<ffffffff8158c950>] ip_finish_output+0x1b0/0x3b0
[ 4974.816040] [<ffffffff8158ddd8>] ip_output+0x58/0x90
[ 4974.816042] [<ffffffff8158d5a5>] ip_local_out+0x25/0x30
[ 4974.816045] [<ffffffff8158d8f7>] ip_queue_xmit+0x137/0x380
[ 4974.816051] [<ffffffff815a47a5>] tcp_transmit_skb+0x445/0x8a0
[ 4974.816054] [<ffffffff815a4d40>] tcp_write_xmit+0x140/0xb00
[ 4974.816058] [<ffffffff815a59ae>] __tcp_push_pending_frames+0x2e/0xc0
[ 4974.816062] [<ffffffff81597198>] tcp_sendmsg+0x118/0xd90
[ 4974.816070] [<ffffffff81278b55>] ? debug_object_deactivate+0x115/0x170
[ 4974.816076] [<ffffffff815bf434>] inet_sendmsg+0x64/0xb0
[ 4974.816080] [<ffffffff8153da56>] sock_sendmsg+0x76/0x90
[ 4974.816086] [<ffffffff81046e89>] ? local_bh_enable_ip+0x89/0xf0
[ 4974.816092] [<ffffffff81641d75>] ? _raw_spin_lock_irqsave+0x25/0x60
[ 4974.816095] [<ffffffff8153daa7>] kernel_sendmsg+0x37/0x50
[ 4974.816106] [<ffffffffa06e6536>] tx_data+0xb6/0x160 [iscsi_target_mod]
[ 4974.816111] [<ffffffffa06e661a>] iscsit_send_tx_data+0x3a/0x90 [iscsi_target_mod]
[ 4974.816116] [<ffffffffa06e899d>] iscsit_send_unsolicited_nopin+0x7d/0x170 [iscsi_target_mod]
[ 4974.816121] [<ffffffffa06e92c5>] iscsit_immediate_queue+0x355/0x4c0 [iscsi_target_mod]
[ 4974.816126] [<ffffffffa06ebb08>] ? iscsi_target_tx_thread+0xe8/0x240 [iscsi_target_mod]
[ 4974.816131] [<ffffffffa06ebb16>] iscsi_target_tx_thread+0xf6/0x240 [iscsi_target_mod]
[ 4974.816135] [<ffffffff810657e0>] ? wake_up_bit+0x30/0x30
[ 4974.816140] [<ffffffffa06eba20>] ? iscsit_thread_get_cpumask+0x30/0x30 [iscsi_target_mod]
[ 4974.816142] [<ffffffff81064aa0>] kthread+0xc0/0xd0
[ 4974.816145] [<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.816150] [<ffffffff8164a56c>] ret_from_fork+0x7c/0xb0
[ 4974.816152] [<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.816153] ---[ end trace 1e6b1f72dd5d5dc7 ]---
[ 4974.885829] ------------[ cut here ]------------
[ 4974.885841] WARNING: at lib/debugobjects.c:260 debug_print_object+0x83/0xa0()
[ 4974.885846] ODEBUG: free active (active state 0) object type: timer_list hint: neigh_timer_handler+0x0/0x310
[ 4974.885920] CPU: 46 PID: 240 Comm: ksoftirqd/46 Tainted: G W O 3.10.59+
[ 4974.885923] 0000000000000009 ffff883f268edc18 ffffffff8163baae ffff883f268edc50
[ 4974.885927] ffffffff8103f141 ffff884d77102f28 ffffffff81e356e0 ffffffff81b70b74
[ 4974.885932] ffffffff83113a10 0000000000000001 ffff883f268edcb0 ffffffff8103f1ac
[ 4974.885935] Call Trace:
[ 4974.885942] [<ffffffff8163baae>] dump_stack+0x19/0x1b
[ 4974.885946] [<ffffffff8103f141>] warn_slowpath_common+0x61/0x80
[ 4974.885948] [<ffffffff8103f1ac>] warn_slowpath_fmt+0x4c/0x50
[ 4974.885950] [<ffffffff81278233>] debug_print_object+0x83/0xa0
[ 4974.885953] [<ffffffff815604d0>] ? neigh_periodic_work+0x1f0/0x1f0
[ 4974.885956] [<ffffffff8127936b>] debug_check_no_obj_freed+0x20b/0x250
[ 4974.885959] [<ffffffff810bd901>] ? rcu_process_callbacks+0x261/0x5f0
[ 4974.885963] [<ffffffff81142840>] kfree+0x90/0x160
[ 4974.885965] [<ffffffff810bd901>] rcu_process_callbacks+0x261/0x5f0
[ 4974.885969] [<ffffffff81047e3f>] __do_softirq+0xff/0x250
[ 4974.885971] [<ffffffff81047fcd>] run_ksoftirqd+0x3d/0x60
[ 4974.885974] [<ffffffff8106c95f>] smpboot_thread_fn+0xff/0x1a0
[ 4974.885977] [<ffffffff8106c860>] ? lg_local_lock_cpu+0x40/0x40
[ 4974.885980] [<ffffffff81064aa0>] kthread+0xc0/0xd0
[ 4974.885983] [<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.885988] [<ffffffff8164a56c>] ret_from_fork+0x7c/0xb0
[ 4974.885991] [<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.885992] ---[ end trace 1e6b1f72dd5d5dc8 ]---
neigh_add_timer() takes a reference on an object that already had a
refcount of zero, so neigh_destroy() was called and, module an rcu
grace period, the object has been freed. That's bad.
Alexei suspects the problem might be in ip_finish_output2().
__ipv4_neigh_lookup_noref() is done inside rcu_read_lock_bh(), so it
should be safe. Unless we schedule a timer and therefore access the
object after rcu_read_unlock_bh().
Maybe someone more familiar with the code has an idea?
Jörn
--
Money can buy bandwidth, but latency is forever.
-- John R. Mashey
[-- Attachment #2: net-Check-for-neighbor-refcount-bugs.patch --]
[-- Type: text/x-diff, Size: 1421 bytes --]
>From c61f1fd27e19d6614dcb381b4d8e7bac0429e412 Mon Sep 17 00:00:00 2001
From: Joern Engel <joern@logfs.org>
Date: Thu, 2 Apr 2015 15:19:57 -0700
Subject: [PATCH] net: Check for neighbor refcount bugs
Debugging why we crash in the timer softirq without this is pretty
painful. Should be low-overhead and give a more meaningful backtrace.
Signed-off-by: Joern Engel <joern@logfs.org>
---
include/net/neighbour.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 7e748ad8b50c..a19d02bb336c 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -297,8 +297,10 @@ static inline struct neigh_parms *neigh_parms_clone(struct neigh_parms *parms)
static inline void neigh_release(struct neighbour *neigh)
{
- if (atomic_dec_and_test(&neigh->refcnt))
+ if (atomic_dec_and_test(&neigh->refcnt)) {
+ WARN_ON_ONCE(timer_pending(&neigh->timer));
neigh_destroy(neigh);
+ }
}
static inline struct neighbour * neigh_clone(struct neighbour *neigh)
@@ -308,7 +310,10 @@ static inline struct neighbour * neigh_clone(struct neighbour *neigh)
return neigh;
}
-#define neigh_hold(n) atomic_inc(&(n)->refcnt)
+static inline void neigh_hold(struct neighbour *neigh)
+{
+ WARN_ON_ONCE(atomic_inc_return(&neigh->refcnt) == 1);
+}
static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
{
--
2.1.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: neigh use-after-free
2015-04-03 20:27 neigh use-after-free Jörn Engel
@ 2015-04-03 20:52 ` Alexei Potashnik
2015-04-03 23:34 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Alexei Potashnik @ 2015-04-03 20:52 UTC (permalink / raw)
To: Joern Engel, David S. Miller; +Cc: netdev
Would this be an appropriate solution:
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index b49e8ba..38265f2 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -168,7 +168,8 @@ static int neigh_forced_gc(struct neigh_table *tbl)
static void neigh_add_timer(struct neighbour *n, unsigned long when)
{
- neigh_hold(n);
+ if (!atomic_inc_not_zero(&n->refcnt))
+ return;
if (unlikely(mod_timer(&n->timer, when))) {
printk("NEIGH: BUG, double timer add, state is %x\n",
n->nud_state);
Alexei
-----Original Message-----
From: Jörn Engel [mailto:joern@purestorage.com]
Sent: Friday, April 03, 2015 1:28 PM
To: David S. Miller
Cc: netdev@vger.kernel.org; Alexei Potashnik
Subject: neigh use-after-free
After hunting down a mystery crash in the timer code, we added a debug
patch (attached) and got two useful backtraces.
[ 4974.815900] ------------[ cut here ]------------ [ 4974.815914]
WARNING: at include/net/neighbour.h:315 neigh_hold.part.26+0x1e/0x27()
[ 4974.815987] CPU: 47 PID: 9066 Comm: iscsi_ttx-8 Tainted: G O
3.10.59+
[ 4974.815989] Hardware name: /0X3D66, BIOS 2.0.19 08/29/2013
[ 4974.815990] 0000000000000009 ffff884d472e1828 ffffffff8163baae
ffff884d472e1860 [ 4974.815998] ffffffff8103f141 ffff884e267e0c00
ffff884d55121ef0 ffff884e267e0c28 [ 4974.816001] 0000000000000000
ffff887f0aad2b58 ffff884d472e1870 ffffffff8103f21a [ 4974.816004] Call
Trace:
[ 4974.816012] [<ffffffff8163baae>] dump_stack+0x19/0x1b [ 4974.816017]
[<ffffffff8103f141>] warn_slowpath_common+0x61/0x80 [ 4974.816019]
[<ffffffff8103f21a>] warn_slowpath_null+0x1a/0x20 [ 4974.816021]
[<ffffffff8163de95>] neigh_hold.part.26+0x1e/0x27 [ 4974.816027]
[<ffffffff8155f08c>] neigh_add_timer+0x3c/0x60 [ 4974.816029]
[<ffffffff8155f1ab>] __neigh_event_send+0xfb/0x220 [ 4974.816031]
[<ffffffff8155f40b>] neigh_resolve_output+0x13b/0x220 [ 4974.816038]
[<ffffffff8158c950>] ip_finish_output+0x1b0/0x3b0 [ 4974.816040]
[<ffffffff8158ddd8>] ip_output+0x58/0x90 [ 4974.816042]
[<ffffffff8158d5a5>] ip_local_out+0x25/0x30 [ 4974.816045]
[<ffffffff8158d8f7>] ip_queue_xmit+0x137/0x380 [ 4974.816051]
[<ffffffff815a47a5>] tcp_transmit_skb+0x445/0x8a0 [ 4974.816054]
[<ffffffff815a4d40>] tcp_write_xmit+0x140/0xb00 [ 4974.816058]
[<ffffffff815a59ae>] __tcp_push_pending_frames+0x2e/0xc0
[ 4974.816062] [<ffffffff81597198>] tcp_sendmsg+0x118/0xd90 [
4974.816070] [<ffffffff81278b55>] ? debug_object_deactivate+0x115/0x170
[ 4974.816076] [<ffffffff815bf434>] inet_sendmsg+0x64/0xb0 [ 4974.816080]
[<ffffffff8153da56>] sock_sendmsg+0x76/0x90 [ 4974.816086]
[<ffffffff81046e89>] ? local_bh_enable_ip+0x89/0xf0 [ 4974.816092]
[<ffffffff81641d75>] ? _raw_spin_lock_irqsave+0x25/0x60 [ 4974.816095]
[<ffffffff8153daa7>] kernel_sendmsg+0x37/0x50 [ 4974.816106]
[<ffffffffa06e6536>] tx_data+0xb6/0x160 [iscsi_target_mod] [ 4974.816111]
[<ffffffffa06e661a>] iscsit_send_tx_data+0x3a/0x90 [iscsi_target_mod] [
4974.816116] [<ffffffffa06e899d>]
iscsit_send_unsolicited_nopin+0x7d/0x170 [iscsi_target_mod] [ 4974.816121]
[<ffffffffa06e92c5>] iscsit_immediate_queue+0x355/0x4c0 [iscsi_target_mod]
[ 4974.816126] [<ffffffffa06ebb08>] ? iscsi_target_tx_thread+0xe8/0x240
[iscsi_target_mod] [ 4974.816131] [<ffffffffa06ebb16>]
iscsi_target_tx_thread+0xf6/0x240 [iscsi_target_mod] [ 4974.816135]
[<ffffffff810657e0>] ? wake_up_bit+0x30/0x30 [ 4974.816140]
[<ffffffffa06eba20>] ? iscsit_thread_get_cpumask+0x30/0x30
[iscsi_target_mod] [ 4974.816142] [<ffffffff81064aa0>] kthread+0xc0/0xd0
[ 4974.816145] [<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.816150] [<ffffffff8164a56c>] ret_from_fork+0x7c/0xb0 [
4974.816152] [<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.816153] ---[ end trace 1e6b1f72dd5d5dc7 ]--- [ 4974.885829]
------------[ cut here ]------------ [ 4974.885841] WARNING: at
lib/debugobjects.c:260 debug_print_object+0x83/0xa0() [ 4974.885846]
ODEBUG: free active (active state 0) object type: timer_list hint:
neigh_timer_handler+0x0/0x310
[ 4974.885920] CPU: 46 PID: 240 Comm: ksoftirqd/46 Tainted: G W O
3.10.59+
[ 4974.885923] 0000000000000009 ffff883f268edc18 ffffffff8163baae
ffff883f268edc50 [ 4974.885927] ffffffff8103f141 ffff884d77102f28
ffffffff81e356e0 ffffffff81b70b74 [ 4974.885932] ffffffff83113a10
0000000000000001 ffff883f268edcb0 ffffffff8103f1ac [ 4974.885935] Call
Trace:
[ 4974.885942] [<ffffffff8163baae>] dump_stack+0x19/0x1b [ 4974.885946]
[<ffffffff8103f141>] warn_slowpath_common+0x61/0x80 [ 4974.885948]
[<ffffffff8103f1ac>] warn_slowpath_fmt+0x4c/0x50 [ 4974.885950]
[<ffffffff81278233>] debug_print_object+0x83/0xa0 [ 4974.885953]
[<ffffffff815604d0>] ? neigh_periodic_work+0x1f0/0x1f0 [ 4974.885956]
[<ffffffff8127936b>] debug_check_no_obj_freed+0x20b/0x250
[ 4974.885959] [<ffffffff810bd901>] ? rcu_process_callbacks+0x261/0x5f0 [
4974.885963] [<ffffffff81142840>] kfree+0x90/0x160 [ 4974.885965]
[<ffffffff810bd901>] rcu_process_callbacks+0x261/0x5f0 [ 4974.885969]
[<ffffffff81047e3f>] __do_softirq+0xff/0x250 [ 4974.885971]
[<ffffffff81047fcd>] run_ksoftirqd+0x3d/0x60 [ 4974.885974]
[<ffffffff8106c95f>] smpboot_thread_fn+0xff/0x1a0 [ 4974.885977]
[<ffffffff8106c860>] ? lg_local_lock_cpu+0x40/0x40 [ 4974.885980]
[<ffffffff81064aa0>] kthread+0xc0/0xd0 [ 4974.885983]
[<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.885988] [<ffffffff8164a56c>] ret_from_fork+0x7c/0xb0 [
4974.885991] [<ffffffff810649e0>] ? kthread_create_on_node+0x120/0x120
[ 4974.885992] ---[ end trace 1e6b1f72dd5d5dc8 ]---
neigh_add_timer() takes a reference on an object that already had a
refcount of zero, so neigh_destroy() was called and, module an rcu grace
period, the object has been freed. That's bad.
Alexei suspects the problem might be in ip_finish_output2().
__ipv4_neigh_lookup_noref() is done inside rcu_read_lock_bh(), so it
should be safe. Unless we schedule a timer and therefore access the
object after rcu_read_unlock_bh().
Maybe someone more familiar with the code has an idea?
Jörn
--
Money can buy bandwidth, but latency is forever.
-- John R. Mashey
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: neigh use-after-free
2015-04-03 20:52 ` Alexei Potashnik
@ 2015-04-03 23:34 ` Eric Dumazet
2015-04-04 0:27 ` Alexei Potashnik
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-04-03 23:34 UTC (permalink / raw)
To: Alexei Potashnik; +Cc: Joern Engel, David S. Miller, netdev
On Fri, 2015-04-03 at 13:52 -0700, Alexei Potashnik wrote:
> Would this be an appropriate solution:
>
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index b49e8ba..38265f2 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -168,7 +168,8 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>
> static void neigh_add_timer(struct neighbour *n, unsigned long when)
> {
> - neigh_hold(n);
> + if (!atomic_inc_not_zero(&n->refcnt))
> + return;
> if (unlikely(mod_timer(&n->timer, when))) {
> printk("NEIGH: BUG, double timer add, state is %x\n",
> n->nud_state);
This is a very ugly hack. Please find root cause.
The correct way to implement refcount on a timer is :
if (!mod_timer(&n->timer, when))
neigh_hold(&n->refcnt);
And current code seems to do that (It dumps a printk() and stack trace
otherwise)
Look at sk_reset_timer() for a good example.
Caller of this function must own a reference.
(Or the socket lock in case of sk_reset_timer())
If refcnt is 0 at this point, it means there is another bug, and we
should fix it instead of trying to work around it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: neigh use-after-free
2015-04-03 23:34 ` Eric Dumazet
@ 2015-04-04 0:27 ` Alexei Potashnik
2015-04-04 0:44 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Alexei Potashnik @ 2015-04-04 0:27 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Joern Engel, David S. Miller, netdev
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>
> On Fri, 2015-04-03 at 13:52 -0700, Alexei Potashnik wrote:
> > Would this be an appropriate solution:
> >
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c index
> > b49e8ba..38265f2 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -168,7 +168,8 @@ static int neigh_forced_gc(struct neigh_table
> > *tbl)
> >
> > static void neigh_add_timer(struct neighbour *n, unsigned long when)
> > {
> > - neigh_hold(n);
> > + if (!atomic_inc_not_zero(&n->refcnt))
> > + return;
> > if (unlikely(mod_timer(&n->timer, when))) {
> > printk("NEIGH: BUG, double timer add, state is %x\n",
> > n->nud_state);
>
> This is a very ugly hack. Please find root cause.
>
> The correct way to implement refcount on a timer is :
>
> if (!mod_timer(&n->timer, when))
> neigh_hold(&n->refcnt);
>
> And current code seems to do that (It dumps a printk() and stack trace
> otherwise)
The printk() and stack trace don't get printed because this is a different
case --
it’s not double timer add, but rather add of the timer after neighbor object
is deleted.
> If refcnt is 0 at this point, it means there is another bug, and we should
> fix it
> instead of trying to work around it.
I agree. And it seems to be clear what code does it:
ip_finish_output2
+-> __ipv4_neigh_lookup_noref
Lookup name is clearly indicating that noref is intentional. What's not
clear is why.
The change seems to have originated in
commit a263b3093641fb1ec377582c90986a7fd0625184
Author: David S. Miller <davem@davemloft.net>
Date: Mon Jul 2 02:02:15 2012
ipv4: Make neigh lookups directly in output packet path.
Do not use the dst cached neigh, we'll be getting rid of that.
Signed-off-by: David S. Miller <davem@davemloft.net>
Before that neigh object came from dst_get_neighbour_noref(), which used to
return dst->_neighbour. Now, if dst->_neighbour used own a reference, than
the
above commit has introduced the bug. Otherwise, the issue has always been
there
and, perhaps, David Miller can explain why it was deemed safe to operate on
neigh
object here without extra ref.
Alexei
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: neigh use-after-free
2015-04-04 0:27 ` Alexei Potashnik
@ 2015-04-04 0:44 ` Eric Dumazet
2015-04-04 0:57 ` Alexei Potashnik
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-04-04 0:44 UTC (permalink / raw)
To: Alexei Potashnik; +Cc: Joern Engel, David S. Miller, netdev
On Fri, 2015-04-03 at 17:27 -0700, Alexei Potashnik wrote:
> > -----Original Message-----
> > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> >
> > On Fri, 2015-04-03 at 13:52 -0700, Alexei Potashnik wrote:
> > > Would this be an appropriate solution:
> > >
> > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c index
> > > b49e8ba..38265f2 100644
> > > --- a/net/core/neighbour.c
> > > +++ b/net/core/neighbour.c
> > > @@ -168,7 +168,8 @@ static int neigh_forced_gc(struct neigh_table
> > > *tbl)
> > >
> > > static void neigh_add_timer(struct neighbour *n, unsigned long when)
> > > {
> > > - neigh_hold(n);
> > > + if (!atomic_inc_not_zero(&n->refcnt))
> > > + return;
> > > if (unlikely(mod_timer(&n->timer, when))) {
> > > printk("NEIGH: BUG, double timer add, state is %x\n",
> > > n->nud_state);
> >
> > This is a very ugly hack. Please find root cause.
> >
> > The correct way to implement refcount on a timer is :
> >
> > if (!mod_timer(&n->timer, when))
> > neigh_hold(&n->refcnt);
> >
> > And current code seems to do that (It dumps a printk() and stack trace
> > otherwise)
>
> The printk() and stack trace don't get printed because this is a different
> case --
> it’s not double timer add, but rather add of the timer after neighbor object
> is deleted.
>
> > If refcnt is 0 at this point, it means there is another bug, and we should
> > fix it
> > instead of trying to work around it.
>
> I agree. And it seems to be clear what code does it:
>
> ip_finish_output2
> +-> __ipv4_neigh_lookup_noref
>
> Lookup name is clearly indicating that noref is intentional. What's not
> clear is why.
> The change seems to have originated in
>
> commit a263b3093641fb1ec377582c90986a7fd0625184
> Author: David S. Miller <davem@davemloft.net>
> Date: Mon Jul 2 02:02:15 2012
>
> ipv4: Make neigh lookups directly in output packet path.
>
> Do not use the dst cached neigh, we'll be getting rid of that.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Before that neigh object came from dst_get_neighbour_noref(), which used to
> return dst->_neighbour. Now, if dst->_neighbour used own a reference, than
> the
> above commit has introduced the bug. Otherwise, the issue has always been
> there
> and, perhaps, David Miller can explain why it was deemed safe to operate on
> neigh
> object here without extra ref.
>
> Alexei
Your debugging code might be wrong.
At entry of neigh_add_timer(), n->refcnt can be 0 if caller holds
the correct lock, preventing the timer handler to proceed and decrement
refcnt.
neigh_timer_handler() does :
write_lock(&neigh->lock);
...
write_unlock(&neigh->lock);
neigh_release(neigh);
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: neigh use-after-free
2015-04-04 0:44 ` Eric Dumazet
@ 2015-04-04 0:57 ` Alexei Potashnik
2015-04-04 1:10 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Alexei Potashnik @ 2015-04-04 0:57 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Joern Engel, David S. Miller, netdev
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, April 03, 2015 5:44 PM
> To: Alexei Potashnik
> Cc: Joern Engel; David S. Miller; netdev@vger.kernel.org
> Subject: Re: neigh use-after-free
>
> On Fri, 2015-04-03 at 17:27 -0700, Alexei Potashnik wrote:
> > > -----Original Message-----
> > > From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> > >
> > > On Fri, 2015-04-03 at 13:52 -0700, Alexei Potashnik wrote:
> > > > Would this be an appropriate solution:
> > > >
> > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c index
> > > > b49e8ba..38265f2 100644
> > > > --- a/net/core/neighbour.c
> > > > +++ b/net/core/neighbour.c
> > > > @@ -168,7 +168,8 @@ static int neigh_forced_gc(struct neigh_table
> > > > *tbl)
> > > >
> > > > static void neigh_add_timer(struct neighbour *n, unsigned long
> > > > when) {
> > > > - neigh_hold(n);
> > > > + if (!atomic_inc_not_zero(&n->refcnt))
> > > > + return;
> > > > if (unlikely(mod_timer(&n->timer, when))) {
> > > > printk("NEIGH: BUG, double timer add, state is
> > > > %x\n",
> > > > n->nud_state);
> > >
> > > This is a very ugly hack. Please find root cause.
> > >
> > > The correct way to implement refcount on a timer is :
> > >
> > > if (!mod_timer(&n->timer, when))
> > > neigh_hold(&n->refcnt);
> > >
> > > And current code seems to do that (It dumps a printk() and stack
> > > trace
> > > otherwise)
> >
> > The printk() and stack trace don't get printed because this is a
> > different case -- it’s not double timer add, but rather add of the
> > timer after neighbor object is deleted.
> >
> > > If refcnt is 0 at this point, it means there is another bug, and we
> > > should fix it instead of trying to work around it.
> >
> > I agree. And it seems to be clear what code does it:
> >
> > ip_finish_output2
> > +-> __ipv4_neigh_lookup_noref
> >
> > Lookup name is clearly indicating that noref is intentional. What's
> > not clear is why.
> > The change seems to have originated in
> >
> > commit a263b3093641fb1ec377582c90986a7fd0625184
> > Author: David S. Miller <davem@davemloft.net>
> > Date: Mon Jul 2 02:02:15 2012
> >
> > ipv4: Make neigh lookups directly in output packet path.
> >
> > Do not use the dst cached neigh, we'll be getting rid of that.
> >
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > Before that neigh object came from dst_get_neighbour_noref(), which
> > used to return dst->_neighbour. Now, if dst->_neighbour used own a
> > reference, than the above commit has introduced the bug. Otherwise,
> > the issue has always been there and, perhaps, David Miller can explain
> > why it was deemed safe to operate on neigh object here without extra
> > ref.
> >
> > Alexei
>
> Your debugging code might be wrong.
>
> At entry of neigh_add_timer(), n->refcnt can be 0 if caller holds the
> correct
> lock, preventing the timer handler to proceed and decrement refcnt.
>
> neigh_timer_handler() does :
>
> write_lock(&neigh->lock);
> ...
> write_unlock(&neigh->lock);
> neigh_release(neigh);
>
My understanding of the state when issue occurs is:
- neigh_destroy() has finished executing
- no timer for this neigh entry is pending or running
(which makes neigh_timer_handler behavior irrelevant)
- but kfree_rcu callback for the neigh entry has not executed yet
- at this point IP code finds the entry with __ipv4_neigh_lookup_noref
and starts operating on it.
Alexei
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: neigh use-after-free
2015-04-04 0:57 ` Alexei Potashnik
@ 2015-04-04 1:10 ` Eric Dumazet
2015-04-04 1:20 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-04-04 1:10 UTC (permalink / raw)
To: Alexei Potashnik; +Cc: Joern Engel, David S. Miller, netdev
On Fri, 2015-04-03 at 17:57 -0700, Alexei Potashnik wrote:
>
> My understanding of the state when issue occurs is:
> - neigh_destroy() has finished executing
> - no timer for this neigh entry is pending or running
> (which makes neigh_timer_handler behavior irrelevant)
> - but kfree_rcu callback for the neigh entry has not executed yet
> - at this point IP code finds the entry with __ipv4_neigh_lookup_noref
> and starts operating on it.
>
Right. So far no rule here seems to be violated.
As long as IP code finds a neighbor, it is allowed to use it.
Freeing of the neighbor should not happen before rcu grace period.
Problem is __neigh_event_send() is ignoring n->dead
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: neigh use-after-free
2015-04-04 1:10 ` Eric Dumazet
@ 2015-04-04 1:20 ` Eric Dumazet
2015-04-04 1:32 ` Alexei Potashnik
2015-04-04 16:06 ` Julian Anastasov
0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-04-04 1:20 UTC (permalink / raw)
To: Alexei Potashnik; +Cc: Joern Engel, David S. Miller, netdev
On Fri, 2015-04-03 at 18:10 -0700, Eric Dumazet wrote:
> On Fri, 2015-04-03 at 17:57 -0700, Alexei Potashnik wrote:
>
> >
> > My understanding of the state when issue occurs is:
> > - neigh_destroy() has finished executing
> > - no timer for this neigh entry is pending or running
> > (which makes neigh_timer_handler behavior irrelevant)
> > - but kfree_rcu callback for the neigh entry has not executed yet
> > - at this point IP code finds the entry with __ipv4_neigh_lookup_noref
> > and starts operating on it.
> >
>
> Right. So far no rule here seems to be violated.
>
> As long as IP code finds a neighbor, it is allowed to use it.
>
> Freeing of the neighbor should not happen before rcu grace period.
>
> Problem is __neigh_event_send() is ignoring n->dead
>
>
Please try following patch :
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 3de6542560288b3896ab243879a7b4a9b098ca0d..3a2928332b31bfd421ca409ba9bc4e82f82b3552 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -957,7 +957,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
rc = 0;
if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
goto out_unlock_bh;
-
+ if (neigh->dead)
+ goto out_unlock_bh;
if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
NEIGH_VAR(neigh->parms, APP_PROBES)) {
^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: neigh use-after-free
2015-04-04 1:20 ` Eric Dumazet
@ 2015-04-04 1:32 ` Alexei Potashnik
2015-04-04 16:06 ` Julian Anastasov
1 sibling, 0 replies; 10+ messages in thread
From: Alexei Potashnik @ 2015-04-04 1:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Joern Engel, David S. Miller, netdev
Thanks. Will try.
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>
>
> Please try following patch :
>
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c index
> 3de6542560288b3896ab243879a7b4a9b098ca0d..3a2928332b31bfd421ca409ba
> 9bc4e82f82b3552 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -957,7 +957,8 @@ int __neigh_event_send(struct neighbour *neigh,
> struct sk_buff *skb)
> rc = 0;
> if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY |
> NUD_PROBE))
> goto out_unlock_bh;
> -
> + if (neigh->dead)
> + goto out_unlock_bh;
> if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
> if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
> NEIGH_VAR(neigh->parms, APP_PROBES)) {
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: neigh use-after-free
2015-04-04 1:20 ` Eric Dumazet
2015-04-04 1:32 ` Alexei Potashnik
@ 2015-04-04 16:06 ` Julian Anastasov
1 sibling, 0 replies; 10+ messages in thread
From: Julian Anastasov @ 2015-04-04 16:06 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Alexei Potashnik, Joern Engel, David S. Miller, netdev
Hello,
On Fri, 3 Apr 2015, Eric Dumazet wrote:
> > Problem is __neigh_event_send() is ignoring n->dead
> >
> >
>
> Please try following patch :
>
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 3de6542560288b3896ab243879a7b4a9b098ca0d..3a2928332b31bfd421ca409ba9bc4e82f82b3552 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -957,7 +957,8 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
> rc = 0;
> if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
> goto out_unlock_bh;
At this point we can have NUD_STALE, NUD_INCOMPLETE or
NUD_FAILED. Not sure about NUD_STALE but for others we should
call kfree_skb(skb) and to return rc = 1. It is possible that
we never resolved this entry, so rc = 0 is not correct for all
cases.
When n->dead = 1 the neigh is unlinked but the
option to call somehow __neigh_create() looks complex.
As result, in a rare case we can drop packets while
neigh_periodic_work is removing NUD_STALE entry. May be
it can take a RCU grace period.
> -
> + if (neigh->dead)
> + goto out_unlock_bh;
> if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
> if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
> NEIGH_VAR(neigh->parms, APP_PROBES)) {
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-04-04 16:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-03 20:27 neigh use-after-free Jörn Engel
2015-04-03 20:52 ` Alexei Potashnik
2015-04-03 23:34 ` Eric Dumazet
2015-04-04 0:27 ` Alexei Potashnik
2015-04-04 0:44 ` Eric Dumazet
2015-04-04 0:57 ` Alexei Potashnik
2015-04-04 1:10 ` Eric Dumazet
2015-04-04 1:20 ` Eric Dumazet
2015-04-04 1:32 ` Alexei Potashnik
2015-04-04 16:06 ` Julian Anastasov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).