netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] prevent oops in udp rcv path
@ 2013-03-07 22:36 Tom Parkin
  2013-03-07 22:36 ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Tom Parkin
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Parkin @ 2013-03-07 22:36 UTC (permalink / raw)
  To: netdev; +Cc: Tom Parkin

During stress testing of some l2tp patches, I've been able to cause an oops in
the udp rcv path by tearing down l2tp sessions while they're passing data.
The oops text is below -- this is for a udp-encap l2tp tunnel containing a
number of ethernet pseudowires.

So far, I've only managed to reproduce this oops with a PREEMPT kernel running
on a VM.  Based on my debugging here it seems that the failure case is caused
by a fragmented IP packet being queued/reassembled across the device shutdown
event.  When such a packet hits udp_rcv, skb_dst(skb)->dev is NULL, which
leads to an oops when the receive code attempts to associate the skb with a
udp socket.

The accompanying patch, which I don't really propose as a fix so much as an
illustration of what goes wrong, "fixes" this problem by dropping packets with
a NULL dev field in the dst_entry.

I'm not sure what is the real root cause of this bug, though -- hence the RFC.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000478
IP: [<ffffffff81618c54>] __udp4_lib_rcv+0x514/0xa80
PGD ac38067 PUD 11492067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: l2tp_eth l2tp_netlink l2tp_core microcode psmouse seri0
CPU 0
Pid: 12607, comm: ip Tainted: G        W    3.8.0-l2tp-pw-fixups-2-dev-6+x
RIP: 0010:[<ffffffff81618c54>]  [<ffffffff81618c54>] __udp4_lib_rcv+0x5140
RSP: 0000:ffff880014403a10  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff880010c3f700 RCX: 000000006800000a
RDX: 0000000000008813 RSI: 000000008300000a RDI: 0000000000000246
RBP: ffff880014403a90 R08: 0000000000008813 R09: 0000000000000003
R10: 000000008300000a R11: 0000000000000001 R12: ffff88000c232ea2
R13: 0000000000000011 R14: ffffffff81cc19c0 R15: 00000000000005de
FS:  00007f2d5efd6700(0000) GS:ffff880014400000(0000) knlGS:00000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000478 CR3: 0000000011c35000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 12607, threadinfo ffff88000c05c000, task ffff88000bc1a2e)
Stack:
 ffff880014403a50 da950d56ef5b821f ffff88000c05dfd8 000000008300000a
 0000000000000003 ffff88006800000a 8813881314403a50 ffffffff81ce5750
 ffff880014403a90 6800000a8300000a 0000000000000246 ffff880010c3f700
Call Trace:
 <IRQ>
 [<ffffffff816191da>] udp_rcv+0x1a/0x20
 [<ffffffff815e0a96>] ip_local_deliver_finish+0xd6/0x530
 [<ffffffff815e0a0a>] ? ip_local_deliver_finish+0x4a/0x530
 [<ffffffff815e1b44>] ip_local_deliver+0x134/0x410
 [<ffffffff815e1080>] ip_rcv_finish+0x190/0x8d0
 [<ffffffff815e203d>] ip_rcv+0x21d/0x300
 [<ffffffff815a531b>] __netif_receive_skb_core+0xa7b/0xd50
 [<ffffffff815a49e2>] ? __netif_receive_skb_core+0x142/0xd50
 [<ffffffff815a5611>] __netif_receive_skb+0x21/0x70
 [<ffffffff815a5823>] netif_receive_skb+0x23/0x1f0
 [<ffffffff815a6678>] napi_gro_receive+0xe8/0x140
 [<ffffffffa0002aa8>] e1000_clean_rx_irq+0x2b8/0x520 [e1000]
 [<ffffffffa000317e>] e1000_clean+0x28e/0x990 [e1000]
 [<ffffffff810af709>] ? __lock_acquire+0x469/0x1e60
 [<ffffffff815a73a9>] net_rx_action+0x179/0x3c0
 [<ffffffff810ad496>] ? mark_held_locks+0x86/0x150
 [<ffffffff81084718>] ? sched_clock_cpu+0xb8/0x130
 [<ffffffff8104b548>] __do_softirq+0xe8/0x460
 [<ffffffff8136876d>] ? do_raw_spin_unlock+0x5d/0xb0
 [<ffffffff816e1a7c>] call_softirq+0x1c/0x30
 [<ffffffff810042a5>] do_softirq+0xa5/0xe0
 [<ffffffff8104ba6e>] irq_exit+0x9e/0xc0
 [<ffffffff816e1b73>] do_IRQ+0x63/0xd0
 [<ffffffff816d85b2>] common_interrupt+0x72/0x72
 <EOI>
 [<ffffffff81134c62>] ? find_get_page+0xb2/0x230
 [<ffffffff81134d95>] ? find_get_page+0x1e5/0x230
 [<ffffffff81134bb5>] ? find_get_page+0x5/0x230
 [<ffffffff81009d22>] ? native_sched_clock+0x22/0x80
 [<ffffffff8113696b>] filemap_fault+0x8b/0x4f0
 [<ffffffff8115b929>] __do_fault+0x69/0x4c0
 [<ffffffff810af709>] ? __lock_acquire+0x469/0x1e60
 [<ffffffff8115e800>] handle_pte_fault+0x90/0x850
 [<ffffffff81084575>] ? sched_clock_local+0x25/0x90
 [<ffffffff811601e1>] handle_mm_fault+0x241/0x340
 [<ffffffff816dbba7>] __do_page_fault+0x197/0x5e0
 [<ffffffff811ad7f9>] ? fget_light+0x3e9/0x4e0
 [<ffffffff813622bd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff816dbffe>] do_page_fault+0xe/0x10
 [<ffffffff816d88e8>] page_fault+0x28/0x30
Code: 45 85 c9 75 07 44 8b 8b a0 00 00 00 85 f6 8b 4a 10 44 8b 52 0c 0f 8
RIP  [<ffffffff81618c54>] __udp4_lib_rcv+0x514/0xa80
 RSP <ffff880014403a10>
CR2: 0000000000000478
---[ end trace da950d56ef5b8221 ]---

Tom Parkin (1):
  udp: don't rereference dst_entry dev pointer on rcv

 net/ipv4/udp.c |    3 +++
 1 file changed, 3 insertions(+)

--
1.7.9.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-07 22:36 [RFC PATCH] prevent oops in udp rcv path Tom Parkin
@ 2013-03-07 22:36 ` Tom Parkin
  2013-03-07 22:47   ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Parkin @ 2013-03-07 22:36 UTC (permalink / raw)
  To: netdev; +Cc: Tom Parkin

When a fragmented IP packet is queued during device teardown, it is possible
for the reassembled packet to hit the UDP rcv path with a NULL dst_entry dev
pointer.  Drop such packets to prevent an oops.
---
 net/ipv4/udp.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0a073a2..c38a4b1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1700,6 +1700,9 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		return __udp4_lib_mcast_deliver(net, skb, uh,
 				saddr, daddr, udptable);
 
+	if (skb_dst(skb)->dev == NULL)
+		goto drop;
+
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 
 	if (sk != NULL) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-07 22:36 ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Tom Parkin
@ 2013-03-07 22:47   ` Eric Dumazet
  2013-03-07 23:15     ` David Miller
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-03-07 22:47 UTC (permalink / raw)
  To: Tom Parkin; +Cc: netdev

On Thu, 2013-03-07 at 22:36 +0000, Tom Parkin wrote:
> When a fragmented IP packet is queued during device teardown, it is possible
> for the reassembled packet to hit the UDP rcv path with a NULL dst_entry dev
> pointer.  Drop such packets to prevent an oops.
> ---
>  net/ipv4/udp.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 0a073a2..c38a4b1 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1700,6 +1700,9 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  		return __udp4_lib_mcast_deliver(net, skb, uh,
>  				saddr, daddr, udptable);
>  
> +	if (skb_dst(skb)->dev == NULL)
> +		goto drop;
> +
>  	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
>  
>  	if (sk != NULL) {


Hmm... couldnt it be tested in reassembly layer instead ?

Why is it specific to UDP ?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-07 22:47   ` Eric Dumazet
@ 2013-03-07 23:15     ` David Miller
  2013-03-13 23:27       ` Tom Parkin
  0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2013-03-07 23:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tparkin, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 07 Mar 2013 14:47:24 -0800

> On Thu, 2013-03-07 at 22:36 +0000, Tom Parkin wrote:
>> When a fragmented IP packet is queued during device teardown, it is possible
>> for the reassembled packet to hit the UDP rcv path with a NULL dst_entry dev
>> pointer.  Drop such packets to prevent an oops.
>> ---
>>  net/ipv4/udp.c |    3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 0a073a2..c38a4b1 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -1700,6 +1700,9 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>>  		return __udp4_lib_mcast_deliver(net, skb, uh,
>>  				saddr, daddr, udptable);
>>  
>> +	if (skb_dst(skb)->dev == NULL)
>> +		goto drop;
>> +
>>  	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
>>  
>>  	if (sk != NULL) {
> 
> 
> Hmm... couldnt it be tested in reassembly layer instead ?
> 
> Why is it specific to UDP ?

Furthermore, when devices are unregistered we set the route's device
pointer to point to the loopback device, not NULL, exactly to avoid
this kind of problem.

I don't see anything in our generic DST handler nor the ipv4 specific
route handling, that would set dst->dev to NULL.

You really have to show us how this can actually happen.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-07 23:15     ` David Miller
@ 2013-03-13 23:27       ` Tom Parkin
  2013-03-14  1:18         ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Parkin @ 2013-03-13 23:27 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

[-- Attachment #1: Type: text/plain, Size: 5477 bytes --]

On Thu, Mar 07, 2013 at 06:15:27PM -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 07 Mar 2013 14:47:24 -0800
> 
> > On Thu, 2013-03-07 at 22:36 +0000, Tom Parkin wrote:
> >> When a fragmented IP packet is queued during device teardown, it is possible
> >> for the reassembled packet to hit the UDP rcv path with a NULL dst_entry dev
> >> pointer.  Drop such packets to prevent an oops.
> >> ---
> >>  net/ipv4/udp.c |    3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> >> index 0a073a2..c38a4b1 100644
> >> --- a/net/ipv4/udp.c
> >> +++ b/net/ipv4/udp.c
> >> @@ -1700,6 +1700,9 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
> >>  		return __udp4_lib_mcast_deliver(net, skb, uh,
> >>  				saddr, daddr, udptable);
> >>  
> >> +	if (skb_dst(skb)->dev == NULL)
> >> +		goto drop;
> >> +
> >>  	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
> >>  
> >>  	if (sk != NULL) {
> > 
> > 
> > Hmm... couldnt it be tested in reassembly layer instead ?
> > 
> > Why is it specific to UDP ?
> 
> Furthermore, when devices are unregistered we set the route's device
> pointer to point to the loopback device, not NULL, exactly to avoid
> this kind of problem.
> 
> I don't see anything in our generic DST handler nor the ipv4 specific
> route handling, that would set dst->dev to NULL.
> 
> You really have to show us how this can actually happen.

I've been working to this end, and while I don't have a root cause as
yet, I do have some more information.

I think what's happening is that the dst_entry refcounting is getting
screwed up either by the ip defrag code, or by something before that
in the rcv path.  What I see from ftrace debugging is that an skb
fragment ends up queued on the reassembly queue while pointing to a
dst_entry with a refcount of 0.  If the dst_entry should be deleted before
the final fragment in the frame arrives, then we end up accessing
free'd memory.

So far as I can make out, the l2tp code isn't doing anything untoward
which is causing this bug.  My stress test simply makes it easier to
reproduce because I'm setting up and tearing down routes and devices
a lot while passing data.  I'm lucky in that my dev branch seems to
reproduce this more easily than net-next master does, although the
same oops occurs on master if you're prepared to wait around for long
enough.

This ftrace debug log snippet shows the sort of behaviour I'm seeing.
The numbers in brackets after some dst pointer values represent the
refcount for that dst:

# The dst_entry is created with a refcount of 1
  <idle>-0     [000] ..s2   112.770192: dst_alloc: dst ffff880012bbb0c0, refcnt 1
# First fragment is queued
  <idle>-0     [000] ..s2   112.770193: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0(1) : is fragment
  <idle>-0     [000] ..s2   112.770206: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0 : fragment queued
# Second and final fragment arrives, reassemble
      ip-10970 [000] ..s1   112.770678: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : is fragment
# skb_morph bumps refcount to 2, skb_consume drops it back down to 1
      ip-10970 [000] ..s2   112.770682: ip_defrag: >>> clone skb ffff880010937e00 with dst ffff880012bbb0c0
      ip-10970 [000] ..s2   112.770691: __copy_skb_header: don't dst_clone ffff880012bbb0c0
      ip-10970 [000] ..s2   112.770691: ip_defrag: >>> morph skb ffff880010937e00 from ffff880012864600
      ip-10970 [000] ..s2   112.770692: skb_release_head_state: drop skb ffff880010937e00 dst ref
      ip-10970 [000] ..s2   112.770692: __copy_skb_header: cloning dst ffff880012bbb0c0 (skb ffff880012864600 -> skb ffff880010937e00)
      ip-10970 [000] ..s2   112.770692: ip_defrag: >>> consume skb ffff880012864600
      ip-10970 [000] ..s2   112.770693: skb_release_head_state: drop skb ffff880012864600 dst ref
      ip-10970 [000] ..s2   112.770693: dst_release: dst ffff880012bbb0c0 newrefcnt 1
      ip-10970 [000] ..s2   112.770698: ip_defrag: >>> coalesce loop
      ip-10970 [000] ..s2   112.770698: ip_defrag: kfree_skb_partial(ffff880010937500, false)
      ip-10970 [000] ..s2   112.770699: skb_release_head_state: drop skb ffff880010937500 dst ref
# skb is reassembled and delivered, dst has refcount of 1 now
      ip-10970 [000] ..s1   112.770705: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : queue defragmented
# l2tp_eth uses dev_forward_skb, which calls skb_dst_drop
      ip-10970 [000] ..s1   112.770707: skb_release_head_state: drop skb ffff880010937e00 dst ref
      ip-10970 [000] ..s1   112.770708: dst_release: dst ffff880012bbb0c0 newrefcnt 0
# Another skb arrives; dst refcount remains at 0
  <idle>-0     [000] ..s2   112.771481: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0(0) : is fragment
  <idle>-0     [000] ..s2   112.771494: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0 : fragment queued

The strange thing is that once the dst refcount reaches zero, another
skb hitting ip_input doesn't bump the refcount back up.  This is
partially why I'm not sure whether the error is caused by the defrag
code, or by something prior to that in the rcv path.
-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-13 23:27       ` Tom Parkin
@ 2013-03-14  1:18         ` Eric Dumazet
  2013-03-14 14:45           ` Tom Parkin
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-03-14  1:18 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

On Wed, 2013-03-13 at 23:27 +0000, Tom Parkin wrote:

> I've been working to this end, and while I don't have a root cause as
> yet, I do have some more information.
> 
> I think what's happening is that the dst_entry refcounting is getting
> screwed up either by the ip defrag code, or by something before that
> in the rcv path.  What I see from ftrace debugging is that an skb
> fragment ends up queued on the reassembly queue while pointing to a
> dst_entry with a refcount of 0.  If the dst_entry should be deleted before
> the final fragment in the frame arrives, then we end up accessing
> free'd memory.
> 
> So far as I can make out, the l2tp code isn't doing anything untoward
> which is causing this bug.  My stress test simply makes it easier to
> reproduce because I'm setting up and tearing down routes and devices
> a lot while passing data.  I'm lucky in that my dev branch seems to
> reproduce this more easily than net-next master does, although the
> same oops occurs on master if you're prepared to wait around for long
> enough.
> 
> This ftrace debug log snippet shows the sort of behaviour I'm seeing.
> The numbers in brackets after some dst pointer values represent the
> refcount for that dst:
> 
> # The dst_entry is created with a refcount of 1
>   <idle>-0     [000] ..s2   112.770192: dst_alloc: dst ffff880012bbb0c0, refcnt 1
> # First fragment is queued
>   <idle>-0     [000] ..s2   112.770193: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0(1) : is fragment
>   <idle>-0     [000] ..s2   112.770206: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0 : fragment queued
> # Second and final fragment arrives, reassemble
>       ip-10970 [000] ..s1   112.770678: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : is fragment
> # skb_morph bumps refcount to 2, skb_consume drops it back down to 1
>       ip-10970 [000] ..s2   112.770682: ip_defrag: >>> clone skb ffff880010937e00 with dst ffff880012bbb0c0
>       ip-10970 [000] ..s2   112.770691: __copy_skb_header: don't dst_clone ffff880012bbb0c0
>       ip-10970 [000] ..s2   112.770691: ip_defrag: >>> morph skb ffff880010937e00 from ffff880012864600
>       ip-10970 [000] ..s2   112.770692: skb_release_head_state: drop skb ffff880010937e00 dst ref
>       ip-10970 [000] ..s2   112.770692: __copy_skb_header: cloning dst ffff880012bbb0c0 (skb ffff880012864600 -> skb ffff880010937e00)
>       ip-10970 [000] ..s2   112.770692: ip_defrag: >>> consume skb ffff880012864600
>       ip-10970 [000] ..s2   112.770693: skb_release_head_state: drop skb ffff880012864600 dst ref
>       ip-10970 [000] ..s2   112.770693: dst_release: dst ffff880012bbb0c0 newrefcnt 1
>       ip-10970 [000] ..s2   112.770698: ip_defrag: >>> coalesce loop
>       ip-10970 [000] ..s2   112.770698: ip_defrag: kfree_skb_partial(ffff880010937500, false)
>       ip-10970 [000] ..s2   112.770699: skb_release_head_state: drop skb ffff880010937500 dst ref
> # skb is reassembled and delivered, dst has refcount of 1 now
>       ip-10970 [000] ..s1   112.770705: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : queue defragmented
> # l2tp_eth uses dev_forward_skb, which calls skb_dst_drop
>       ip-10970 [000] ..s1   112.770707: skb_release_head_state: drop skb ffff880010937e00 dst ref
>       ip-10970 [000] ..s1   112.770708: dst_release: dst ffff880012bbb0c0 newrefcnt 0
> # Another skb arrives; dst refcount remains at 0
>   <idle>-0     [000] ..s2   112.771481: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0(0) : is fragment
>   <idle>-0     [000] ..s2   112.771494: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0 : fragment queued
> 
> The strange thing is that once the dst refcount reaches zero, another
> skb hitting ip_input doesn't bump the refcount back up.  This is
> partially why I'm not sure whether the error is caused by the defrag
> code, or by something prior to that in the rcv path.

Ah thanks for this, as this definitely makes more sense ;)

Could you try the following fix ?

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index b6d30ac..87f4ecb 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -529,6 +529,7 @@ found:
 	    qp->q.meat == qp->q.len)
 		return ip_frag_reasm(qp, prev, dev);
 
+	skb_dst_force(skb);
 	inet_frag_lru_move(&qp->q);
 	return -EINPROGRESS;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-14  1:18         ` Eric Dumazet
@ 2013-03-14 14:45           ` Tom Parkin
  2013-03-14 15:05             ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Tom Parkin @ 2013-03-14 14:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 719 bytes --]

On Thu, Mar 14, 2013 at 02:18:04AM +0100, Eric Dumazet wrote:
> Ah thanks for this, as this definitely makes more sense ;)
> 
> Could you try the following fix ?
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index b6d30ac..87f4ecb 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -529,6 +529,7 @@ found:
>  	    qp->q.meat == qp->q.len)
>  		return ip_frag_reasm(qp, prev, dev);
>  
> +	skb_dst_force(skb);
>  	inet_frag_lru_move(&qp->q);
>  	return -EINPROGRESS;
>

Thanks Eric, with this patch I can no longer reproduce the oops :-)
-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-14 14:45           ` Tom Parkin
@ 2013-03-14 15:05             ` Eric Dumazet
  2013-03-14 15:29               ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-03-14 15:05 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

On Thu, 2013-03-14 at 14:45 +0000, Tom Parkin wrote:
> On Thu, Mar 14, 2013 at 02:18:04AM +0100, Eric Dumazet wrote:
> > Ah thanks for this, as this definitely makes more sense ;)
> > 
> > Could you try the following fix ?
> > 
> > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> > index b6d30ac..87f4ecb 100644
> > --- a/net/ipv4/ip_fragment.c
> > +++ b/net/ipv4/ip_fragment.c
> > @@ -529,6 +529,7 @@ found:
> >  	    qp->q.meat == qp->q.len)
> >  		return ip_frag_reasm(qp, prev, dev);
> >  
> > +	skb_dst_force(skb);
> >  	inet_frag_lru_move(&qp->q);
> >  	return -EINPROGRESS;
> >
> 
> Thanks Eric, with this patch I can no longer reproduce the oops :-)

Thanks for testing.

I am considering an alternative patch :

We can drop the reference instead, and use the dst of the last skb.

This would help to not dirty the dst refcount.

I'll send an updated version.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-14 15:05             ` Eric Dumazet
@ 2013-03-14 15:29               ` Eric Dumazet
  2013-03-14 16:14                 ` Tom Parkin
  2013-04-11 16:32                 ` Eric Dumazet
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2013-03-14 15:29 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

On Thu, 2013-03-14 at 16:05 +0100, Eric Dumazet wrote:
> On Thu, 2013-03-14 at 14:45 +0000, Tom Parkin wrote:
> > On Thu, Mar 14, 2013 at 02:18:04AM +0100, Eric Dumazet wrote:
> > > Ah thanks for this, as this definitely makes more sense ;)
> > > 
> > > Could you try the following fix ?
> > > 
> > > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> > > index b6d30ac..87f4ecb 100644
> > > --- a/net/ipv4/ip_fragment.c
> > > +++ b/net/ipv4/ip_fragment.c
> > > @@ -529,6 +529,7 @@ found:
> > >  	    qp->q.meat == qp->q.len)
> > >  		return ip_frag_reasm(qp, prev, dev);
> > >  
> > > +	skb_dst_force(skb);
> > >  	inet_frag_lru_move(&qp->q);
> > >  	return -EINPROGRESS;
> > >
> > 
> > Thanks Eric, with this patch I can no longer reproduce the oops :-)
> 
> Thanks for testing.
> 
> I am considering an alternative patch :
> 
> We can drop the reference instead, and use the dst of the last skb.
> 
> This would help to not dirty the dst refcount.
> 
> I'll send an updated version.
> 

OK, I tested the following one instead, please test it so that I can send an official patch.

Thanks

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index b6d30ac..2d23830 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -529,6 +529,7 @@ found:
 	    qp->q.meat == qp->q.len)
 		return ip_frag_reasm(qp, prev, dev);
 
+	skb_dst_drop(skb);
 	inet_frag_lru_move(&qp->q);
 	return -EINPROGRESS;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-14 15:29               ` Eric Dumazet
@ 2013-03-14 16:14                 ` Tom Parkin
  2013-03-23 10:31                   ` Damien Wyart
  2013-04-11 16:32                 ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Tom Parkin @ 2013-03-14 16:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

On Thu, Mar 14, 2013 at 04:29:06PM +0100, Eric Dumazet wrote:
> OK, I tested the following one instead, please test it so that I can send an official patch.
> 
> Thanks
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index b6d30ac..2d23830 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -529,6 +529,7 @@ found:
>  	    qp->q.meat == qp->q.len)
>  		return ip_frag_reasm(qp, prev, dev);
>  
> +	skb_dst_drop(skb);
>  	inet_frag_lru_move(&qp->q);
>  	return -EINPROGRESS;
>

Thanks again, Eric.

Sadly, with this patch the oops is back :-(  It falls over quite
quickly for me.
-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-14 16:14                 ` Tom Parkin
@ 2013-03-23 10:31                   ` Damien Wyart
  2013-03-23 15:09                     ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Damien Wyart @ 2013-03-23 10:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Parkin, David Miller, netdev

Hi Eric,

* Tom Parkin <tparkin@katalix.com> [2013-03-14 16:14]:
> On Thu, Mar 14, 2013 at 04:29:06PM +0100, Eric Dumazet wrote:
> > OK, I tested the following one instead, please test it so that I can send an official patch.

> > Thanks

> > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> > index b6d30ac..2d23830 100644
> > --- a/net/ipv4/ip_fragment.c
> > +++ b/net/ipv4/ip_fragment.c
> > @@ -529,6 +529,7 @@ found:
> >  	    qp->q.meat == qp->q.len)
> >  		return ip_frag_reasm(qp, prev, dev);

> > +	skb_dst_drop(skb);
> >  	inet_frag_lru_move(&qp->q);
> >  	return -EINPROGRESS;

> Thanks again, Eric.

> Sadly, with this patch the oops is back :-(  It falls over quite
> quickly for me.

Just a quick check: do you plan to have another look at this in the
coming days/weeks? You said you had a huge backlog, but in the meantime,
you sent many patches on many topics, so just wanted to check if this
one had not be forgotten for ever.

Thanks,
-- 
Damien Wyart

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-23 10:31                   ` Damien Wyart
@ 2013-03-23 15:09                     ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2013-03-23 15:09 UTC (permalink / raw)
  To: Damien Wyart; +Cc: Tom Parkin, David Miller, netdev

On Sat, 2013-03-23 at 11:31 +0100, Damien Wyart wrote:
> Hi Eric,
> 
> * Tom Parkin <tparkin@katalix.com> [2013-03-14 16:14]:
> > On Thu, Mar 14, 2013 at 04:29:06PM +0100, Eric Dumazet wrote:
> > > OK, I tested the following one instead, please test it so that I can send an official patch.
> 
> > > Thanks
> 
> > > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> > > index b6d30ac..2d23830 100644
> > > --- a/net/ipv4/ip_fragment.c
> > > +++ b/net/ipv4/ip_fragment.c
> > > @@ -529,6 +529,7 @@ found:
> > >  	    qp->q.meat == qp->q.len)
> > >  		return ip_frag_reasm(qp, prev, dev);
> 
> > > +	skb_dst_drop(skb);
> > >  	inet_frag_lru_move(&qp->q);
> > >  	return -EINPROGRESS;
> 
> > Thanks again, Eric.
> 
> > Sadly, with this patch the oops is back :-(  It falls over quite
> > quickly for me.
> 
> Just a quick check: do you plan to have another look at this in the
> coming days/weeks? You said you had a huge backlog, but in the meantime,
> you sent many patches on many topics, so just wanted to check if this
> one had not be forgotten for ever.

Hi Damien. This one is part of my backlog, I will send a patch soon.

Thanks

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-03-14 15:29               ` Eric Dumazet
  2013-03-14 16:14                 ` Tom Parkin
@ 2013-04-11 16:32                 ` Eric Dumazet
  2013-04-11 17:53                   ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-04-11 16:32 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

On Thu, 2013-03-14 at 16:29 +0100, Eric Dumazet wrote:
> On Thu, 2013-03-14 at 16:05 +0100, Eric Dumazet wrote:
> > On Thu, 2013-03-14 at 14:45 +0000, Tom Parkin wrote:
> > > On Thu, Mar 14, 2013 at 02:18:04AM +0100, Eric Dumazet wrote:
> > > > Ah thanks for this, as this definitely makes more sense ;)
> > > > 
> > > > Could you try the following fix ?
> > > > 
> > > > diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> > > > index b6d30ac..87f4ecb 100644
> > > > --- a/net/ipv4/ip_fragment.c
> > > > +++ b/net/ipv4/ip_fragment.c
> > > > @@ -529,6 +529,7 @@ found:
> > > >  	    qp->q.meat == qp->q.len)
> > > >  		return ip_frag_reasm(qp, prev, dev);
> > > >  
> > > > +	skb_dst_force(skb);
> > > >  	inet_frag_lru_move(&qp->q);
> > > >  	return -EINPROGRESS;
> > > >
> > > 
> > > Thanks Eric, with this patch I can no longer reproduce the oops :-)
> > 
> > Thanks for testing.
> > 
> > I am considering an alternative patch :
> > 
> > We can drop the reference instead, and use the dst of the last skb.
> > 
> > This would help to not dirty the dst refcount.
> > 
> > I'll send an updated version.
> > 
> 
> OK, I tested the following one instead, please test it so that I can send an official patch.
> 
> Thanks
> 
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index b6d30ac..2d23830 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -529,6 +529,7 @@ found:
>  	    qp->q.meat == qp->q.len)
>  		return ip_frag_reasm(qp, prev, dev);
>  
> +	skb_dst_drop(skb);
>  	inet_frag_lru_move(&qp->q);
>  	return -EINPROGRESS;
>  
> 

Short update : I do not understand yet why this patch is not working.

Normally, the reassembled packet should get the dst from the last skb
(the one completing the packet)...

I have to make more experiments.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-04-11 16:32                 ` Eric Dumazet
@ 2013-04-11 17:53                   ` Eric Dumazet
  2013-04-11 22:26                     ` Eric Dumazet
                                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Eric Dumazet @ 2013-04-11 17:53 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

On Thu, 2013-04-11 at 09:32 -0700, Eric Dumazet wrote:

> Short update : I do not understand yet why this patch is not working.
> 
> Normally, the reassembled packet should get the dst from the last skb
> (the one completing the packet)...
> 
> I have to make more experiments.

OK I think I've nailed it, please try following patch (I tried it on
net-next, but it should apply on previous kernels)

Thanks !

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 9385206..f8f1a87 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -494,9 +494,16 @@ found:
 		qp->q.max_size = skb->len + ihl;
 
 	if (qp->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
-	    qp->q.meat == qp->q.len)
-		return ip_frag_reasm(qp, prev, dev);
+	    qp->q.meat == qp->q.len) {
+		unsigned long dst_save = skb->_skb_refdst;
 
+		skb->_skb_refdst = 0;
+		err = ip_frag_reasm(qp, prev, dev);
+		skb->_skb_refdst = dst_save;
+		return err;
+	}
+
+	skb_dst_drop(skb);
 	inet_frag_lru_move(&qp->q);
 	return -EINPROGRESS;
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index e6e44ce..aac7fa5 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -342,9 +342,17 @@ found:
 	}
 
 	if (fq->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
-	    fq->q.meat == fq->q.len)
-		return ip6_frag_reasm(fq, prev, dev);
+	    fq->q.meat == fq->q.len) {
+		int err;
+		unsigned long dstsave = skb->_skb_refdst;
+
+		skb->_skb_refdst = 0;
+		err = ip6_frag_reasm(fq, prev, dev);
+		skb->_skb_refdst = dstsave;
+		return err;
+	}
 
+	skb_dst_drop(skb);
 	inet_frag_lru_move(&fq->q);
 	return -1;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-04-11 17:53                   ` Eric Dumazet
@ 2013-04-11 22:26                     ` Eric Dumazet
  2013-04-12 22:32                       ` David Miller
  2013-04-12  7:04                     ` Tom Parkin
  2013-04-16 20:20                     ` Tom Parkin
  2 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-04-11 22:26 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

On Thu, 2013-04-11 at 10:53 -0700, Eric Dumazet wrote:
> On Thu, 2013-04-11 at 09:32 -0700, Eric Dumazet wrote:
> 
> > Short update : I do not understand yet why this patch is not working.
> > 
> > Normally, the reassembled packet should get the dst from the last skb
> > (the one completing the packet)...
> > 
> > I have to make more experiments.
> 
> OK I think I've nailed it, please try following patch (I tried it on
> net-next, but it should apply on previous kernels)

By the way I am ashamed by commit 64f3b9e203bd068550
(net: ip_expire() must revalidate route)

Reading its changelog now, I understand I should have fixed this the
way we did today...

:(

    Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
    added a bug in IP defragmentation handling, in case timeout is fired.
    
    When a frame is defragmented, we use last skb dst field when building
    final skb. Its dst is valid, since we are in rcu read section.
    
    But if a timeout occurs, we take first queued fragment to build one ICMP
    TIME EXCEEDED message. Problem is all queued skb have weak dst pointers,
    since we escaped RCU critical section after their queueing. icmp_send()
    might dereference a now freed (and possibly reused) part of memory.
    
    Calling skb_dst_drop() and ip_route_input_noref() to revalidate route is
    the only possible choice.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-04-11 17:53                   ` Eric Dumazet
  2013-04-11 22:26                     ` Eric Dumazet
@ 2013-04-12  7:04                     ` Tom Parkin
  2013-04-16 20:20                     ` Tom Parkin
  2 siblings, 0 replies; 21+ messages in thread
From: Tom Parkin @ 2013-04-12  7:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 772 bytes --]

On Thu, Apr 11, 2013 at 10:53:24AM -0700, Eric Dumazet wrote:
> On Thu, 2013-04-11 at 09:32 -0700, Eric Dumazet wrote:
> 
> > Short update : I do not understand yet why this patch is not working.
> > 
> > Normally, the reassembled packet should get the dst from the last skb
> > (the one completing the packet)...
> > 
> > I have to make more experiments.
> 
> OK I think I've nailed it, please try following patch (I tried it on
> net-next, but it should apply on previous kernels)
> 
> Thanks !

Many thanks for this updated patch, Eric.  I will test here as soon as I
can and report back -- this will likely be some time next week.

-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-04-11 22:26                     ` Eric Dumazet
@ 2013-04-12 22:32                       ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2013-04-12 22:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tparkin, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 11 Apr 2013 15:26:33 -0700

> On Thu, 2013-04-11 at 10:53 -0700, Eric Dumazet wrote:
>> On Thu, 2013-04-11 at 09:32 -0700, Eric Dumazet wrote:
>> 
>> > Short update : I do not understand yet why this patch is not working.
>> > 
>> > Normally, the reassembled packet should get the dst from the last skb
>> > (the one completing the packet)...
>> > 
>> > I have to make more experiments.
>> 
>> OK I think I've nailed it, please try following patch (I tried it on
>> net-next, but it should apply on previous kernels)
> 
> By the way I am ashamed by commit 64f3b9e203bd068550
> (net: ip_expire() must revalidate route)
> 
> Reading its changelog now, I understand I should have fixed this the
> way we did today...
> 
> :(

There is still time to do it the right way Eric :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-04-11 17:53                   ` Eric Dumazet
  2013-04-11 22:26                     ` Eric Dumazet
  2013-04-12  7:04                     ` Tom Parkin
@ 2013-04-16 20:20                     ` Tom Parkin
  2013-04-16 22:55                       ` [PATCH] net: drop dst before queueing fragments Eric Dumazet
  2013-04-16 22:56                       ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Eric Dumazet
  2 siblings, 2 replies; 21+ messages in thread
From: Tom Parkin @ 2013-04-16 20:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 823 bytes --]

On Thu, Apr 11, 2013 at 10:53:24AM -0700, Eric Dumazet wrote:
> On Thu, 2013-04-11 at 09:32 -0700, Eric Dumazet wrote:
> 
> > Short update : I do not understand yet why this patch is not working.
> > 
> > Normally, the reassembled packet should get the dst from the last skb
> > (the one completing the packet)...
> > 
> > I have to make more experiments.
> 
> OK I think I've nailed it, please try following patch (I tried it on
> net-next, but it should apply on previous kernels)
> 
> Thanks !

Thanks Eric, I've tested this patch here and it looks good.  Usually I
see an oops within 30 mins or so with my workload, but with this
change I've run for over two hours with no issues.

-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] net: drop dst before queueing fragments
  2013-04-16 20:20                     ` Tom Parkin
@ 2013-04-16 22:55                       ` Eric Dumazet
  2013-04-17  5:15                         ` David Miller
  2013-04-16 22:56                       ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-04-16 22:55 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

From: Eric Dumazet <edumazet@google.com>

Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, as non refcounted
dst could escape an RCU protected section.

Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed
the case of timeouts, but not the general problem.

Tom Parkin noticed crashes in UDP stack and provided a patch,
but further analysis permitted us to pinpoint the root cause.

Before queueing a packet into a frag list, we must drop its dst,
as this dst has limited lifetime (RCU protected)

When/if a packet is finally reassembled, we use the dst of the very
last skb, still protected by RCU and valid, as the dst of the
reassembled packet.

Use same logic in IPv6, as there is no need to hold dst references.

Reported-by: Tom Parkin <tparkin@katalix.com>
Tested-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ip_fragment.c |   14 ++++++++++----
 net/ipv6/reassembly.c  |   12 ++++++++++--
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index a6445b8..52c273e 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -248,8 +248,7 @@ static void ip_expire(unsigned long arg)
 		if (!head->dev)
 			goto out_rcu_unlock;
 
-		/* skb dst is stale, drop it, and perform route lookup again */
-		skb_dst_drop(head);
+		/* skb has no dst, perform route lookup again */
 		iph = ip_hdr(head);
 		err = ip_route_input_noref(head, iph->daddr, iph->saddr,
 					   iph->tos, head->dev);
@@ -523,9 +522,16 @@ found:
 		qp->q.max_size = skb->len + ihl;
 
 	if (qp->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
-	    qp->q.meat == qp->q.len)
-		return ip_frag_reasm(qp, prev, dev);
+	    qp->q.meat == qp->q.len) {
+		unsigned long orefdst = skb->_skb_refdst;
 
+		skb->_skb_refdst = 0UL;
+		err = ip_frag_reasm(qp, prev, dev);
+		skb->_skb_refdst = orefdst;
+		return err;
+	}
+
+	skb_dst_drop(skb);
 	inet_frag_lru_move(&qp->q);
 	return -EINPROGRESS;
 
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 196ab93..0ba10e5 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -330,9 +330,17 @@ found:
 	}
 
 	if (fq->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
-	    fq->q.meat == fq->q.len)
-		return ip6_frag_reasm(fq, prev, dev);
+	    fq->q.meat == fq->q.len) {
+		int res;
+		unsigned long orefdst = skb->_skb_refdst;
+
+		skb->_skb_refdst = 0UL;
+		res = ip6_frag_reasm(fq, prev, dev);
+		skb->_skb_refdst = orefdst;
+		return res;
+	}
 
+	skb_dst_drop(skb);
 	inet_frag_lru_move(&fq->q);
 	return -1;
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
  2013-04-16 20:20                     ` Tom Parkin
  2013-04-16 22:55                       ` [PATCH] net: drop dst before queueing fragments Eric Dumazet
@ 2013-04-16 22:56                       ` Eric Dumazet
  1 sibling, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2013-04-16 22:56 UTC (permalink / raw)
  To: Tom Parkin; +Cc: David Miller, netdev

On Tue, 2013-04-16 at 21:20 +0100, Tom Parkin wrote:

> Thanks Eric, I've tested this patch here and it looks good.  Usually I
> see an oops within 30 mins or so with my workload, but with this
> change I've run for over two hours with no issues.
> 

Thanks Tom, I submitted the official patch.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] net: drop dst before queueing fragments
  2013-04-16 22:55                       ` [PATCH] net: drop dst before queueing fragments Eric Dumazet
@ 2013-04-17  5:15                         ` David Miller
  0 siblings, 0 replies; 21+ messages in thread
From: David Miller @ 2013-04-17  5:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tparkin, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 16 Apr 2013 15:55:41 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
> added a bug in IP defragmentation handling, as non refcounted
> dst could escape an RCU protected section.
> 
> Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed
> the case of timeouts, but not the general problem.
> 
> Tom Parkin noticed crashes in UDP stack and provided a patch,
> but further analysis permitted us to pinpoint the root cause.
> 
> Before queueing a packet into a frag list, we must drop its dst,
> as this dst has limited lifetime (RCU protected)
> 
> When/if a packet is finally reassembled, we use the dst of the very
> last skb, still protected by RCU and valid, as the dst of the
> reassembled packet.
> 
> Use same logic in IPv6, as there is no need to hold dst references.
> 
> Reported-by: Tom Parkin <tparkin@katalix.com>
> Tested-by: Tom Parkin <tparkin@katalix.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-04-17  5:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-07 22:36 [RFC PATCH] prevent oops in udp rcv path Tom Parkin
2013-03-07 22:36 ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Tom Parkin
2013-03-07 22:47   ` Eric Dumazet
2013-03-07 23:15     ` David Miller
2013-03-13 23:27       ` Tom Parkin
2013-03-14  1:18         ` Eric Dumazet
2013-03-14 14:45           ` Tom Parkin
2013-03-14 15:05             ` Eric Dumazet
2013-03-14 15:29               ` Eric Dumazet
2013-03-14 16:14                 ` Tom Parkin
2013-03-23 10:31                   ` Damien Wyart
2013-03-23 15:09                     ` Eric Dumazet
2013-04-11 16:32                 ` Eric Dumazet
2013-04-11 17:53                   ` Eric Dumazet
2013-04-11 22:26                     ` Eric Dumazet
2013-04-12 22:32                       ` David Miller
2013-04-12  7:04                     ` Tom Parkin
2013-04-16 20:20                     ` Tom Parkin
2013-04-16 22:55                       ` [PATCH] net: drop dst before queueing fragments Eric Dumazet
2013-04-17  5:15                         ` David Miller
2013-04-16 22:56                       ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).