netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Parkin <tparkin@katalix.com>
To: David Miller <davem@davemloft.net>
Cc: eric.dumazet@gmail.com, netdev@vger.kernel.org
Subject: Re: [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv
Date: Wed, 13 Mar 2013 23:27:43 +0000	[thread overview]
Message-ID: <20130313232743.GA3686@raven> (raw)
In-Reply-To: <20130307.181527.390191009324148471.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 5477 bytes --]

On Thu, Mar 07, 2013 at 06:15:27PM -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 07 Mar 2013 14:47:24 -0800
> 
> > On Thu, 2013-03-07 at 22:36 +0000, Tom Parkin wrote:
> >> When a fragmented IP packet is queued during device teardown, it is possible
> >> for the reassembled packet to hit the UDP rcv path with a NULL dst_entry dev
> >> pointer.  Drop such packets to prevent an oops.
> >> ---
> >>  net/ipv4/udp.c |    3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> >> index 0a073a2..c38a4b1 100644
> >> --- a/net/ipv4/udp.c
> >> +++ b/net/ipv4/udp.c
> >> @@ -1700,6 +1700,9 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
> >>  		return __udp4_lib_mcast_deliver(net, skb, uh,
> >>  				saddr, daddr, udptable);
> >>  
> >> +	if (skb_dst(skb)->dev == NULL)
> >> +		goto drop;
> >> +
> >>  	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
> >>  
> >>  	if (sk != NULL) {
> > 
> > 
> > Hmm... couldnt it be tested in reassembly layer instead ?
> > 
> > Why is it specific to UDP ?
> 
> Furthermore, when devices are unregistered we set the route's device
> pointer to point to the loopback device, not NULL, exactly to avoid
> this kind of problem.
> 
> I don't see anything in our generic DST handler nor the ipv4 specific
> route handling, that would set dst->dev to NULL.
> 
> You really have to show us how this can actually happen.

I've been working to this end, and while I don't have a root cause as
yet, I do have some more information.

I think what's happening is that the dst_entry refcounting is getting
screwed up either by the ip defrag code, or by something before that
in the rcv path.  What I see from ftrace debugging is that an skb
fragment ends up queued on the reassembly queue while pointing to a
dst_entry with a refcount of 0.  If the dst_entry should be deleted before
the final fragment in the frame arrives, then we end up accessing
free'd memory.

So far as I can make out, the l2tp code isn't doing anything untoward
which is causing this bug.  My stress test simply makes it easier to
reproduce because I'm setting up and tearing down routes and devices
a lot while passing data.  I'm lucky in that my dev branch seems to
reproduce this more easily than net-next master does, although the
same oops occurs on master if you're prepared to wait around for long
enough.

This ftrace debug log snippet shows the sort of behaviour I'm seeing.
The numbers in brackets after some dst pointer values represent the
refcount for that dst:

# The dst_entry is created with a refcount of 1
  <idle>-0     [000] ..s2   112.770192: dst_alloc: dst ffff880012bbb0c0, refcnt 1
# First fragment is queued
  <idle>-0     [000] ..s2   112.770193: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0(1) : is fragment
  <idle>-0     [000] ..s2   112.770206: ip_local_deliver: skb ffff880012864600, dst ffff880012bbb0c0 : fragment queued
# Second and final fragment arrives, reassemble
      ip-10970 [000] ..s1   112.770678: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : is fragment
# skb_morph bumps refcount to 2, skb_consume drops it back down to 1
      ip-10970 [000] ..s2   112.770682: ip_defrag: >>> clone skb ffff880010937e00 with dst ffff880012bbb0c0
      ip-10970 [000] ..s2   112.770691: __copy_skb_header: don't dst_clone ffff880012bbb0c0
      ip-10970 [000] ..s2   112.770691: ip_defrag: >>> morph skb ffff880010937e00 from ffff880012864600
      ip-10970 [000] ..s2   112.770692: skb_release_head_state: drop skb ffff880010937e00 dst ref
      ip-10970 [000] ..s2   112.770692: __copy_skb_header: cloning dst ffff880012bbb0c0 (skb ffff880012864600 -> skb ffff880010937e00)
      ip-10970 [000] ..s2   112.770692: ip_defrag: >>> consume skb ffff880012864600
      ip-10970 [000] ..s2   112.770693: skb_release_head_state: drop skb ffff880012864600 dst ref
      ip-10970 [000] ..s2   112.770693: dst_release: dst ffff880012bbb0c0 newrefcnt 1
      ip-10970 [000] ..s2   112.770698: ip_defrag: >>> coalesce loop
      ip-10970 [000] ..s2   112.770698: ip_defrag: kfree_skb_partial(ffff880010937500, false)
      ip-10970 [000] ..s2   112.770699: skb_release_head_state: drop skb ffff880010937500 dst ref
# skb is reassembled and delivered, dst has refcount of 1 now
      ip-10970 [000] ..s1   112.770705: ip_local_deliver: skb ffff880010937e00, dst ffff880012bbb0c0(1) : queue defragmented
# l2tp_eth uses dev_forward_skb, which calls skb_dst_drop
      ip-10970 [000] ..s1   112.770707: skb_release_head_state: drop skb ffff880010937e00 dst ref
      ip-10970 [000] ..s1   112.770708: dst_release: dst ffff880012bbb0c0 newrefcnt 0
# Another skb arrives; dst refcount remains at 0
  <idle>-0     [000] ..s2   112.771481: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0(0) : is fragment
  <idle>-0     [000] ..s2   112.771494: ip_local_deliver: skb ffff880012864500, dst ffff880012bbb0c0 : fragment queued

The strange thing is that once the dst refcount reaches zero, another
skb hitting ip_input doesn't bump the refcount back up.  This is
partially why I'm not sure whether the error is caused by the defrag
code, or by something prior to that in the rcv path.
-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

  reply	other threads:[~2013-03-13 23:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-07 22:36 [RFC PATCH] prevent oops in udp rcv path Tom Parkin
2013-03-07 22:36 ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Tom Parkin
2013-03-07 22:47   ` Eric Dumazet
2013-03-07 23:15     ` David Miller
2013-03-13 23:27       ` Tom Parkin [this message]
2013-03-14  1:18         ` Eric Dumazet
2013-03-14 14:45           ` Tom Parkin
2013-03-14 15:05             ` Eric Dumazet
2013-03-14 15:29               ` Eric Dumazet
2013-03-14 16:14                 ` Tom Parkin
2013-03-23 10:31                   ` Damien Wyart
2013-03-23 15:09                     ` Eric Dumazet
2013-04-11 16:32                 ` Eric Dumazet
2013-04-11 17:53                   ` Eric Dumazet
2013-04-11 22:26                     ` Eric Dumazet
2013-04-12 22:32                       ` David Miller
2013-04-12  7:04                     ` Tom Parkin
2013-04-16 20:20                     ` Tom Parkin
2013-04-16 22:55                       ` [PATCH] net: drop dst before queueing fragments Eric Dumazet
2013-04-17  5:15                         ` David Miller
2013-04-16 22:56                       ` [RFC PATCH] udp: don't rereference dst_entry dev pointer on rcv Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130313232743.GA3686@raven \
    --to=tparkin@katalix.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).