netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eyal Birger <eyal.birger@gmail.com>
To: Steffen Klassert <steffen.klassert@secunet.com>
Cc: <netdev@vger.kernel.org>, <herbert@gondor.apana.org.au>,
	<davem@davemloft.net>, <shmulik@metanetworks.com>,
	Wei Wang <weiwan@google.com>,
	fw@strlen.de
Subject: Re: xfrm, ip tunnel: non released device reference upon device unregistration
Date: Tue, 6 Feb 2018 12:42:02 +0200	[thread overview]
Message-ID: <20180206124202.4c86ddc9@jimi> (raw)
In-Reply-To: <20180206085338.wvxybymmqn6r67j2@gauss3.secunet.de>

Hi Steffen,

On Tue, 6 Feb 2018 09:53:38 +0100
Steffen Klassert <steffen.klassert@secunet.com> wrote:

> Cc Wei Wang
> 
> On Sun, Feb 04, 2018 at 01:21:18PM +0200, Eyal Birger wrote:
> > Hi,
> > 
> > We've encountered a non released device reference upon device
> > unregistration which seems to stem from xfrm policy code.
> > 
> > The setup includes:
> > - an underlay device (e.g. eth0) using IPv4
> > - an xfrm IPv6 over IPv4 tunnel routed via the underlay device
> > - an ipip6 tunnel over the xfrm IPv6 tunnel
> > 
> > When tearing down the underlay device, after traffic had passed via
> > the ipip6 tunnel, log messages of the following form are observed:
> > 
> > unregister_netdevice: waiting for eth0 to become free. Usage count
> > = 2  
> 
> Looks like this happened when the dst garbage collection code was
> removed. I could not point to a commit that introduced it so I
> did a bisection and this pointed to:
> 
> commit 9514528d92d4cbe086499322370155ed69f5d06c
> ipv6: call dst_dev_put() properly
> 
> With this commit we leak the one refcount and some further commit
> leaked the second one.
> 
> > 
> > The below synthetic script reproduces this consistently on a fresh
> > ubuntu vm running net-next v4.15-6066-ge9522a5:
> > ---------------------------------------------------------
> > #!/bin/bash
> > 
> > ipsec_underlay_dst=192.168.6.1
> > ipsec_underlay_src=192.168.5.2
> > ipv6_pfx=1234
> > local_ipv6_addr="$ipv6_pfx::1"
> > remote_ipv6_addr="$ipv6_pfx::2"
> > 
> > # create dummy ipsec underlay
> > ip l add dev dummy1 type dummy
> > ip l set dev dummy1 up
> > ip r add "$ipsec_underlay_dst/32" dev dummy1
> > ip -6 r add "$ipv6_pfx::/16" dev dummy1
> > 
> > ip a add dev dummy1 "$local_ipv6_addr/128"
> > ip a add dev dummy1 "$ipsec_underlay_src/24"
> > 
> > # add xfrm policy and state
> > ip x p add src "$local_ipv6_addr/128" dst "$ipv6_pfx::/16" dir out
> > tmpl src "$ipsec_underlay_src" dst "$ipsec_underlay_dst" proto esp
> > reqid 1 mode tunnel ip x s add src "$ipsec_underlay_src" dst
> > "$ipsec_underlay_dst" proto esp spi 0xcd440ce6 reqid 1 mode tunnel
> > auth-trunc 'hmac(sha1)' 0x34a546d309031628962b814ef073aff1a638ad21
> > 96 enc 'cbc(aes)' 0xf31e14149c328297fe7925ad7448420e encap espinudp
> > 4500 4500 0.0.0.0
> > 
> > # add 4o6 tunnel
> > ip l add tnl46 type ip6tnl mode ipip6 local "$local_ipv6_addr"
> > remote "$remote_ipv6_addr" ip l set dev tnl46 up
> > ip r add 10.64.0.0/10 dev tnl46 
> > 
> > # pass traffic so route is cached
> > ping -w 1 -c 1 10.64.0.1
> > 
> > # remove dummy underlay
> > ip l del dummy1
> > ---------------------------------------------------------
> > 
> > Analysis:
> > 
> > ip6_tunnel holds a dst_cache which caches its underlay dst objects.
> > When devices are unregistered, non-xfrm dst objects are invlidated
> > by their original creators (ipv4/ipv6/...) and thus are wiped from
> > dst_cache.
> > 
> > xfrm created routes otoh are not tracked by xfrm, and are not
> > invalidated upon device unregistration, thus hold the device upon
> > unregistration.
> > 
> > The following rough sketch patch illustrates an approach overcoming
> > this issue:
> > ---------------------------------------------------------  

[snip]

> > ---------------------------------------------------------
> > 
> > This approach has the unfortunate side effects of adding a spin
> > lock for the tracked list, as well as increasing struct xfrm_dst.  
> 
> Reintroducing garbage collection is probably not a so good idea. I
> think the patch below should fix it a bit less intrusive.
> 
> 
> Subject: [PATCH RFC] xfrm: Fix netdev refcount leak when flushing the
> percpu dst cache.
> 
> The dst garbage collection code is removed, so we need to call
> dst_dev_put() on cached dst entries before we release them.
> Otherwise we leak the refcount to the netdev.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/xfrm/xfrm_policy.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7a23078132cf..7836b7601b49 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1715,8 +1715,10 @@ static int xfrm_expand_policies(const struct
> flowi *fl, u16 family, static void xfrm_last_dst_update(struct
> xfrm_dst *xdst, struct xfrm_dst *old) {
>  	this_cpu_write(xfrm_last_dst, xdst);
> -	if (old)
> +	if (old) {
> +		dst_dev_put(&old->u.dst);
>  		dst_release(&old->u.dst);
> +	}
>  }
>  
>  static void __xfrm_pcpu_work_fn(void)
> @@ -1787,6 +1789,7 @@ void xfrm_policy_cache_flush(void)
>  		old = per_cpu(xfrm_last_dst, cpu);
>  		if (old && !xfrm_bundle_ok(old)) {
>  			per_cpu(xfrm_last_dst, cpu) = NULL;
> +			dst_dev_put(&old->u.dst);
>  			dst_release(&old->u.dst);
>  		}
>  		rcu_read_unlock();
I have tested this and indeed it prevents the leak.

But... IIUC the xfrm_last_dst cache is a single instance that is updated
every time a new bundle is created, whereas ip6_tunnel uses a different
dst_cache for each tunnel.

Invalidating the dst every time a new bundle is created effectively means
that in a multiple tunnels scenario (multiple ip6_tunnels over multiple
xfrm policies) there is only one active ip6_tunnel dst_cache at a time.

In case multiple tunnels are used at the same times, I think this
essentially renders the ip6_tunnel dst_cache useless.

Eyal.

  parent reply	other threads:[~2018-02-06 10:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-04 11:21 xfrm, ip tunnel: non released device reference upon device unregistration Eyal Birger
2018-02-04 13:32 ` Eyal Birger
2018-02-06  8:53 ` Steffen Klassert
2018-02-06 10:32   ` Florian Westphal
2018-02-06 10:42   ` Eyal Birger [this message]
2018-02-06 12:56 ` Florian Westphal
2018-02-06 13:09   ` Steffen Klassert
2018-02-06 13:15     ` Florian Westphal
2018-02-06 13:21       ` Steffen Klassert
2018-02-06 19:19       ` Eyal Birger
2018-02-11 15:46         ` Florian Westphal
2018-02-12 11:54           ` Eyal Birger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180206124202.4c86ddc9@jimi \
    --to=eyal.birger@gmail.com \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=shmulik@metanetworks.com \
    --cc=steffen.klassert@secunet.com \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).