Re: xfrm4_garbage_collect reaching limit

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Steffen Klassert <steffen.klassert@secunet.com>
To: Dan Streetman <dan.streetman@canonical.com>
Cc: Dan Streetman <ddstreet@ieee.org>,
	Jay Vosburgh <jay.vosburgh@canonical.com>,
	<netdev@vger.kernel.org>
Subject: Re: xfrm4_garbage_collect reaching limit
Date: Fri, 11 Sep 2015 11:48:08 +0200	[thread overview]
Message-ID: <20150911094808.GI25499@secunet.com> (raw)
In-Reply-To: <CAOZ2QJPt=3YmLG=S2Zf70yNPwLf-dbWtiyWJNbfYMRrQ5C6EZg@mail.gmail.com>

Hi Dan.

On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote:
> Hi Steffen,
> 
> I've been working with Jay on a ipsec issue, which I believe he
> discussed with you.  

Yes, we talked about this at the LPC.

> In this case the xfrm4_garbage_collect is
> returning error because the number of xfrm4 dst entries has exceeded
> twice the gc_thresh, which causes new allocations of xfrm4 dst objects
> to fail, thus making the ipsec connection unusable (until dst objects
> are removed/freed).
> 
> The main reason the count gets to the limit is because the
> xfrm4_policy_afinfo.garbage_collect function - which points to
> flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
> dst will get cleaned up, it only cleans up unused entries.
> 
> The flow cache hashtable size limit watermark does restrict how many
> flow cache entries exist (by shrinking the per-cpu hashtable once it
> has 4k entries), and therefore indirectly controls the total number of
> xfrm4 dst objects.  However, there's a mismatch between the default
> xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
> objects) - and the flow cache hashtable limit of 4k objects per cpu.
> Any system with 16 or less cpus will have a total limit of 64k (or
> less) flow cache entries, so the 64k xfrm4 dst entry limit will never
> be reached.  However for any system with more than 16 cpus, the flow
> cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
> allocation can fail, rendering the ipsec connection unusable.
> 
> The most obvious solution is for the system admin to increase the
> xfrm4_gc_thresh value, although it's not really an obvious solution to
> the end-user what value they should set it to :-) 

Yes, a static gc threshold is always wrong for some workloads. So
the user needs to adjust it to his needs, even if the right value
is not obvious.

> Possibly the
> default value of xfrm4_gc_thresh could be set proportional to
> num_online_cpus(), but that doesn't help when cpus are onlined after
> boot.  

This could be an option, we could change the xfrm4_gc_thresh value with
a cpu notifier callback if more cpus come up after boot.

> Also, a warning message indicating the xfrm4_gc_thresh limit
> was reached, and a suggestion to increase the limit, may help anyone
> who hits the issue.
> 
> I'm not sure if something more aggressive is appropriate, like
> removing active entries during garbage collection. 

It would not make too much sense to push an active flow out of the
fastpath just to add some other flow. If the number of active
entries is to high, there is no other option than increasing the
gc threshold.

You could try to reduce the number of active entries by shutting
down stale security associations frequently.

> Or, removing the
> failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
> always be allocated,

This would open doors for DOS attacks, we can't do this.

> or just increasing it from gc_thresh * 2 up to *
> 4 or more.

This would just defer the problem, so not a real solution.

That said, whatever we do, we just paper over the real problem,
that is the flowcache itself. Everything that need this kind
of garbage collecting is fundamentally broken. But as long as
nobody volunteers to work on a replacement, we have to live
with this situation somehow.

next prev parent reply	other threads:[~2015-09-11  9:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-10 21:01 xfrm4_garbage_collect reaching limit Dan Streetman
2015-09-11  9:48 ` Steffen Klassert [this message]
2015-09-15  3:14   ` Dan Streetman
2015-09-16  8:45     ` Steffen Klassert
2015-09-18  4:23       ` David Miller
2015-09-18  4:49         ` Steffen Klassert
2015-09-18  5:00       ` Dan Streetman
2015-09-21 14:51         ` Dan Streetman
2015-09-30  9:54           ` Steffen Klassert
2015-09-21 14:52       ` Dan Streetman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150911094808.GI25499@secunet.com \
    --to=steffen.klassert@secunet.com \
    --cc=dan.streetman@canonical.com \
    --cc=ddstreet@ieee.org \
    --cc=jay.vosburgh@canonical.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).