Re: [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eyal Birger <eyal.birger@gmail.com>
To: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>,
	netfilter-devel@vger.kernel.org, Karel Rericha <karel@maxtel.cz>,
	Shmulik Ladkani <shmulik.ladkani@gmail.com>
Subject: Re: [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior
Date: Tue, 14 Dec 2021 12:37:27 +0200	[thread overview]
Message-ID: <CAHsH6Gs9wX7-_DS8MA8QQZEFzzP1A2AMhAeqMcSOLda2GE_-5Q@mail.gmail.com> (raw)
In-Reply-To: <20211201112454.GA2315@breakpoint.cc>

On Wed, Dec 1, 2021 at 1:24 PM Florian Westphal <fw@strlen.de> wrote:
>
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > I do not see how.
> >
> > I started a patchset, but the single hashtable for every netns might
> > defeat the batching, if there is a table per netns then it should be
> > similar to 67cc570edaa0.
>
> I see.
>
> > > for going with 2m.
> >
> > Default 2m is probably too large? This should be set at least to the
> > UDP unreplied timeout, ie. 30s?
>
> Perhaps but I don't think 30s is going to resolve the issue at hand
> (burstiness).
>
> > Probably set default scan interval to 20s and reduce it if there is
> > workload coming in the next scan round? It is possible to count for
> > the number of entries that will expired in the next round, if this
> > represents a large % of entries, then reduce the scan interval of the
> > vgarbage collector.
>
> I don't see how thios helps, see below.
>
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 770a63103c7a..3f6731a9c722 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -77,7 +77,8 @@ static __read_mostly bool nf_conntrack_locks_all;
> >  /* serialize hash resizes and nf_ct_iterate_cleanup */
> >  static DEFINE_MUTEX(nf_conntrack_mutex);
> >
> > -#define GC_SCAN_INTERVAL     (120u * HZ)
> > +/* Scan every 20 seconds which is 2/3 of the UDP unreplied timeout. */
> > +#define GC_SCAN_INTERVAL     (20u * HZ)
> >  #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10)
> >
> >  #define MIN_CHAINLEN 8u
> > @@ -1418,12 +1419,22 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
> >       return false;
> >  }
> >
> > +static bool nf_ct_is_expired_next_run(const struct nf_conn *ct)
> > +{
> > +     unsigned long next_timestamp = nfct_time_stamp + GC_SCAN_INTERVAL;
> > +
> > +     return (__s32)(ct->timeout - next_timestamp) <= 0;
> > +}
>
> Ok.
>
> >  static void gc_worker(struct work_struct *work)
> >  {
> > +     unsigned long next_run_expired_entries = 0, entries = 0, idle;
> >       unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION;
> >       unsigned int i, hashsz, nf_conntrack_max95 = 0;
> >       unsigned long next_run = GC_SCAN_INTERVAL;
> >       struct conntrack_gc_work *gc_work;
> > +     bool next_run_expired;
> > +
> >       gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
> >
> >       i = gc_work->next_bucket;
> > @@ -1448,6 +1459,8 @@ static void gc_worker(struct work_struct *work)
> >                       struct nf_conntrack_net *cnet;
> >                       struct net *net;
> >
> > +                     next_run_expired = false;
> > +                     entries++;
> >                       tmp = nf_ct_tuplehash_to_ctrack(h);
> >
> >                       if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > @@ -1458,6 +1471,9 @@ static void gc_worker(struct work_struct *work)
> >                       if (nf_ct_is_expired(tmp)) {
> >                               nf_ct_gc_expired(tmp);
> >                               continue;
> > +                     } else if (nf_ct_is_expired_next_run(tmp)) {
> > +                             next_run_expired = true;
> > +                             next_run_expired_entries++;
>
> This means this expires within next 20s, but not now.
>
> > @@ -1511,7 +1531,22 @@ static void gc_worker(struct work_struct *work)
> >       if (next_run) {
> >               gc_work->early_drop = false;
> >               gc_work->next_bucket = 0;
> > +             /*
> > +              * Calculate gc workload for the next run, adjust the gc
> > +              * interval not to reap expired entries in bursts.
> > +              *
> > +              * Adjust scan interval linearly based on the percentage of
> > +              * entries that will expire in the next run. The scan interval
> > +              * is inversely proportional to the workload.
> > +              */
> > +             if (entries == 0) {
> > +                     next_run = GC_SCAN_INTERVAL;
> > +             } else {
> > +                     idle = 100u - (next_run_expired_entries * 100u / entries);
> > +                     next_run = GC_SCAN_INTERVAL * idle / 100u;
>
> AFAICS we may now schedule next run for 'right now' even though that
> would not find any expired entries (they might all have a timeout of
> 19s). Next round would reap no entries, then resched again immediately
>
> (the nf_ct_is_expired_next_run expire count assumes next run is in
>  20s, not before).
>
> This would burn cycles for 19s before those entries can be expired.
>
> Not sure how to best avoid this, perhaps computing the remaining avg timeout
> of the nf_ct_is_expired_next_run() candidates would help?

At least for us - where our load is mostly constant - using an avg
seems like a good approach.

next prev parent reply	other threads:[~2021-12-14 10:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-21 17:05 [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior Florian Westphal
2021-11-23 13:24 ` Pablo Neira Ayuso
2021-11-23 13:30   ` Florian Westphal
2021-11-30 21:31     ` Pablo Neira Ayuso
2021-12-01 11:24       ` Florian Westphal
2021-12-14 10:37         ` Eyal Birger [this message]
2022-01-11 19:44           ` Florian Westphal
2021-11-23 14:01   ` Eyal Birger
2021-11-24  9:17     ` Karel Rericha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHsH6Gs9wX7-_DS8MA8QQZEFzzP1A2AMhAeqMcSOLda2GE_-5Q@mail.gmail.com \
    --to=eyal.birger@gmail.com \
    --cc=fw@strlen.de \
    --cc=karel@maxtel.cz \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=shmulik.ladkani@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).