From: Eyal Birger <eyal.birger@gmail.com>
To: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>,
netfilter-devel@vger.kernel.org, Karel Rericha <karel@maxtel.cz>,
Shmulik Ladkani <shmulik.ladkani@gmail.com>
Subject: Re: [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior
Date: Tue, 14 Dec 2021 12:37:27 +0200 [thread overview]
Message-ID: <CAHsH6Gs9wX7-_DS8MA8QQZEFzzP1A2AMhAeqMcSOLda2GE_-5Q@mail.gmail.com> (raw)
In-Reply-To: <20211201112454.GA2315@breakpoint.cc>
On Wed, Dec 1, 2021 at 1:24 PM Florian Westphal <fw@strlen.de> wrote:
>
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > I do not see how.
> >
> > I started a patchset, but the single hashtable for every netns might
> > defeat the batching, if there is a table per netns then it should be
> > similar to 67cc570edaa0.
>
> I see.
>
> > > for going with 2m.
> >
> > Default 2m is probably too large? This should be set at least to the
> > UDP unreplied timeout, ie. 30s?
>
> Perhaps but I don't think 30s is going to resolve the issue at hand
> (burstiness).
>
> > Probably set default scan interval to 20s and reduce it if there is
> > workload coming in the next scan round? It is possible to count for
> > the number of entries that will expired in the next round, if this
> > represents a large % of entries, then reduce the scan interval of the
> > vgarbage collector.
>
> I don't see how thios helps, see below.
>
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 770a63103c7a..3f6731a9c722 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -77,7 +77,8 @@ static __read_mostly bool nf_conntrack_locks_all;
> > /* serialize hash resizes and nf_ct_iterate_cleanup */
> > static DEFINE_MUTEX(nf_conntrack_mutex);
> >
> > -#define GC_SCAN_INTERVAL (120u * HZ)
> > +/* Scan every 20 seconds which is 2/3 of the UDP unreplied timeout. */
> > +#define GC_SCAN_INTERVAL (20u * HZ)
> > #define GC_SCAN_MAX_DURATION msecs_to_jiffies(10)
> >
> > #define MIN_CHAINLEN 8u
> > @@ -1418,12 +1419,22 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
> > return false;
> > }
> >
> > +static bool nf_ct_is_expired_next_run(const struct nf_conn *ct)
> > +{
> > + unsigned long next_timestamp = nfct_time_stamp + GC_SCAN_INTERVAL;
> > +
> > + return (__s32)(ct->timeout - next_timestamp) <= 0;
> > +}
>
> Ok.
>
> > static void gc_worker(struct work_struct *work)
> > {
> > + unsigned long next_run_expired_entries = 0, entries = 0, idle;
> > unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION;
> > unsigned int i, hashsz, nf_conntrack_max95 = 0;
> > unsigned long next_run = GC_SCAN_INTERVAL;
> > struct conntrack_gc_work *gc_work;
> > + bool next_run_expired;
> > +
> > gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
> >
> > i = gc_work->next_bucket;
> > @@ -1448,6 +1459,8 @@ static void gc_worker(struct work_struct *work)
> > struct nf_conntrack_net *cnet;
> > struct net *net;
> >
> > + next_run_expired = false;
> > + entries++;
> > tmp = nf_ct_tuplehash_to_ctrack(h);
> >
> > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > @@ -1458,6 +1471,9 @@ static void gc_worker(struct work_struct *work)
> > if (nf_ct_is_expired(tmp)) {
> > nf_ct_gc_expired(tmp);
> > continue;
> > + } else if (nf_ct_is_expired_next_run(tmp)) {
> > + next_run_expired = true;
> > + next_run_expired_entries++;
>
> This means this expires within next 20s, but not now.
>
> > @@ -1511,7 +1531,22 @@ static void gc_worker(struct work_struct *work)
> > if (next_run) {
> > gc_work->early_drop = false;
> > gc_work->next_bucket = 0;
> > + /*
> > + * Calculate gc workload for the next run, adjust the gc
> > + * interval not to reap expired entries in bursts.
> > + *
> > + * Adjust scan interval linearly based on the percentage of
> > + * entries that will expire in the next run. The scan interval
> > + * is inversely proportional to the workload.
> > + */
> > + if (entries == 0) {
> > + next_run = GC_SCAN_INTERVAL;
> > + } else {
> > + idle = 100u - (next_run_expired_entries * 100u / entries);
> > + next_run = GC_SCAN_INTERVAL * idle / 100u;
>
> AFAICS we may now schedule next run for 'right now' even though that
> would not find any expired entries (they might all have a timeout of
> 19s). Next round would reap no entries, then resched again immediately
>
> (the nf_ct_is_expired_next_run expire count assumes next run is in
> 20s, not before).
>
> This would burn cycles for 19s before those entries can be expired.
>
> Not sure how to best avoid this, perhaps computing the remaining avg timeout
> of the nf_ct_is_expired_next_run() candidates would help?
At least for us - where our load is mostly constant - using an avg
seems like a good approach.
next prev parent reply other threads:[~2021-12-14 10:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-21 17:05 [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior Florian Westphal
2021-11-23 13:24 ` Pablo Neira Ayuso
2021-11-23 13:30 ` Florian Westphal
2021-11-30 21:31 ` Pablo Neira Ayuso
2021-12-01 11:24 ` Florian Westphal
2021-12-14 10:37 ` Eyal Birger [this message]
2022-01-11 19:44 ` Florian Westphal
2021-11-23 14:01 ` Eyal Birger
2021-11-24 9:17 ` Karel Rericha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHsH6Gs9wX7-_DS8MA8QQZEFzzP1A2AMhAeqMcSOLda2GE_-5Q@mail.gmail.com \
--to=eyal.birger@gmail.com \
--cc=fw@strlen.de \
--cc=karel@maxtel.cz \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=shmulik.ladkani@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).