Re: [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Florian Westphal <fw@strlen.de>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>,
	netfilter-devel@vger.kernel.org, Karel Rericha <karel@maxtel.cz>,
	Shmulik Ladkani <shmulik.ladkani@gmail.com>,
	Eyal Birger <eyal.birger@gmail.com>
Subject: Re: [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior
Date: Wed, 1 Dec 2021 12:24:54 +0100	[thread overview]
Message-ID: <20211201112454.GA2315@breakpoint.cc> (raw)
In-Reply-To: <YaaYK9i2hixxbs70@salvia>

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > I do not see how.
> 
> I started a patchset, but the single hashtable for every netns might
> defeat the batching, if there is a table per netns then it should be
> similar to 67cc570edaa0.

I see.

> > for going with 2m.
> 
> Default 2m is probably too large? This should be set at least to the
> UDP unreplied timeout, ie. 30s?

Perhaps but I don't think 30s is going to resolve the issue at hand
(burstiness).

> Probably set default scan interval to 20s and reduce it if there is
> workload coming in the next scan round? It is possible to count for
> the number of entries that will expired in the next round, if this
> represents a large % of entries, then reduce the scan interval of the
> vgarbage collector.

I don't see how thios helps, see below.

> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 770a63103c7a..3f6731a9c722 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -77,7 +77,8 @@ static __read_mostly bool nf_conntrack_locks_all;
>  /* serialize hash resizes and nf_ct_iterate_cleanup */
>  static DEFINE_MUTEX(nf_conntrack_mutex);
>  
> -#define GC_SCAN_INTERVAL	(120u * HZ)
> +/* Scan every 20 seconds which is 2/3 of the UDP unreplied timeout. */
> +#define GC_SCAN_INTERVAL	(20u * HZ)
>  #define GC_SCAN_MAX_DURATION	msecs_to_jiffies(10)
>  
>  #define MIN_CHAINLEN	8u
> @@ -1418,12 +1419,22 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
>  	return false;
>  }
>  
> +static bool nf_ct_is_expired_next_run(const struct nf_conn *ct)
> +{
> +	unsigned long next_timestamp = nfct_time_stamp + GC_SCAN_INTERVAL;
> +
> +	return (__s32)(ct->timeout - next_timestamp) <= 0;
> +}

Ok.

>  static void gc_worker(struct work_struct *work)
>  {
> +	unsigned long next_run_expired_entries = 0, entries = 0, idle;
>  	unsigned long end_time = jiffies + GC_SCAN_MAX_DURATION;
>  	unsigned int i, hashsz, nf_conntrack_max95 = 0;
>  	unsigned long next_run = GC_SCAN_INTERVAL;
>  	struct conntrack_gc_work *gc_work;
> +	bool next_run_expired;
> +
>  	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
>  
>  	i = gc_work->next_bucket;
> @@ -1448,6 +1459,8 @@ static void gc_worker(struct work_struct *work)
>  			struct nf_conntrack_net *cnet;
>  			struct net *net;
>  
> +			next_run_expired = false;
> +			entries++;
>  			tmp = nf_ct_tuplehash_to_ctrack(h);
>  
>  			if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> @@ -1458,6 +1471,9 @@ static void gc_worker(struct work_struct *work)
>  			if (nf_ct_is_expired(tmp)) {
>  				nf_ct_gc_expired(tmp);
>  				continue;
> +			} else if (nf_ct_is_expired_next_run(tmp)) {
> +				next_run_expired = true;
> +				next_run_expired_entries++;

This means this expires within next 20s, but not now.

> @@ -1511,7 +1531,22 @@ static void gc_worker(struct work_struct *work)
>  	if (next_run) {
>  		gc_work->early_drop = false;
>  		gc_work->next_bucket = 0;
> +		/*
> +		 * Calculate gc workload for the next run, adjust the gc
> +		 * interval not to reap expired entries in bursts.
> +		 *
> +		 * Adjust scan interval linearly based on the percentage of
> +		 * entries that will expire in the next run. The scan interval
> +		 * is inversely proportional to the workload.
> +		 */
> +		if (entries == 0) {
> +			next_run = GC_SCAN_INTERVAL;
> +		} else {
> +			idle = 100u - (next_run_expired_entries * 100u / entries);
> +			next_run = GC_SCAN_INTERVAL * idle / 100u;

AFAICS we may now schedule next run for 'right now' even though that
would not find any expired entries (they might all have a timeout of
19s). Next round would reap no entries, then resched again immediately

(the nf_ct_is_expired_next_run expire count assumes next run is in
 20s, not before).

This would burn cycles for 19s before those entries can be expired.

Not sure how to best avoid this, perhaps computing the remaining avg timeout
of the nf_ct_is_expired_next_run() candidates would help?

next prev parent reply	other threads:[~2021-12-01 11:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-21 17:05 [PATCH nf-next] netfilter: conntrack: allow to tune gc behavior Florian Westphal
2021-11-23 13:24 ` Pablo Neira Ayuso
2021-11-23 13:30   ` Florian Westphal
2021-11-30 21:31     ` Pablo Neira Ayuso
2021-12-01 11:24       ` Florian Westphal [this message]
2021-12-14 10:37         ` Eyal Birger
2022-01-11 19:44           ` Florian Westphal
2021-11-23 14:01   ` Eyal Birger
2021-11-24  9:17     ` Karel Rericha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211201112454.GA2315@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=eyal.birger@gmail.com \
    --cc=karel@maxtel.cz \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=shmulik.ladkani@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.