From mboxrd@z Thu Jan 1 00:00:00 1970 From: Florian Westphal Subject: Re: [PATCH nf] netfilter: conntrack: refine gc worker heuristics, redux Date: Mon, 16 Jan 2017 18:33:49 +0100 Message-ID: <20170116173349.GA12001@breakpoint.cc> References: <1484564166-5886-1-git-send-email-fw@strlen.de> <1fa618d3-4c2a-72f1-d5f8-ddfadc6ad029@6wind.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit Cc: Florian Westphal , netfilter-devel@vger.kernel.org, nuclearcat@nuclearcat.com To: Nicolas Dichtel Return-path: Received: from Chamillionaire.breakpoint.cc ([146.0.238.67]:51224 "EHLO Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750777AbdAPReE (ORCPT ); Mon, 16 Jan 2017 12:34:04 -0500 Content-Disposition: inline In-Reply-To: <1fa618d3-4c2a-72f1-d5f8-ddfadc6ad029@6wind.com> Sender: netfilter-devel-owner@vger.kernel.org List-ID: Nicolas Dichtel wrote: > Le 16/01/2017 à 11:56, Florian Westphal a écrit : > > This further refines the changes made to conntrack gc_worker in > > commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics"). > > > > The main of this change is to reduce the scan interval when evictions > > take place. > > > > However, on the reporters' setup, there are 1-2 Million conntrack entries > > in total, and roughly 8k new connections per second. > > > > In this case we'll always evict at least one entry per gc cycle and scan > > interval is always at 1 jiffy. > > > > Given we scan ~10k entries per run its clearly wrong to reduce interval > > based on number of evictions, it will only waste cpu cycles. > > > > Thus only look at the ratio (scanned entries vs. evicted entries) to make > > a decision on whether to reduce or not. > > > > Given the fact that evictor is supposed to only kick in when system turns > > idle after a busy period, we should pick a high ratio -- this makes it 50%. > > > > Also, get rid of GC_MAX_EVICTS. Instead of breaking loop and instant > > resched, just don't bother checking this at all (the loop calls > > cond_resched for every bucket anyway). > > > > Removal of GC_MAX_EVICTS was suggested by Nicolas in the past; I > > should have listened. > I (still) agree with that part, maybe it could be done in another patch. Done. > > Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") > > Signed-off-by: Florian Westphal > With that patch, it's worse than before. I regularly have a delay > 30 seconds > in my tests (this was infrequent before the patch). :-/ We could lower the ratio of course, but under synflood with 3 second syn_recv timeout I get a scan/evict ratio of ~15%, and we should not schedule worker all the time in that case. > A solution, to fix the balance between cpu cycles consumption and short ct > events delays may be to introduce a new sysctl which allows the user to choose > between two predefined configs: short delay or long delay for ct events. > Even if I don't like this, maybe it could be acceptable with a default value set > to 'short delay'. TBH in that case I'd prefer to just give up, and change worker to this: for (i = 0; i < nf_conntrack_htable_size; i++) { [..] hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) { tmp = nf_ct_tuplehash_to_ctrack(h); if (nf_ct_is_expired(tmp)) { nf_ct_gc_expired(tmp); continue; } } [..] cond_resched_rcu_qs(); } if (!gc_work->exiting) queue_delayed_work(system_long_wq, &gc_work->dwork, nf_conntrack_gc_interval); } Where nf_conntrack_gc_interval gets exported to userspace (in milliseconds), with a default set e.g. to one minute. Obvious downside: with low setting like 1 second we'll do frequent rescan of entire table.