Re: 4.9 conntrack performance issues

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
To: Florian Westphal <fw@strlen.de>
Cc: Guillaume Nault <g.nault@alphalink.fr>,
	Netfilter Devel <netfilter-devel@vger.kernel.org>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	nicolas.dichtel@6wind.com, netdev-owner@vger.kernel.org
Subject: Re: 4.9 conntrack performance issues
Date: Sun, 15 Jan 2017 02:18:45 +0200	[thread overview]
Message-ID: <d6dfdd8cf83933fc8f548da62a147775@nuclearcat.com> (raw)
In-Reply-To: <20170114235333.GA13421@breakpoint.cc>

On 2017-01-15 01:53, Florian Westphal wrote:
> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
> 
> [ CC Nicolas since he also played with gc heuristics in the past ]
> 
>> Sorry if i added someone wrongly to CC, please let me know, if i 
>> should
>> remove.
>> I just run successfully 4.9 on my nat several days ago, and seems 
>> panic
>> issue disappeared. But i started to face another issue, it seems 
>> garbage
>> collector is hogging one of CPU's.
>> 
>> It was handling load very well at 4.8 and below, it might be still 
>> fine, but
>> i suspect queues that belong to hogged cpu might experience issues.
> 
> The worker doesn't grab locks for long and calls scheduler for every
> bucket to give a chance for other threads to run.
> 
> It also doesn't block softinterrupts.
> 
>> Is there anything can be done to improve cpu load distribution or 
>> reduce
>> single core load?
> 
> No, I am afraid we don't export any of the heuristics as tuneables so
> far.
> 
> You could try changing defaults in net/netfilter/nf_conntrack_core.c:
> 
> #define GC_MAX_BUCKETS_DIV      64u
> /* upper bound of scan intervals */
> #define GC_INTERVAL_MAX         (2 * HZ)
> /* maximum conntracks to evict per gc run */
> #define GC_MAX_EVICTS           256u
> 
> (the first two result in ~2 minute worst case timeout detection
>  on a fully idle system).
> 
> For instance you could use
> 
> GC_MAX_BUCKETS_DIV -> 128
> GC_INTERVAL_MAX    -> 30 * HZ
> 
> (This means that it takes one hour for a dead connection to be picked
>  up on an idle system, but thats only relevant in case you use
>  conntrack events to log when connection went down and need more 
> precise
>  accounting).
Not a big deal in my case.

> 
> I suspect you might also have to change
> 
> 1011         } else if (expired_count) {
> 1012                 gc_work->next_gc_run /= 2U;
> 1013                 next_run = msecs_to_jiffies(1);
> 1014         } else {
> 
> line 2013 to
> 	next_run = msecs_to_jiffies(HZ / 2);
> 
> or something like this to not have frequent rescans.
OK
> 
> The gc is also done from the packet path (i.e. accounted
> towards (k)softirq).
> 
> How many total connections is the machine handling on average?
> And how many new/delete events happen per second?
1-2 million connections, at current moment 988k
I dont know if it is correct method to measure events rate:

NAT ~ # timeout -t 5 conntrack -E -e NEW | wc -l
conntrack v1.4.2 (conntrack-tools): 40027 flow events have been shown.
40027
NAT ~ # timeout -t 5 conntrack -E -e DESTROY | wc -l
conntrack v1.4.2 (conntrack-tools): 40951 flow events have been shown.
40951

It is not peak time, so values can be 2-3 higher at peak time, but even 
right now, it is hogging one core, leaving only 20% idle left,
while others are 80-83% idle.

> 
>>     88.98%     0.00%  kworker/24:1  [kernel.kallsyms]       [k]
>> process_one_work
>>             |
>>             ---process_one_work
>>                |
>>                |--54.65%--gc_worker
>>                |          |
>>                |           --3.58%--nf_ct_gc_expired
>>                |                     |
>>                |                     |--1.90%--nf_ct_delete
> 
> I'd be interested to see how often that shows up on other cores
> (from packet path).
Other CPU's totally different:
This is top entry
     99.60%     0.00%  swapper  [kernel.kallsyms]    [k] start_secondary
             |
             ---start_secondary
                |
                 --99.42%--cpu_startup_entry
                           |
                            --98.04%--default_idle_call
                                      arch_cpu_idle
                                      |
                                      
|--48.58%--call_function_single_interrupt
                                      |          |
                                      |           
--46.36%--smp_call_function_single_interrupt
                                      |                     
smp_trace_call_function_single_interrupt
                                      |                     |
                                      |                     
|--44.18%--irq_exit
                                      |                     |          |
                                      |                     |          
|--43.37%--__do_softirq
                                      |                     |          |  
         |
                                      |                     |          |  
          --43.18%--net_rx_action
                                      |                     |          |  
                    |
                                      |                     |          |  
                    |--36.02%--process_backlog
                                      |                     |          |  
                    |          |
                                      |                     |          |  
                    |           --35.64%--__netif_receive_skb


gc_worker didnt appeared on other core at all.
Or i am checking something wrong?

next prev parent reply	other threads:[~2017-01-15  0:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-14 23:05 4.9 conntrack performance issues Denys Fedoryshchenko
2017-01-14 23:53 ` Florian Westphal
2017-01-15  0:18   ` Denys Fedoryshchenko [this message]
2017-01-15  0:29     ` Florian Westphal
2017-01-15  0:42       ` Denys Fedoryshchenko
2017-01-30 11:26 ` Guillaume Nault
2017-01-30 11:37   ` Denys Fedoryshchenko
2017-01-30 12:21     ` Guillaume Nault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6dfdd8cf83933fc8f548da62a147775@nuclearcat.com \
    --to=nuclearcat@nuclearcat.com \
    --cc=fw@strlen.de \
    --cc=g.nault@alphalink.fr \
    --cc=netdev-owner@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).