From: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
To: Florian Westphal <fw@strlen.de>
Cc: Guillaume Nault <g.nault@alphalink.fr>,
Netfilter Devel <netfilter-devel@vger.kernel.org>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
nicolas.dichtel@6wind.com, netdev-owner@vger.kernel.org
Subject: Re: 4.9 conntrack performance issues
Date: Sun, 15 Jan 2017 02:18:45 +0200 [thread overview]
Message-ID: <d6dfdd8cf83933fc8f548da62a147775@nuclearcat.com> (raw)
In-Reply-To: <20170114235333.GA13421@breakpoint.cc>
On 2017-01-15 01:53, Florian Westphal wrote:
> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>
> [ CC Nicolas since he also played with gc heuristics in the past ]
>
>> Sorry if i added someone wrongly to CC, please let me know, if i
>> should
>> remove.
>> I just run successfully 4.9 on my nat several days ago, and seems
>> panic
>> issue disappeared. But i started to face another issue, it seems
>> garbage
>> collector is hogging one of CPU's.
>>
>> It was handling load very well at 4.8 and below, it might be still
>> fine, but
>> i suspect queues that belong to hogged cpu might experience issues.
>
> The worker doesn't grab locks for long and calls scheduler for every
> bucket to give a chance for other threads to run.
>
> It also doesn't block softinterrupts.
>
>> Is there anything can be done to improve cpu load distribution or
>> reduce
>> single core load?
>
> No, I am afraid we don't export any of the heuristics as tuneables so
> far.
>
> You could try changing defaults in net/netfilter/nf_conntrack_core.c:
>
> #define GC_MAX_BUCKETS_DIV 64u
> /* upper bound of scan intervals */
> #define GC_INTERVAL_MAX (2 * HZ)
> /* maximum conntracks to evict per gc run */
> #define GC_MAX_EVICTS 256u
>
> (the first two result in ~2 minute worst case timeout detection
> on a fully idle system).
>
> For instance you could use
>
> GC_MAX_BUCKETS_DIV -> 128
> GC_INTERVAL_MAX -> 30 * HZ
>
> (This means that it takes one hour for a dead connection to be picked
> up on an idle system, but thats only relevant in case you use
> conntrack events to log when connection went down and need more
> precise
> accounting).
Not a big deal in my case.
>
> I suspect you might also have to change
>
> 1011 } else if (expired_count) {
> 1012 gc_work->next_gc_run /= 2U;
> 1013 next_run = msecs_to_jiffies(1);
> 1014 } else {
>
> line 2013 to
> next_run = msecs_to_jiffies(HZ / 2);
>
> or something like this to not have frequent rescans.
OK
>
> The gc is also done from the packet path (i.e. accounted
> towards (k)softirq).
>
> How many total connections is the machine handling on average?
> And how many new/delete events happen per second?
1-2 million connections, at current moment 988k
I dont know if it is correct method to measure events rate:
NAT ~ # timeout -t 5 conntrack -E -e NEW | wc -l
conntrack v1.4.2 (conntrack-tools): 40027 flow events have been shown.
40027
NAT ~ # timeout -t 5 conntrack -E -e DESTROY | wc -l
conntrack v1.4.2 (conntrack-tools): 40951 flow events have been shown.
40951
It is not peak time, so values can be 2-3 higher at peak time, but even
right now, it is hogging one core, leaving only 20% idle left,
while others are 80-83% idle.
>
>> 88.98% 0.00% kworker/24:1 [kernel.kallsyms] [k]
>> process_one_work
>> |
>> ---process_one_work
>> |
>> |--54.65%--gc_worker
>> | |
>> | --3.58%--nf_ct_gc_expired
>> | |
>> | |--1.90%--nf_ct_delete
>
> I'd be interested to see how often that shows up on other cores
> (from packet path).
Other CPU's totally different:
This is top entry
99.60% 0.00% swapper [kernel.kallsyms] [k] start_secondary
|
---start_secondary
|
--99.42%--cpu_startup_entry
|
--98.04%--default_idle_call
arch_cpu_idle
|
|--48.58%--call_function_single_interrupt
| |
|
--46.36%--smp_call_function_single_interrupt
|
smp_trace_call_function_single_interrupt
| |
|
|--44.18%--irq_exit
| | |
| |
|--43.37%--__do_softirq
| | |
|
| | |
--43.18%--net_rx_action
| | |
|
| | |
|--36.02%--process_backlog
| | |
| |
| | |
| --35.64%--__netif_receive_skb
gc_worker didnt appeared on other core at all.
Or i am checking something wrong?
next prev parent reply other threads:[~2017-01-15 0:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-14 23:05 4.9 conntrack performance issues Denys Fedoryshchenko
2017-01-14 23:53 ` Florian Westphal
2017-01-15 0:18 ` Denys Fedoryshchenko [this message]
2017-01-15 0:29 ` Florian Westphal
2017-01-15 0:42 ` Denys Fedoryshchenko
2017-01-30 11:26 ` Guillaume Nault
2017-01-30 11:37 ` Denys Fedoryshchenko
2017-01-30 12:21 ` Guillaume Nault
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d6dfdd8cf83933fc8f548da62a147775@nuclearcat.com \
--to=nuclearcat@nuclearcat.com \
--cc=fw@strlen.de \
--cc=g.nault@alphalink.fr \
--cc=netdev-owner@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.