From: Patrick McHardy <kaber@trash.net>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>,
avorontsov@ru.mvista.com, netdev@vger.kernel.org,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free()
Date: Tue, 24 Mar 2009 13:43:54 +0100 [thread overview]
Message-ID: <49C8D58A.6060401@trash.net> (raw)
In-Reply-To: <49C8D13D.10307@cosmosbay.com>
Eric Dumazet wrote:
>> In a stress situation, you feed more deleted conntracks to call_rcu() than
>> the blimit (10 real freeing per RCU softirq invocation).
>>
>> So with default qhimark being 10000, this means about 10000 conntracks
>> can sit in RCU (per CPU) before being really freed.
>>
>> Only when hitting 10000, RCU enters a special mode to free all queued items, instead
>> of a small batch of 10
>>
>> To solve your problem we can :
>>
>> 1) reduce qhimark from 10000 to 1000 (for example)
>> Probably should be done to reduce some spikes in RCU code when freeing
>> whole 10000 elements...
>> OR
>> 2) change conntrack tunable (max conntrack entries on your machine)
>> OR
>> 3) change net/netfilter/nf_conntrack_core.c to decrement net->ct.count
>> in nf_conntrack_free() instead of callback.
>>
>> [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free()
>>
>> We use RCU to defer freeing of conntrack structures. In DOS situation, RCU might
>> accumulate about 10.000 elements per CPU in its internal queues. To get accurate
>> conntrack counts (at the expense of slightly more RAM used), we might consider
>> conntrack counter not taking into account "about to be freed elements, waiting
>> in RCU queues". We thus decrement it in nf_conntrack_free(), not in the RCU
>> callback.
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>>
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>> index f4935e3..6478dc7 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -516,16 +516,17 @@ EXPORT_SYMBOL_GPL(nf_conntrack_alloc);
>> static void nf_conntrack_free_rcu(struct rcu_head *head)
>> {
>> struct nf_conn *ct = container_of(head, struct nf_conn, rcu);
>> - struct net *net = nf_ct_net(ct);
>>
>> nf_ct_ext_free(ct);
>> kmem_cache_free(nf_conntrack_cachep, ct);
>> - atomic_dec(&net->ct.count);
>> }
>>
>> void nf_conntrack_free(struct nf_conn *ct)
>> {
>> + struct net *net = nf_ct_net(ct);
>> +
>> nf_ct_ext_destroy(ct);
>> + atomic_dec(&net->ct.count);
>> call_rcu(&ct->rcu, nf_conntrack_free_rcu);
>> }
>> EXPORT_SYMBOL_GPL(nf_conntrack_free);
>
> I forgot to say this is what we do for 'struct file' freeing as well. We
> decrement nr_files in file_free(), not in file_free_rcu()
While temporarily exceeding the limit by up to 10000 entries is
quite a lot, I guess the important thing is that it can't grow
unbounded, so I think this patch is fine.
next prev parent reply other threads:[~2009-03-24 12:44 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-23 10:42 ucc_geth: nf_conntrack: table full, dropping packet Joakim Tjernlund
2009-03-23 12:15 ` Patrick McHardy
2009-03-23 12:25 ` Joakim Tjernlund
2009-03-23 12:29 ` Patrick McHardy
2009-03-23 12:59 ` Joakim Tjernlund
[not found] ` <OF387EC803.F810F72A-ONC1257582.00468C6E-C1257582.00475783@LocalDomain>
2009-03-23 13:09 ` Joakim Tjernlund
2009-03-23 17:42 ` Joakim Tjernlund
2009-03-23 17:49 ` Patrick McHardy
2009-03-24 8:22 ` Joakim Tjernlund
2009-03-24 9:12 ` Eric Dumazet
2009-03-24 10:55 ` Joakim Tjernlund
2009-03-24 12:07 ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Eric Dumazet
2009-03-24 12:25 ` Eric Dumazet
2009-03-24 12:43 ` Patrick McHardy [this message]
2009-03-24 13:32 ` Eric Dumazet
2009-03-24 13:38 ` Patrick McHardy
2009-03-24 13:47 ` Eric Dumazet
[not found] ` <49C8F871.9070600@cosmosbay.com>
[not found] ` <49C8F8E0.9050502@trash.net>
2009-03-25 3:53 ` Eric Dumazet
2009-03-25 13:39 ` Patrick McHardy
2009-03-25 13:44 ` Eric Dumazet
2009-03-24 13:20 ` Joakim Tjernlund
2009-03-24 13:28 ` Patrick McHardy
2009-03-24 13:29 ` Eric Dumazet
2009-03-24 13:41 ` Joakim Tjernlund
2009-03-24 15:17 ` Maxime Bizon
2009-03-24 15:21 ` Patrick McHardy
2009-03-24 15:27 ` Eric Dumazet
2009-03-24 19:54 ` [PATCH] netfilter: Use hlist_add_head_rcu() in nf_conntrack_set_hashsize() Eric Dumazet
2009-03-25 16:26 ` Patrick McHardy
2009-03-25 17:53 ` [PATCH] conntrack: use SLAB_DESTROY_BY_RCU for nf_conn structs Eric Dumazet
2009-03-25 18:05 ` Patrick McHardy
2009-03-25 18:06 ` Patrick McHardy
2009-03-25 18:15 ` Eric Dumazet
2009-03-25 18:24 ` Patrick McHardy
2009-03-25 18:53 ` Eric Dumazet
2009-03-25 19:00 ` Patrick McHardy
2009-03-25 19:17 ` Eric Dumazet
2009-03-25 19:41 ` Patrick McHardy
2009-03-25 19:58 ` Eric Dumazet
2009-03-25 20:10 ` Patrick McHardy
2009-03-24 18:29 ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Joakim Tjernlund
2009-03-23 17:49 ` ucc_geth: nf_conntrack: table full, dropping packet Eric Dumazet
2009-03-23 18:04 ` Joakim Tjernlund
2009-03-23 18:08 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49C8D58A.6060401@trash.net \
--to=kaber@trash.net \
--cc=Joakim.Tjernlund@transmode.se \
--cc=avorontsov@ru.mvista.com \
--cc=dada1@cosmosbay.com \
--cc=netdev@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.