netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Cc: avorontsov@ru.mvista.com, Patrick McHardy <kaber@trash.net>,
	netdev@vger.kernel.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free()
Date: Tue, 24 Mar 2009 13:07:16 +0100	[thread overview]
Message-ID: <49C8CCF4.5050104@cosmosbay.com> (raw)
In-Reply-To: <OF9168DCC3.5E31F8E4-ONC1257583.003B48CC-C1257583.003C0A5E@transmode.se>

Joakim Tjernlund a écrit :
> Eric Dumazet <dada1@cosmosbay.com> wrote on 24/03/2009 10:12:53:
>> Joakim Tjernlund a écrit :
>>> Patrick McHardy <kaber@trash.net> wrote on 23/03/2009 18:49:15:
>>>> Joakim Tjernlund wrote:
>>>>> Patrick McHardy <kaber@trash.net> wrote on 23/03/2009 13:29:33:
>>>>>
>>>>>
>>>>>>> There is no /proc/net/netfilter/nf_conntrack. There is a
>>>>>>> /proc/net/nf_conntrack though and it is empty. If I telnet
>>>>>>> to the board I see:
>>>>>>>
>>>>>> That means that something is leaking conntrack references, most 
>>> likely
>>>>>> by leaking skbs. Since I haven't seen any other reports, my guess 
>>> would
>>>>>> be the ucc_geth driver.
>>>>>>
>>>>> Mucking around with the ucc_geth driver I found that if I:
>>>>>  - Move TX from IRQ to NAPI context
>>>>>  - double the weight.
>>>>>  - after booting up, wait a few mins until the JFFS2 GC kernel 
> thread 
>>> has 
>>>>> stopped
>>>>>    scanning the FS 
>>>>>
>>>>> Then the "nf_conntrack: table full, dropping packet." msgs stops.
>>>>> Does this seem right to you guys?
>>>> No. As I said, something seems to be leaking packets. You should be
>>>> able to confirm that by checking the sk_buff slabs in /proc/slabinfo.
>>>> If that *doesn't* show any signs of a leak, please run "conntrack -E"
>>>> to capture the conntrack events before the "table full" message
>>>> appears and post the output.
>>> skbuff does not differ much, but others do
>>>
>>> Before ping:
>>>   skbuff_fclone_cache    0      0    352   11    1 : tunables   54 27 
> 0 
>>> : slabdata      0      0      0
>>>   skbuff_head_cache     20     20    192   20    1 : tunables  120 60 
> 0 
>>> : slabdata      1      1      0
>>>   size-64              731    767     64   59    1 : tunables  120 60 
> 0 
>>> : slabdata     13     13      0
>>>   nf_conntrack          10     19    208   19    1 : tunables  120 60 
> 0 
>>> : slabdata      1      1      0
>>>
>>> During ping: 
>>>   skbuff_fclone_cache    0      0    352   11    1 : tunables   54 27 
> 0 
>>> : slabdata      0      0      0
>>>   skbuff_head_cache     40     40    192   20    1 : tunables  120 60 
> 0 
>>> : slabdata      2      2      0
>>>   size-64             8909   8909     64   59    1 : tunables  120 60 
> 0 
>>> : slabdata    151    151      0
>>>   nf_conntrack        5111   5111    208   19    1 : tunables  120 60 
> 0 
>>> : slabdata    269    269      0
>>>
>>> This feels more like the freeing of conntrack objects are delayed and 
>>> builds up when ping flooding.
>>>
>>> Don't have "conntrack -E" for my embedded board so that will have to 
> wait 
>>> a bit longer.
>> I dont understand how your ping can use so many conntrack entries...
>>
>> Then, as I said yesterday, I believe you have a RCU delay, because of
>> a misbehaving driver or something...
>>
>> grep RCU .config
> grep RCU .config
> # RCU Subsystem
> CONFIG_CLASSIC_RCU=y
> # CONFIG_TREE_RCU is not set
> # CONFIG_PREEMPT_RCU is not set
> # CONFIG_TREE_RCU_TRACE is not set
> # CONFIG_PREEMPT_RCU_TRACE is not set
> # CONFIG_RCU_TORTURE_TEST is not set
> # CONFIG_RCU_CPU_STALL_DETECTOR is not set
> 
>> grep CONFIG_SMP .config
> grep CONFIG_SMP .config
> # CONFIG_SMP is not set
> 
>> You could change qhimark from 10000 to 1000 in kernel/rcuclassic.c (line 
> 80)
>> as a workaround. It should force a quiescent state after 1000 freed 
> conntracks.
> 
> right, doing this almost killed all conntrack messages, had to stress it 
> pretty
> hard before I saw handful "nf_conntrack: table full, dropping packet"
> 
> RCU is not my cup of tea, do you have any ideas were to look?

In a stress situation, you feed more deleted conntracks to call_rcu() than
the blimit (10 real freeing per RCU softirq invocation). 

So with default qhimark being 10000, this means about 10000 conntracks
can sit in RCU (per CPU) before being really freed.

Only when hitting 10000, RCU enters a special mode to free all queued items, instead
of a small batch of 10

To solve your problem we can :

1) reduce qhimark from 10000 to 1000 (for example)
   Probably should be done to reduce some spikes in RCU code when freeing
   whole 10000 elements...
OR
2) change conntrack tunable (max conntrack entries on your machine)
OR
3) change net/netfilter/nf_conntrack_core.c to decrement net->ct.count
  in nf_conntrack_free() instead of callback.

[PATCH] conntrack: Reduce conntrack count in nf_conntrack_free()

We use RCU to defer freeing of conntrack structures. In DOS situation, RCU might
accumulate about 10.000 elements per CPU in its internal queues. To get accurate
conntrack counts (at the expense of slightly more RAM used), we might consider
conntrack counter not taking into account "about to be freed elements, waiting
in RCU queues". We thus decrement it in nf_conntrack_free(), not in the RCU
callback.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f4935e3..6478dc7 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -516,16 +516,17 @@ EXPORT_SYMBOL_GPL(nf_conntrack_alloc);
 static void nf_conntrack_free_rcu(struct rcu_head *head)
 {
 	struct nf_conn *ct = container_of(head, struct nf_conn, rcu);
-	struct net *net = nf_ct_net(ct);
 
 	nf_ct_ext_free(ct);
 	kmem_cache_free(nf_conntrack_cachep, ct);
-	atomic_dec(&net->ct.count);
 }
 
 void nf_conntrack_free(struct nf_conn *ct)
 {
+	struct net *net = nf_ct_net(ct);
+
 	nf_ct_ext_destroy(ct);
+	atomic_dec(&net->ct.count);
 	call_rcu(&ct->rcu, nf_conntrack_free_rcu);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_free);




  reply	other threads:[~2009-03-24 12:08 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-23 10:42 ucc_geth: nf_conntrack: table full, dropping packet Joakim Tjernlund
2009-03-23 12:15 ` Patrick McHardy
2009-03-23 12:25   ` Joakim Tjernlund
2009-03-23 12:29     ` Patrick McHardy
2009-03-23 12:59       ` Joakim Tjernlund
     [not found]       ` <OF387EC803.F810F72A-ONC1257582.00468C6E-C1257582.00475783@LocalDomain>
2009-03-23 13:09         ` Joakim Tjernlund
2009-03-23 17:42       ` Joakim Tjernlund
2009-03-23 17:49         ` Patrick McHardy
2009-03-24  8:22           ` Joakim Tjernlund
2009-03-24  9:12             ` Eric Dumazet
2009-03-24 10:55               ` Joakim Tjernlund
2009-03-24 12:07                 ` Eric Dumazet [this message]
2009-03-24 12:25                   ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Eric Dumazet
2009-03-24 12:43                     ` Patrick McHardy
2009-03-24 13:32                       ` Eric Dumazet
2009-03-24 13:38                         ` Patrick McHardy
2009-03-24 13:47                           ` Eric Dumazet
     [not found]                             ` <49C8F871.9070600@cosmosbay.com>
     [not found]                               ` <49C8F8E0.9050502@trash.net>
2009-03-25  3:53                                 ` Eric Dumazet
2009-03-25 13:39                                   ` Patrick McHardy
2009-03-25 13:44                                     ` Eric Dumazet
2009-03-24 13:20                   ` Joakim Tjernlund
2009-03-24 13:28                     ` Patrick McHardy
2009-03-24 13:29                     ` Eric Dumazet
2009-03-24 13:41                       ` Joakim Tjernlund
2009-03-24 15:17                   ` Maxime Bizon
2009-03-24 15:21                     ` Patrick McHardy
2009-03-24 15:27                     ` Eric Dumazet
2009-03-24 19:54                       ` [PATCH] netfilter: Use hlist_add_head_rcu() in nf_conntrack_set_hashsize() Eric Dumazet
2009-03-25 16:26                         ` Patrick McHardy
2009-03-25 17:53                       ` [PATCH] conntrack: use SLAB_DESTROY_BY_RCU for nf_conn structs Eric Dumazet
2009-03-25 18:05                         ` Patrick McHardy
2009-03-25 18:06                           ` Patrick McHardy
2009-03-25 18:15                           ` Eric Dumazet
2009-03-25 18:24                             ` Patrick McHardy
2009-03-25 18:53                               ` Eric Dumazet
2009-03-25 19:00                                 ` Patrick McHardy
2009-03-25 19:17                                   ` Eric Dumazet
2009-03-25 19:41                                     ` Patrick McHardy
2009-03-25 19:58                                       ` Eric Dumazet
2009-03-25 20:10                                         ` Patrick McHardy
2009-03-24 18:29                     ` [PATCH] conntrack: Reduce conntrack count in nf_conntrack_free() Joakim Tjernlund
2009-03-23 17:49         ` ucc_geth: nf_conntrack: table full, dropping packet Eric Dumazet
2009-03-23 18:04           ` Joakim Tjernlund
2009-03-23 18:08             ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C8CCF4.5050104@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=Joakim.Tjernlund@transmode.se \
    --cc=avorontsov@ru.mvista.com \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).