From: Afi Gjermund <afigjermund@gmail.com>
To: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Jan Engelhardt <jengelh@medozas.de>,
netfilter-devel@vger.kernel.org
Subject: Re: nf_conntrack_count versus '/proc/net/nf_conntrack | wc -l' count
Date: Thu, 18 Feb 2010 16:53:51 -0800 [thread overview]
Message-ID: <48ceaa831002181653o549964c3w76bc27dd66864f8b@mail.gmail.com> (raw)
In-Reply-To: <48ceaa831002181139k134dadbp2bc65857eac6af59@mail.gmail.com>
On Thu, Feb 18, 2010 at 11:39 AM, Afi Gjermund <afigjermund@gmail.com> wrote:
> On Thu, Feb 18, 2010 at 10:19 AM, Patrick McHardy <kaber@trash.net> wrote:
>> Afi Gjermund wrote:
>>> On Thu, Feb 18, 2010 at 10:07 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>>>> Shouldn't the value after the flush be 0? The traffic that has created
>>>>>>> this mess is from a REDIRECT rule in the PREROUTING chain of the 'nat'
>>>>>>> table.
>>>>>> Could you post a copy of these rules ?
>>>>>>
>>>>> iptables -t nat -A PREROUTING -p tcp -s X.X.X.X -d X.X.X.X --sport X
>>>>> --dport X -j REDIRECT --to-port X
>>>> Yes I understood you were using such rules, but I cannot understand how
>>>> it can trigger without real nics being plugged. So I asked you some
>>>> details, apprently you dont want to provide them and prefer to hide from
>>>> us :)
>>>>
>>> Lol, sorry. The X values are dynamic and depend on what network the
>>> device happens to be on, as well as the ephemeral source port.
>>>
>>> iptables -t nat -A PREROUTING -p tcp -s 172.168.8.45 -d 172.168.8.200
>>> --sport 4351 --dport 4500 -j REDIRECT --to-port 45001
>>
>> NAT is unlikely to be the cause since its widely used and there
>> are no other reports of leaks. Please describe your full setup,
>> especially things like traffic scheduling, network devices,
>> userspace queueing etc etc.
>>
>
> The device has 2 network interfaces that are configured in a bridge
> (eth0,eth1). The traffic scheduling has not been changed from the
> default kernel configuration.
>
> Problem path:
> The problem I am seeing is that my tcp connections enter the
> /proc/net/nf_conntrack table, then disappear over time but the
> nf_conntrack_count never decreases. Over time, the nf_conntrack_count
> hits the 4096 nf_conntrack_max and the kernel begins to drop packets
> even though the /proc/net/nf_conntrack table is not full (has < 100
> connections).
>
> In testing I decided to set the nf_conntrack_max to 100, and fill the
> table via the connections above. Then remove both ethernet cables to
> ensure no new connections could be made. I also set the
> nf_conntrack_tcp_timeout_established to 60 seconds. I left this for 2
> hours and saw that the /proc/net/nf_conntrack table was empty while
> the nf_conntrack_count was still 100.
>
> I also created a kernel module that calls the nf_conntrack_flush()
> function, this seems to only clear the /proc/net/nf_conntrack table,
> but not the count. If I also do an atomic_set(&nf_conntrack_count,0)
> then (obviously) the count becomes 0. It is as if the connections are
> being removed from the table, but the count is not being decremented,
> which I am not sure why. As far as I understand it, they should be in
> sync.
>
I have found the issue that was causing this problem. A userspace
application that was using the NFQueue mechanism to queue data to
userspace was returning a verdict of STOLEN on the first UDP packet
seen. This appears to have been leaving entries in the connection
table that could not be flushed via nf_conntrack_flush(). When
changing the verdict to DROP, the problem no longer existed.
This was found as I noticed the Timer value of the connections within
the table remained at 3000 (30 in nf_conntrack_udp_timeout x 100).
Feb 18 22:56:31 titan user.info kernel: ===========================
Table Dump =========================
Feb 18 22:56:31 titan user.info kernel: ---- Set ----
Feb 18 22:56:31 titan user.info kernel: Timer is : 3000
Feb 18 22:56:31 titan user.info kernel: tuple dump: IP_CT_DIR_ORIGINAL
Feb 18 22:56:31 titan user.info kernel:
Feb 18 22:56:31 titan user.warn kernel: tuple c321cc70: l3num 2
protonum 17 srcIP 172.16.8.45 srcPort 4858 -> dstIP 172.16.8.7
dstPort 45001
Feb 18 22:56:31 titan user.info kernel: tuple dump: IP_CT_DIR_REPLY
Feb 18 22:56:31 titan user.info kernel:
Feb 18 22:56:31 titan user.warn kernel: tuple c321cca8: l3num 2
protonum 17 srcIP 172.16.8.7 srcPort 45001 -> dstIP 172.16.8.45
dstPort 4858
Feb 18 22:56:31 titan user.info kernel: ---- End Set ----
Feb 18 22:56:31 titan user.info kernel: ===========================
End Table Dump =========================
Feb 18 22:57:03 titan user.info kernel: ===========================
Table Dump =========================
Feb 18 22:57:03 titan user.info kernel: ---- Set ----
Feb 18 22:57:03 titan user.info kernel: Timer is : 3000
Feb 18 22:57:03 titan user.info kernel: tuple dump: IP_CT_DIR_ORIGINAL
Feb 18 22:57:03 titan user.info kernel:
Feb 18 22:57:03 titan user.warn kernel: tuple c321cc70: l3num 2
protonum 17 srcIP 172.16.8.45 srcPort 4858 -> dstIP 172.16.8.7
dstPort 45001
Feb 18 22:57:03 titan user.info kernel: tuple dump: IP_CT_DIR_REPLY
Feb 18 22:57:03 titan user.info kernel:
Feb 18 22:57:03 titan user.warn kernel: tuple c321cca8: l3num 2
protonum 17 srcIP 172.16.8.7 srcPort 45001 -> dstIP 172.16.8.45
dstPort 4858
Feb 18 22:57:03 titan user.info kernel: ---- End Set ----
Feb 18 22:57:03 titan user.info kernel: ===========================
End Table Dump =========================
Thank you all for your help! Hopefully this may help other people as well.
Afi
next prev parent reply other threads:[~2010-02-19 0:53 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-15 17:27 nf_conntrack_count versus '/proc/net/nf_conntrack | wc -l' count Afi Gjermund
2010-02-15 17:29 ` Patrick McHardy
2010-02-15 17:46 ` Jan Engelhardt
2010-02-15 18:04 ` Afi Gjermund
2010-02-15 19:00 ` Jan Engelhardt
2010-02-15 19:30 ` Afi Gjermund
2010-02-15 19:45 ` Afi Gjermund
2010-02-15 20:04 ` Eric Dumazet
2010-02-15 20:33 ` Jan Engelhardt
2010-02-15 21:08 ` Afi Gjermund
2010-02-15 21:52 ` Eric Dumazet
2010-02-15 22:00 ` Afi Gjermund
2010-02-15 22:02 ` Eric Dumazet
2010-02-15 22:10 ` Afi Gjermund
2010-02-18 17:40 ` Afi Gjermund
2010-02-18 17:51 ` Eric Dumazet
2010-02-18 17:55 ` Afi Gjermund
2010-02-18 18:07 ` Eric Dumazet
2010-02-18 18:13 ` Afi Gjermund
2010-02-18 18:19 ` Patrick McHardy
2010-02-18 19:39 ` Afi Gjermund
2010-02-19 0:53 ` Afi Gjermund [this message]
2010-02-19 14:12 ` Eric Dumazet
2010-02-19 14:29 ` Patrick McHardy
2010-02-18 18:12 ` Douglas Diniz
2010-02-18 18:22 ` Patrick McHardy
2010-02-18 18:35 ` Douglas Diniz
2010-02-15 21:17 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48ceaa831002181653o549964c3w76bc27dd66864f8b@mail.gmail.com \
--to=afigjermund@gmail.com \
--cc=eric.dumazet@gmail.com \
--cc=jengelh@medozas.de \
--cc=kaber@trash.net \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).