From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Halley Subject: Netfilter Connection Tracking Race Condition in Kernel 2.4.x Date: Mon, 24 Jul 2006 17:31:15 -0700 Message-ID: <44C56653.10901@nominum.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: To: netfilter-devel@lists.netfilter.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: netfilter-devel-bounces@lists.netfilter.org Errors-To: netfilter-devel-bounces@lists.netfilter.org List-Id: netfilter-devel.vger.kernel.org This is bugzilla 495, resent to netfilter-devel by request. Background Our application uses ip_queue in prerouting to divert DNS UDP packets to a userland daemon which inspects them and then issues a NF_ACCEPT or NF_DROP verdict back to the kernel. We found that if several packets with the same conntrack tuple, i.e. the same src addr, src port, dst addr, and dst port, arrive very close together, then only the first one accepted by our software actually makes it back out to the wire; the others are silently dropped. Analysis We instrumented the kernel to find out where the drop was occurring. The code doing the dropping was ip_refrag() in net/ipv4/netfilter/ip_conntrack_standalone.c, specifically: /* We've seen it coming out the other side: confirm */ if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT) return NF_DROP; The dropping is caused by a race between the first packet of a given tuple making it to confirmed state, and the arrival of another packet with the same tuple. If a second packet arrives before the first is confirmed, it is assigned a new connection tracking context instead of joining that of the first unconfirmed packet. When the second packet is finally handled by ip_refrag(), the call to ip_confim() finds that there is already a confirmed entry in the table, and returns NF_DROP. From the comments in __ip_contrack_confirm(), we infer that this is to deal with duplicated datagrams and some REJECT case, but it's the wrong thing in this case because the subsequent packets are neither duplicates nor REJECTs. We were using RHEL 3 kernel 2.4.21-40 initially. We looked at later 2.4.x kernels and found some promising looking changes, namely the addition of an unconfirmed list, in more recent 2.4.x kernels. We built a 2.4.32 kernel and tested it, but the problem remained. We looked into the nature of the unconfirmed list and discovered that it was solving a different problem, but could be a useful starting point for a fix. Fix We decided to eliminate the race by having subsequent packets with the same conntrack tuple join the conntrack context of the first packet instead of creating a new conntrack context for each of them. Here's the patch: --- linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c.orig 2005-04-03 18:42:20.000000000 -0700 +++ linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c 2006-07-24 13:23:25.000000000 -0700 @@ -777,6 +777,14 @@ /* look for tuple match */ h = ip_conntrack_find_get(&tuple, NULL); if (!h) { + READ_LOCK(&ip_conntrack_lock); + h = LIST_FIND(&unconfirmed, conntrack_tuple_cmp, + struct ip_conntrack_tuple_hash *, &tuple, NULL); + if (h) + atomic_inc(&h->ctrack->ct_general.use); + READ_UNLOCK(&ip_conntrack_lock); + } + if (!h) { h = init_conntrack(&tuple, proto, skb); if (!h) return NULL; This patch reliably ends the race, and we no longer have mysteriously disappearing packets. Not being netfilter experts, we're not certain that this patch has no other side effects, and would appreciate any advice or alternative fixes that people who know more than we do have to offer. Regards, Bob Halley Brian Wellington