From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick McHardy Subject: Re: DDoS attack causing bad effect on conntrack searches Date: Tue, 01 Jun 2010 12:18:27 +0200 Message-ID: <4C04DE73.6050605@trash.net> References: <1271941082.14501.189.camel@jdb-workstation> <4BD04C74.9020402@trash.net> <1271946961.7895.5665.camel@edumazet-laptop> <1271948029.7895.5707.camel@edumazet-laptop> <20100422155123.GA2524@linux.vnet.ibm.com> <1271952128.7895.5851.camel@edumazet-laptop> <1272056237.4599.7.camel@edumazet-laptop> <1272139861.20714.525.camel@edumazet-laptop> <1272292568.13192.43.camel@jdb-workstation> <1275340896.2478.26.camel@edumazet-laptop> <1275368732.2478.88.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Changli Gao , hawk@comx.dk, Jesper Dangaard Brouer , paulmck@linux.vnet.ibm.com, Linux Kernel Network Hackers , Netfilter Developers To: Eric Dumazet Return-path: Received: from stinky.trash.net ([213.144.137.162]:45126 "EHLO stinky.trash.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751259Ab0FAKSa (ORCPT ); Tue, 1 Jun 2010 06:18:30 -0400 In-Reply-To: <1275368732.2478.88.camel@edumazet-laptop> Sender: netfilter-devel-owner@vger.kernel.org List-ID: Eric Dumazet wrote: > Le mardi 01 juin 2010 =E0 08:28 +0800, Changli Gao a =E9crit : >> On Tue, Jun 1, 2010 at 5:21 AM, Eric Dumazet wrote: >>> I had a look at current conntrack and found the 'unconfirmed' list = was >>> maybe a candidate for a potential blackhole. >>> >>> That is, if a reader happens to hit an entry that is moved from reg= ular >>> hash table slot 'hash' to unconfirmed list, >> Sorry, but I can't find where we do this things. unconfirmed list is >> used to track the unconfirmed cts, whose corresponding skbs are stil= l >> in path from the first to the last netfilter hooks. As soon as the >> skbs end their travel in netfilter, the corresponding cts will be >> confirmed(moving ct from unconfirmed list to regular hash table). >> >=20 > So netfilter is a monolithic thing. >=20 > When a packet begins its travel into netfilter, you guarantee that no > other packet can also begin its travel and find an unconfirmed > conntrack ? Correct, the unconfirmed list exists only for cleanup. > I wonder why we use atomic ops then to track the confirmed bit :) Good question, that looks unnecessary :) >> unconfirmed list should be small, as networking receiving is in BH. >=20 > So according to you, netfilter/ct runs only in input path ? >=20 > So I assume a packet is handled by CPU X, creates a new conntrack > (possibly early droping an old entry that was previously in a standar= d > hash chain), inserted in unconfirmed list. _You_ guarantee another CP= U > Y, handling another packet, possibly sent by a hacker reading your > netdev mails, cannot find the conntrack that was early dropped ? >=20 >> How about implementing unconfirmed list as a per cpu variable? >=20 > I first implemented such a patch to reduce cache line contention, the= n I > asked to myself : What is exactly an unconfirmed conntrack ? Can thei= r > number be unbounded ? If yes, we have a problem, even on a two cpus > machine. Using two lists instead of one wont solve the fundamental > problem. If a new conntrack is created in PRE_ROUTING or LOCAL_OUT, it will be added to the unconfirmed list and moved to the hash as soon as the packet passes POST_ROUTING. This means the number of unconfirmed entrie= s created by the network is bound by the number of CPUs due to BH processing. The number created by locally generated packets is unbound in case of preemptible kernels however. > The real question is, why do we need this unconfirmed 'list' in the > first place. Is it really a private per cpu thing ? Can you prove thi= s, > in respect of lockless lookups, and things like NFQUEUE ?=20 Its used for cleaning up conntracks not in the hash table yet on module unload (or manual flush). It is supposed to be write-only during regular operation. > Each conntrack object has two list anchors. One for IP_CT_DIR_ORIGINA= L, > one for IP_CT_DIR_REPLY. >=20 > Unconfirmed list use the first anchor. This means another cpu can > definitely find an unconfirmed item in a regular hash chain, since we > dont respect an RCU grace period before re-using an object. >=20 > If memory was not a problem, we probably would use a third anchor to > avoid this, or regular RCU instead of SLAB_DESTROY_BY_RCU variant. So I guess we should check the CONFIRMED bit when searching in the hash table. -- To unsubscribe from this list: send the line "unsubscribe netfilter-dev= el" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html