* Re: ip_conntrack performance issues - also semantic issues [not found] ` <15914.20503.476455.344137@isis.cs3-inc.com> @ 2003-01-19 7:51 ` Patrick Schaaf [not found] ` <15914.25470.189261.168220@isis.cs3-inc.com> 0 siblings, 1 reply; 6+ messages in thread From: Patrick Schaaf @ 2003-01-19 7:51 UTC (permalink / raw) To: Don Cohen; +Cc: Martin Josefsson, netfilter-devel Hi Don & all, On Sat, Jan 18, 2003 at 11:13:27PM -0800, Don Cohen wrote: > > [This is not to the list, but feel free to put it or replies to > parts of it there if you think they're of general interest.] I'll do so. Your point about the post-dequeueing hook warrants thinking about by the masses :) > I have one big complaint with conntrack, which is related to > performance but also semantics. I don't think it's semantics per se, to me you are talking about an (important) implementation detail. Fixing it will _hopefully_ not require new semantics (as understood by the end user). > The semantic problem is that not all packets are forwarded. > What we really want is two different conntrack hooks. > The first as soon as the packet arrives classifies it in terms of > what has been seen before. This is used by filters, schedulers, etc. > However that one does NOT update the conntrack data structure. It is already the case that a NEW contrack structure is put into the _hashtable_ only after running through all the filters - as the last thing in POSTROUTING. It happens right at the point where the packet will then be ENqueued to the outgoing network device. If I understand your complaint correctly, it is really two complaints in one: 1) that a NEW conntrack need not be allocated in full. 2) that the putting into hashes (which presupposes allocation in full), happens before ENqueueing the packet, and not after DEqueueing, so potential drops by egress shaping are not seen and handled. Addressing point 1) would help overhead in the case of the filter rules themselves handling a DoS attack (by dropping suitable packets). Addressing point 2) would _additionally_ cover the CLS/SCHED policing. 2) does not make much sense if the overhead reduction of 1) has not been already accomplished. 1) makes sense by itself, and can be implemented without touching the base network stack. Would you agree that the two points are related, but independant? Fixing point 1), would need no change in semantics (but changes in the internal APIs): for each packet which now gets a NEW conntrack, instead, let the skbuff reference a shared, unspecific "THE NEW CONNTRACK". Only when an individual conntrack is required (by NAT module calls on a packet which has the shared, unspecific "THE NEW CONNTRACK"), will a real conntrack structure be allocated, on demand. The same must happen on the POSTROUTING conntrack hook, before the individual NEW connection's conntrack is put into the hashes. "ALLOCATE ON DEMAND" is the general theme. Of course, almost every place in iptables where now we assume we have an individual conntrack, must learn to individualize "THE NEW CONNTRACK" when encountered. Big code audit time. Always a good thing - Don, is that a job for you, if people commit to taking the changes? :-) Regarding point 2), there is a (temporal) semantic change involved. With that approach, it takes potentially much longer until the conntrack is created. The packet can sit in the output queue for quite a long time, if the output interface is a slow one, and filled to the brim. So, the question is, are there real world protocols where several packets back to back go from A to B, before packets flow back? Such protocols already have a window of opportunity for SNAFU in the current scheme, but updating the hashes after the output queue may aggravate the symptoms. (I have no protocol in mind, just being paranoid...) best regards Patrick ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <15914.25470.189261.168220@isis.cs3-inc.com>]
* Re: ip_conntrack performance issues - also semantic issues [not found] ` <15914.25470.189261.168220@isis.cs3-inc.com> @ 2003-01-19 9:16 ` Patrick Schaaf 2003-01-19 9:40 ` Martin Josefsson 0 siblings, 1 reply; 6+ messages in thread From: Patrick Schaaf @ 2003-01-19 9:16 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel On Sun, Jan 19, 2003 at 12:36:14AM -0800, Don Cohen wrote: > > I'll do so. Your point about the post-dequeueing hook warrants > > thinking about by the masses :) > The reason I didn't is that I already suggested this to the list. I'm of the opinion that good points warrant occasional repeating, until they permeated enough skulls to be resolved once and for all. Without the repetition, I wouldn't have thought about some of the things I write now. I'll again put this reply onto the list, see new point 3), below. As you indicate you are not fully familiar with it, here's the story about the "allocate early, hash late" approach which is in the current code. It's really pretty simple. See net/ipv4/netfilter/ip_conntrack_core.c: - init_conntrack() allocates a new conntrack, triggered by resolve_normal_ct(), which in turn is called by ip_conntrack_in() - the PREROUTING hook function, sitting there before all other functions (mangle/nat/filter). There are no other call chains. The new conntrack is NOT put into the hashes, but only referenced by the skbuff (the packet under inspection). The state match, NAT, etc. all use this skbuff reference! - __ip_conntrack_confirm() puts the conntrack of an skbuff that passed through to POSTROUTING, into the hashes. This function is wrapped in a 'static inline' in include/linux/netfilter_ipv4/ip_conntrack_core.h, called ip_conntrack_confirm(). For normal iptables operation, this in turn is called from exactly one place, ip_confirm() in net/ipv4/ip_conntrack_standalone.c. That ip_confirm(), in turn, is the last LOCAL_IN hook function, and (indirectly through ip_refrag()) the last POSTROUTING hook function. Point 2), i.e. your "put it after egress dequeue" proposal, would mean delaying the confirm operation now done in ip_refrag(). A new hook in the core network stack, called after egress dequeueing, would again call ip_confirm(), as in the LOCAL_IN case. Hmm, here's a point 3): in the LOCAL_IN case, the incoming packet may be an UDP packet to a nonexisting local port. This packet will be dropped by the UDP bind hash lookup, but that comes AFTER the LOCAL_IN hook, so there will be a conntrack remaining (with 30 seconds timeout, IIRC.) So, the proposed after-egress-dequeue hook, should also be called from the local delivery code, after real user level interest has been checked. > > 1) that a NEW conntrack need not be allocated in full. > I guess you're saying that the allocation takes a significant amount > of time, entering into hash is another chunk, and the first part is > done at prerouting. I'm pretty sure that the allocation is the main cost, putting into the hashes is not performance critical per se (if the lists are short). > In which case I agree allocation should be delayed. But of course > this means that various code has to be changed to expect new > connections to be unallocated. This may come at a cost for the normal case. In the normal case, the allocation cost _has_ to be payed (or we wouldn't load ip_conntrack), and the constant per-packet checking would be pure overhead. For filtered DoS packets not belonging to any connection, it would certainly be a significant saving. [BIG SNIP] > I don't know of any, but for such protocols I guess the right thing > would be to allocate enough at arrival of the first packet to > recognize the second packet on its arrival, etc. This is already done, in a sense. The really large NAT pieces are kept outside the main ip_conntrack structure, allocated on demand. As far as I know the prococol helpers operate (or could operate) the same way. Keep the "equal for all" conntracking structure simple. BTW, there's still 10-15% cruft there which could be removed. > A borderline case - even tcp does allow, e.g., a syn followed by a > repeat of the syn if not answered promptly. But in that case both > can be classified as new. Only in the second phase it's important > to recognize that the second one is a repeat of the first. In the current ip_conntrack_confirm() implementation, when that happens, the second packet is DROPped, and its conntrack freed shortly thereafter. This happens whenever ip_conntrack_confirm() finds one or the other direction of the connection in the hashes. I think that's about the sanest thing one can do, realizing that before that point, we potentially set up conflicting NAT information for one and the same end-user TCP connection. Letting both progress, is a sure way to desaster. With the given approach, there's a chance for everything to just work. best regards Patrick ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ip_conntrack performance issues - also semantic issues 2003-01-19 9:16 ` Patrick Schaaf @ 2003-01-19 9:40 ` Martin Josefsson 2003-01-19 9:55 ` Patrick Schaaf 2003-01-31 11:35 ` Harald Welte 0 siblings, 2 replies; 6+ messages in thread From: Martin Josefsson @ 2003-01-19 9:40 UTC (permalink / raw) To: Patrick Schaaf; +Cc: Don Cohen, Netfilter-devel On Sun, 2003-01-19 at 10:16, Patrick Schaaf wrote: > > > 1) that a NEW conntrack need not be allocated in full. > > I guess you're saying that the allocation takes a significant amount > > of time, entering into hash is another chunk, and the first part is > > done at prerouting. > > I'm pretty sure that the allocation is the main cost, putting into > the hashes is not performance critical per se (if the lists are short). My small unscientific tests (performed quite some time ago) say that the allocation isn't the most costly operation performed. The test was a simple SYN-flood (random source ip's and ports) beeing forwarded by a router. If I didn't load ip_conntrack at all everything was fine and I had lots of cpu left. When I loaded ip_conntrack the cpu usage went to 100%. Then I tried blocking the flood in filter/FORWARD and that helped _a lot_ and in this case the allocation is still happening for each packet. But of course blocking it in raw/PREROUTING is even better (before the packets reach ip_conntrack) My point is that until someone really profiles ip_conntrack I'm not sure I'll believe that the allocation is the most costly operation. We need good profiles for a lot of diffrent situations. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ip_conntrack performance issues - also semantic issues 2003-01-19 9:40 ` Martin Josefsson @ 2003-01-19 9:55 ` Patrick Schaaf 2003-01-31 11:35 ` Harald Welte 1 sibling, 0 replies; 6+ messages in thread From: Patrick Schaaf @ 2003-01-19 9:55 UTC (permalink / raw) To: Martin Josefsson; +Cc: Patrick Schaaf, Don Cohen, Netfilter-devel On Sun, Jan 19, 2003 at 10:40:47AM +0100, Martin Josefsson wrote: > > > I'm pretty sure that the allocation is the main cost, putting into > > the hashes is not performance critical per se (if the lists are short). > > My small unscientific tests (performed quite some time ago) say that the > allocation isn't the most costly operation performed. > The test was a simple SYN-flood (random source ip's and ports) beeing > forwarded by a router. If I didn't load ip_conntrack at all everything > was fine and I had lots of cpu left. When I loaded ip_conntrack the cpu > usage went to 100%. Then I tried blocking the flood in filter/FORWARD > and that helped _a lot_ and in this case the allocation is still > happening for each packet. OK, thanks for that information. Still the cost won't be in the operation of putting the conntrack into the hashes, but in the consequences of that: each hashed contrack contributes to future lookup cost (you may have head significant hash bucket chain lengths), Also, at the same point where the conntrack enters the hashes, it is also add_timer()ed for the first time. That touches kernel locks, IIRC, and may contribute significantly to the cost you saw. Such a test, IFF it also performed significant egress DROPping of the SYNs (somehow), should clearly show the benefit of Don's proposal of delaying hash (and timer) update until egress dequeueing. > My point is that until someone really profiles ip_conntrack I'm not sure > I'll believe that the allocation is the most costly operation. > We need good profiles for a lot of diffrent situations. One or two, for common situations, would already be a good start :) Maybe I should update my TSC based netfilter hook profiling patch, as it provides good, low impact, and precise facility for that. If somebody wants to give it a run & has problems extracting it from the dated patch posted below, please mail me directly. best regards Patrick [1] http://lists.netfilter.org/pipermail/netfilter-devel/2002-August/008876.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ip_conntrack performance issues - also semantic issues 2003-01-19 9:40 ` Martin Josefsson 2003-01-19 9:55 ` Patrick Schaaf @ 2003-01-31 11:35 ` Harald Welte 2003-01-31 22:58 ` Martin Josefsson 1 sibling, 1 reply; 6+ messages in thread From: Harald Welte @ 2003-01-31 11:35 UTC (permalink / raw) To: Martin Josefsson; +Cc: Patrick Schaaf, Don Cohen, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1779 bytes --] On Sun, Jan 19, 2003 at 10:40:47AM +0100, Martin Josefsson wrote: > On Sun, 2003-01-19 at 10:16, Patrick Schaaf wrote: > > > > > 1) that a NEW conntrack need not be allocated in full. > > > I guess you're saying that the allocation takes a significant amount > > > of time, entering into hash is another chunk, and the first part is > > > done at prerouting. > > > > I'm pretty sure that the allocation is the main cost, putting into > > the hashes is not performance critical per se (if the lists are short). > > My small unscientific tests (performed quite some time ago) say that the > allocation isn't the most costly operation performed. It would also surprise me if allocation cost was _that_ high. Remember, we are already using the slab cache for this. > The test was a simple SYN-flood (random source ip's and ports) beeing > forwarded by a router. If I didn't load ip_conntrack at all everything > was fine and I had lots of cpu left. When I loaded ip_conntrack the cpu > usage went to 100%. Then I tried blocking the flood in filter/FORWARD > and that helped _a lot_ and in this case the allocation is still > happening for each packet. But of course blocking it in raw/PREROUTING > is even better (before the packets reach ip_conntrack) yup. Why don't we have a 'raw' or 'prestate' table in patch-o-matic yet? This is a very easy job, can't anybody please add this missing feature? > /Martin -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ip_conntrack performance issues - also semantic issues 2003-01-31 11:35 ` Harald Welte @ 2003-01-31 22:58 ` Martin Josefsson 0 siblings, 0 replies; 6+ messages in thread From: Martin Josefsson @ 2003-01-31 22:58 UTC (permalink / raw) To: Harald Welte; +Cc: Patrick Schaaf, Don Cohen, Netfilter-devel On Fri, 2003-01-31 at 12:35, Harald Welte wrote: > > My small unscientific tests (performed quite some time ago) say that the > > allocation isn't the most costly operation performed. > > It would also surprise me if allocation cost was _that_ high. Remember, > we are already using the slab cache for this. I'll ask Robert Olsson (of NAPI fame) to run a few tests with conntrack to see where we spend most time in the case of a single (or two) udp stream(s) on both UP and SMP. I think we should start optimizing for the already_in_conntrack case first and then move on to fixing the new_conntrack problem (the global lock is probably the worst problem for SMP, on UP I have no idea as of now) > yup. Why don't we have a 'raw' or 'prestate' table in patch-o-matic > yet? This is a very easy job, can't anybody please add this missing > feature? We do have such a table in patch-o-matic/userspace, raw.patch and raw.patch.ipv6, it was Jozsef that made those patches. They contain the raw table, TRACE (every further match is printk'd) and NOTRACK (assign a dummy conntrack to the skb) targets. I've used the raw table to block a few DDoS's (some were >140kpps) without any problems. I've also tested the TRACE and NOTRACK targets (only in a testmachine, not in production) and they also worked fine. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-01-31 22:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20030118232752.26497.32589.Mailman@kashyyyk>
[not found] ` <15914.20503.476455.344137@isis.cs3-inc.com>
2003-01-19 7:51 ` ip_conntrack performance issues - also semantic issues Patrick Schaaf
[not found] ` <15914.25470.189261.168220@isis.cs3-inc.com>
2003-01-19 9:16 ` Patrick Schaaf
2003-01-19 9:40 ` Martin Josefsson
2003-01-19 9:55 ` Patrick Schaaf
2003-01-31 11:35 ` Harald Welte
2003-01-31 22:58 ` Martin Josefsson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.