From mboxrd@z Thu Jan 1 00:00:00 1970 From: Simon Kirby Subject: Re: Route cache performance Date: Tue, 13 Sep 2005 15:14:48 -0700 Message-ID: <20050913221448.GD15704@netnation.com> References: <20050824000158.GA8137@netnation.com> <20050825181111.GB14336@netnation.com> <20050825200543.GA6612@yakov.inr.ac.ru> <20050825212211.GA23384@netnation.com> <20050826115520.GA12351@yakov.inr.ac.ru> <17167.29239.469711.847951@robur.slu.se> <20050906235700.GA31820@netnation.com> <17182.64751.340488.996748@robur.slu.se> <20050907162854.GB24735@netnation.com> <20050907195911.GA8382@yakov.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: To: Alexey Kuznetsov , Robert Olsson , Eric Dumazet , netdev@oss.sgi.com Content-Disposition: inline In-Reply-To: <20050907195911.GA8382@yakov.inr.ac.ru> Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org On Wed, Sep 07, 2005 at 11:59:11PM +0400, Alexey Kuznetsov wrote: > Hello! > > > Yes, setting maxbatch to 10000 also results in working gc, > > Could you try lower values? F.e. I guess 300 or a little more > (it is netdev_max_backlog) should be enough. 300 seems to be sufficient, but I'm not sure what this depends on (load, HZ, timing of some sort?). See below for full tests. > > for the normal case also hurts the DoS case...and it really hurts when > > the when the DoS case is the normal case. > > 5.7% is not "really hurts" yet. :-) I decided to try out FreeBSD in comparison as I've heard people saying that it handles this case quite well. The results are interesting. FreeBSD seems to have a route cache; however, it keys only on destination. When a new destination is seen, the route table entry that matched is "cloned" so that the MTU, etc., is copied, the dst rewritten to the exact IP (as opposed to a network route), and path MTU discovery results are maintained in this entry, keyed by destination address only. I'm not sure if Linux could work in the same way with the source routing tables enabled, but perhaps it's possible to either disable the source side of the route cache when policy routing is disabled. Or perhaps a route cache hash could be instantiated per route table or something. Actually, is there ever a valid case where the source needs to be tracked in the route cache when policy routing is disabled? A local socket will track MSS correctly while a forwarded packet will create or use an entry without touching it, so I don't see why not. Anyway, spoofed source or not go the same speed through FreeBSD. Also, there is a "fastforwarding" sysctl that sends forwarded packets from the input interrupt/poll without queueing them in a soft interrupt ("NETISR"). Polling mode on FreeBSD isn't as nice as NAPI in that it's fully manual on or off, and when it's on it triggers entirely from the timer interrupt unless told to also trigger from the idle loop. The user/kernel balancing is also manual but I can't seem to get it to forward as fast as with it disabled no matter how I adjust it. TEST RESULTS ------------ All Linux tests with NAPI enabled and the e1000 driver native to that kernel unless otherwise specified. maxbatch does not exist in kernels < 2.6.9, and rhash_size does not exist in 2.4. Sender: 367 Mbps, 717883 pps valid src/dst, 64 byte (Ethernet) packets 2.4.27-rc1: 297 Mbps forwarded (w/idle time?!) 2.4.31: 296 Mbps forwarded (w/idle time?!) 2.6.13-rc6: 173 Mbps forwarded FreeBSD 5.4-RELEASE (HZ=1000): 103 Mbps forwarded (dead userland) `- net.inet.ip.fastforwarding=1: 282 Mbps forwarded (dead userland) `- kern.polling.enable=1: 75.3 Mbps forwarded `- kern.polling.idle_poll=1: 226 Mbps forwarded Sender: 348 Mbps, 680416 pps random src, valid dst, 64 bytes (All FreeBSD tests have identical results.) 2.4.27-rc1: 122 Mbps forwarded 2.4.27-rc1 gc_elasticity=1: 182 Mbps forwarded 2.4.27-rc1+2.4.31_e1000: 117 Mbps forwarded 2.4.27-rc1+2.4.31_e1000 gc_elasticity=1: 170 Mbps forwarded 2.4.31: 95.1 Mbps forwarded 2.4.31 gc_elasticity=1: 122 Mbps forwarded 2.6.13-rc6: <1 Mbps forwarded (dst overflow) 2.6.13-rc6 maxbatch=30: <1 Mbps forwarded (dst overflow) 2.6.13-rc6 maxbatch=60: 1.5 Mbps forwarded (dst overflow) 2.6.13-rc6 maxbatch=100: 2.6 Mbps forwarded (dst overflow) 2.6.13-rc6 maxbatch=150: 3.8 Mbps forwarded (dst overflow) 2.6.13-rc6 maxbatch=200: 6.9 Mbps forwarded (dst overflow) 2.6.13-rc6 maxbatch=250: 15.4 Mbps forwarded (dst overflow) 2.6.13-rc6 maxbatch=300: 58.6 Mbps forwarded (gc balanced) 2.6.13-rc6 maxbatch=350: 60.5 Mbps forwarded 2.6.13-rc6 maxbatch=400: 59.4 Mbps forwarded 2.6.13-rc6 maxbatch=450: 59.1 Mbps forwarded 2.6.13-rc6 maxbatch=500: 62.0 Mbps forwarded 2.6.13-rc6 maxbatch=550: 61.9 Mbps forwarded 2.6.13-rc6 maxbatch=1000: 61.4 Mbps forwarded 2.6.13-rc6 maxbatch=2000: 60.2 Mbps forwarded 2.6.13-rc6 maxbatch=3000: 60.1 Mbps forwarded 2.6.13-rc6 maxbatch=5000: 59.1 Mbps forwarded 2.6.13-rc6 maxbatch=MAXINT: 59.1 Mbps forwarded 2.6.13-rc6 dst_free: 66.0 Mbps forwarded 2.6.13-rc6 dst_free max_size=rhash_size: 79.2 Mbps forwarded ------------ 2.6 definitely has better dst cache gc balancing than 2.4. I can set the max_size=rhash_size in 2.6.13-rc6 and it will just work, even without adjusting gc_elasticity or gc_thresh. In 2.4.27 and 2.4.31, the only parameter that appears to help is gc_elasticity. If I just adjust max_size, it overflows and falls over. I note that the actual read copy update "maxbatch" limit was added in 2.6.9. Before then, it seems there was no limit (infinite). Was it added for latency reasons? Time permitting, I'd also like to run some profiles. It's interesting to note that 2.6 is slower at forwarding even straight duplicate small packets. We should definitely get to the bottom of that. Simon-