From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [net-next PATCH 4/4] net: frag LRU list per CPU Date: Thu, 25 Apr 2013 15:59:29 +0200 Message-ID: <1366898369.26911.604.camel@localhost> References: <20130424154624.16883.40974.stgit@dragon> <20130424154848.16883.65833.stgit@dragon> <1366849557.8964.110.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , Hannes Frederic Sowa , netdev@vger.kernel.org To: Eric Dumazet Return-path: Received: from mx1.redhat.com ([209.132.183.28]:12710 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758154Ab3DYN7f (ORCPT ); Thu, 25 Apr 2013 09:59:35 -0400 In-Reply-To: <1366849557.8964.110.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 2013-04-24 at 17:25 -0700, Eric Dumazet wrote: > On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote: > > The global LRU list is the major bottleneck in fragmentation handling > > (after the recent frag optimization). > > > > Simply change to use a LRU list per CPU, instead of a single shared > > LRU list. This was the simples approach of removing the LRU list, I > > could come up with. The previous "direct hash cleaning" approach was > > getting too complicated, and interacted badly with netns. > > > > The /proc/sys/net/ipv4/ipfrag_*_thresh values are now per CPU limits, > > and have been reduced to 2 Mbytes (from 4 MB). > > > > Performance compared to net-next (953c96e): > > > > Test-type: 20G64K 20G3F 20G64K+DoS 20G3F+DoS 20G64K+MQ 20G3F+MQ > > ---------- ------- ------- ---------- --------- -------- ------- > > (953c96e) > > net-next: 17417.4 11376.5 3853.43 6170.56 174.8 402.9 > > LRU-pr-CPU: 19047.0 13503.9 10314.10 12363.20 1528.7 2064.9 > > Having per cpu memory limit is going to be a nightmare for machines with > 64+ cpus I do see your concern, but the struct frag_cpu_limit is only 26 bytes, well actually 32 bytes due alignment. And the limit sort of scales with the system size. But yes, I'm not a complete fan of it... > Most machines use a single cpu to receive network packets. In some > situations, every network interrupt is balanced onto all cpus. fragments > for the same reassembled packet can be serviced on different cpus. > > So your results are good because your irq affinities were properly > tuned. Yes, the irq affinities needs to be aligned for this to work. I do see your point about the single CPU that distributes via RSS (Receive Side Scaling). > Why don't you remove the lru instead ? That was my attempt in my previous patch set: "net: frag code fixes and RFC for LRU removal" http://thread.gmane.org/gmane.linux.network/266323/ I though that you "shot-me-down" on that approach. That is why I'm doing all this work on the LRU per CPU, stuff now... (this is actually consuming a lot of my time, not knowing which direction you want me to run in...) We/you have to make a choice between: 1) "Remove LRU and do direct cleaning on hash table" 2) "Fix LRU mem acct scalability issue" > Clearly, removing the oldest frag was an implementation choice. > > We know that a slow sender has no chance to complete a packet if the > attacker can create new fragments fast enough : frag_evictor() will keep > the attacker fragments in memory and throw away good fragments. Let me quote my self (which you cut out): On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote: > I have also tested that a 512 Kbit/s simulated link (with HTB) still > works (with sending 3x UDP fragments) under the DoS test 20G3F+MQ, > which is sending approx 1Mpps on a 10Gbit/s NIC My experiments show that removing the oldest frag is actually a quite good solution. And the LRU list do have some advantages... The LRU list is the hole reason, that I can make a 512 Kbit/s link work, at the same time of a of a 10Gbit/s DoS attack (well my packet generator was limited to 1 million packets per sec, which is not 10G) The calculations: The 512 Kbit/s link will send packets spaced with 23.43 ms. (1500*8)/512000 = 0.0234375 Thus, I need enough buffering/ mem limit to keep minimum 24 ms worth of data, for a new fragment to arrive, which will move the frag queue to the tail of the LRU. On a 1Gbit/s link this is: approx 24 MB (0.0234375*1000*10^6/10^6 23.4375 MB) On a 10Gbit/s link this is: approx 240 MB (0.0234375*10000*10^6/10^6 234.375 MB) Yes, the frag sizes do get account to be bigger, but I hope you get the point, which is: We don't need that much memory to "protect" fragment from slow sender, with the LRU system. --Jesper