Re: [net-next PATCH 4/4] net: frag LRU list per CPU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	netdev@vger.kernel.org
Subject: Re: [net-next PATCH 4/4] net: frag LRU list per CPU
Date: Thu, 25 Apr 2013 15:59:29 +0200	[thread overview]
Message-ID: <1366898369.26911.604.camel@localhost> (raw)
In-Reply-To: <1366849557.8964.110.camel@edumazet-glaptop>

On Wed, 2013-04-24 at 17:25 -0700, Eric Dumazet wrote:
> On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote:
> > The global LRU list is the major bottleneck in fragmentation handling
> > (after the recent frag optimization).
> > 
> > Simply change to use a LRU list per CPU, instead of a single shared
> > LRU list.  This was the simples approach of removing the LRU list, I
> > could come up with.  The previous "direct hash cleaning" approach was
> > getting too complicated, and interacted badly with netns.
> > 
> > The /proc/sys/net/ipv4/ipfrag_*_thresh values are now per CPU limits,
> > and have been reduced to 2 Mbytes (from 4 MB).
> > 
> > Performance compared to net-next (953c96e):
> > 
> > Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
> > ----------  -------   -------  ----------  ---------  --------  -------
> > (953c96e)
> >  net-next:  17417.4   11376.5   3853.43     6170.56     174.8    402.9
> > LRU-pr-CPU: 19047.0   13503.9  10314.10    12363.20    1528.7   2064.9
> 
> Having per cpu memory limit is going to be a nightmare for machines with
> 64+ cpus

I do see your concern, but the struct frag_cpu_limit is only 26 bytes,
well actually 32 bytes due alignment.  And the limit sort of scales with
the system size.  But yes, I'm not a complete fan of it...


> Most machines use a single cpu to receive network packets. In some
> situations, every network interrupt is balanced onto all cpus. fragments
> for the same reassembled packet can be serviced on different cpus.
> 
> So your results are good because your irq affinities were properly
> tuned.

Yes, the irq affinities needs to be aligned for this to work.  I do see
your point about the single CPU that distributes via RSS (Receive Side
Scaling).


> Why don't you remove the lru instead ?

That was my attempt in my previous patch set:
  "net: frag code fixes and RFC for LRU removal"
  http://thread.gmane.org/gmane.linux.network/266323/

I though that you "shot-me-down" on that approach.  That is why I'm
doing all this work on the LRU per CPU, stuff now... (this is actually
consuming a lot of my time, not knowing which direction you want me to
run in...)

We/you have to make a choice between:
1) "Remove LRU and do direct cleaning on hash table"
2) "Fix LRU mem acct scalability issue"


> Clearly, removing the oldest frag was an implementation choice.
> 
> We know that a slow sender has no chance to complete a packet if the
> attacker can create new fragments fast enough : frag_evictor() will keep
> the attacker fragments in memory and throw away good fragments.

Let me quote my self (which you cut out):

On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote:
> I have also tested that a 512 Kbit/s simulated link (with HTB) still
> works (with sending 3x UDP fragments) under the DoS test 20G3F+MQ,
> which is sending approx 1Mpps on a 10Gbit/s NIC

My experiments show that removing the oldest frag is actually a quite
good solution.  And the LRU list do have some advantages...

The LRU list is the hole reason, that I can make a 512 Kbit/s link work,
at the same time of a of a 10Gbit/s DoS attack (well my packet generator
was limited to 1 million packets per sec, which is not 10G)

The calculations:
  The 512 Kbit/s link will send packets spaced with 23.43 ms.
  (1500*8)/512000 = 0.0234375

Thus, I need enough buffering/ mem limit to keep minimum 24 ms worth of
data, for a new fragment to arrive, which will move the frag queue to
the tail of the LRU.

On a 1Gbit/s link this is: approx 24 MB
 (0.0234375*1000*10^6/10^6 23.4375 MB)

On a 10Gbit/s link this is: approx 240 MB
 (0.0234375*10000*10^6/10^6 234.375 MB)

Yes, the frag sizes do get account to be bigger, but I hope you get the
point, which is:
 We don't need that much memory to "protect" fragment from slow sender,
with the LRU system.

--Jesper

next prev parent reply	other threads:[~2013-04-25 13:59 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-24 15:47 [net-next PATCH 0/4] net: frag patchset for fixing LRU scalability issue Jesper Dangaard Brouer
2013-04-24 15:48 ` [net-next PATCH 1/4] Revert "inet: limit length of fragment queue hash table bucket lists" Jesper Dangaard Brouer
2013-04-25  0:00   ` Eric Dumazet
2013-04-25 13:10     ` Jesper Dangaard Brouer
2013-04-25 13:58       ` David Laight
2013-05-02  7:59     ` Jesper Dangaard Brouer
2013-05-02 15:16       ` Eric Dumazet
2013-05-03  9:15         ` Jesper Dangaard Brouer
2013-04-24 15:48 ` [net-next PATCH 2/4] net: increase frag hash size Jesper Dangaard Brouer
2013-04-24 22:09   ` Sergei Shtylyov
2013-04-25 10:13     ` Jesper Dangaard Brouer
2013-04-25 12:13       ` Sergei Shtylyov
2013-04-25 19:11       ` David Miller
2013-04-24 23:48   ` Eric Dumazet
2013-04-25  3:26   ` Hannes Frederic Sowa
2013-04-25 19:52   ` [net-next PATCH V2] " Jesper Dangaard Brouer
2013-04-29 17:44     ` David Miller
2013-04-24 15:48 ` [net-next PATCH 3/4] net: avoid false perf interpretations in frag code Jesper Dangaard Brouer
2013-04-24 23:48   ` Eric Dumazet
2013-04-24 23:54     ` David Miller
2013-04-25 10:57     ` Jesper Dangaard Brouer
2013-04-25 19:13       ` David Miller
2013-04-24 15:48 ` [net-next PATCH 4/4] net: frag LRU list per CPU Jesper Dangaard Brouer
2013-04-25  0:25   ` Eric Dumazet
2013-04-25  2:05     ` Eric Dumazet
2013-04-25 14:06       ` Jesper Dangaard Brouer
2013-04-25 14:37         ` Eric Dumazet
2013-04-25 13:59     ` Jesper Dangaard Brouer [this message]
2013-04-25 14:10       ` Eric Dumazet
2013-04-25 14:18       ` Eric Dumazet
2013-04-25 19:15         ` Jesper Dangaard Brouer
2013-04-25 19:22           ` Eric Dumazet
2013-04-24 16:21 ` [net-next PATCH 0/4] net: frag patchset for fixing LRU scalabilityissue David Laight
2013-04-25 11:39   ` Jesper Dangaard Brouer
2013-04-25 12:57     ` David Laight
2013-04-24 17:27 ` [net-next PATCH 0/4] net: frag patchset for fixing LRU scalability issue Hannes Frederic Sowa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1366898369.26911.604.camel@localhost \
    --to=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.