From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
netdev@vger.kernel.org
Subject: Re: [net-next PATCH 4/4] net: frag LRU list per CPU
Date: Thu, 25 Apr 2013 15:59:29 +0200 [thread overview]
Message-ID: <1366898369.26911.604.camel@localhost> (raw)
In-Reply-To: <1366849557.8964.110.camel@edumazet-glaptop>
On Wed, 2013-04-24 at 17:25 -0700, Eric Dumazet wrote:
> On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote:
> > The global LRU list is the major bottleneck in fragmentation handling
> > (after the recent frag optimization).
> >
> > Simply change to use a LRU list per CPU, instead of a single shared
> > LRU list. This was the simples approach of removing the LRU list, I
> > could come up with. The previous "direct hash cleaning" approach was
> > getting too complicated, and interacted badly with netns.
> >
> > The /proc/sys/net/ipv4/ipfrag_*_thresh values are now per CPU limits,
> > and have been reduced to 2 Mbytes (from 4 MB).
> >
> > Performance compared to net-next (953c96e):
> >
> > Test-type: 20G64K 20G3F 20G64K+DoS 20G3F+DoS 20G64K+MQ 20G3F+MQ
> > ---------- ------- ------- ---------- --------- -------- -------
> > (953c96e)
> > net-next: 17417.4 11376.5 3853.43 6170.56 174.8 402.9
> > LRU-pr-CPU: 19047.0 13503.9 10314.10 12363.20 1528.7 2064.9
>
> Having per cpu memory limit is going to be a nightmare for machines with
> 64+ cpus
I do see your concern, but the struct frag_cpu_limit is only 26 bytes,
well actually 32 bytes due alignment. And the limit sort of scales with
the system size. But yes, I'm not a complete fan of it...
> Most machines use a single cpu to receive network packets. In some
> situations, every network interrupt is balanced onto all cpus. fragments
> for the same reassembled packet can be serviced on different cpus.
>
> So your results are good because your irq affinities were properly
> tuned.
Yes, the irq affinities needs to be aligned for this to work. I do see
your point about the single CPU that distributes via RSS (Receive Side
Scaling).
> Why don't you remove the lru instead ?
That was my attempt in my previous patch set:
"net: frag code fixes and RFC for LRU removal"
http://thread.gmane.org/gmane.linux.network/266323/
I though that you "shot-me-down" on that approach. That is why I'm
doing all this work on the LRU per CPU, stuff now... (this is actually
consuming a lot of my time, not knowing which direction you want me to
run in...)
We/you have to make a choice between:
1) "Remove LRU and do direct cleaning on hash table"
2) "Fix LRU mem acct scalability issue"
> Clearly, removing the oldest frag was an implementation choice.
>
> We know that a slow sender has no chance to complete a packet if the
> attacker can create new fragments fast enough : frag_evictor() will keep
> the attacker fragments in memory and throw away good fragments.
Let me quote my self (which you cut out):
On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote:
> I have also tested that a 512 Kbit/s simulated link (with HTB) still
> works (with sending 3x UDP fragments) under the DoS test 20G3F+MQ,
> which is sending approx 1Mpps on a 10Gbit/s NIC
My experiments show that removing the oldest frag is actually a quite
good solution. And the LRU list do have some advantages...
The LRU list is the hole reason, that I can make a 512 Kbit/s link work,
at the same time of a of a 10Gbit/s DoS attack (well my packet generator
was limited to 1 million packets per sec, which is not 10G)
The calculations:
The 512 Kbit/s link will send packets spaced with 23.43 ms.
(1500*8)/512000 = 0.0234375
Thus, I need enough buffering/ mem limit to keep minimum 24 ms worth of
data, for a new fragment to arrive, which will move the frag queue to
the tail of the LRU.
On a 1Gbit/s link this is: approx 24 MB
(0.0234375*1000*10^6/10^6 23.4375 MB)
On a 10Gbit/s link this is: approx 240 MB
(0.0234375*10000*10^6/10^6 234.375 MB)
Yes, the frag sizes do get account to be bigger, but I hope you get the
point, which is:
We don't need that much memory to "protect" fragment from slow sender,
with the LRU system.
--Jesper
next prev parent reply other threads:[~2013-04-25 13:59 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-24 15:47 [net-next PATCH 0/4] net: frag patchset for fixing LRU scalability issue Jesper Dangaard Brouer
2013-04-24 15:48 ` [net-next PATCH 1/4] Revert "inet: limit length of fragment queue hash table bucket lists" Jesper Dangaard Brouer
2013-04-25 0:00 ` Eric Dumazet
2013-04-25 13:10 ` Jesper Dangaard Brouer
2013-04-25 13:58 ` David Laight
2013-05-02 7:59 ` Jesper Dangaard Brouer
2013-05-02 15:16 ` Eric Dumazet
2013-05-03 9:15 ` Jesper Dangaard Brouer
2013-04-24 15:48 ` [net-next PATCH 2/4] net: increase frag hash size Jesper Dangaard Brouer
2013-04-24 22:09 ` Sergei Shtylyov
2013-04-25 10:13 ` Jesper Dangaard Brouer
2013-04-25 12:13 ` Sergei Shtylyov
2013-04-25 19:11 ` David Miller
2013-04-24 23:48 ` Eric Dumazet
2013-04-25 3:26 ` Hannes Frederic Sowa
2013-04-25 19:52 ` [net-next PATCH V2] " Jesper Dangaard Brouer
2013-04-29 17:44 ` David Miller
2013-04-24 15:48 ` [net-next PATCH 3/4] net: avoid false perf interpretations in frag code Jesper Dangaard Brouer
2013-04-24 23:48 ` Eric Dumazet
2013-04-24 23:54 ` David Miller
2013-04-25 10:57 ` Jesper Dangaard Brouer
2013-04-25 19:13 ` David Miller
2013-04-24 15:48 ` [net-next PATCH 4/4] net: frag LRU list per CPU Jesper Dangaard Brouer
2013-04-25 0:25 ` Eric Dumazet
2013-04-25 2:05 ` Eric Dumazet
2013-04-25 14:06 ` Jesper Dangaard Brouer
2013-04-25 14:37 ` Eric Dumazet
2013-04-25 13:59 ` Jesper Dangaard Brouer [this message]
2013-04-25 14:10 ` Eric Dumazet
2013-04-25 14:18 ` Eric Dumazet
2013-04-25 19:15 ` Jesper Dangaard Brouer
2013-04-25 19:22 ` Eric Dumazet
2013-04-24 16:21 ` [net-next PATCH 0/4] net: frag patchset for fixing LRU scalabilityissue David Laight
2013-04-25 11:39 ` Jesper Dangaard Brouer
2013-04-25 12:57 ` David Laight
2013-04-24 17:27 ` [net-next PATCH 0/4] net: frag patchset for fixing LRU scalability issue Hannes Frederic Sowa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1366898369.26911.604.camel@localhost \
--to=brouer@redhat.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=hannes@stressinduktion.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).