netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	fw@strlen.de, netdev@vger.kernel.org, pablo@netfilter.org,
	tgraf@suug.ch, amwang@redhat.com, kaber@trash.net,
	paulmck@linux.vnet.ibm.com, herbert@gondor.hengli.com.au
Subject: Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues
Date: Fri, 30 Nov 2012 11:04:06 +0100	[thread overview]
Message-ID: <1354269846.11754.381.camel@localhost> (raw)
In-Reply-To: <1354230100.3299.40.camel@edumazet-glaptop>

On Thu, 2012-11-29 at 15:01 -0800, Eric Dumazet wrote:
> On Thu, 2012-11-29 at 23:17 +0100, Jesper Dangaard Brouer wrote:
> 
> > For example lets give a threshold of 2000 MBytes:
> > 
> > [root@dragon ~]# sysctl -w net/ipv4/ipfrag_high_thresh=$(((1024**2*2000)))
> > net.ipv4.ipfrag_high_thresh = 2097152000
> > 
> > [root@dragon ~]# sysctl -w net/ipv4/ipfrag_low_thresh=$(((1024**2*2000)-655350))
> > net.ipv4.ipfrag_low_thresh = 2096496650
> > 
> > 4x10 Netperf adjusted output:
> >  Socket  Message  Elapsed      Messages
> >  Size    Size     Time         Okay Errors   Throughput
> >  bytes   bytes    secs            #      #   10^6bits/sec
> > 
> >  229376   65507   20.00      298685      0    7826.35
> >  212992           20.00          27              0.71
> > 
> >  229376   65507   20.00      366668      0    9607.71
> >  212992           20.00          13              0.34
> > 
> >  229376   65507   20.00      254790      0    6676.20
> >  212992           20.00          14              0.37
> > 
> >  229376   65507   20.00      309293      0    8104.33
> >  212992           20.00          15              0.39
> > 
> > Can we agree that the current evictor strategy is broken?
> 
> Not really, you drop packets because of another limit.

Then tell me which limit?
And notice the result is the same for 200 MBytes threshold.

As I wrote *just* above the section you quoted:

On Thu, 2012-11-29 at 23:17 +0100, Jesper Dangaard Brouer wrote:
[...] Thus, we must drop packets, or else the NIC will do it for
> us... for fragments we need do this "dropping" more intelligent. 

So, I think it is the NIC dropping packets, in this case... what do you
claim?



I still claim the the current evictor strategy is broken!

We need to drop fragments more intelligently in software. As DaveM
correctly states, the code/algorithm needs some "probability
of fulfillment" taken into account.   Which is actually what my evictor
code implements (I don't claim its perfect, as it currently does have
fairness/fair-queue issues, I have a plan for fixing it, but lets not
clutter up this answer).


So, let me instead show, with tests, that the evictor strategy is
broken, while keeping the original default thresh settings:

# grep . /proc/sys/net/ipv4/ipfrag_*_thresh
/proc/sys/net/ipv4/ipfrag_high_thresh:262144
/proc/sys/net/ipv4/ipfrag_low_thresh:196608

Test purpose, I will on a single 10G link demonstrate, that starting
several "N" netperf UDP fragmentation flows, will hurt performance, and
then claim this is caused by the bad evictor strategy.

Test setup:
 - Disable Ethernet flow control
 - netperf packet size 65507
 - Run netserver on one NUMA node
 - Start netperf clients against a NIC on the other NUMA node
 - (The NUMA imbalance helps the effect occur at lower N) 

Result: N=1  8040 Mbit/s
Result: N=2  9584 Mbit/s (4739+4845)
Result: N=3  4055 Mbit/s (1436+1371+1248)
Result: N=4  2247 Mbit/s (1538+29+54+626)
Result: N=5   879 Mbit/s (78+152+226+125+298)
Result: N=6   293 Mbit/s (85+55+32+57+46+18)
Result: N=7   354 Mbit/s (70+47+33+80+20+72+32)

Can we, now, agree that the current evictor strategy is broken?!?


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2012-11-30 10:07 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-29 16:10 [net-next PATCH V2 0/9] net: fragmentation performance scalability on NUMA/SMP systems Jesper Dangaard Brouer
2012-11-29 16:11 ` [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm frag queues Jesper Dangaard Brouer
2012-11-29 17:44   ` David Miller
2012-11-29 22:17     ` Jesper Dangaard Brouer
2012-11-29 23:01       ` Eric Dumazet
2012-11-30 10:04         ` Jesper Dangaard Brouer [this message]
2012-11-30 14:52           ` Eric Dumazet
2012-11-30 15:45             ` Jesper Dangaard Brouer
2012-11-30 16:37               ` Eric Dumazet
2012-11-30 21:37                 ` Jesper Dangaard Brouer
2012-11-30 22:25                   ` Eric Dumazet
2012-11-30 23:23                     ` Jesper Dangaard Brouer
2012-11-30 23:47                       ` Stephen Hemminger
2012-12-01  0:03                         ` Eric Dumazet
2012-12-01  0:13                           ` Stephen Hemminger
2012-11-30 23:58                       ` Eric Dumazet
2012-12-04 13:30                         ` [net-next PATCH V3-evictor] " Jesper Dangaard Brouer
2012-12-04 14:32                           ` [net-next PATCH V3-evictor] net: frag evictor,avoid " David Laight
2012-12-04 14:47                           ` [net-next PATCH V3-evictor] net: frag evictor, avoid " Eric Dumazet
2012-12-04 17:51                             ` Jesper Dangaard Brouer
2012-12-05  9:24                           ` Jesper Dangaard Brouer
2012-12-06 12:26                             ` Jesper Dangaard Brouer
2012-12-06 12:32                               ` Florian Westphal
2012-12-06 13:29                                 ` David Laight
2012-12-06 21:38                                   ` David Miller
2012-12-06 13:55                                 ` Jesper Dangaard Brouer
2012-12-06 14:47                                   ` Eric Dumazet
2012-12-06 15:23                                     ` Jesper Dangaard Brouer
2012-11-29 23:32       ` [net-next PATCH V2 1/9] " Eric Dumazet
2012-11-30 12:01       ` Jesper Dangaard Brouer
2012-11-30 14:57         ` Eric Dumazet
2012-11-29 16:11 ` [net-next PATCH V2 2/9] net: frag cache line adjust inet_frag_queue.net Jesper Dangaard Brouer
2012-11-29 16:12 ` [net-next PATCH V2 3/9] net: frag, move LRU list maintenance outside of rwlock Jesper Dangaard Brouer
2012-11-29 17:43   ` Eric Dumazet
2012-11-29 17:48     ` David Miller
2012-11-29 17:54       ` Eric Dumazet
2012-11-29 18:05         ` David Miller
2012-11-29 18:24           ` Eric Dumazet
2012-11-29 18:31             ` David Miller
2012-11-29 18:33               ` Eric Dumazet
2012-11-29 18:36                 ` David Miller
2012-11-29 22:33         ` Jesper Dangaard Brouer
2012-11-29 16:12 ` [net-next PATCH V2 4/9] net: frag helper functions for mem limit tracking Jesper Dangaard Brouer
2012-11-29 16:13 ` [net-next PATCH V2 5/9] net: frag, per CPU resource, mem limit and LRU list accounting Jesper Dangaard Brouer
2012-11-29 17:06   ` Eric Dumazet
2012-11-29 17:31     ` David Miller
2012-12-03 14:02     ` Jesper Dangaard Brouer
2012-12-03 17:25       ` David Miller
2012-11-29 16:14 ` [net-next PATCH V2 6/9] net: frag, implement dynamic percpu alloc of frag_cpu_limit Jesper Dangaard Brouer
2012-11-29 16:15 ` [net-next PATCH V2 7/9] net: frag, move nqueues counter under LRU lock protection Jesper Dangaard Brouer
2012-11-29 16:15 ` [net-next PATCH V2 8/9] net: frag queue locking per hash bucket Jesper Dangaard Brouer
2012-11-29 17:08   ` Eric Dumazet
2012-11-30 12:55     ` Jesper Dangaard Brouer
2012-11-29 16:16 ` [net-next PATCH V2 9/9] net: increase frag queue hash size and cache-line Jesper Dangaard Brouer
2012-11-29 16:39   ` [net-next PATCH V2 9/9] net: increase frag queue hash size andcache-line David Laight
2012-11-29 16:55   ` [net-next PATCH V2 9/9] net: increase frag queue hash size and cache-line Eric Dumazet
2012-11-29 20:53     ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1354269846.11754.381.camel@localhost \
    --to=brouer@redhat.com \
    --cc=amwang@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=herbert@gondor.hengli.com.au \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).