public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Jesper Dangaard Brouer <jdb@comx.dk>
Cc: netdev <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>
Subject: Re: Loopback performance from kernel 2.6.12 to 2.6.37
Date: Thu, 18 Nov 2010 18:41:53 +0100	[thread overview]
Message-ID: <1290102113.2781.237.camel@edumazet-laptop> (raw)
In-Reply-To: <1290088353.2781.137.camel@edumazet-laptop>

Le jeudi 18 novembre 2010 à 14:52 +0100, Eric Dumazet a écrit :
> Le mardi 09 novembre 2010 à 15:25 +0100, Eric Dumazet a écrit :
> 
> > So far, so good. These are the expected numbers. Now we have to
> > understand why corei7 gets 38 seconds instead of 8 :)
> > 
> > 
> 
> My tests show a problem with backlog processing, and too big TCP
> windows. (at least on loopback and wild senders)
> 
> Basically, with huge tcp windows we have now (default 4 Mbytes),
> the reader process can have to process up to 4Mbytes of backlogged data
> in __release_sock() before returning from its 'small' read(fd, buffer,
> 1024) done by netcat.
> 
> While it processes this backlog, it sends tcp ACKS, allowing sender to
> send new frames that might be dropped because of sk_rcvqueues_full(), or
> continue to fill receive queue up to the receiver window, feeding the
> task in __release_sock() [loop]
> 
> 
> This blows cpu caches completely [data is queued, and the dequeue is
> done long after], and latency of a single read() can be very high. This
> blocks the pipeline of user processing eventually.
> 
> 
> <disgress>
> I also understand why UDP latencies are so impacted. If we receive a
> burst of frames on same socket, the user process reading first frame
> might be forced to process the backlog before returning to userland.
> 
> Really we must zap lock_sock() from UDP input path.
> 
> commit 95766fff6b9a78d1 ([UDP]: Add memory accounting) was a big error.
> </disgress>
> 
> 
> 
> On my server machine with 6Mbytes of L2 cache, you dont see the problem,
> while on my laptop with 3Mbytes of L2 cache, you can see the problem.
> 
> I caught this because of new SNMP counter added in 2.6.34
> (TCPBacklogDrop), that could easily take 1000 increments during the
> test.
> 
> 
> I built a test program, maybe easier to use than various netcat flavors
> It also use two tasks only, thats better if you have a core 2 Duo like
> me on my laptop ;)
> 
> To reproduce the problem, run it with option -l 4M
> 
> $ netstat -s|grep TCPBacklogDrop
>     TCPBacklogDrop: 788
> $ time ./loopback_transfert -l 1k;netstat -s|grep TCPBacklogDrop
> 
> real	0m14.013s
> user	0m0.630s
> sys	0m13.250s
>     TCPBacklogDrop: 788
> $ time ./loopback_transfert -l 128k;netstat -s|grep TCPBacklogDrop
> 
> real	0m7.447s
> user	0m0.030s
> sys	0m5.490s
>     TCPBacklogDrop: 789
> $ time ./loopback_transfert -l 1M;netstat -s|grep TCPBacklogDrop
> 
> real	0m11.206s
> user	0m0.020s
> sys	0m7.150s
>     TCPBacklogDrop: 793
> $ time ./loopback_transfert -l 4M;netstat -s|grep TCPBacklogDrop
> 
> real	0m10.347s
> user	0m0.000s
> sys	0m6.120s
>     TCPBacklogDrop: 1510
> $ time ./loopback_transfert -l 16k;netstat -s|grep TCPBacklogDrop
> 
> real	0m6.810s
> user	0m0.040s
> sys	0m6.670s
>     TCPBacklogDrop: 1511
> 

I forgot to include test results for my dev machine (server class
machine, 8 Mbytes of L2 cache) NUMA 
2 sockets : Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz

# netstat -s|grep TCPBacklogDrop
    TCPBacklogDrop: 8891
# time ./loopback_transfert -l 16k;netstat -s|grep TCPBacklogDrop

real	0m7.033s
user	0m0.010s
sys	0m4.580s
    TCPBacklogDrop: 9239
# time ./loopback_transfert -l 1M;netstat -s|grep TCPBacklogDrop

real	0m5.408s
user	0m0.000s
sys	0m2.880s
    TCPBacklogDrop: 9243
# time ./loopback_transfert -l 4M;netstat -s|grep TCPBacklogDrop

real	0m2.965s
user	0m0.000s
sys	0m2.070s
    TCPBacklogDrop: 10485
# time ./loopback_transfert -l 6M;netstat -s|grep TCPBacklogDrop

real	0m7.711s
user	0m0.000s
sys	0m3.180s
    TCPBacklogDrop: 13537
# time ./loopback_transfert -l 8M;netstat -s|grep TCPBacklogDrop

real	0m11.497s
user	0m0.020s
sys	0m3.830s
    TCPBacklogDrop: 17108


As soon as our working set is larger than L2 cache, this is very slow.

for the -l 8M bench :

# Events: 7K cycles
#
# Overhead  Command      Shared Object                               Symbol
# ........  .......  .................  ...................................
#
    40.97%  loopback_transf  [kernel.kallsyms]  [k] copy_user_generic_string
    18.57%    :8968  [kernel.kallsyms]  [k] copy_user_generic_string
     3.54%    :8968  [kernel.kallsyms]  [k] get_page_from_freelist
     3.36%    :8968  [kernel.kallsyms]  [k] tcp_sendmsg
     1.17%    :8968  [kernel.kallsyms]  [k] put_page
     0.99%    :8968  [kernel.kallsyms]  [k] free_hot_cold_page
     0.99%    :8968  [kernel.kallsyms]  [k] __might_sleep
     0.88%    :8968  [kernel.kallsyms]  [k] __ticket_spin_lock
     0.81%  loopback_transf  [kernel.kallsyms]  [k] free_pcppages_bulk
     0.79%    :8968  [kernel.kallsyms]  [k] __alloc_pages_nodemask
     0.63%  loopback_transf  [kernel.kallsyms]  [k] put_page
     0.63%  loopback_transf  [kernel.kallsyms]  [k] __might_sleep
     0.63%  loopback_transf  [kernel.kallsyms]  [k] tcp_transmit_skb
     0.57%    :8968  [kernel.kallsyms]  [k] skb_release_data
     0.55%  loopback_transf  [kernel.kallsyms]  [k] free_hot_cold_page
     0.53%    :8968  [kernel.kallsyms]  [k] tcp_ack
     0.50%  loopback_transf  [kernel.kallsyms]  [k] __inet_lookup_established
     0.49%  loopback_transf  [kernel.kallsyms]  [k] skb_copy_datagram_iovec
     0.47%    :8968  [kernel.kallsyms]  [k] __rmqueue
     0.45%    :8968  [kernel.kallsyms]  [k] get_pageblock_flags_group
     0.41%    :8968  [kernel.kallsyms]  [k] zone_watermark_ok
     0.41%    :8968  [kernel.kallsyms]  [k] __inc_zone_state
     0.40%  loopback_transf  [kernel.kallsyms]  [k] skb_release_data
     0.39%    :8968  [kernel.kallsyms]  [k] tcp_transmit_skb



  reply	other threads:[~2010-11-18 17:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1288954189.28003.178.camel@firesoul.comx.local>
     [not found] ` <1288988955.2665.297.camel@edumazet-laptop>
     [not found]   ` <1289213926.15004.19.camel@firesoul.comx.local>
     [not found]     ` <1289214289.2820.188.camel@edumazet-laptop>
2010-11-08 15:06       ` Loopback performance from kernel 2.6.12 to 2.6.37 Eric Dumazet
2010-11-09  0:05         ` Andrew Hendry
2010-11-09  5:22           ` Eric Dumazet
2010-11-09  6:23             ` Eric Dumazet
2010-11-09  6:30               ` Andrew Hendry
2010-11-09  6:38                 ` Eric Dumazet
2010-11-09  6:42                   ` Eric Dumazet
2010-11-09 13:59         ` Jesper Dangaard Brouer
2010-11-09 14:06           ` Eric Dumazet
2010-11-09 14:16           ` Jesper Dangaard Brouer
2010-11-09 14:25             ` Eric Dumazet
2010-11-18 13:52               ` Eric Dumazet
2010-11-18 17:41                 ` Eric Dumazet [this message]
2010-11-18 17:48                   ` David Miller
2010-11-09 14:38             ` Jesper Dangaard Brouer
2010-11-10 11:24               ` Jesper Dangaard Brouer
2010-12-12 15:48                 ` Arnaldo Carvalho de Melo
2010-11-09 21:35 Xose Vazquez Perez
2010-11-10  8:49 ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1290102113.2781.237.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=jdb@comx.dk \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox