All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Daniel Borkmann <dborkman@redhat.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	cwang@twopensource.com, Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [RFC PATCH] pktgen: skb bursting via skb->xmit_more API
Date: Sat, 30 Aug 2014 15:37:42 +0200	[thread overview]
Message-ID: <20140830153742.7f27d98c@redhat.com> (raw)
In-Reply-To: <20140827211300.26976.52104.stgit@dragon>


On Wed, 27 Aug 2014 23:13:00 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> This patch just demonstrates the effect of delaying the HW tailptr.
> Let me demonstrate the performance effect of bulking packet with pktgen.
> 
> These results is a **single** CPU pktgen TX via script:
>  https://github.com/netoptimizer/network-testing/blob/master/pktgen/pktgen02_burst.sh
> 
> Cmdline args:
>  ./pktgen02_burst.sh -i eth5 -d 192.168.21.4 -m 00:12:c0:80:1d:54 -b $skb_burst
> 
> Special case skb_burst=1 does not burst, but activates the
> skb_burst_count++ and writing to skb->xmit_more.
> 
> Performance
>  skb_burst=0  tx:5614370 pps
>  skb_burst=1  tx:5571279 pps ( -1.38 ns (worse))
>  skb_burst=2  tx:6942821 pps ( 35.46 ns)
>  skb_burst=3  tx:7556214 pps ( 11.69 ns)
>  skb_burst=4  tx:7740632 pps ( 3.15 ns)
>  skb_burst=5  tx:7972489 pps ( 3.76 ns)
>  skb_burst=6  tx:8129856 pps ( 2.43 ns)
>  skb_burst=7  tx:8281671 pps ( 2.25 ns)
>  skb_burst=8  tx:8383790 pps ( 1.47 ns)
>  skb_burst=9  tx:8451248 pps ( 0.95 ns)
>  skb_burst=10 tx:8503571 pps ( 0.73 ns)
>  skb_burst=16 tx:8745878 pps ( 3.26 ns)
>  skb_burst=24 tx:8871629 pps ( 1.62 ns)
>  skb_burst=32 tx:8945166 pps ( 0.93 ns)
> 
> skb_burst=(0 vs 32) improvement:
>  (1/5614370*10^9)-(1/8945166*10^9) = 66.32 ns
>   + 3330796 pps

A more interesting benchmark with pktgen is to see what happens if
pktgen have to free and allocate a new SKB everytime in the transmit
loop.  Because this adds a relatively significant delay between packets.

Baseline before with SKB_CLONE=100000 (and skb_burst=0), was
5614370pps.  Corrosponding to a 178 nanosec delay between packets
(1/5614370*10^9).

Pktgen performance drops to 2421076 pps with SKB_CLONE=0 (and
skb_burst=0), causing a full free+alloc cycle (also keeping the
do_gettimeofday() timestamp). This corrosponds to (1/2421076*10^9)
413 nanosec between packets.

Interesting this also tell us that the stack overhead + pktgen
packet-init is (413-178=) 235ns. (The do_gettimeofday contributes
23ns, leaving 212ns).

Results:
 skb_burst=0  2421076 pps
 skb_burst=1  2410301 pps ( -1.85 ns (worse))
 skb_burst=2  2580824 pps ( 27.41 ns)
 skb_burst=3  2678276 pps ( 14.10 ns)
 skb_burst=4  2729021 pps (  6.94 ns)
 skb_burst=5  2742044 pps (  1.74 ns)
 skb_burst=6  2763974 pps (  2.89 ns)
 skb_burst=7  2772413 pps (  1.10 ns)
 skb_burst=8  2788705 pps (  2.10 ns)
 skb_burst=9  2791055 pps (  0.30 ns)
 skb_burst=10 2791726 pps (  0.09 ns)
 skb_burst=16 2819949 pps (  3.58 ns)
 skb_burst=24 2817786 pps ( -0.27 ns)
 skb_burst=32 2813690 pps ( -0.51 ns)

Perhaps a little bit interesting that performance slightly decreases
after skb_burst=16, but this could simply be caused by the accuracy
level (as those tests had a variation of min:-0.250 max:1.811 ns).

skb_burst=(0 vs 32) improvement:
 (1/2421076*10^9)-(1/2813690*10^9) = 57.63 ns
 2,813,690-2,421,076 = +392,614 pps

Bulking via HW ring buffer tailptr "flush", still showed a significant
performance improvement, even with this spacing caused by pktgen
free+alloc+init+timestamp.  I tried to tcpdump packets on the sink
host, but I could not "see" the bulking (this is most likely a problem
with the sink and tcpdumps time resolution).


Setup notes:
 - pktgen TX single CPU test (E5-2695)
 - ethtool -C eth5 rx-usecs 30
 - tuned-adm profile latency-performance
 - IRQ aligned to CPUs
 - Ethernet Flow-Control disabled
 - No Hyper-Threading
 - netfilter_unload_modules.sh

Need something to relate these nanosec to?
Go read:
 http://netoptimizer.blogspot.dk/2014/05/the-calculations-10gbits-wirespeed.html
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

      parent reply	other threads:[~2014-08-30 13:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-27 21:13 [RFC PATCH] pktgen: skb bursting via skb->xmit_more API Jesper Dangaard Brouer
2014-08-27 21:36 ` David Miller
2014-08-30 13:37 ` Jesper Dangaard Brouer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140830153742.7f27d98c@redhat.com \
    --to=brouer@redhat.com \
    --cc=cwang@twopensource.com \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.