Re: TX performance regression caused by the mbuf cachline split

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paul Emmerich <emmericp@net.in.tum.de>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: TX performance regression caused by the mbuf cachline split
Date: Mon, 15 Feb 2016 20:15:23 +0100	[thread overview]
Message-ID: <56C223CB.9080901@net.in.tum.de> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772582142EB46@irsmsx105.ger.corp.intel.com>

Hi,

here's a kind of late follow-up. I've only recently found the need 
(mostly for the better support of XL710 NICs (which I still dislike but 
people are using them...)) to seriously address DPDK 2.x support in MoonGen.

On 13.05.15 11:03, Ananyev, Konstantin wrote:
> Before start to discuss your findings, there is one thing in your test app that looks strange to me:
> You use BATCH_SIZE==64 for TX packets, but your mempool cache_size==32.
> This is not really a good choice, as it means that for each iteration your mempool cache will be exhausted,
> and you'll endup doing ring_dequeue().
> I'd suggest you use something like ' 2 * BATCH_SIZE' for mempools cache size,
> that should improve your numbers (at least it did to me).

Thanks for pointing that out. However, my real app did not have this bug 
and I also saw the performance improvement there.

> Though, I suppose that scenario might be improved without manual 'prefetch' - by reordering code a bit.
> Below are 2 small patches, that introduce rte_pktmbuf_bulk_alloc() and modifies your test app to use it.
> Could you give it a try and see would it help to close a gap between 1.7.1 and 2.0?
> I don't have box with the same off-hand, but on my IVB box results are quite promising:
> on 1.2 GHz for simple_tx there is practically no difference in results (-0.33%),
> for full_tx the drop reduced to 2%.
> That's comparing DPDK1.7.1+testpapp with cache_size=2*batch_size vs
> latest DPDK+ testpapp with cache_size=2*batch_size+bulk_alloc.

The bulk_alloc patch is great and helps. I'd love to see such a function 
in DPDK.

I agree that this is a better solution than prefetching. I also can't 
see a difference with/without prefetching when using bulk alloc.


  Paul

next prev parent reply	other threads:[~2016-02-15 19:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-11  0:14 TX performance regression caused by the mbuf cachline split Paul Emmerich
2015-05-11  9:13 ` Luke Gorrie
2015-05-11 10:16   ` Paul Emmerich
2015-05-11 22:32 ` Paul Emmerich
2015-05-11 23:18   ` Paul Emmerich
2015-05-12  0:28     ` Marc Sune
2015-05-12  0:38       ` Marc Sune
2015-05-13  9:03     ` Ananyev, Konstantin
2016-02-15 19:15       ` Paul Emmerich [this message]
2016-02-19 12:31         ` Olivier MATZ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C223CB.9080901@net.in.tum.de \
    --to=emmericp@net.in.tum.de \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.