From: Olivier Matz <olivier.matz@6wind.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: andrew.rybchenko@oktetlabs.ru, thomas@monjalon.net,
bruce.richardson@intel.com, jerinjacobk@gmail.com, dev@dpdk.org
Subject: Re: [PATCH] mempool: test performance with constant n
Date: Mon, 24 Jan 2022 11:26:00 +0100 [thread overview]
Message-ID: <Ye5+uJDd/75qq/7R@platinum> (raw)
In-Reply-To: <20220119113732.40167-1-mb@smartsharesystems.com>
Hi Morten,
Thank you for enhancing the mempool test. Please see some comments
below.
On Wed, Jan 19, 2022 at 12:37:32PM +0100, Morten Brørup wrote:
> "What gets measured gets done."
>
> This patch adds mempool performance tests where the number of objects to
> put and get is constant at compile time, which may significantly improve
> the performance of these functions. [*]
>
> Also, it is ensured that the array holding the object used for testing
> is cache line aligned, for maximum performance.
>
> And finally, the following entries are added to the list of tests:
> - Number of kept objects: 512
> - Number of objects to get and to put: The number of pointers fitting
> into a cache line, i.e. 8 or 16
>
> [*] Some example performance test (with cache) results:
>
> get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972
> get_bulk=4 put_bulk=4 keep=128 constant_n=true rate_persec=622159462
>
> get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155
> get_bulk=8 put_bulk=8 keep=128 constant_n=true rate_persec=917582643
>
> get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691
> get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> app/test/test_mempool_perf.c | 120 +++++++++++++++++++++--------------
> 1 file changed, 74 insertions(+), 46 deletions(-)
>
> diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> index 87ad251367..ffefe934d5 100644
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -1,5 +1,6 @@
> /* SPDX-License-Identifier: BSD-3-Clause
> * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2022 SmartShare Systems
> */
>
> #include <string.h>
> @@ -55,19 +56,24 @@
> *
> * - Bulk get from 1 to 32
> * - Bulk put from 1 to 32
> + * - Bulk get and put from 1 to 32, compile time constant
> *
> * - Number of kept objects (*n_keep*)
> *
> * - 32
> * - 128
> + * - 512
> */
>
> #define N 65536
> #define TIME_S 5
> #define MEMPOOL_ELT_SIZE 2048
> -#define MAX_KEEP 128
> +#define MAX_KEEP 512
> #define MEMPOOL_SIZE ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1)
>
> +/* Number of pointers fitting into one cache line. */
> +#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE/sizeof(uintptr_t))
> +
nit: I think it's better to follow the coding rules and add a space around the
'/', even if I can see the line right above does not follow this convention
> #define LOG_ERR() printf("test failed at %s():%d\n", __func__, __LINE__)
> #define RET_ERR() do { \
> LOG_ERR(); \
> @@ -80,16 +86,16 @@
> } while (0)
>
> static int use_external_cache;
> -static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
> +static unsigned int external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
>
> static uint32_t synchro;
>
> /* number of objects in one bulk operation (get or put) */
> -static unsigned n_get_bulk;
> -static unsigned n_put_bulk;
> +static int n_get_bulk;
> +static int n_put_bulk;
>
> /* number of objects retrieved from mempool before putting them back */
> -static unsigned n_keep;
> +static int n_keep;
>
> /* number of enqueues / dequeues */
> struct mempool_test_stats {
> @@ -104,20 +110,43 @@ static struct mempool_test_stats stats[RTE_MAX_LCORE];
> */
> static void
> my_obj_init(struct rte_mempool *mp, __rte_unused void *arg,
> - void *obj, unsigned i)
> + void *obj, unsigned int i)
> {
> uint32_t *objnum = obj;
> memset(obj, 0, mp->elt_size);
> *objnum = i;
> }
>
> +#define test_loop(x_keep, x_get_bulk, x_put_bulk) \
> + for (i = 0; likely(i < (N/x_keep)); i++) { \
> + /* get x_keep objects by bulk of x_get_bulk */ \
> + for (idx = 0; idx < x_keep; idx += x_get_bulk) {\
> + ret = rte_mempool_generic_get(mp, \
> + &obj_table[idx], \
> + x_get_bulk, \
> + cache); \
> + if (unlikely(ret < 0)) { \
> + rte_mempool_dump(stdout, mp); \
> + GOTO_ERR(ret, out); \
> + } \
> + } \
> + \
> + /* put the objects back by bulk of x_put_bulk */\
> + for (idx = 0; idx < x_keep; idx += x_put_bulk) {\
> + rte_mempool_generic_put(mp, \
> + &obj_table[idx], \
> + x_put_bulk, \
> + cache); \
> + } \
> + }
> +
I think a static __rte_always_inline function would do the job and is
clearer. Something like this:
static __rte_always_inline int
test_loop(struct rte_mempool *mp, struct rte_mempool_cache *cache,
unsigned int x_keep, unsigned int x_get_bulk, unsigned int x_put_bulk)
{
void *obj_table[MAX_KEEP] __rte_cache_aligned;
unsigned int idx;
unsigned int i;
int ret;
for (i = 0; likely(i < (N / x_keep)); i++) {
/* get x_keep objects by bulk of x_get_bulk */
for (idx = 0; idx < x_keep; idx += x_get_bulk) {
ret = rte_mempool_generic_get(mp,
&obj_table[idx],
x_get_bulk,
cache);
if (unlikely(ret < 0)) {
rte_mempool_dump(stdout, mp);
return ret;
}
}
/* put the objects back by bulk of x_put_bulk */
for (idx = 0; idx < x_keep; idx += x_put_bulk) {
rte_mempool_generic_put(mp,
&obj_table[idx],
x_put_bulk,
cache);
}
}
return 0;
}
> static int
> per_lcore_mempool_test(void *arg)
> {
> - void *obj_table[MAX_KEEP];
> - unsigned i, idx;
> + void *obj_table[MAX_KEEP] __rte_cache_aligned;
> + int i, idx;
> struct rte_mempool *mp = arg;
> - unsigned lcore_id = rte_lcore_id();
> + unsigned int lcore_id = rte_lcore_id();
> int ret = 0;
> uint64_t start_cycles, end_cycles;
> uint64_t time_diff = 0, hz = rte_get_timer_hz();
> @@ -139,6 +168,9 @@ per_lcore_mempool_test(void *arg)
> GOTO_ERR(ret, out);
> if (((n_keep / n_put_bulk) * n_put_bulk) != n_keep)
> GOTO_ERR(ret, out);
> + /* for constant n, n_get_bulk and n_put_bulk must be the same */
> + if (n_get_bulk < 0 && n_put_bulk != n_get_bulk)
> + GOTO_ERR(ret, out);
>
> stats[lcore_id].enq_count = 0;
>
> @@ -149,30 +181,18 @@ per_lcore_mempool_test(void *arg)
> start_cycles = rte_get_timer_cycles();
>
> while (time_diff/hz < TIME_S) {
> - for (i = 0; likely(i < (N/n_keep)); i++) {
> - /* get n_keep objects by bulk of n_bulk */
> - idx = 0;
> - while (idx < n_keep) {
> - ret = rte_mempool_generic_get(mp,
> - &obj_table[idx],
> - n_get_bulk,
> - cache);
> - if (unlikely(ret < 0)) {
> - rte_mempool_dump(stdout, mp);
> - /* in this case, objects are lost... */
> - GOTO_ERR(ret, out);
> - }
> - idx += n_get_bulk;
> - }
> -
> - /* put the objects back */
> - idx = 0;
> - while (idx < n_keep) {
> - rte_mempool_generic_put(mp, &obj_table[idx],
> - n_put_bulk,
> - cache);
> - idx += n_put_bulk;
> - }
> + if (n_get_bulk > 0) {
> + test_loop(n_keep, n_get_bulk, n_put_bulk);
> + } else if (n_get_bulk == -1) {
> + test_loop(-n_keep, 1, 1);
> + } else if (n_get_bulk == -4) {
> + test_loop(-n_keep, 4, 4);
> + } else if (n_get_bulk == -(int)CACHE_LINE_BURST) {
> + test_loop(-n_keep, CACHE_LINE_BURST, CACHE_LINE_BURST);
> + } else if (n_get_bulk == -32) {
> + test_loop(-n_keep, 32, 32);
> + } else {
> + GOTO_ERR(ret, out);
> }
> end_cycles = rte_get_timer_cycles();
> time_diff = end_cycles - start_cycles;
I'm not convinced that having negative values to mean "constant" is
clear. I'd prefer to have another global variable "use_constant_values",
which would give something like this:
if (!use_constant_values)
ret = test_loop(mp, cache, n_keep, n_get_bulk, n_put_bulk);
else if (n_get_bulk == 1)
ret = test_loop(mp, cache, n_keep, 1, 1);
else if (n_get_bulk == 4)
ret = test_loop(mp, cache, n_keep, 4, 4);
else if (n_get_bulk == CACHE_LINE_BURST)
ret = test_loop(mp, cache, n_keep,
CACHE_LINE_BURST, CACHE_LINE_BURST);
else if (n_get_bulk == 32)
ret = test_loop(mp, cache, n_keep, 32, 32);
else
ret = -1;
if (ret < 0)
GOTO_ERR(ret, out);
> @@ -192,10 +212,10 @@ per_lcore_mempool_test(void *arg)
> static int
> launch_cores(struct rte_mempool *mp, unsigned int cores)
> {
> - unsigned lcore_id;
> + unsigned int lcore_id;
> uint64_t rate;
> int ret;
> - unsigned cores_save = cores;
> + unsigned int cores_save = cores;
>
> __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
>
> @@ -203,10 +223,14 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> memset(stats, 0, sizeof(stats));
>
> printf("mempool_autotest cache=%u cores=%u n_get_bulk=%u "
> - "n_put_bulk=%u n_keep=%u ",
> + "n_put_bulk=%u n_keep=%u constant_n=%s ",
> use_external_cache ?
> - external_cache_size : (unsigned) mp->cache_size,
> - cores, n_get_bulk, n_put_bulk, n_keep);
> + external_cache_size : (unsigned int) mp->cache_size,
> + cores,
> + n_get_bulk > 0 ? n_get_bulk : -n_get_bulk,
> + n_put_bulk > 0 ? n_put_bulk : -n_put_bulk,
> + n_keep > 0 ? n_keep : -n_keep,
> + n_get_bulk > 0 ? "false" : "true");
>
This would become much simpler with this new variable.
> if (rte_mempool_avail_count(mp) != MEMPOOL_SIZE) {
> printf("mempool is not full\n");
> @@ -253,12 +277,13 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
> static int
> do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> {
> - unsigned bulk_tab_get[] = { 1, 4, 32, 0 };
> - unsigned bulk_tab_put[] = { 1, 4, 32, 0 };
> - unsigned keep_tab[] = { 32, 128, 0 };
> - unsigned *get_bulk_ptr;
> - unsigned *put_bulk_ptr;
> - unsigned *keep_ptr;
> + /* Negative n_get_bulk values represent constants in the test. */
> + int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, -1, -4, -(int)CACHE_LINE_BURST, -32, 0 };
> + int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 };
> + int keep_tab[] = { 32, 128, 512, 0 };
> + int *get_bulk_ptr;
> + int *put_bulk_ptr;
> + int *keep_ptr;
> int ret;
>
Same here, changes would be minimal.
> for (get_bulk_ptr = bulk_tab_get; *get_bulk_ptr; get_bulk_ptr++) {
> @@ -266,13 +291,16 @@ do_one_mempool_test(struct rte_mempool *mp, unsigned int cores)
> for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) {
>
> n_get_bulk = *get_bulk_ptr;
> - n_put_bulk = *put_bulk_ptr;
> - n_keep = *keep_ptr;
> + n_put_bulk = n_get_bulk > 0 ? *put_bulk_ptr : n_get_bulk;
> + n_keep = n_get_bulk > 0 ? *keep_ptr : -*keep_ptr;
> ret = launch_cores(mp, cores);
>
> if (ret < 0)
> return -1;
No change would be required above (except use_constant_values = 0).
And below, we could simply add instead:
/* replay test with constant values */
if (n_get_bulk == n_put_bulk) {
use_constant_values = 1;
ret = launch_cores(mp, cores);
if (ret < 0)
return -1;
}
If you are ok with the proposed changes, I can directly submit a v2,
since I already made them locally.
Thanks,
Olivier
next prev parent reply other threads:[~2022-01-24 10:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 11:37 [PATCH] mempool: test performance with constant n Morten Brørup
2022-01-24 10:26 ` Olivier Matz [this message]
2022-01-24 10:37 ` Morten Brørup
2022-01-24 14:53 ` Olivier Matz
2022-01-24 14:57 ` Olivier Matz
2022-01-24 14:59 ` [PATCH v2] " Olivier Matz
2022-01-24 17:20 ` Morten Brørup
2022-01-25 12:56 ` Olivier Matz
2022-02-02 22:39 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ye5+uJDd/75qq/7R@platinum \
--to=olivier.matz@6wind.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=jerinjacobk@gmail.com \
--cc=mb@smartsharesystems.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.