All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: Jia He <hejianet@gmail.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Zhao, Bing" <ilovethull@163.com>,
	Olivier MATZ <olivier.matz@6wind.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"jia.he@hxt-semitech.com" <jia.he@hxt-semitech.com>,
	"jie2.liu@hxt-semitech.com" <jie2.liu@hxt-semitech.com>,
	"bing.zhao@hxt-semitech.com" <bing.zhao@hxt-semitech.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Subject: Re: [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue
Date: Mon, 23 Oct 2017 15:36:18 +0530	[thread overview]
Message-ID: <20171023100617.GA17957@jerin> (raw)
In-Reply-To: <ab7154a2-a9f8-f12e-b6a0-2805c2065e2e@gmail.com>

-----Original Message-----
> Date: Mon, 23 Oct 2017 16:49:01 +0800
> From: Jia He <hejianet@gmail.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "Zhao, Bing"
>  <ilovethull@163.com>, Olivier MATZ <olivier.matz@6wind.com>,
>  "dev@dpdk.org" <dev@dpdk.org>, "jia.he@hxt-semitech.com"
>  <jia.he@hxt-semitech.com>, "jie2.liu@hxt-semitech.com"
>  <jie2.liu@hxt-semitech.com>, "bing.zhao@hxt-semitech.com"
>  <bing.zhao@hxt-semitech.com>, "Richardson, Bruce"
>  <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
>  loading when doing enqueue/dequeue
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.4.0
> 
> Hi Jerin
> 
> 
> On 10/20/2017 1:43 PM, Jerin Jacob Wrote:
> > -----Original Message-----
> > > 
> [...]
> > > dependant on each other.
> > > Thus a memory barrier is neccessary.
> > Yes. The barrier is necessary.
> > In fact, upstream freebsd fixed this issue for arm64. DPDK ring
> > implementation is derived from freebsd's buf_ring.h.
> > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166
> > 
> > I think, the only outstanding issue is, how to reduce the performance
> > impact for arm64. I believe using accurate/release semantics instead
> > of rte_smp_rmb() will reduce the performance overhead like similar ring implementations below,
> > freebsd: https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166
> > odp: https://github.com/Linaro/odp/blob/master/platform/linux-generic/pktio/ring.c
> > 
> > Jia,
> > 1) Can you verify the use of accurate/release semantics fixes the problem in your
> > platform? like use of atomic_load_acq* in the reference code.
> > 2) If so, What is the overhead between accurate/release and plane smp_smb()
> > barriers. Based on that we need decide what path to take.
> I've tested 3 cases.  The new 3rd case is to use the load_acquire barrier
> (half barrier) you mentioned
> at above link.
> The patch seems like:
> @@ -408,8 +466,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp,
>                 /* Reset n to the initial burst count */
>                 n = max;
> 
> -               *old_head = r->prod.head;
> -               const uint32_t cons_tail = r->cons.tail;
> +               *old_head = atomic_load_acq_32(&r->prod.head);
> +               const uint32_t cons_tail =
> atomic_load_acq_32(&r->cons.tail);
> 
> @@ -516,14 +576,15 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_s
>                 /* Restore n as it may change every loop */
>                 n = max;
> 
> -               *old_head = r->cons.head;
> -               const uint32_t prod_tail = r->prod.tail;
> +               *old_head = atomic_load_acq_32(&r->cons.head);
> +               const uint32_t prod_tail = atomic_load_acq_32(&r->prod.tail)
>                 /* The subtraction is done between two unsigned 32bits value
>                  * (the result is always modulo 32 bits even if we have
>                  * cons_head > prod_tail). So 'entries' is always between 0
>                  * and size(ring)-1. */
> 
> The half barrier patch passed the fuctional test.
> 
> As for the performance comparision on *arm64*(the debug patch is at
> http://dpdk.org/ml/archives/dev/2017-October/079012.html), please see the
> test results
> below:
> 
> [case 1] old codes, no barrier
> ============================================
>  Performance counter stats for './test --no-huge -l 1-10':
> 
>      689275.001200      task-clock (msec)         #    9.771 CPUs utilized
>               6223      context-switches          #    0.009 K/sec
>                 10      cpu-migrations            #    0.000 K/sec
>                653      page-faults               #    0.001 K/sec
>      1721190914583      cycles                    #    2.497 GHz
>      3363238266430      instructions              #    1.95  insn per cycle
>    <not supported> branches
>           27804740      branch-misses             #    0.00% of all branches
> 
>       70.540618825 seconds time elapsed
> 
> [case 2] full barrier with rte_smp_rmb()
> ============================================
>  Performance counter stats for './test --no-huge -l 1-10':
> 
>      582557.895850      task-clock (msec)         #    9.752 CPUs utilized
>               5242      context-switches          #    0.009 K/sec
>                 10      cpu-migrations            #    0.000 K/sec
>                665      page-faults               #    0.001 K/sec
>      1454360730055      cycles                    #    2.497 GHz
>       587197839907      instructions              #    0.40  insn per cycle
>    <not supported> branches
>           27799687      branch-misses             #    0.00% of all branches
> 
>       59.735582356 seconds time elapse
> 
> [case 1] half barrier with load_acquire
> ============================================
>  Performance counter stats for './test --no-huge -l 1-10':
> 
>      660758.877050      task-clock (msec)         #    9.764 CPUs utilized
>               5982      context-switches          #    0.009 K/sec
>                 11      cpu-migrations            #    0.000 K/sec
>                657      page-faults               #    0.001 K/sec
>      1649875318044      cycles                    #    2.497 GHz
>       591583257765      instructions              #    0.36  insn per cycle
>    <not supported> branches
>           27994903      branch-misses             #    0.00% of all branches
> 
>       67.672855107 seconds time elapsed
> 
> Please see the context-switches in the perf results
> test result  sorted by time is:
> full barrier < half barrier < no barrier
> 
> AFAICT, in this case ,the cpu reordering will add the possibility for
> context switching and
> increase the running time.

> Any ideas?

Regarding performance test, it better to use ring perf test case
on _isolated_ cores to measure impact on number of enqueue/dequeue operations.

example:
./build/app/test -c 0xff -n 4
>>ring_perf_autotest

By default, arm64+dpdk will be using el0 counter to measure the cycles. I
think, in your SoC, it will be running at 50MHz or 100MHz.So, You can
follow the below scheme to get accurate cycle measurement scheme:

See: http://dpdk.org/doc/guides/prog_guide/profile_app.html
check: 44.2.2. High-resolution cycle counter

  parent reply	other threads:[~2017-10-23 10:06 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-10  9:56 [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue Jia He
2017-10-12 15:53 ` Olivier MATZ
2017-10-12 16:15   ` Stephen Hemminger
2017-10-12 17:05   ` Ananyev, Konstantin
2017-10-12 17:23     ` Jerin Jacob
2017-10-13  1:02       ` Jia He
2017-10-13  1:15         ` Jia He
2017-10-13  1:16         ` Jia He
2017-10-13  1:49           ` Jerin Jacob
2017-10-13  3:23             ` Jia He
2017-10-13  5:57               ` Zhao, Bing
2017-10-13  7:33             ` Jianbo Liu
2017-10-13  8:20               ` Jia He
2017-10-19 10:02           ` Ananyev, Konstantin
2017-10-19 11:18             ` Zhao, Bing
2017-10-19 14:15               ` Ananyev, Konstantin
2017-10-19 20:02                 ` Ananyev, Konstantin
2017-10-20  1:57                   ` Jia He
2017-10-20  5:43                     ` Jerin Jacob
2017-10-23  8:49                       ` Jia He
2017-10-23  9:05                         ` Kuusisaari, Juhamatti
2017-10-23  9:10                           ` Bruce Richardson
2017-10-23 10:06                         ` Jerin Jacob [this message]
2017-10-24  2:04                           ` Jia He
2017-10-25 13:26                             ` Jerin Jacob
2017-10-26  2:27                               ` Jia He
2017-10-31  2:55                               ` Jia He
2017-10-31 11:14                                 ` Jerin Jacob
2017-11-01  2:53                                   ` Jia He
2017-11-01 19:04                                     ` Jerin Jacob
2017-11-02  1:09                                       ` Jia He
2017-11-02  8:57                                       ` Jia He
2017-11-03  2:55                                         ` Jia He
2017-11-03 12:47                                           ` Jerin Jacob
2017-11-01  4:48                                   ` Jia He
2017-11-01 19:10                                     ` Jerin Jacob
2017-10-20  7:03                     ` Ananyev, Konstantin
2017-10-13  0:24     ` Liu, Jie2
2017-10-13  2:12       ` Zhao, Bing
2017-10-13  2:34         ` Jerin Jacob
2017-10-16 10:51       ` Kuusisaari, Juhamatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171023100617.GA17957@jerin \
    --to=jerin.jacob@caviumnetworks.com \
    --cc=bing.zhao@hxt-semitech.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=hejianet@gmail.com \
    --cc=ilovethull@163.com \
    --cc=jia.he@hxt-semitech.com \
    --cc=jie2.liu@hxt-semitech.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.