From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: Jia He <hejianet@gmail.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"Zhao, Bing" <ilovethull@163.com>,
Olivier MATZ <olivier.matz@6wind.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"jia.he@hxt-semitech.com" <jia.he@hxt-semitech.com>,
"jie2.liu@hxt-semitech.com" <jie2.liu@hxt-semitech.com>,
"bing.zhao@hxt-semitech.com" <bing.zhao@hxt-semitech.com>,
"Richardson, Bruce" <bruce.richardson@intel.com>
Subject: Re: [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue
Date: Mon, 23 Oct 2017 15:36:18 +0530 [thread overview]
Message-ID: <20171023100617.GA17957@jerin> (raw)
In-Reply-To: <ab7154a2-a9f8-f12e-b6a0-2805c2065e2e@gmail.com>
-----Original Message-----
> Date: Mon, 23 Oct 2017 16:49:01 +0800
> From: Jia He <hejianet@gmail.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "Zhao, Bing"
> <ilovethull@163.com>, Olivier MATZ <olivier.matz@6wind.com>,
> "dev@dpdk.org" <dev@dpdk.org>, "jia.he@hxt-semitech.com"
> <jia.he@hxt-semitech.com>, "jie2.liu@hxt-semitech.com"
> <jie2.liu@hxt-semitech.com>, "bing.zhao@hxt-semitech.com"
> <bing.zhao@hxt-semitech.com>, "Richardson, Bruce"
> <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
> loading when doing enqueue/dequeue
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
> Thunderbird/52.4.0
>
> Hi Jerin
>
>
> On 10/20/2017 1:43 PM, Jerin Jacob Wrote:
> > -----Original Message-----
> > >
> [...]
> > > dependant on each other.
> > > Thus a memory barrier is neccessary.
> > Yes. The barrier is necessary.
> > In fact, upstream freebsd fixed this issue for arm64. DPDK ring
> > implementation is derived from freebsd's buf_ring.h.
> > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166
> >
> > I think, the only outstanding issue is, how to reduce the performance
> > impact for arm64. I believe using accurate/release semantics instead
> > of rte_smp_rmb() will reduce the performance overhead like similar ring implementations below,
> > freebsd: https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166
> > odp: https://github.com/Linaro/odp/blob/master/platform/linux-generic/pktio/ring.c
> >
> > Jia,
> > 1) Can you verify the use of accurate/release semantics fixes the problem in your
> > platform? like use of atomic_load_acq* in the reference code.
> > 2) If so, What is the overhead between accurate/release and plane smp_smb()
> > barriers. Based on that we need decide what path to take.
> I've tested 3 cases. The new 3rd case is to use the load_acquire barrier
> (half barrier) you mentioned
> at above link.
> The patch seems like:
> @@ -408,8 +466,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp,
> /* Reset n to the initial burst count */
> n = max;
>
> - *old_head = r->prod.head;
> - const uint32_t cons_tail = r->cons.tail;
> + *old_head = atomic_load_acq_32(&r->prod.head);
> + const uint32_t cons_tail =
> atomic_load_acq_32(&r->cons.tail);
>
> @@ -516,14 +576,15 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_s
> /* Restore n as it may change every loop */
> n = max;
>
> - *old_head = r->cons.head;
> - const uint32_t prod_tail = r->prod.tail;
> + *old_head = atomic_load_acq_32(&r->cons.head);
> + const uint32_t prod_tail = atomic_load_acq_32(&r->prod.tail)
> /* The subtraction is done between two unsigned 32bits value
> * (the result is always modulo 32 bits even if we have
> * cons_head > prod_tail). So 'entries' is always between 0
> * and size(ring)-1. */
>
> The half barrier patch passed the fuctional test.
>
> As for the performance comparision on *arm64*(the debug patch is at
> http://dpdk.org/ml/archives/dev/2017-October/079012.html), please see the
> test results
> below:
>
> [case 1] old codes, no barrier
> ============================================
> Performance counter stats for './test --no-huge -l 1-10':
>
> 689275.001200 task-clock (msec) # 9.771 CPUs utilized
> 6223 context-switches # 0.009 K/sec
> 10 cpu-migrations # 0.000 K/sec
> 653 page-faults # 0.001 K/sec
> 1721190914583 cycles # 2.497 GHz
> 3363238266430 instructions # 1.95 insn per cycle
> <not supported> branches
> 27804740 branch-misses # 0.00% of all branches
>
> 70.540618825 seconds time elapsed
>
> [case 2] full barrier with rte_smp_rmb()
> ============================================
> Performance counter stats for './test --no-huge -l 1-10':
>
> 582557.895850 task-clock (msec) # 9.752 CPUs utilized
> 5242 context-switches # 0.009 K/sec
> 10 cpu-migrations # 0.000 K/sec
> 665 page-faults # 0.001 K/sec
> 1454360730055 cycles # 2.497 GHz
> 587197839907 instructions # 0.40 insn per cycle
> <not supported> branches
> 27799687 branch-misses # 0.00% of all branches
>
> 59.735582356 seconds time elapse
>
> [case 1] half barrier with load_acquire
> ============================================
> Performance counter stats for './test --no-huge -l 1-10':
>
> 660758.877050 task-clock (msec) # 9.764 CPUs utilized
> 5982 context-switches # 0.009 K/sec
> 11 cpu-migrations # 0.000 K/sec
> 657 page-faults # 0.001 K/sec
> 1649875318044 cycles # 2.497 GHz
> 591583257765 instructions # 0.36 insn per cycle
> <not supported> branches
> 27994903 branch-misses # 0.00% of all branches
>
> 67.672855107 seconds time elapsed
>
> Please see the context-switches in the perf results
> test result sorted by time is:
> full barrier < half barrier < no barrier
>
> AFAICT, in this case ,the cpu reordering will add the possibility for
> context switching and
> increase the running time.
> Any ideas?
Regarding performance test, it better to use ring perf test case
on _isolated_ cores to measure impact on number of enqueue/dequeue operations.
example:
./build/app/test -c 0xff -n 4
>>ring_perf_autotest
By default, arm64+dpdk will be using el0 counter to measure the cycles. I
think, in your SoC, it will be running at 50MHz or 100MHz.So, You can
follow the below scheme to get accurate cycle measurement scheme:
See: http://dpdk.org/doc/guides/prog_guide/profile_app.html
check: 44.2.2. High-resolution cycle counter
next prev parent reply other threads:[~2017-10-23 10:06 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-10 9:56 [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue Jia He
2017-10-12 15:53 ` Olivier MATZ
2017-10-12 16:15 ` Stephen Hemminger
2017-10-12 17:05 ` Ananyev, Konstantin
2017-10-12 17:23 ` Jerin Jacob
2017-10-13 1:02 ` Jia He
2017-10-13 1:15 ` Jia He
2017-10-13 1:16 ` Jia He
2017-10-13 1:49 ` Jerin Jacob
2017-10-13 3:23 ` Jia He
2017-10-13 5:57 ` Zhao, Bing
2017-10-13 7:33 ` Jianbo Liu
2017-10-13 8:20 ` Jia He
2017-10-19 10:02 ` Ananyev, Konstantin
2017-10-19 11:18 ` Zhao, Bing
2017-10-19 14:15 ` Ananyev, Konstantin
2017-10-19 20:02 ` Ananyev, Konstantin
2017-10-20 1:57 ` Jia He
2017-10-20 5:43 ` Jerin Jacob
2017-10-23 8:49 ` Jia He
2017-10-23 9:05 ` Kuusisaari, Juhamatti
2017-10-23 9:10 ` Bruce Richardson
2017-10-23 10:06 ` Jerin Jacob [this message]
2017-10-24 2:04 ` Jia He
2017-10-25 13:26 ` Jerin Jacob
2017-10-26 2:27 ` Jia He
2017-10-31 2:55 ` Jia He
2017-10-31 11:14 ` Jerin Jacob
2017-11-01 2:53 ` Jia He
2017-11-01 19:04 ` Jerin Jacob
2017-11-02 1:09 ` Jia He
2017-11-02 8:57 ` Jia He
2017-11-03 2:55 ` Jia He
2017-11-03 12:47 ` Jerin Jacob
2017-11-01 4:48 ` Jia He
2017-11-01 19:10 ` Jerin Jacob
2017-10-20 7:03 ` Ananyev, Konstantin
2017-10-13 0:24 ` Liu, Jie2
2017-10-13 2:12 ` Zhao, Bing
2017-10-13 2:34 ` Jerin Jacob
2017-10-16 10:51 ` Kuusisaari, Juhamatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171023100617.GA17957@jerin \
--to=jerin.jacob@caviumnetworks.com \
--cc=bing.zhao@hxt-semitech.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=hejianet@gmail.com \
--cc=ilovethull@163.com \
--cc=jia.he@hxt-semitech.com \
--cc=jie2.liu@hxt-semitech.com \
--cc=konstantin.ananyev@intel.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.