From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: Jia He <hejianet@gmail.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"Zhao, Bing" <ilovethull@163.com>,
Olivier MATZ <olivier.matz@6wind.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"jia.he@hxt-semitech.com" <jia.he@hxt-semitech.com>,
"jie2.liu@hxt-semitech.com" <jie2.liu@hxt-semitech.com>,
"bing.zhao@hxt-semitech.com" <bing.zhao@hxt-semitech.com>,
"Richardson, Bruce" <bruce.richardson@intel.com>
Subject: Re: [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue
Date: Fri, 20 Oct 2017 11:13:21 +0530 [thread overview]
Message-ID: <20171020054319.GA4249@jerin> (raw)
In-Reply-To: <3e580cd7-2854-d855-be9c-7c4ce06e3ed5@gmail.com>
-----Original Message-----
> Date: Fri, 20 Oct 2017 09:57:58 +0800
> From: Jia He <hejianet@gmail.com>
> To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "Zhao, Bing"
> <ilovethull@163.com>, Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: Olivier MATZ <olivier.matz@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>,
> "jia.he@hxt-semitech.com" <jia.he@hxt-semitech.com>,
> "jie2.liu@hxt-semitech.com" <jie2.liu@hxt-semitech.com>,
> "bing.zhao@hxt-semitech.com" <bing.zhao@hxt-semitech.com>, "Richardson,
> Bruce" <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod
> loading when doing enqueue/dequeue
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
> Thunderbird/52.4.0
>
>
>
> On 10/20/2017 4:02 AM, Ananyev, Konstantin Wrote:
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > > Sent: Thursday, October 19, 2017 3:15 PM
> > > To: Zhao, Bing <ilovethull@163.com>; Jia He <hejianet@gmail.com>; Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > Cc: Olivier MATZ <olivier.matz@6wind.com>; dev@dpdk.org; jia.he@hxt-semitech.com; jie2.liu@hxt-semitech.com; bing.zhao@hxt-
> > > semitech.com
> > > Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue
> > >
> > > > Hi,
> > > >
> > > > On 2017/10/19 18:02, Ananyev, Konstantin wrote:
> > > > > Hi Jia,
> > > > >
> > > > > > Hi
> > > > > >
> > > > > >
> > > > > > On 10/13/2017 9:02 AM, Jia He Wrote:
> > > > > > > Hi Jerin
> > > > > > >
> > > > > > >
> > > > > > > On 10/13/2017 1:23 AM, Jerin Jacob Wrote:
> > > > > > > > -----Original Message-----
> > > > > > > > > Date: Thu, 12 Oct 2017 17:05:50 +0000
> > > > > > > > >
> > > > > > [...]
> > > > > > > > On the same lines,
> > > > > > > >
> > > > > > > > Jia He, jie2.liu, bing.zhao,
> > > > > > > >
> > > > > > > > Is this patch based on code review or do you saw this issue on any of
> > > > > > > > the
> > > > > > > > arm/ppc target? arm64 will have performance impact with this change.
> > > > > > sorry, miss one important information
> > > > > > Our platform is an aarch64 server with 46 cpus.
> > > > > > If we reduced the involved cpu numbers, the bug occurred less frequently.
> > > > > >
> > > > > > Yes, mb barrier impact the performance, but correctness is more
> > > > > > important, isn't it ;-)
> > > > > > Maybe we can find any other lightweight barrier here?
> > > > > >
> > > > > > Cheers,
> > > > > > Jia
> > > > > > > Based on mbuf_autotest, the rte_panic will be invoked in seconds.
> > > > > > >
> > > > > > > PANIC in test_refcnt_iter():
> > > > > > > (lcore=0, iter=0): after 10s only 61 of 64 mbufs left free
> > > > > > > 1: [./test(rte_dump_stack+0x38) [0x58d868]]
> > > > > > > Aborted (core dumped)
> > > > > > >
> > > > > So is it only reproducible with mbuf refcnt test?
> > > > > Could it be reproduced with some 'pure' ring test
> > > > > (no mempools/mbufs refcnt, etc.)?
> > > > > The reason I am asking - in that test we also have mbuf refcnt updates
> > > > > (that's what for that test was created) and we are doing some optimizations here too
> > > > > to avoid excessive atomic updates.
> > > > > BTW, if the problem is not reproducible without mbuf refcnt,
> > > > > can I suggest to extend the test with:
> > > > > - add a check that enqueue() operation was successful
> > > > > - walk through the pool and check/printf refcnt of each mbuf.
> > > > > Hopefully that would give us some extra information what is going wrong here.
> > > > > Konstantin
> > > > >
> > > > >
> > > > Currently, the issue is only found in this case here on the ARM
> > > > platform, not sure how it is going with the X86_64 platform
> > > I understand that it is only reproducible on arm so far.
> > > What I am asking - with dpdk is there any other way to reproduce it (on arm)
> > > except then running mbuf_autotest?
> > > Something really simple that not using mbuf/mempool etc?
> > > Just do dequeue/enqueue from multiple threads and check data integrity at the end?
> > > If not - what makes you think that the problem is precisely in rte_ring code?
> > > Why not in rte_mbuf let say?
> > Actually I reread your original mail and finally get your point.
> > If I understand you correctly the problem with read reordering here is that
> > after we read prot.tail but before we read cons.head
> > both cons.head and prod.tail might be updated,
> > but for us prod.tail change might be unnoticed.
> > As an example:
> > time 0 (cons.head == 0, prod.tail == 0):
> > prod_tail = r->prod.tail; /* due read reordering */
> > /* prod_tail == 0 */
> >
> > time 1 (cons.head ==5, prod.tail == 5):
> > *old_head = r->cons.head;
> > /* cons.head == 5 */
> > *entries = (prod_tail - *old_head);
> > /* *entries == (0 - 5) == 0xFFFFFFFB */
> >
> > And we will move our cons.head forward, even though there are no filled entries in the ring.
> > Is that right?
> Yes
> > As I side notice, I wonder why we have here:
> > *entries = (prod_tail - *old_head);
> > instead of:
> > *entries = r->size + prod_tail - *old_head;
> > ?
> Yes, I agree with you at this code line.
> But reordering will still mess up things even after this change(I have
> tested, still the same as before)
> I thought the *entries is a door to prevent consumer from moving forward too
> fast than the producer.
> But in some cases, it is correct that prod_tail is smaller than *old_head
> due to the cirular queue.
> In other cases, it is incorrect because of read/read reordering.
>
> AFAICT, the root cause here is the producer tail and cosumer head are
> dependant on each other.
> Thus a memory barrier is neccessary.
Yes. The barrier is necessary.
In fact, upstream freebsd fixed this issue for arm64. DPDK ring
implementation is derived from freebsd's buf_ring.h.
https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166
I think, the only outstanding issue is, how to reduce the performance
impact for arm64. I believe using accurate/release semantics instead
of rte_smp_rmb() will reduce the performance overhead like similar ring implementations below,
freebsd: https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L166
odp: https://github.com/Linaro/odp/blob/master/platform/linux-generic/pktio/ring.c
Jia,
1) Can you verify the use of accurate/release semantics fixes the problem in your
platform? like use of atomic_load_acq* in the reference code.
2) If so, What is the overhead between accurate/release and plane smp_smb()
barriers. Based on that we need decide what path to take.
Note:
This issue wont come in all the arm64 implementation. it comes on arm64
implementation with OOO(out of order) implementations.
>
> Cheers,
> Jia
>
> >
> > Konstantin
> >
> > > > . In another
> > > > mail of this thread, we've made a simple test based on this and captured
> > > > some information and I pasted there.(I pasted the patch there :-))
> > > Are you talking about that one:
> > > http://dpdk.org/dev/patchwork/patch/30405/
> > > ?
> > > It still uses test/test/test_mbuf.c...,
> > > but anyway I don't really understand how mbuf_autotest supposed
> > > to work with these changes:
> > > @@ -730,7 +739,7 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter,
> > > rte_ring_enqueue(refcnt_mbuf_ring, m);
> > > }
> > > }
> > > - rte_pktmbuf_free(m);
> > > + // rte_pktmbuf_free(m);
> > > }
> > > @@ -741,6 +750,12 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter,
> > > while (!rte_ring_empty(refcnt_mbuf_ring))
> > > ;
> > >
> > > + if (NULL != m) {
> > > + if (1 != rte_mbuf_refcnt_read(m))
> > > + printf("m ref is %u\n", rte_mbuf_refcnt_read(m));
> > > + rte_pktmbuf_free(m);
> > > + }
> > > +
> > > /* check that all mbufs are back into mempool by now */
> > > for (wn = 0; wn != REFCNT_MAX_TIMEOUT; wn++) {
> > > if ((i = rte_mempool_avail_count(refcnt_pool)) == n) {
> > >
> > > That means all your mbufs (except the last one) will still be allocated.
> > > So the test would fail - as it should, I think.
> > >
> > > > And
> > > > it seems that Juhamatti & Jacod found some reverting action several
> > > > months ago.
> > > Didn't get that one either.
> > > Konstantin
>
next prev parent reply other threads:[~2017-10-20 5:43 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-10 9:56 [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue Jia He
2017-10-12 15:53 ` Olivier MATZ
2017-10-12 16:15 ` Stephen Hemminger
2017-10-12 17:05 ` Ananyev, Konstantin
2017-10-12 17:23 ` Jerin Jacob
2017-10-13 1:02 ` Jia He
2017-10-13 1:15 ` Jia He
2017-10-13 1:16 ` Jia He
2017-10-13 1:49 ` Jerin Jacob
2017-10-13 3:23 ` Jia He
2017-10-13 5:57 ` Zhao, Bing
2017-10-13 7:33 ` Jianbo Liu
2017-10-13 8:20 ` Jia He
2017-10-19 10:02 ` Ananyev, Konstantin
2017-10-19 11:18 ` Zhao, Bing
2017-10-19 14:15 ` Ananyev, Konstantin
2017-10-19 20:02 ` Ananyev, Konstantin
2017-10-20 1:57 ` Jia He
2017-10-20 5:43 ` Jerin Jacob [this message]
2017-10-23 8:49 ` Jia He
2017-10-23 9:05 ` Kuusisaari, Juhamatti
2017-10-23 9:10 ` Bruce Richardson
2017-10-23 10:06 ` Jerin Jacob
2017-10-24 2:04 ` Jia He
2017-10-25 13:26 ` Jerin Jacob
2017-10-26 2:27 ` Jia He
2017-10-31 2:55 ` Jia He
2017-10-31 11:14 ` Jerin Jacob
2017-11-01 2:53 ` Jia He
2017-11-01 19:04 ` Jerin Jacob
2017-11-02 1:09 ` Jia He
2017-11-02 8:57 ` Jia He
2017-11-03 2:55 ` Jia He
2017-11-03 12:47 ` Jerin Jacob
2017-11-01 4:48 ` Jia He
2017-11-01 19:10 ` Jerin Jacob
2017-10-20 7:03 ` Ananyev, Konstantin
2017-10-13 0:24 ` Liu, Jie2
2017-10-13 2:12 ` Zhao, Bing
2017-10-13 2:34 ` Jerin Jacob
2017-10-16 10:51 ` Kuusisaari, Juhamatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171020054319.GA4249@jerin \
--to=jerin.jacob@caviumnetworks.com \
--cc=bing.zhao@hxt-semitech.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=hejianet@gmail.com \
--cc=ilovethull@163.com \
--cc=jia.he@hxt-semitech.com \
--cc=jie2.liu@hxt-semitech.com \
--cc=konstantin.ananyev@intel.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.