From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
Cc: dev@dpdk.org, stable@dpdk.org, olivier.matz@6wind.com,
chaozhu@linux.vnet.ibm.com, konstantin.ananyev@intel.com
Subject: Re: [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64
Date: Tue, 17 Jul 2018 09:04:13 +0530 [thread overview]
Message-ID: <20180717033410.GA3344@jerin> (raw)
In-Reply-To: <CALBPOTWYxCN3+DL_5Ozx3parn+hyo582BJgrjRzicPOEwR1CcA@mail.gmail.com>
-----Original Message-----
> Date: Tue, 17 Jul 2018 11:54:18 +0900
> From: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: dev@dpdk.org, stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64
>
Cc: olivier.matz@6wind.com
Cc: chaozhu@linux.vnet.ibm.com
Cc: konstantin.ananyev@intel.com
>
> > Adding rte_smp_rmb() cause performance regression on non x86 platforms.
> > Having said that, load-load barrier can be expressed very well with C11 memory
> > model. I guess ppc64 supports C11 memory model. If so,
> > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check
> > original issue?
>
> Yes, the performance regression happens on non-x86 with single
> producer/consumer.
> The average latency of an enqueue was increased from 21 nsec to 24 nsec in my
> simple experiment. But, I think it is worth it.
That varies to machine to machine. What is the burst size etc.
>
>
> I also tested C11 rte_ring, however, it caused the same race condition in ppc64.
> I tried to fix the C11 problem as well, but I also found the C11
> rte_ring had other potential
> incorrect choices of memory orders, which caused another race
> condition in ppc64.
Does it happens on all ppc64 machines? Or on a specific machine?
Is following tests are passing on your system without the patch?
test/test/test_ring_perf.c
test/test/test_ring.c
>
> For example,
> __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), but
> I am not sure why the load-acquire is used for the compare exchange.
It correct as per C11 acquire and release semantics.
> Also in update_tail, the pause can be called before the data copy because
> of ht->tail load without atomic_load_n.
>
> The memory order is simply difficult, so it might take a bit longer
> time to check
> if the code is correct. I think I can fix the C11 rte_ring as another patch.
>
> >>
> >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64.
> >> It uses a single consumer and multiple producers for a rte_ring.
> >> The problem was a load-load reorder in rte_ring_sc_dequeue_bulk().
> >
> > Adding rte_smp_rmb() cause performance regression on non x86 platforms.
> > Having said that, load-load barrier can be expressed very well with C11 memory
> > model. I guess ppc64 supports C11 memory model. If so,
> > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check
> > original issue?
> >
> >>
> >> The reordered loads happened on r->prod.tail in
There is rte_smp_rmb() just before reading r->prod.tail in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_rte_ring_move_cons_head(). Would that not suffice the requirement?
Can you check adding compiler barrier and see is compiler is reordering
the stuff?
DPDK's ring implementation is based freebsd's ring implementation, I
don't see need for such barrier
https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h
If it is something specific to ppc64 or a specific ppc64 machine, we
could add a compile option as it is arch specific to avoid performance
impact on other architectures.
> >> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in
> >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control
> >> dependency, but the code does not satisfy it. Note that they are
> >> not reordered if __rte_ring_move_cons_head() with is_sc != 1 because
> >> cmpset invokes a read barrier.
> >>
> >> The paired stores on these loads are in ENQUEUE_PTRS() and
> >> update_tail(). Simplified code around the reorder is the following.
> >>
> >> Consumer Producer
> >> load idx[ring]
> >> store idx[ring]
> >> store r->prod.tail
> >> load r->prod.tail
> >>
> >> In this case, the consumer loads old idx[ring] and confirms the load
> >> is valid with the new r->prod.tail.
> >>
> >> I added a read barrier in the case where __IS_SC is passed to
> >> __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head()
> >> to avoid similar problems with a single producer.
> >>
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
> >> ---
> >> lib/librte_ring/rte_ring_generic.h | 10 ++++++----
> >> 1 file changed, 6 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
> >> index ea7dbe5b9..477326180 100644
> >> --- a/lib/librte_ring/rte_ring_generic.h
> >> +++ b/lib/librte_ring/rte_ring_generic.h
> >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
> >> return 0;
> >>
> >> *new_head = *old_head + n;
> >> - if (is_sp)
> >> + if (is_sp) {
> >> + rte_smp_rmb();
> >> r->prod.head = *new_head, success = 1;
> >> - else
> >> + } else
> >> success = rte_atomic32_cmpset(&r->prod.head,
> >> *old_head, *new_head);
> >> } while (unlikely(success == 0));
> >> @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
> >> return 0;
> >>
> >> *new_head = *old_head + n;
> >> - if (is_sc)
> >> + if (is_sc) {
> >> + rte_smp_rmb();
> >> r->cons.head = *new_head, success = 1;
> >> - else
> >> + } else
> >> success = rte_atomic32_cmpset(&r->cons.head, *old_head,
> >> *new_head);
> >> } while (unlikely(success == 0));
> >> --
> >> 2.17.1
> >>
next prev parent reply other threads:[~2018-07-17 3:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-12 2:44 [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 Takeshi Yoshimura
2018-07-12 17:08 ` Jerin Jacob
2018-07-17 2:54 ` Takeshi Yoshimura
2018-07-17 3:34 ` Jerin Jacob [this message]
2021-03-24 21:45 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2021-03-28 1:00 ` Honnappa Nagarahalli
2021-06-16 7:14 ` [dpdk-dev] 回复: " Feifei Wang
2021-06-16 16:37 ` [dpdk-dev] " Honnappa Nagarahalli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180717033410.GA3344@jerin \
--to=jerin.jacob@caviumnetworks.com \
--cc=chaozhu@linux.vnet.ibm.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
--cc=olivier.matz@6wind.com \
--cc=stable@dpdk.org \
--cc=t.yoshimura8869@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.