From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by smtp.lore.kernel.org (Postfix) with ESMTP id C015CFF8868 for ; Tue, 28 Apr 2026 11:54:59 +0000 (UTC) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D2F9D40395; Tue, 28 Apr 2026 13:54:58 +0200 (CEST) Received: from dkmailrelay1.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 35C9D402A3 for ; Tue, 28 Apr 2026 13:54:57 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesys.local [192.168.4.10]) by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id E07ED20778; Tue, 28 Apr 2026 13:54:56 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [PATCH v4 1/2] ring: make soring to always finalize its own stage Date: Tue, 28 Apr 2026 13:54:54 +0200 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F65827@smartserver.smartshare.dk> In-Reply-To: <20260423091625.123642-2-konstantin.ananyev@huawei.com> X-MimeOLE: Produced By Microsoft Exchange V6.5 X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH v4 1/2] ring: make soring to always finalize its own stage Thread-Index: AdzTAfMVKeREnAHdRQyie70cWSqU5wEAzGsw References: <20260417212358.212692-1-konstantin.ananyev@huawei.com> <20260423091625.123642-1-konstantin.ananyev@huawei.com> <20260423091625.123642-2-konstantin.ananyev@huawei.com> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Konstantin Ananyev" , Cc: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com] > Sent: Thursday, 23 April 2026 11.16 >=20 > SORING internal finalize() function is MT-safe and can be called from > multiple places: from it's own stage release(), also from 'acquire()' > for next stage or even from consumer's 'dequeue(). > But calling finalize() from not its own stage release() function > creates extra contention and might slow-down ring operations, > especially > for the cases when we have multiple threads doing acquire/release > for the same stage. > We can't compeletely avoid calling finalize() from all these multiple > places, as it can in some rare cases break soring behavior. > But we can make release() for given stage to invoke it always. > That increases number of 'finalize()' operations done from 'release()' > for current stage, and helps to minimize number of finalize() calls > from > other stages, which in turn, help to reduce the contention. > According to the soring_stress_autotest, for multiple workers (8+) > it reduces number of cycles spent by 1.5x-1.8x factor. > For l3fwd-like workload it improves things by ~20%. > For small number of workers, I didn't observe any serious change. > Note that it doesn't introduce any changes in functionality provided. >=20 > Signed-off-by: Konstantin Ananyev Good idea. Not peeking into the tail to determine if finalize() might be omitted = also makes release() cleaner. Removing an optimization that was not really an optimization. :-) Acked-by: Morten Br=F8rup > --- > lib/ring/soring.c | 33 +++++++++++++++------------------ > 1 file changed, 15 insertions(+), 18 deletions(-) >=20 > diff --git a/lib/ring/soring.c b/lib/ring/soring.c > index 3b90521bdb..4bc2321fb5 100644 > --- a/lib/ring/soring.c > +++ b/lib/ring/soring.c > @@ -37,24 +37,24 @@ > * plus current stage index). > * 'release()' extracts old head value from provided ftoken and = checks > that > * corresponding 'state[]' contains expected values(mostly for sanity > - * purposes). > - * Then it marks this state[] with 'SORING_ST_FINISH' flag to = indicate > - * that given subset of objects was released. > - * After that, it checks does old head value equals to current tail > value? > - * If yes, then it performs 'finalize()' operation, otherwise > 'release()' > - * just returns (without spinning on stage tail value). > - * As updated state[] is shared by all threads, some other thread can > do > - * 'finalize()' for given stage. > - * That allows 'release()' to avoid excessive waits on the tail = value. > + * purposes). Then it marks this state[] with 'SORING_ST_FINISH' flag > to > + * indicate that given subset of objects was released. > + * After that, it calls 'finalize()'. > * Main purpose of 'finalize()' operation is to walk through = 'state[]' > * from current stage tail up to its head, check state[] and move > stage tail > * through elements that already are in SORING_ST_FINISH state. > * Along with that, corresponding state[] values are reset to zero. > - * Note that 'finalize()' for given stage can be done from multiple > places: > + * Note that updated state[] is shared by all threads, so > + * 'finalize()' for given stage can be done from multiple places: > * 'release()' for that stage or from 'acquire()' for next stage > * even from consumer's 'dequeue()' - in case given stage is the last > one. > * So 'finalize()' has to be MT-safe and inside it we have to > - * guarantee that only one thread will update state[] and stage's = tail > values. > + * guarantee that only one thread at a time will update state[] and > + * stage's tail values (sort of critical-section). > + * When multiple threads trying to do finalize() for the same stage, > + * simultaneously one thread will win the race and do all the pending > + * updates, while others will simply return (kind of try-lock > scenario). > + * That allows 'release()' to avoid excessive waits on the tail = value. > */ >=20 > #include "soring.h" > @@ -442,7 +442,7 @@ static __rte_always_inline void > soring_release(struct rte_soring *r, const void *objs, > const void *meta, uint32_t stage, uint32_t n, uint32_t ftoken) > { > - uint32_t idx, pos, tail; > + uint32_t idx, pos; > struct soring_stage *stg; > union soring_state st; >=20 > @@ -479,12 +479,9 @@ soring_release(struct rte_soring *r, const void > *objs, > rte_atomic_store_explicit(&r->state[idx].raw, st.raw, > rte_memory_order_relaxed); >=20 > - /* try to do finalize(), if appropriate */ > - tail =3D rte_atomic_load_explicit(&stg->sht.tail.pos, > - rte_memory_order_relaxed); > - if (tail =3D=3D pos) > - __rte_soring_stage_finalize(&stg->sht, stage, r->state, r- > >mask, > - r->capacity); > + /* now, try to do finalize() */ > + __rte_soring_stage_finalize(&stg->sht, stage, r->state, r->mask, > + r->capacity); > } >=20 > /* > -- > 2.51.0