From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olivier MATZ Subject: Re: [PATCH] mbuf: optimize refcnt handling during free Date: Fri, 27 Mar 2015 11:50:13 +0100 Message-ID: <551535E5.7020207@6wind.com> References: <1427393457-7080-1-git-send-email-zoltan.kiss@linaro.org> <20150327102533.GA5375@hmsreliant.think-freely.org> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: "dev-VfR2kkLFssw@public.gmane.org" To: Neil Horman , "Wiles, Keith" Return-path: In-Reply-To: <20150327102533.GA5375-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" Hi Neil, On 03/27/2015 11:25 AM, Neil Horman wrote: > On Thu, Mar 26, 2015 at 09:00:33PM +0000, Wiles, Keith wrote: >> >> >> On 3/26/15, 1:10 PM, "Zoltan Kiss" wrote: >> >>> The current way is not the most efficient: if m->refcnt is 1, the sec= ond >>> condition never evaluates, and we set it to 0. If refcnt > 1, the 2nd >>> condition fails again, although the code suggest otherwise to branch >>> prediction. Instead we should keep the second condition only, and rem= ove >>> the >>> duplicate set to zero. >>> >>> Signed-off-by: Zoltan Kiss >>> --- >>> lib/librte_mbuf/rte_mbuf.h | 5 +---- >>> 1 file changed, 1 insertion(+), 4 deletions(-) >>> >>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h >>> index 17ba791..3ec4024 100644 >>> --- a/lib/librte_mbuf/rte_mbuf.h >>> +++ b/lib/librte_mbuf/rte_mbuf.h >>> @@ -764,10 +764,7 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) >>> { >>> __rte_mbuf_sanity_check(m, 0); >>> >>> - if (likely (rte_mbuf_refcnt_read(m) =3D=3D 1) || >>> - likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) { >>> - >>> - rte_mbuf_refcnt_set(m, 0); >>> + if (likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) { >>> >>> /* if this is an indirect mbuf, then >>> * - detach mbuf >> >> I fell for this one too, but read Bruce=B9s email >> http://dpdk.org/ml/archives/dev/2015-March/014481.html >=20 > This is still the right thing to do though, Bruce's reasoning is errone= ous. > Just because the return from rte_mbuf_refcnt_read returns 1, doesn't me= an you > are the last user of the mbuf, you are only guaranteed that if the upda= te > operation returns zero. >=20 > In other words: > rte_mbuf_refcnt_update(m, -1) >=20 > is an atomic operation >=20 > if (likely (rte_mbuf_refcnt_read(m) =3D=3D 1) || > likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)) { >=20 >=20 > is not. >=20 > To illustrate, on two cpus, this might occur: >=20 > CPU0 CPU1 > rte_mbuf_refcnt_read ... > returns 1 rte_mbuf_refcnt_read > ... returns 1 > execute if clause execute if clause >=20 > In the above scenario both cpus fell into the if clause because they bo= th held a > pointer to the same buffer and both got a return value of one, so they = skipped > the update portion of the if clause and both executed the internal bloc= k of the > conditional expression. you might be tempted to think thats ok, since = that > block just sets the refcnt to zero, and doing so twice isn't harmful, b= ut the > entire purpose of that if conditional above was to ensure that only one > execution context ever executed the conditional for a given buffer. Lo= ok at > what else happens in that conditional: I disagree, I also spent some time to think about this code, and I think Bruce is right here. If you read rte_mbuf_refcnt and it returns 1, it means you are the last user, so no other core references the mbuf anymore. Your scenario is not possible, because 2 CPUs do not have the right to access to a mbuf pointer at the same time. It's like writing data in the mbuf while reading it on another core. If you think your scenario can happen, could you give an example of code that would led to such case? If you want to use a mbuf on 2 CPUs at the same time, you have to clone it first, and in this case the reference counter would be at least 2, preventing your case to happen Olivier >=20 > static inline struct rte_mbuf* __attribute__((always_inline)) > __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > { > __rte_mbuf_sanity_check(m, 0); >=20 > if (likely (rte_mbuf_refcnt_read(m) =3D=3D 1) || > likely (rte_mbuf_refcnt_update(m, -1) =3D=3D 0)= ) { >=20 > rte_mbuf_refcnt_set(m, 0); >=20 > /* if this is an indirect mbuf, then > * - detach mbuf > * - free attached mbuf segment > */ > if (RTE_MBUF_INDIRECT(m)) { > struct rte_mbuf *md =3D RTE_MBUF_FROM_BADDR(m->= buf_addr); > rte_pktmbuf_detach(m); > if (rte_mbuf_refcnt_update(md, -1) =3D=3D 0) > __rte_mbuf_raw_free(md); > } > return(m); > } > return (NULL); > } >=20 > If the buffer is indirect, another refcnt update occurs to the buf_addr= mbuf, > and in the scenario I outlined above, that refcnt will underflow, likel= y causing > a buffer leak. Additionally, the return code of this function is desig= ned to > indicate to the caller if they were the last user of the buffer. In th= e above > scenario, two execution contexts will be told that they were, which is = wrong. >=20 > Zoltans patch is a good fix >=20 > Acked-by: Neil Horman >=20