From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerin Jacob Subject: Re: [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API Date: Fri, 13 Jan 2017 21:51:53 +0530 Message-ID: <20170113162152.GC17956@localhost.localdomain> References: <1482832175-27199-1-git-send-email-jerin.jacob@caviumnetworks.com> <1484212646-10338-1-git-send-email-jerin.jacob@caviumnetworks.com> <1484212646-10338-16-git-send-email-jerin.jacob@caviumnetworks.com> <6bb9980b-f546-38d5-044a-63507510f6a5@intel.com> <20170113081641.GA17635@localhost.localdomain> <20170113145753.GB13558@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: , , , , , , , John Griffin , Fiona Trahe , Deepak Kumar Jain To: Ferruh Yigit Return-path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0065.outbound.protection.outlook.com [104.47.40.65]) by dpdk.org (Postfix) with ESMTP id 1502A1E2B for ; Fri, 13 Jan 2017 17:22:13 +0100 (CET) Content-Disposition: inline In-Reply-To: List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Jan 13, 2017 at 03:50:59PM +0000, Ferruh Yigit wrote: > On 1/13/2017 2:57 PM, Jerin Jacob wrote: > > On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote: > >> On 1/13/2017 8:17 AM, Jerin Jacob wrote: > >>> On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote: > >>>> Hi Jerin, > >>>> > >>>> On 1/12/2017 9:17 AM, Jerin Jacob wrote: > >>>> <...> > >>>> > >>>>> +#include > >>>>> + > >>>>> /* CSR write macro */ > >>>>> -#define ADF_CSR_WR(csrAddr, csrOffset, val) \ > >>>>> - (void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \ > >>>>> - = (val))) > >>>>> +#define ADF_CSR_WR(csrAddr, csrOffset, val) \ > >>>>> + rte_write32(val, (((uint8_t *)csrAddr) + csrOffset)) > >>>> > >>>> For IA, this update introduces an extra compiler barrier (rte_io_wmb()), > >>>> which is indeed not a must, is this correct? > >>> > >>> AFAIK, Compiler barrier is required for IA. I am not an IA expert, if > >>> someone thinks it needs to changed then I can fix it in following commit > >>> in this patch series by making rte_io_wmb() and rte_io_rmb() as empty. > >>> > >>> Let me know. > >>> > >>> AFAIK, Linux kernel code has a barrier in readl/writel for IA. > >>> > >>> Typically we don't use any non relaxed versions in fast path.In fast > >>> typically all the drivers has explicit write barrier for doorbell write > >>> and followed by a relaxed version of write. IMO, In any event, it won't > >>> generate performance regression. > >>> > >>> [dpdk-master] $ git show > >>> 70c343bdc8c33a51a9db23cd58122bdfc120a58f > >>> commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f > >>> Author: Jerin Jacob > >>> Date: Mon Dec 5 06:36:49 2016 +0530 > >>> > >>> eal/x86: define I/O device memory barriers for IA > >>> > >>> The patch does not provide any functional change for IA. > >>> I/O barriers are mapped to existing smp barriers. > >>> > >>> CC: Bruce Richardson > >>> CC: Konstantin Ananyev > >>> Signed-off-by: Jerin Jacob > >>> > >>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>> index 00b1cdf..4eac666 100644 > >>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >>> @@ -61,6 +61,12 @@ extern "C" { > >>> > >>> #define rte_smp_rmb() rte_compiler_barrier() > >>> > >>> +#define rte_io_mb() rte_mb() > >>> + > >>> +#define rte_io_wmb() rte_compiler_barrier() > >>> + > >>> +#define rte_io_rmb() rte_compiler_barrier() > >>> + > >>> /*------------------------- 16 bit atomic operations > >>> * -------------------------*/ > >>> > >>> #ifndef RTE_FORCE_INTRINSICS > >>> > >>>> > >>>> If so, does it make sense to override these functions for x86, and make > >>>> rte_writeX = rte_writeX_relaxed > >>>> rte_readX = rte_readX_relaxed > >>>> > >>>>> > >>>>> /* CSR read macro */ > >>>>> -#define ADF_CSR_RD(csrAddr, csrOffset) \ > >>>>> - (*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset))) > >>>>> +#define ADF_CSR_RD(csrAddr, csrOffset) \ > >>>>> + rte_read32((((uint8_t *)csrAddr) + csrOffset)) > >>>> > >>>> This patchset both introduces new rte_readX/rte_writeX functions, also > >>>> applies them into drivers. > >>>> > >>>> While applying them, it changes the behavior. > >>>> Like above code was doing a read, but after update it does read and > >>>> read_memory_barrier. > >>>> > >>>> What do you think this patchset updates usage in a manner that keeps > >>>> behavior exact same. Like using rte_read32_relaxed for this case. > >>>> And doing architecture related updates in a different patchset? > >>> > >>> Need to use rte_read32 at this commit otherwise it will break for ARM. > >>> That's was all point for this patchset. > >> > >> Why it breaks the ARM, is it because rte_*mb() updated for ARM in this > >> patchset (patch 7/29) ? > > > > Yes. > > > > > >> > >> I believe it is good to make these modifications in two phase: > > > > It is in two phases only. First introduced the API with implementation and > > enabled in each driver. Why did you think other-way around it is better? > > For two things: > 1- If something goes wrong, find the source of problem easier. How? Are you suggesting like this below, 0) Introduce rte_io_?mb() 1) Introduce readxx_relaxed and writexx_relaxed 2) Change all the drivers with readxx_relaxed and writexx_relaxed(15 change sets) to keep the same behavior 3) Introduce readxx and writexx 4) revert step 2 changes and make driver based on readxx and writexx(again 15 change sets) Instead of(existing one) 0) Introduce rte_io_?mb() 1) Introduce readxx_relaxed and writexx_relaxed 2) Introduce readxx and writexx 3) Change all the drivers with readxx_[relaxed] and writexx_[relaxed] Proposed scheme makes driver authors to review two check-ins. And git bisect fail on fourth case of instead of thrird case with existing one. Not sure what we soloving here? Thomas, Any thoughts? > 2- Make architectural changes obvious, right now it is a little hard to > see, and this again for item 1. > > But I also would like to hear more comments before you change/try anything. > > > I can rework and test if there is any value addition. If you concerned > > about git bisect ability then I don't think we are loosing that in this > > model. > > > > Thoughts? > > > >> - First replace old usage with rte_readX/rte_writeX while keeping exact > >> same behavior > >> > >> - Second, do architecture specific changes. Both in eal and drivers > >> level if required. > >> > >> Thanks, > >> ferruh > >> > >>> For performance regression, we can always verify by taking delta > >>> between this changeset and the previous changeset. If you think, I need > >>> to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that > >>> as well. > >>> > >>> > >>>> > >>>> This both makes easy to see architecture specific updates, and makes > >>>> easy to trace any possible performance issues by this patchset. > >>>> > >>>>> > >>>>> #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL > >>>>> #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL > >>>>> > >>>> > >> >