From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3x5bL76BvtzDq6W for ; Mon, 10 Jul 2017 16:44:47 +1000 (AEST) Message-ID: <1499669082.2865.8.camel@kernel.crashing.org> Subject: Re: [PATCH 2/2] powerpc/mm/radix: Synchronize updates to the process table From: Benjamin Herrenschmidt To: Nicholas Piggin Cc: linuxppc dev list , Michael Neuling , "Aneesh Kumar K.V" Date: Mon, 10 Jul 2017 16:44:42 +1000 In-Reply-To: <20170710144006.669cab3a@roar.ozlabs.ibm.com> References: <1499461936.3397.13.camel@kernel.crashing.org> <20170710144006.669cab3a@roar.ozlabs.ibm.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, 2017-07-10 at 14:40 +1000, Nicholas Piggin wrote: > On Fri, 07 Jul 2017 16:12:16 -0500 > Benjamin Herrenschmidt wrote: > > > When writing to the process table, we need to ensure the store is > > visible to a subsequent access by the MMU. We assume we never have > > the PID active while doing the update, so a ptesync/isync pair > > should hopefully be a big enough hammer for our purpose. > > > > Do we need this if it's going from invalid->valid? No. While there is no valid bit in radix, I checked with HW and they will not cache an entry that has an invalid RTS field. We should ensure this gets architected for future impl. though. > > > Signed-off-by: Benjamin Herrenschmidt > > --- > > > > Note: Architecturally, we also need to use a tlbie(l) with RIC=2 > > to flush the process table cache. However this is (very) expensive > > and we know that POWER9 will invalidate its cache when hitting the > > mtpid instruction. > > > > To be safe, we should add the tlbie for any ARCH300 processor we > > don't know about though. (Aneesh, Nick do we need a ftr bit ?) > > Good question, I'm not sure. Aside from this particular thing, it > seems like a good idea in general to add implementation specific > tests into the ftr framework. > > We could add the PVR into it so we don't have to pollute FTR bits. > The POWER9_DD1 bit for example could just be a PVR mask and cmp. Reading the PVR isn't necessarily cheap though, we may want to cache it. > > > > > arch/powerpc/mm/mmu_context_book3s64.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c > > index 9404b5e..e3e2803 100644 > > --- a/arch/powerpc/mm/mmu_context_book3s64.c > > +++ b/arch/powerpc/mm/mmu_context_book3s64.c > > @@ -138,6 +138,14 @@ static int radix__init_new_context(struct mm_struct *mm) > > rts_field = radix__get_tree_size(); > > process_tb[index].prtb0 = cpu_to_be64(rts_field | __pa(mm->pgd) | RADIX_PGD_INDEX_SIZE); > > > > + /* > > + * Order the above store with subsequent update of the PID > > + * register (at which point HW can start loading/caching > > + * the entry) and the corresponding load by the MMU from > > + * the L2 cache. > > + */ > > + asm volatile("ptesync;isync" : : : "memory"); > > + > > mm->context.npu_context = NULL; > > > > return index; > >