From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from hr2.samba.org (hr2.samba.org [144.76.82.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3rHwRX1gRYzDqCf for ; Mon, 30 May 2016 09:08:56 +1000 (AEST) Date: Mon, 30 May 2016 09:08:33 +1000 From: Anton Blanchard To: Benjamin Herrenschmidt Cc: "Aneesh Kumar K.V\" , paulus@samba.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, Michael Neuling Subject: Re: [PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64 Message-ID: <20160530090833.4400ac83@kryten> In-Reply-To: <1464557231.3078.185.camel@kernel.crashing.org> References: <1460182444-2468-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1460182444-2468-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20160529210356.66dd944e@kryten> <1464557231.3078.185.camel@kernel.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi Ben, > That is surprising, do we have any idea what specifically increases > the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I > notice in our io.h for example we still do manual ld/std + swap > because old processors didn't know these, we should fix that for > CONFIG_POWER8 (or is it POWER7 that brought these ?). The futex issue seems to be __get_user_pages_fast(): ld r11,0(r6) ... rldicl r8,r11,32,32 rotlwi r28,r11,24 rlwimi r28,r11,8,8,15 rotlwi r6,r8,24 rlwimi r28,r11,8,24,31 rlwimi r6,r8,8,8,15 rlwimi r6,r8,8,24,31 rldicr r28,r28,32,31 or r28,r28,r6 cmpdi cr7,r28,0 beq cr7,2428 That's a whole lot of work just to check if a pte is zero. I assume the reason gcc can't replace this with a byte reversed load is that we access the pte via the READ_ONCE() macro. I see the same issue in unmap_page_range(), __hash_page_64K(), handle_mm_fault(). The other issue I see is when we access a pte via larx/stcx, and then we have no choice but to byte swap it manually. I see that in __hash_page_64K(): rldicl r28,r30,32,32 rotlwi r0,r30,24 rlwimi r0,r30,8,8,15 rotlwi r10,r28,24 rlwimi r0,r30,8,24,31 rlwimi r10,r28,8,8,15 rlwimi r10,r28,8,24,31 rldicr r0,r0,32,31 or r0,r0,r10 hwsync ldarx r12,0,r6 cmpd r12,r11 bne- c00000000004fad0 stdcx. r0,0,r6 bne- c00000000004fab8 hwsync Anton