From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [PATCH] locking/rwsem: Remove arch specific rwsem files Date: Mon, 11 Feb 2019 18:04:47 +0100 Message-ID: <20190211170447.GO32477@hirez.programming.kicks-ass.net> References: <1549850450-10171-1-git-send-email-longman@redhat.com> <20190211115833.GY32511@hirez.programming.kicks-ass.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ZLwKf7h7zqkbTfEyvmBMOZiwClbevUvDh66/eDcKDLY=; b=Ogs2aKEParxBSW PGZe4Ohmsz+kj44qPLPy0Aw0NBdb/aUTWWeBbkKTpsFYLvcG7vM2t5KLg0Ji0rzKQWFt6+O+/ZmUO dUX8aIXo8s9+JWzAB9F51q/Cg82tj6RmyKtzB8VDzE2yvRCcW1Di87jqXP9pd+fbbIW26bVp+3EwF fBXKL5q80itm1T9OItCEbgqM7nsgSuRV1rDH8mS8WoEzZFC32MKSY4BnN3zkUuzHnBI1f/MiAXDGl tMQGLLx7ErzjEg2F2WX/1rhz5541CHSV+Ug9xQimzMs5Bf99/G1CL0XDgJ1jzYfPqkDGlOAJRsk/n GjfP9+U5Hjk3Bxk8g0Ow==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=NtW9vSG4B9Be8vJJa6zgqc1RANLcAk+oi8MXCzEZ4VU=; b=Y1t9jigPcJQuy+6ZytUPMdcDk kcvDhn9nybF03iEbknI0N3tbXAJlxaMQ2f6hJJfcJ6nYjsb0mO6pFvJgvVcvg+WeF8wDpZM3f1X+z iWxU6HJcRAJk1vF4EYdkZ7xocFwYoxXdu6WhXvyAJjKjlJkCtRNnv/Ijacgdg8srcrA2e5seoeB5Y Es7vmJ4Sxt/Ju+G9LYnA/ldItYVsc0snS1g9tmYGKizi4qgSHMPPXL7DWLlVsDBUmO3t4dxN62tet s7fJVp83EP8E3d+AoPoyc5/lfn2kbFNJDfFojVXdj5DgdHACuzvIQDzx4K4qy9TGHKQf6sJrEbT9S auJ3b59SA==; Content-Disposition: inline In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=m.gmane.org@lists.infradead.org To: Waiman Long Cc: linux-arch@vger.kernel.org, linux-xtensa@linux-xtensa.org, Davidlohr Bueso , linux-ia64@vger.kernel.org, Tim Chen , Arnd Bergmann , linux-sh@vger.kernel.org, linux-hexagon@vger.kernel.org, x86@kernel.org, Will Deacon , linux-kernel@vger.kernel.org, Linus Torvalds , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , linux-alpha@vger.kernel.org, sparclinux@vger.kernel.org, Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, Andrew Morton , linux-arm-kernel@lists.infradead.org On Mon, Feb 11, 2019 at 11:35:24AM -0500, Waiman Long wrote: > On 02/11/2019 06:58 AM, Peter Zijlstra wrote: > > Which is clearly worse. Now we can write that as: > > > > int __down_read_trylock2(unsigned long *l) > > { > > long tmp = READ_ONCE(*l); > > > > while (tmp >= 0) { > > if (try_cmpxchg(l, &tmp, tmp + 1)) > > return 1; > > } > > > > return 0; > > } > > > > which generates: > > > > 0000000000000030 <__down_read_trylock2>: > > 30: 48 8b 07 mov (%rdi),%rax > > 33: 48 85 c0 test %rax,%rax > > 36: 78 18 js 50 <__down_read_trylock2+0x20> > > 38: 48 8d 50 01 lea 0x1(%rax),%rdx > > 3c: f0 48 0f b1 17 lock cmpxchg %rdx,(%rdi) > > 41: 75 f0 jne 33 <__down_read_trylock2+0x3> > > 43: b8 01 00 00 00 mov $0x1,%eax > > 48: c3 retq > > 49: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > > 50: 31 c0 xor %eax,%eax > > 52: c3 retq > > > > Which is a lot better; but not quite there yet. > > > > > > I've tried quite a bit, but I can't seem to get GCC to generate the: > > > > add $1,%rdx > > jle > > > > required; stuff like: > > > > new = old + 1; > > if (new <= 0) > > > > generates: > > > > lea 0x1(%rax),%rdx > > test %rdx, %rdx > > jle > > Thanks for the suggested code snippet. So you want to replace "lea > 0x1(%rax), %rdx" by "add $1,%rdx"? > > I think the compiler is doing that so as to use the address generation > unit for addition instead of using the ALU. That will leave the ALU > available for doing other arithmetic operation in parallel. I don't > think it is a good idea to override the compiler and force it to use > ALU. So I am not going to try doing that. It is only 1 or 2 more of > codes anyway. Yeah, I was trying to see what I could make it do.. #2 really should be good enough, but you know how it is once you're poking at it :-)