From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751372AbcFXTSG (ORCPT ); Fri, 24 Jun 2016 15:18:06 -0400 Received: from merlin.infradead.org ([205.233.59.134]:51135 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750988AbcFXTSE (ORCPT ); Fri, 24 Jun 2016 15:18:04 -0400 Date: Fri, 24 Jun 2016 21:17:34 +0200 From: Peter Zijlstra To: James Bottomley Cc: Davidlohr Bueso , mingo@kernel.org, davem@davemloft.net, cw00.choi@samsung.com, dougthompson@xmission.com, bp@alien8.de, mchehab@osg.samsung.com, gregkh@linuxfoundation.org, pfg@sgi.com, jikos@kernel.org, hans.verkuil@cisco.com, awalls@md.metrocast.net, dledford@redhat.com, sean.hefty@intel.com, kys@microsoft.com, heiko.carstens@de.ibm.com, sumit.semwal@linaro.org, schwidefsky@de.ibm.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH -tip 00/12] locking/atomics: Add and use inc,dec calls for FETCH-OP flavors Message-ID: <20160624191734.GE30154@twins.programming.kicks-ass.net> References: <1466453164-13185-1-git-send-email-dave@stgolabs.net> <1466786765.2343.37.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1466786765.2343.37.camel@HansenPartnership.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 24, 2016 at 09:46:05AM -0700, James Bottomley wrote: > On Mon, 2016-06-20 at 13:05 -0700, Davidlohr Bueso wrote: > > Hi, > > > > The series is really straightforward and based on Peter's work that > > introduces[1] the atomic_fetch_$op machinery. Only patch 1 implements > > the actual atomic_fetch_{inc,dec} calls based on > > atomic_fetch_{add,sub}. > > Could I just ask why? atomic_inc_return(x) - 1 seems a reasonable > thing to do to me. Is it because on architectures where atomics are > implemented in asm, it costs us one more CPU instruction to do the > extra decrement which gcc can't optimise? If that's it, I'm not sure > the added complexity justifies the cycle savings. That boat has sailed, fetch_$op is implemented (in asm mostly) for _all_ architectures already. All Davidlohr does here is add fetch_{inc,dec}(v) -> fetch_{add,sub}(1, v) macros because he's lazy. In any case, fetch_$op is the natural form of atomics that return a value; Linux has historically chosen the 'wrong' form. The fetch_$op, test-and-modify, load-store whatever is what hardware typically does natively and is what works for irreversible operations. Sure, for reversible operations (add/sub) what you say can (and is) done, and then we hope the compiler knows that x-x == 0 (and it typically does). As you say, that's slightly sub-optimal for archs where the compiler cannot see into the atomic (typically LL/SC archs). But add/sub were _2_ lines extra after I did all the groundwork for fetch_{or,and,xor}. So we might as well save those few extra add/dec cycles. Some of them are in fairly hot paths. Lastly; and the weakest argument; fetch_$op is what C11 has, probably because the above reasons.