From mboxrd@z Thu Jan 1 00:00:00 1970 From: Borislav Petkov Subject: Re: [PATCH 2/5] bitops: compile time optimization for hweight_long(CONSTANT) Date: Mon, 8 Feb 2010 10:28:45 +0100 Message-ID: <20100208092845.GB12618@a1.tnic> References: <1265222875.24455.1020.camel@laptop> <4B69D362.10608@zytor.com> <20100204151050.GC32711@aftab> <1265296432.22001.18.camel@laptop> <20100204155419.GD32711@aftab> <1265299457.22001.72.camel@laptop> <20100205121139.GA9044@aftab> <4B6C93A2.1090302@zytor.com> <20100206093659.GA28326@aftab> <4B6E1DA3.50204@zytor.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Borislav Petkov , Peter Zijlstra , Andrew Morton , Wu Fengguang , LKML , Jamie Lokier , Roland Dreier , Al Viro , "linux-fsdevel@vger.kernel.org" , Ingo Molnar , Brian Gerst To: "H. Peter Anvin" Return-path: Content-Disposition: inline In-Reply-To: <4B6E1DA3.50204@zytor.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sat, Feb 06, 2010 at 05:55:47PM -0800, H. Peter Anvin wrote: > > Well, the example Brian pointed me to - __mutex_fastpath_lock - lists > > the full set of clobbered registers. Please elaborate on the assembly > > wrapper for the function, wouldn't I need to list all the clobbered > > registers there too or am I missing something? > > > > The notion there would be that you do push/pop in the assembly wrapper. Oh yes, something similar to SAVE/RESTORE_ALL in could work. Good idea! > >> d) On the other hand, you do *not* need a "memory" clobber. > > > > Right, in this case we have all non-barrier like inlines so no memory > > clobber, according to the comment above alternative() macro. > > OK, I'm missing something here. > > A few more notions: > > a. This is exactly the kind of code where you don't want to put > "volatile" on your asm statement, because it's a pure compute. > > b. It is really rather pointless to go through the whole alternatives > work if you are then going to put it inside a function which isn't an > inline ... Well, in the second version I did replace a 'call _hweightXX' with the actual popcnt opcode so the alternatives is only needed to do the replacement during boot. We might just as well do if (X86_FEATURE_POPCNT) __hw_popcnt() else __software_hweight() The only advantage of the alternatives is that it would save us the if-else test above each time we do cpumask_weight. However, the if-else approach is much more readable and obviates the need for all that macro magic and taking special care of calling c function from within asm. And since we do not call cpumask_weight all that often I'll honestly opt for alternative-less solution... Hmm... Thanks, Boris.