From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751119AbcEGWaW (ORCPT ); Sat, 7 May 2016 18:30:22 -0400 Received: from ns.horizon.com ([71.41.210.147]:20421 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751028AbcEGWaW (ORCPT ); Sat, 7 May 2016 18:30:22 -0400 Date: 7 May 2016 18:30:18 -0400 Message-ID: <20160507223018.17111.qmail@ns.horizon.com> From: "George Spelvin" To: sam@ravnborg.org, zengzhaoxiu@163.com Subject: Re: [patch V4] lib: GCD: Use binary GCD algorithm instead of Euclidean Cc: akpm@linux-foundation.org, linux@horizon.com, linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org In-Reply-To: <20160507112308.GA2612@ravnborg.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sam Ravnborg wrote: > sparc64 have an efficient ffs implementation. > We use run-time patching to use the proper version > depending on the actual sparc cpu. > > As this is determinded at config time, then let the > sparc cpu that has the efficient ffs benefit from this. > > In other words - select CPU_NO_EFFICIENT_FFS only for SPARC32. I'm not sure this is the right thing. It's always a function call, and there's boot-time code patching to use either an unrolled binary search or a POPC instructon on processors that have that instruction. The NO_EFFICIENT_FFS isn't much slower than the __ffs version, so the call/return alone might eat the difference, and if the CPU doesn't have POPC support it's definitely a lose. Quite simply, gcd isn't important enough to be worth the same boot-time code patching, and if we have to use one on both types of CPU the the NO_EFFICIENT_FFS path is the safer alternative in case of uncertainty. Would you be willing to try benchmarking it? The baseline code, plus two versions of the __ffs code using the two different __ffs implementations (forced out of line by compiling from assembler source or using inine asm and __attribute((noinline))). By the way, the SPARC64 implementation could be improved. It's currently 5 instructions: __ffs: neg %o0, %g1 xnor %o0, %g1, %o1 popc %o1, %o0 retl sub %o0, 1, %o0 That could be improved to 4: sub %o0, 1, %g1 andn %g1, %o0, %g1 retl popc %g1, %o0